Re: swap space issues

2020-06-29 Thread Donald Wilde
[adding maintainers of synth and ccache]

On 6/29/20, Mark Millard  wrote:
> Based on "small arm system" context experiments
> mostly . . .
>
> If your console messasges do not include
> messages about "swap_pager_getswapspace(...): failed",
> then it is unlikely that being out of swap space
> is the actual issue even when it reports: "was killed:
> out of swap space" messages. For such contexts, making
> the swap area bigger does not help.
>

It did not show those getswapspace messages.

> In other words, "was killed: out of swap space"
> is frequently a misnomer and not to be believed
> for "why" the kill happened or what should be
> done about it --without other evidence also being
> present anyway.
>
> Other causes include:
>
> Sustained low free RAM (via stays-runnable processes).
> A sufficiently delayed pageout.
> The swap blk uma zone was exhausted.
> The swap pctrie uma zone was exhausted.
>
> (stays-runnable processes are not swapped out
> [kernel stacks are not swapped out] but do actively
> compete for RAM via paging activity. In such a
> context, free RAM can stay low.)
>
> The below material does not deal with the
> the "exhausted" causes but does deal with
> the other 2.
>
> Presuming that you are getting "was killed: out
> of swap space" notices but are not getting
> "swap_pager_getswapspace failed" notices and
> that kern.maxswzone vs. system load has not
> been adjusted in a way that leads to bad
> memory tradeoffs . . .
>
> I recommend attempting use of, say, (from
> my /etc/sysctl.conf ):
>
Attached is what I tried, but when I ran synth again, I got a
corrupted HDD that fsck refuses to fix, whether in 1U mode or with fs
mounted. It just will not SALVAGE even when I add the -y flag.

What got corrupted was one of the /usr/.ccache directories, but
'ccache -C' doesn't clear it.

I restored the original /etc/sysctl.conf, but I can't add packages or
ports any more, so I'm afraid I'm going to have to dd if=/dev/zero the
disk and reload from 12.1R and start over again.

I can't even 'rm -Rf /usr/.ccache'. It says 'Directory not empty'.

I don't need this system up and running, so I'm not going to make any
more changes until I see if any of you have suggestions to try first.
-- 
Don Wilde

* What is the Internet of Things but a system *
* of systems including humans? *



sysctl.conf.new
Description: Binary data
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: swap space issues

2020-06-29 Thread Mark Millard via freebsd-stable
[I'm now subscribed so my messages should go through to
the list.]

On 2020-Jun-29, at 06:17, Donald Wilde  wrote:

> [adding maintainers of synth and ccache]
> 
> On 6/29/20, Mark Millard  wrote:
>> Based on "small arm system" context experiments
>> mostly . . .
>> 
>> If your console messasges do not include
>> messages about "swap_pager_getswapspace(...): failed",
>> then it is unlikely that being out of swap space
>> is the actual issue even when it reports: "was killed:
>> out of swap space" messages. For such contexts, making
>> the swap area bigger does not help.
>> 
> 
> It did not show those getswapspace messages.

Any other potentially of interest console messages?

>> In other words, "was killed: out of swap space"
>> is frequently a misnomer and not to be believed
>> for "why" the kill happened or what should be
>> done about it --without other evidence also being
>> present anyway.
>> 
>> Other causes include:
>> 
>> Sustained low free RAM (via stays-runnable processes).
>> A sufficiently delayed pageout.
>> The swap blk uma zone was exhausted.
>> The swap pctrie uma zone was exhausted.
>> 
>> (stays-runnable processes are not swapped out
>> [kernel stacks are not swapped out] but do actively
>> compete for RAM via paging activity. In such a
>> context, free RAM can stay low.)
>> 
>> The below material does not deal with the
>> the "exhausted" causes but does deal with
>> the other 2.
>> 
>> Presuming that you are getting "was killed: out
>> of swap space" notices but are not getting
>> "swap_pager_getswapspace failed" notices and
>> that kern.maxswzone vs. system load has not
>> been adjusted in a way that leads to bad
>> memory tradeoffs . . .
>> 
>> I recommend attempting use of, say, (from
>> my /etc/sysctl.conf ):
>> 
> Attached is what I tried, but when I ran synth again, I got a
> corrupted HDD that fsck refuses to fix, whether in 1U mode or with fs
> mounted. It just will not SALVAGE even when I add the -y flag.

That is a horrible result.

I assume that you rebooted after editing
sysctl.conf or manually applied the
values separately instead.

What sort of console messages were generated?
Was the corruption the only issue? Did the system
crash? In what way?

Your notes on what you set have a incorrect
comment about a case that you did not use:

# For plunty of swap/paging space (will not
# run out), avoid pageout delays leading to
# Out Of Memory killing of processes:
#vm.pfault_oom_attempts=-1 # infinite

vm.pfault_oom_attempts being -1 is a special
value that disables the the logic for the
vm.pfault_oom_attempts and vm.pfault_oom_wait
pair: Willing to wait indefinitely relative to
how long the pageout takes, no retries. (Other
OOM criteria may still be active.)

You report using:

# For possibly insufficient swap/paging space
# (might run out), increase the pageout delay
# that leads to Out Of Memory killing of
# processes:
vm.pfault_oom_attempts= 10
vm.pfault_oom_wait= 1
# (The multiplication is the total but there
# are other potential tradoffs in the factors
# multiplied for the same total.)

Note: kib might be interested in what happens
for, say, 10 and 1, 5 and 2, and 1 and 10.
He has asked for such before from someone
having OOM problems but, to my knowledge,
no one has taken him up on such testing.
(He might be only after 10/1 and 1/10 or
other specific figures. Best to ask him if
you want to try such things for him.)

I've always set up to use vm.pfault_oom_attempts=-1
(avoiding running out of swap space by how I
configure things and what I choose to run). I
avoid things like tempfs that compete for RAM,
especially in low memory contexts.

For 64-bit environments I've never had to have
enough swapspace that the boot reported an issue
for kern.maxswapzone : more swap is allowed for
the same amount of RAM as is allowed for a 32-bit
environment.

In the 64-bit type of context with 1 GiByte+
of RAM I do -j4 build world buildkernel, 3072 MiBytes
of swap. For 2 GiByte+ of RAM I use 4 poudriere builders
(one per core), each allowed 4 processes
(ALLOW_MAKE_JOBS=yes), so the load average can at times
reach around 16 over significant periods. I also use
USB SSDs instead of spinning rust. The port builds
include a couple of llvm's and other toolchains. But
there could be other stuff around that would not fit.

(So synth for you vs. poudriere for me is a
difference in our contexts. ALso, I stick to
default kern.maxswapzone use without boot
messages about exceeding the maximum
recommended amount. Increasing kern.maxswapzone
trades off KVM available for other purposes and
I avoid the tradeoffs that I do not understand.)

For 32-bit environments with environments with
2 GiByte+ of RAM I have to be more careful to
be sure of avoiding running out of swap for
what I do. PARALLEL_JOBS=2 and ALLOW_MAKE_JOBS=yes
for poudruere (so load average around 8 over some
periods). -j4 for buildworld buildkernel .

For 32-bit 1 GiByte I used -j2 for buildworld
buildkernel , 1800 MiBytes of swap. As I rememb

Re: swap space issues

2020-06-29 Thread Donald Wilde
On 6/29/20, Mark Millard  wrote:
> [I'm now subscribed so my messages should go through to
> the list.]
>
> On 2020-Jun-29, at 06:17, Donald Wilde  wrote:
>
>> [adding maintainers of synth and ccache]
>>
>> On 6/29/20, Mark Millard  wrote:
>>> Based on "small arm system" context experiments
>>> mostly . . .
>>>
>>> If your console messasges do not include
>>> messages about "swap_pager_getswapspace(...): failed",
>>> then it is unlikely that being out of swap space
>>> is the actual issue even when it reports: "was killed:
>>> out of swap space" messages. For such contexts, making
>>> the swap area bigger does not help.
>>>
>>
>> It did not show those getswapspace messages.
>
> Any other potentially of interest console messages?
>
>>> In other words, "was killed: out of swap space"
>>> is frequently a misnomer and not to be believed
>>> for "why" the kill happened or what should be
>>> done about it --without other evidence also being
>>> present anyway.
>>>
>>> Other causes include:
>>>
>>> Sustained low free RAM (via stays-runnable processes).
>>> A sufficiently delayed pageout.
>>> The swap blk uma zone was exhausted.
>>> The swap pctrie uma zone was exhausted.
>>>
>>> (stays-runnable processes are not swapped out
>>> [kernel stacks are not swapped out] but do actively
>>> compete for RAM via paging activity. In such a
>>> context, free RAM can stay low.)
>>>
>>> The below material does not deal with the
>>> the "exhausted" causes but does deal with
>>> the other 2.
>>>
>>> Presuming that you are getting "was killed: out
>>> of swap space" notices but are not getting
>>> "swap_pager_getswapspace failed" notices and
>>> that kern.maxswzone vs. system load has not
>>> been adjusted in a way that leads to bad
>>> memory tradeoffs . . .
>>>
>>> I recommend attempting use of, say, (from
>>> my /etc/sysctl.conf ):
>>>
>> Attached is what I tried, but when I ran synth again, I got a
>> corrupted HDD that fsck refuses to fix, whether in 1U mode or with fs
>> mounted. It just will not SALVAGE even when I add the -y flag.
>
> That is a horrible result.
>
> I assume that you rebooted after editing

Yes.

> sysctl.conf or manually applied the
> values separately instead.
>
> What sort of console messages were generated?
> Was the corruption the only issue? Did the system
> crash? In what way?

The symptoms used to be that, once synth crashed by OOM, it would
refuse to allow any packages to be added, because the synth data
structures were blasted. Synth is unfortunately pretty blockheaded in
that you can't control the order of the ports built. You can control
how many ports get built at a time and how many tasks per port get
started and managed. As I say, though, this change led to -- I won't
say _caused_ -- a corruption of the disk area used for ccache. I could
not recover from that, although I could have turned ccache off.

I was able to manually build llvm80 from the ports, so the dependency
gotcha that gave me the issues was in some other dependency. As I said
in my other message, it seems to be the Google performance tools or
Brotli that might have been broken.

What's great, Mark, is that you've given me a lot more insight into
the tunability of the kernel. As I said, I do a lot of IoT and
embedded work with ARM-based Micro-Controllers.
>
> Your notes on what you set have a incorrect
> comment about a case that you did not use:
>
> # For plunty of swap/paging space (will not
> # run out), avoid pageout delays leading to
> # Out Of Memory killing of processes:
> #vm.pfault_oom_attempts=-1 # infinite
>
> vm.pfault_oom_attempts being -1 is a special
> value that disables the the logic for the
> vm.pfault_oom_attempts and vm.pfault_oom_wait
> pair: Willing to wait indefinitely relative to
> how long the pageout takes, no retries. (Other
> OOM criteria may still be active.)

Ah, I appreciate the distinction. :)
>
> You report using:
>
> # For possibly insufficient swap/paging space
> # (might run out), increase the pageout delay
> # that leads to Out Of Memory killing of
> # processes:
> vm.pfault_oom_attempts= 10
> vm.pfault_oom_wait= 1
> # (The multiplication is the total but there
> # are other potential tradoffs in the factors
> # multiplied for the same total.)
>
> Note: kib might be interested in what happens
> for, say, 10 and 1, 5 and 2, and 1 and 10.
> He has asked for such before from someone
> having OOM problems but, to my knowledge,
> no one has taken him up on such testing.
> (He might be only after 10/1 and 1/10 or
> other specific figures. Best to ask him if
> you want to try such things for him.)

Who is 'kib'? I'm still learning the current team of the Project.
>
> I've always set up to use vm.pfault_oom_attempts=-1
> (avoiding running out of swap space by how I
> configure things and what I choose to run). I
> avoid things like tempfs that compete for RAM,
> especially in low memory contexts.

Until you explained what you have taught me, I thought these were
swap-related issues.

TBH, I am gettin

Re: swap space issues

2020-06-29 Thread Mark Millard via freebsd-stable



On 2020-Jun-29, at 14:12, Donald Wilde  wrote:

> On 6/29/20, Mark Millard  wrote:
>> [I'm now subscribed so my messages should go through to
>> the list.]
>> 
>> On 2020-Jun-29, at 06:17, Donald Wilde  wrote:
>> 
>>> . . .
>> 
>> You report using:
>> 
>> # For possibly insufficient swap/paging space
>> # (might run out), increase the pageout delay
>> # that leads to Out Of Memory killing of
>> # processes:
>> vm.pfault_oom_attempts= 10
>> vm.pfault_oom_wait= 1
>> # (The multiplication is the total but there
>> # are other potential tradoffs in the factors
>> # multiplied for the same total.)
>> 
>> Note: kib might be interested in what happens
>> for, say, 10 and 1, 5 and 2, and 1 and 10.
>> He has asked for such before from someone
>> having OOM problems but, to my knowledge,
>> no one has taken him up on such testing.
>> (He might be only after 10/1 and 1/10 or
>> other specific figures. Best to ask him if
>> you want to try such things for him.)
> 
> Who is 'kib'? I'm still learning the current team of the Project.

Konstantin Belousov

Also known as kib (from kib at freebsd.org).
Also known as kostik (from part of his gmail address?).


>> I've always set up to use vm.pfault_oom_attempts=-1
>> (avoiding running out of swap space by how I
>> configure things and what I choose to run). I
>> avoid things like tempfs that compete for RAM,
>> especially in low memory contexts.
> 
> Until you explained what you have taught me, I thought these were
> swap-related issues.
> 
> TBH, I am getting disgusted with Synth, as good as it (by spec, not
> actuality) is supposed to be.

While I experimented with Synth a little a long time ago,
I normally stick to tools and techniques that work across
amd64, powerpc64, aarch64, 32-bit powerpc, and armv7 when
I can. So, the experiment was strictly temporary on one
environment at the time.

> CCache I've used for years, and never had this kind of issue.
>> 
>> For 64-bit environments I've never had to have
>> enough swapspace that the boot reported an issue
>> for kern.maxswapzone : more swap is allowed for
>> the same amount of RAM as is allowed for a 32-bit
>> environment.
> 
> Now that you've opened the possibility, it would explain how it goes
> from <3% swap use to OOM in moments... it's not a swap usage issue!
> That's an important thing to learn.
> 
> Not having heard from anyone else, I'm in the process of zeroing my
> drive and starting over.
>> 
>> In the 64-bit type of context with 1 GiByte+
>> of RAM I do -j4 build world buildkernel, 3072 MiBytes
>> of swap. For 2 GiByte+ of RAM I use 4 poudriere builders
>> (one per core), each allowed 4 processes
>> (ALLOW_MAKE_JOBS=yes), so the load average can at times
>> reach around 16 over significant periods. I also use
>> USB SSDs instead of spinning rust. The port builds
>> include a couple of llvm's and other toolchains. But
>> there could be other stuff around that would not fit.
>> 
>> (So synth for you vs. poudriere for me is a
>> difference in our contexts. ALso, I stick to
>> default kern.maxswapzone use without boot
>> messages about exceeding the maximum
>> recommended amount. Increasing kern.maxswapzone
>> trades off KVM available for other purposes and
>> I avoid the tradeoffs that I do not understand.)
> [snip]
>> (My context is head, not stable.)
> 
> Thanks for documenting your usage. I'll store a pointer to this week's
> -stable archives  so I can come back to this when I get to smaller
> builds.
>> 
>> . . .
>> 
>>> What got corrupted was one of the /usr/.ccache directories, but
>>> 'ccache -C' doesn't clear it.
>> 
>> I've not used ccache. So that is another variation
>> in our contexts.
>> 
>> I use UFS, not ZFS. I avoid tmpfs and such that complete
>> for memory.
> 
> I'm using UFS on MBR partitions.

GPT for root file systems for me, other than any old PowerMacs
(APM). (On the small arm's I just use microsd cards to get to
booting the root file system on a GPT based USB SSD via a
technique that works the same for all such arms that I
sometimes have access to, other than the RPi4's at this stage.)

>> . . .

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"