Bug#953421: dash: Resident Set Size growth is unbound (memory leak) on an infinite shell loop

2020-04-02 Thread Vitaly Zuevsky
Hi Harald,

Thanks for comprehensive account of the job flow - all worked as expected now.

Interestingly, I originally assumed it was a bug due to observed discrepancy 
with bash...

On 29/03/2020 23:07, Jilles Tjoelker wrote:
> I agree that the change is incorrect, but I do not agree that this kind
> of script must leak memory. Per POSIX.1-2008 XCU 2.9.3.1 Asynchronous
> Lists, an implementation has additional ways to forget about jobs than
> just an appropriate completion of the wait utility: if another
> asynchronous job is started when $! was not referenced or if the number
> of known process IDs would exceed {CHILD_MAX} (which tends to be rather
> big, though).

I can see in Bash man pages:

CHILD_MAX
  Set the number of exited child status values for the shell to remember.  Bash 
will not allow this value to be decreased
  below a POSIX-mandated minimum, and there is a maximum value (currently 8192) 
that this may not  exceed.   The  minimum
  value is system-dependent.

Running above scripts under bash shell manifests growing RSS just initially, 
and with sleep .1 that growth will end after some 800 seconds as one could 
deduce from the man pages. That is, RSS growth is bound.

I understand dash philosophy could be focussed on the most minimalistic 
solution possible. Such solution would inherently be less forgiving in terms of 
mistakes. On the other hand, the mistake in question can be difficult to detect 
and leads all the way to OOM, negatively affecting user experience.

It is up to you and community where that balance should rest. And if you decide 
the change would be worthy of doing - I will be more than happy to contribute.

Cheers,
Vitaly 



Bug#953421: dash: Resident Set Size growth is unbound (memory leak) on an infinite shell loop

2020-03-31 Thread Harald van Dijk

Hi Vitaly,

On 31/03/2020 20:07, Vitaly Zuevsky wrote:

I must have confused two concepts: waited process in OS -vs- waited job inside 
shell interpreter. I am trying to see how it work in practice:

# true & false &
#
[2] + Done(1)false
[1] + Done   true
# wait 2
# echo $?
127

As we preserve job exit codes, I would expect wait command to read them and 
free associated jobtab slots. In above example I expect to see 1 in place of 
127. What do I miss here?


There are two problems here. The first is that waiting for jobs involves 
the % symbol. wait 2 always means "wait for process 2", never "wait for 
job 2". To wait for job 2, the syntax is "wait %2". The second problem 
is that as soon as you see that "[2] + Done(1)", the job has been 
removed the job table already, so "wait %2" no longer works after that. 
The blank line (except for the #) indicates that you pressed Return at 
that point. In interactive shells, jobs are checked for completion and 
removed from the job table automatically when prompting for input. Since 
it was practically impossible to press Return before the "true" and 
"false" commands had finished, when the next prompt would have been 
shown, the jobs were cleaned up. If you had typed "wait %2" before 
pressing Return, it would have worked and the subsequent "echo $?" would 
have printed "1".



Many thanks for your help.

Best,
Vitaly




Bug#953421: dash: Resident Set Size growth is unbound (memory leak) on an infinite shell loop

2020-03-31 Thread Vitaly Zuevsky
Hi Harald,


>   set -- $(seq 1 100)
>   for i
>   do
> : &
> sleep .1
>   done
>   for i
>   do
> wait %$i
>   done
>
>This is a valid script and works fine in dash. Your change breaks this by not 
>keeping the jobs around long enough, and I hope this test script shows that 
>there is no way to keep the jobs around long enough but by allocating ever 
>more memory.

I must have confused two concepts: waited process in OS -vs- waited job inside 
shell interpreter. I am trying to see how it work in practice:

# true & false &
#
[2] + Done(1)false
[1] + Done   true
# wait 2
# echo $?
127

As we preserve job exit codes, I would expect wait command to read them and 
free associated jobtab slots. In above example I expect to see 1 in place of 
127. What do I miss here?

Many thanks for your help.

Best,
Vitaly  



Bug#953421: dash: Resident Set Size growth is unbound (memory leak) on an infinite shell loop

2020-03-29 Thread Harald van Dijk

On 29/03/2020 23:07, Jilles Tjoelker wrote:

On Sun, Mar 29, 2020 at 08:06:31PM +0100, Harald van Dijk wrote:

On 29/03/2020 18:54, Vitaly Zuevsky wrote:

I have now fixed this bug locally.



The leak is in jobtab array (jobs.c). I concluded that the most
logical approach would be eliminating inconsistency between
makejob() and dowait() functions. My fix in a forked repo:



https://salsa.debian.org/psvz-guest/dash/-/commit/5e3ea90cb3355d1308c482661a471883d36af5e7



This change is incorrect. The reason dash keeps on allocating memory is
because dash needs to keep on allocating memory. Consider this script:



   set -- $(seq 1 100)
   for i
   do
 : &
 sleep .1
   done
   for i
   do
 wait %$i
   done



This is a valid script and works fine in dash. Your change breaks this by
not keeping the jobs around long enough, and I hope this test script shows
that there is no way to keep the jobs around long enough but by allocating
ever more memory.


I agree that the change is incorrect, but I do not agree that this kind
of script must leak memory. Per POSIX.1-2008 XCU 2.9.3.1 Asynchronous
Lists, an implementation has additional ways to forget about jobs than
just an appropriate completion of the wait utility: if another
asynchronous job is started when $! was not referenced or if the number
of known process IDs would exceed {CHILD_MAX} (which tends to be rather
big, though).


That says the process ID need not be remembered, which makes sense: 
process IDs are a very limited resource on some systems. This reasoning 
does not apply to job IDs and I do not see any corresponding wording for 
those. Am I overlooking something?


Regardless, it does not leak memory. It continually uses more memory, 
but it is not a leak if the memory is still in use and can/will be freed 
when it is no longer needed. Memory used for jobs is released when the 
job is removed from the job table.



POSIX does not seem to expect using % in scripts like this; it
seems highly fragile to me anyway (although $! has problems with process
ID reuse).


In what way does it not seem to expect this? POSIX says "The job control 
job ID type of pid is only available on systems supporting the User 
Portability Utilities option" but does not appear to have any other 
relevant restrictions.


As for fragile, sure, but if a script is correct, it should be processed 
correctly by a shell.



FreeBSD sh implements forgetting when $! was not referenced (and the job
has terminated), but not the {CHILD_MAX} limit. This avoids the
increasing memory usage in the example script.


See above for the note on process ID vs job ID.


Your change makes it impossible to keep track of the background process's
status, but if you do not care about that anyway, you can avoid the
increasing memory use without modifying dash by launching a background
process without including it in the current shell's job table, by launching
it from a subshell:



   while true
   do
 (true &)
 sleep .1
   done


Certainly a good idea.

Another option may be to include regular invocations of the wait utility
without parameters, although this is not suitable for all scripts.


A third option is invoking the 'jobs' command, possibly redirecting the 
output to /dev/null. When the 'jobs' command reports that any background 
job is done, that job is removed from the list of jobs the shell has to 
remembers.


Cheers,
Harald van Dijk



Bug#953421: dash: Resident Set Size growth is unbound (memory leak) on an infinite shell loop

2020-03-29 Thread Jilles Tjoelker
On Sun, Mar 29, 2020 at 08:06:31PM +0100, Harald van Dijk wrote:
> On 29/03/2020 18:54, Vitaly Zuevsky wrote:
> > I have now fixed this bug locally.

> > The leak is in jobtab array (jobs.c). I concluded that the most
> > logical approach would be eliminating inconsistency between
> > makejob() and dowait() functions. My fix in a forked repo:

> > https://salsa.debian.org/psvz-guest/dash/-/commit/5e3ea90cb3355d1308c482661a471883d36af5e7

> This change is incorrect. The reason dash keeps on allocating memory is
> because dash needs to keep on allocating memory. Consider this script:

>   set -- $(seq 1 100)
>   for i
>   do
> : &
> sleep .1
>   done
>   for i
>   do
> wait %$i
>   done

> This is a valid script and works fine in dash. Your change breaks this by
> not keeping the jobs around long enough, and I hope this test script shows
> that there is no way to keep the jobs around long enough but by allocating
> ever more memory.

I agree that the change is incorrect, but I do not agree that this kind
of script must leak memory. Per POSIX.1-2008 XCU 2.9.3.1 Asynchronous
Lists, an implementation has additional ways to forget about jobs than
just an appropriate completion of the wait utility: if another
asynchronous job is started when $! was not referenced or if the number
of known process IDs would exceed {CHILD_MAX} (which tends to be rather
big, though).

POSIX does not seem to expect using % in scripts like this; it
seems highly fragile to me anyway (although $! has problems with process
ID reuse).

FreeBSD sh implements forgetting when $! was not referenced (and the job
has terminated), but not the {CHILD_MAX} limit. This avoids the
increasing memory usage in the example script.

> Your change makes it impossible to keep track of the background process's
> status, but if you do not care about that anyway, you can avoid the
> increasing memory use without modifying dash by launching a background
> process without including it in the current shell's job table, by launching
> it from a subshell:

>   while true
>   do
> (true &)
> sleep .1
>   done

Certainly a good idea.

Another option may be to include regular invocations of the wait utility
without parameters, although this is not suitable for all scripts.

-- 
Jilles Tjoelker



Bug#953421: dash: Resident Set Size growth is unbound (memory leak) on an infinite shell loop

2020-03-29 Thread Harald van Dijk

Hi Vitaly,

On 29/03/2020 18:54, Vitaly Zuevsky wrote:

I have now fixed this bug locally.

The leak is in jobtab array (jobs.c). I concluded that the most logical 
approach would be eliminating inconsistency between makejob() and dowait() 
functions. My fix in a forked repo:

https://salsa.debian.org/psvz-guest/dash/-/commit/5e3ea90cb3355d1308c482661a471883d36af5e7


This change is incorrect. The reason dash keeps on allocating memory is 
because dash needs to keep on allocating memory. Consider this script:


  set -- $(seq 1 100)
  for i
  do
: &
sleep .1
  done
  for i
  do
wait %$i
  done

This is a valid script and works fine in dash. Your change breaks this 
by not keeping the jobs around long enough, and I hope this test script 
shows that there is no way to keep the jobs around long enough but by 
allocating ever more memory.


Your change makes it impossible to keep track of the background 
process's status, but if you do not care about that anyway, you can 
avoid the increasing memory use without modifying dash by launching a 
background process without including it in the current shell's job 
table, by launching it from a subshell:


  while true
  do
(true &)
sleep .1
  done

Cheers,
Harald van Dijk



Bug#953421: dash: Resident Set Size growth is unbound (memory leak) on an infinite shell loop

2020-03-29 Thread Vitaly Zuevsky
I have now fixed this bug locally.

The leak is in jobtab array (jobs.c). I concluded that the most logical 
approach would be eliminating inconsistency between makejob() and dowait() 
functions. My fix in a forked repo:

https://salsa.debian.org/psvz-guest/dash/-/commit/5e3ea90cb3355d1308c482661a471883d36af5e7

I suspect that imbalance between allocating and freeing jobtab slots was 
introduced at evolving SIGCHLD handling, because I can see two waitforjob() 
calls, wherein the first call could wait for both done jobs and gotsigchld is 
set. At the second call, gotsigchld is cleared and waitforjob() does nothing. 
So 1 of 2 jobtab slots goes astray.

Simplified bug repro steps:

Script
---
while true
do
  true &
  sleep .1
done

Execution
-
# dash Script &

You can now observe resident set size (RSS) is bumped every several seconds:
# ps aux |grep 'RSS\|dash'

What do I do to have it peer-reviewed and, potentially, merged into mainline?

Cheers,
Vitaly
--fixDetails:
Function makejob() calls freejob() when appropriate.
Specifically, it detects a free slot in jobtab array by this condition:

(jp->state != JOBDONE || !jp->waited)

The problem is that dowait() never sets waited flag together with JOBDONE state.
Consequently, jobtab may leak: repro steps are provided in a bug report #953421.

One solution could be not relying on waited flag at all, i.e. converting
the makejob()'s condition into (jp->state != JOBDONE). As an alternative,
I am setting waited flag in dowait().



Bug#953421: dash: Resident Set Size growth is unbound (memory leak) on an infinite shell loop

2020-03-17 Thread Vitaly Zuevsky
Hi Andrej

I cloned and compiled dash (0.5.10.2-6) from 
https://salsa.debian.org/debian/dash

I did further testing and found my initial repro steps were incorrect. 
Offending script showing RSS growth is:
--
#!/bin/sh
while true
do
  true || echo &
  sleep .1
done
--
RSS will not grow if precedence is addressed like so: true || (echo &)
I interpret it as starting/terminating cycle of a background job is leaky.

Br Vitaly

-Original Message-
From: Andrej Shadura [mailto:and...@shadura.me] 
Sent: Sunday, March 15, 2020 6:47 PM
To: Vitaly Zuevsky; 953...@bugs.debian.org
Cc: Debian Bug Tracking System
Subject: Re: Bug#953421: dash: Resident Set Size growth is unbound (memory 
leak) on an infinite shell loop

Thanks for your report; this most likely is a bug in the upstream package. 
While I doubt it’s any of our patches that introduced this undesired behaviour, 
it would be great if you could verify this still happens with the latest 
upstream Git version of dash, and if it also does, if you followed up with the 
upstream at d...@vger.kernel.org.

If that’s too much work for you right now (it’s okay if it is), I can do that 
for you, but, unfortunately, I cannot provide a timeline at the moment.

--
Cheers,
  Andrej



Bug#953421: dash: Resident Set Size growth is unbound (memory leak) on an infinite shell loop

2020-03-15 Thread Andrej Shadura
Hi,

On Mon, 9 Mar 2020 at 17:09, Vitaly Zuevsky  wrote:
>  [VZ]I use a shell script to supervise processes in a docker/kubernetes 
> container. I noticed steady growth
>  in the cgroup's CPU utilization from 15 to 35 millicores within 17 
> days in absence of any external
>  stimuli (dry run). If container got restarted, the pattern 
> reappeared from the base level of 15 millicores.

>  [VZ]I reproduced the issue in local docker, sampled cgroup's threads 
> with perf, and ran perf-diff on reports.
>  It emerged that the CPU cost grew at dash's freejob() and kernel's 
> copy_page() at page_fault().
>  The findings were consistent with dash growing a data structure, 
> possibly the one indexed by nprocs at
>  freejob(), on every iteration of the loop of the shell script. That 
> data structure would have to be
>  cloned at fork() happening along the loop, hence kernel page_faults.

Thanks for your report; this most likely is a bug in the upstream
package. While I doubt it’s any of our patches that introduced this
undesired behaviour, it would be great if you could verify this still
happens with the latest upstream Git version of dash, and if it also
does, if you followed up with the upstream at d...@vger.kernel.org.

If that’s too much work for you right now (it’s okay if it is), I can
do that for you, but, unfortunately, I cannot provide a timeline at
the moment.

-- 
Cheers,
  Andrej



Bug#953421: dash: Resident Set Size growth is unbound (memory leak) on an infinite shell loop

2020-03-09 Thread Vitaly Zuevsky
Package: dash
Version: 0.5.8-2.1ubuntu2
Severity: important

Dear Maintainer,

*** Reporter, please consider answering these questions, where appropriate ***

   * What led up to the situation?

 [VZ]I use a shell script to supervise processes in a docker/kubernetes 
container. I noticed steady growth
 in the cgroup's CPU utilization from 15 to 35 millicores within 17 
days in absence of any external
 stimuli (dry run). If container got restarted, the pattern reappeared 
from the base level of 15 millicores.

   * What exactly did you do (or not do) that was effective (or
 ineffective)?

 [VZ]I reproduced the issue in local docker, sampled cgroup's threads with 
perf, and ran perf-diff on reports.
 It emerged that the CPU cost grew at dash's freejob() and kernel's 
copy_page() at page_fault().
 The findings were consistent with dash growing a data structure, 
possibly the one indexed by nprocs at
 freejob(), on every iteration of the loop of the shell script. That 
data structure would have to be
 cloned at fork() happening along the loop, hence kernel page_faults.

 To verify my suspicions I reduced sleep inteval in the loop and 
monitored Resident Set Size.
 Reproduction steps without containers. The shell script:
-
#!/bin/sh

cd /tmp
while true
do
  [ "`pgrep rsyslogd`" ] || service rsyslog start
  [ "`pgrep cron`" ] || service cron start
  [ "`pgrep sshd`" ] || service ssh start
  env >/root/env
  sleep .5
done
-
 Save it as a file called woof, chmod it executable and run as
# nohup ./woof &

 Monitoring with ps unveils unbound growth, e.g. (note the first 
figure):

1628 wait   S?  0:00 /bin/sh ./woof
3456 wait   S?  1:02 /bin/sh ./woof
5188 wait   S?  2:59 /bin/sh ./woof
6584 wait   S?  5:19 /bin/sh ./woof
7076 wait   S?  6:17 /bin/sh ./woof
9300 wait   S? 23:54 /bin/sh ./woof

   * What was the outcome of this action?

 [VZ]I had to resort to #!/bin/bash shebang as it rendered expected 
behaviour:

2684 wait   S?  0:00 /bin/bash ./woof
3896 wait   S?  1:32 /bin/bash ./woof
3896 wait   S?  3:10 /bin/bash ./woof
3896 wait   S?  3:45 /bin/bash ./woof
3896 pipe_w S? 15:20 /bin/bash ./woof

 RSS remains constant at 3896

   * What outcome did you expect instead?

 [VZ]I expect RSS of dash remains bound in the case of an infinite shell 
loop the same way as RSS of bash does.

*** End of the template - remove these template lines ***


-- System Information:
Debian Release: stretch/sid
  APT prefers xenial-updates
  APT policy: (500, 'xenial-updates'), (500, 'xenial-security'), (500, 
'xenial'), (100, 'xenial-backports')
Architecture: amd64 (x86_64)

Kernel: Linux 4.4.0-1074-aws (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=ANSI_X3.4-1968) 
(ignored: LC_ALL set to C)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages dash depends on:
ii  debianutils  4.7
ii  dpkg 1.18.4ubuntu1.4
ii  libc62.23-0ubuntu10

dash recommends no packages.

dash suggests no packages.

-- debconf-show failed