Hello Gerlof.

Am Samstag, 12. November 2016, 14:18:02 CET schrieb Gerlof Langeveld:
> Hi Martin,
> 
> Thanks for the system call traces!
> I analyzed them in the meantime and have some questions (see end of the
> mail).

Answers below:

> On 11/09/2016 09:37 AM, Martin Steigerwald wrote:
> > Hi Gerlof.
> > 
> > Very busy week, so feel free to ask for further information, but please
> > understand that I may only report back next week.
> > 
> > Am Samstag, 29. Oktober 2016, 10:26:38 CET schrieben Sie:
> >>> Hi Marc, hi Gerlof (on CC).
> >>> 
> >>> Am Mittwoch, 26. Oktober 2016, 10:44:30 CEST schrieb Marc Haber:
> >>>> On Wed, Oct 26, 2016 at 10:31:54AM +0200, Martin Steigerwald wrote:
> >>>>> merkaba:~> systemctl status atopacct
> >>>>> ● atopacct.service - Atop process accounting daemon
> >>>>> 
> >>>>>      Loaded: loaded (/lib/systemd/system/atopacct.service; enabled;
> >>>>>      vendor
> >>>>>      preset: enabled) Active: active (running) since Mi 2016-10-26
> >>>>>      10:21:29
> >>>>>      CEST; 5min ago>
> >>>>>      
> >>>>>        Docs: man:atopacctd(8)
> >>>>>     
> >>>>>     Process: 31802 ExecStart=/usr/sbin/atopacctd (code=exited,
> >>>>>     status=0/SUCCESS)>
> >>>>>    
> >>>>>    Main PID: 31804 (atopacctd)
> >>>>>    
> >>>>>       Tasks: 1 (limit: 4915)
> >>>>>      
> >>>>>      CGroup: /system.slice/atopacct.service
> >>>>>      
> >>>>>              └─31804 /usr/sbin/atopacctd
> >>>>> 
> >>>>> Okt 26 10:21:29 merkaba atopacctd[31804]: Version: 2.2-4 - 2016/10/14
> >>>>> 23:17:04  <gerlof.langev...@atoptool.nl> Okt 26 10:21:29 merkaba
> >>>>> systemd[1]: Starting Atop process accounting daemon... Okt 26 10:21:29
> >>>>> merkaba atopacctd[31804]: accounting to /run/pacct_source Okt 26
> >>>>> 10:21:29
> >>>>> merkaba systemd[1]: Started Atop process accounting daemon.
> >>>>> 
> >>>>> 
> >>>>> Yet no procacct warning in atop.
> >>>> 
> >>>> I'm running out of ideas. Can you strace atop and see where it fails?
> >>> 
> >>> After your answering your question below I think it is still atopacctd
> >>> having>
> >>> 
> >>> issues:
> >>>>> But accounting seems to work:
> >>>>> 
> >>>>> merkaba:~> find /run/pacct_source -ls
> >>>>> 
> >>>>>     1297627     44 -rw-------   1 root     root        44928 Okt 26
> >>>>>     10:28
> >>>>>     /run/pacct_source
> >>>> 
> >>>> How about the files in /run/pacct_shadow.d/? These are the ones an
> >>>> atop process tries to read.
> >>>> 
> >>>> I have:
> >>>> -rw-r--r--  1 root root 21184 Oct 26 00:46 0000000000.paf
> >>>> -rw-r--r--  1 root root     7 Oct 26 00:00 current
> >>>> 
> >>>> and the paf file is growing.
> >>> 
> >>> There is the issue:
> >>> 
> >>> merkaba:~> ls -l /run/pacct_source
> >>> -rw------- 1 root root 1174656 Okt 27 22:46 /run/pacct_source
> >>> merkaba:~> ls -l /run/pacct_shadow.d
> >>> insgesamt 4
> >>> -rw-r--r-- 1 root root 0 Okt 27 06:53 0000000000.paf
> >>> -rw-r--r-- 1 root root 7 Okt 27 06:53 current
> >>> 
> >>> The pacct_source file grows, but the paf file does not.
> >>> 
> >>> merkaba:~> ps aux | grep "[a]top"
> >>> root      1573  0.0  0.0   4184    84 ?        S<   06:53   0:00
> >>> /usr/sbin/
> >>> atopacctd
> >>> root      1904  0.0  0.0  27500 10492 ?        S<Ls 06:53   0:26
> >>> /usr/bin/atop -a -R -w /var/log/atop/atop_20161027 600
> >>> 
> >>> merkaba:~> LANG=C systemctl status atopacct
> >>> * atopacct.service - Atop process accounting daemon
> >>> 
> >>>      Loaded: loaded (/lib/systemd/system/atopacct.service; enabled;
> >>>      vendor
> >>> 
> >>> preset: enabled)
> >>> 
> >>>      Active: active (running) since Thu 2016-10-27 06:53:31 CEST; 15h
> >>>      ago
> >>>      
> >>>        Docs: man:atopacctd(8)
> >>>    
> >>>    Main PID: 1573 (atopacctd)
> >>>    
> >>>       Tasks: 1 (limit: 4915)
> >>>      
> >>>      CGroup: /system.slice/atopacct.service
> >>>      
> >>>              `-1573 /usr/sbin/atopacctd
> >>> 
> >>> Oct 27 06:53:31 merkaba systemd[1]: Starting Atop process accounting
> >>> daemon... Oct 27 06:53:31 merkaba atopacctd[1573]: Version: 2.2-4 -
> >>> 2016/10/14 23:17:04 <gerlof[…]>
> >>> Oct 27 06:53:31 merkaba atopacctd[1573]: accounting to /run/pacct_source
> >>> Oct 27 06:53:31 merkaba systemd[1]: Started Atop process accounting
> >>> daemon.
> >>> 
> >>> 
> >>> Hmmm, but atopacctd actually accounts to /run/pacct_source? Is that a
> >>> configuration mismatch?
> >>> 
> >>> Okay, from strace atop opens the paf file and so… reads from a zero byte
> >>> file.
> >>> 
> >>> 25046 open("/run/pacct_shadow.d/current", O_RDONLY) = 3
> >>> 25046 open("/run/pacct_shadow.d/0000000000.paf", O_RDONLY) = 3
> >>> 
> >>> I attach the complete output of strace -e file -f -o /tmp/atop.strace
> >>> atop.
> >>> 
> >>> from manpage of atopacctd:
> >>>          The directory /run is used as the default topdirectory.   Below
> >>>          this  top-directory, the source file pacct_source is created to
> >>>          which the kernel writes the process accounting records.
> >>>          Furthermore, the subdirectory pacct_shadow.d is  created  as  a
> >>>          'container' for the shadow files.  Apart from the shadow files,
> >>>          also the file current is maintained in this subdirectory,  con‐
> >>>          taining the sequence number of the current (newest) shadow file
> >>>          and the maximum number of records that will be written in  each
> >>>          shadow file.
> >>> 
> >>> I don´t fully understand the mechanism atopacctd uses here from above
> >>> description, but it for some reason file pacct_source definately grows,
> >>> but
> >>> 00000….paf remains empty.
> > 
> > […]
> > 
> >> It would be of great help if you could do the following:
> >> 
> >> 1) Stop atopacctd and stop all atop incarnations (clean situation).
> >> 
> >> 2) Start atopacctd again (via systemctl).
> >> 
> >> 3) Start 'strace -tt -f -o atopacct.strace -p <pid_of_atopacctd>
> >> 
> >> 4) Manually start command 'strace -tt -f -o atop.strace  atop 1 1 >
> >> /tmp/atop.out'
> >> 
> >>       You could verify if the file /tmp/atop.out really contains the
> >> 
> >> indication 'no procacct'
> >> 
> >>       to be sure that the problem is caught.
> >> 
> >> 5) Stop the 'strace' command on atopacctd.
> >> 
> >> Please send me the files atopacct.strace, atop.strace and /tmp/atop.out.
> > 
> > 1) merkaba:~> ps aux |  grep "[a]top"
> > merkaba:~#1>
> > 
> > 2 und 3) merkaba:~#1> systemctl start atopacct ; strace -tt -f -o /tmp/
> > atop.strace -p $( pidof /usr/sbin/atopacctd )
> > Job for atopacct.service failed because of unavailable resources or
> > another
> > system error.
> > See "systemctl status atopacct.service" and "journalctl -xe" for details.
> > strace: option requires an argument -- 'p'
> > Try 'strace -h' for more information.
> > merkaba:~#1> systemctl status atopacct
> > ● atopacct.service - Atop process accounting daemon
> > 
> >     Loaded: loaded (/lib/systemd/system/atopacct.service; enabled; vendor
> > 
> > preset: ena
> > 
> >     Active: failed (Result: resources) since Wed 2016-11-09 09:25:10 CET;
> >     14s
> > 
> > ago
> > 
> >       Docs: man:atopacctd(8)
> >    
> >    Process: 13973 ExecStart=/usr/sbin/atopacctd (code=exited,
> >    status=0/SUCCESS)>   
> >   Main PID: 1050 (code=killed, signal=KILL)
> > 
> > Nov 09 09:25:10 merkaba systemd[1]: Starting Atop process accounting
> > daemon... Nov 09 09:25:10 merkaba atopacctd[13973]: /run/pacct_shadow.d:
> > File exists Nov 09 09:25:10 merkaba systemd[1]: atopacct.service: PID
> > file /run/ atopacctd.pid not readable (yet?) after start: No such file or
> > directory Nov 09 09:25:10 merkaba systemd[1]: Failed to start Atop
> > process accounting daemon.
> > Nov 09 09:25:10 merkaba systemd[1]: atopacct.service: Unit entered failed
> > state.
> > Nov 09 09:25:10 merkaba systemd[1]: atopacct.service: Failed with result
> > 'resources'
> > 
> > (I report this as one another bug when I get the chance, Mark, this again
> > is the /run/pacct_shadow.d: File exists issue I stumbled across before)
> > 
> > merkaba:~#3> rm -r /run/pacct_shadow.d
> > merkaba:~> systemctl stop atopacct
> > merkaba:~> ps aux |  grep "[a]top"
> > merkaba:~#1> systemctl start atopacct ; strace -tt -f -o /tmp/atop.strace
> > -p $ ( pidof /usr/sbin/atopacctd )
> > strace: Process 14074 attached
> > 
> > 4) merkaba:~> strace -tt -f -o /tmp/atop.strace  atop 1 1 >/tmp/atop.out
> > 
> > Waited till some seconds after completion
> > 
> > 5) merkaba:~> systemctl start atopacct ; strace -tt -f -o
> > /tmp/atopacct.strace -p $( pidof /usr/sbin/atopacctd )
> > strace: Process 14233 attached
> > ^Cstrace: Process 14233 detached
> > 
> > 
> > Files attached. atopacct.strace is quite small. The strace misses the
> > startup of atopacctd even tough I directly start if after the systemctl
> > start.
> > 
> > Thank you,
> 
> Observations ATOP system call trace:
> ------------------------------------
> 
> When atop starts, the following steps are taken:
> 
> 1) Check if process accounting is controlled by the atopacct
>      daemon (verify the presence of a semaphore for this).
> 
> 2) Read the first process accounting record in the current
>      shadow file to determine the type of accounting record.
> 
> 3) If the current shadow file is empty (in case that the first
>      atop incarnation is started), for a child process which
>      immediately terminates to force an accounting record to
>      be written to the current shadow file.
>      Afterwards repeat step 2).
> 
> 4) If the current shadow file is still empty after
>      forking/terminating a child process, apparently process
>      accounting does not work properly.
> 
>      This is the reason why atop indicates "no procacct"!!
> 
> 
> Observations ATOPACCTD system call trace:
> ------------------------------------
> 
> When atopacctd runs, the following steps are taken:
> 
> 1) Wait for a message from the netlink socket (neutral
>      state).
> 
> 2) When a process terminates, atopacctd receives
>      a message from the netlink socket releasing atopacctd
>      from its wait state.
> 
> 3) Read the new process accounting record(s) from the pacct_source file
> and write them
>      to the shadow file if at least one atop process is active (which is
> the case here).
> 
> 4) Continue with step 1.
> 
> The system call trace shows that atopacctd does NOT receive
> a message from the netlink socket at the moment that atop
> forks/terminates a child process. It remains pending in step 1.
> Therefore atop thinks that process accounting is not activated.
> 
> So the question is why atopacctd is not informed by the kernel
> via the netlink socket about the termination of a process?
> And why does it only happen on this test system (I never encounter
> these problems on my own system, neither do a lot of other atop users)?
> 
> 1) Is the netlink interface configured in this kernel?
>      It is one of the options you can select while configuring the kernel.

I tested this with standard Debian kernel 4.8.0-1-amd64 (x86_64) as self-
compiled 4.9-rc4 has severe regressions in Intel DRM driver.

I am not sure which option to look for. Is it CONFIG_NETLINK_DIAG? It is 
enabled in the Debian kernel and my self-compiled kernels.

> 2) Can you anyhow see that the pacct_source ever grows on this system?
>      If so, the process accounting mechanism and the netlink socket
> anyhow works.

Yes, it grows:

merkaba:/run> for (( I=1 ; I<=5 ; I=I+1 )); do ls -l pacct_source ; sleep 1; 
done
-rw------- 1 root root 299008 Nov 14 18:33 pacct_source
-rw------- 1 root root 299136 Nov 14 18:33 pacct_source
-rw------- 1 root root 299264 Nov 14 18:33 pacct_source
-rw------- 1 root root 299392 Nov 14 18:34 pacct_source
-rw------- 1 root root 299520 Nov 14 18:34 pacct_source

> 3) Is the psacct package installed on this system?
>      Sometimes this package disturbs the process accounting mechanism,
> because
>      it switches on process accounting by itself.

It seem that I do not have this at all in Debian:

merkaba:~> locate psacct
merkaba:~#1> 
merkaba:~> dpkg -l | grep psacct
merkaba:~#1>       
merkaba:~#1> apt-cache search psacct
merkaba:~> apt-cache search process accounting | sort 
chiark-utils-bin - chiark system administration utilities
cups-tea4cups - Swiss Army's knife of advanced CUPS administrators
flow-tools - collects and processes NetFlow data
gnucash-docs - Documentation for gnucash, a personal finance tracking program
munge - authentication service to create and validate credentials
python-gnucash - Gnucash interface for Python
sql-ledger - Web based double-entry accounting program

(Interestingly it doesn´t find the atopacctd package either, but thats more an 
issue of the description.)

Thanks,
-- 
Martin

Reply via email to