> -----邮件原件----- > 发件人: Paul Moore [mailto:p...@paul-moore.com] > 发送时间: 2019年9月18日 3:17 > 收件人: Li,Rongqing <lirongq...@baidu.com> > 抄送: Eric Paris <epa...@redhat.com>; linux-audit@redhat.com > 主题: Re: [PATCH][RFC] audit: set wait time to zero when audit failed > > On Mon, Sep 16, 2019 at 9:08 PM Li,Rongqing <lirongq...@baidu.com> wrote: > > > -----邮件原件----- > > > 发件人: Paul Moore [mailto:p...@paul-moore.com] > > > 发送时间: 2019年9月17日 6:52 > > > 收件人: Li,Rongqing <lirongq...@baidu.com> > > > 抄送: Eric Paris <epa...@redhat.com>; linux-audit@redhat.com > > > 主题: Re: [PATCH][RFC] audit: set wait time to zero when audit failed > > > > > > On Sun, Sep 15, 2019 at 10:55 PM Li,Rongqing <lirongq...@baidu.com> > wrote: > > > > > > if audit_log_start failed because queue is full, kauditd is > > > > > > waiting the receiving queue empty, but no receiver, a task > > > > > > will be forced to wait 60 seconds for each audited syscall, > > > > > > and it will be hang for a very long time > > > > > > > > > > > > so at this condition, set the wait time to zero to reduce > > > > > > wait, and restore wait time when audit works again > > > > > > > > > > > > it partially restore the commit 3197542482df ("audit: rework > > > > > > audit_log_start()") > > > > > > > > > > > > Signed-off-by: Li RongQing <lirongq...@baidu.com> > > > > > > Signed-off-by: Liang ZhiCheng <liangzhich...@baidu.com> > > > > > > --- > > > > > > reboot is taking a very long time on my machine(centos 6u4 > > > > > > +kernel > > > > > > 5.3) since TIF_SYSCALL_AUDIT is set by default, and when > > > > > > reboot, userspace process which receiver audit message , will > > > > > > be killed, and lead to that no user drain the audit queue > > > > > > > > > > > > git bitsect show it is caused by 3197542482df ("audit: rework > > > > > > audit_log_start()") > > > > > > > > > > > > kernel/audit.c | 9 +++++++-- > > > > > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > > > > > > > This is typically solved by increasing the backlog using the > > > "audit_backlog_limit" > > > > > kernel parameter (link to the docs below). > > > > > > > > It should be able to avoid my issue, but the default behaviors > > > > does not > > > working for me; And not all have enough knowledge about audit, who > > > maybe spend lots of effort to find the root cause, and estimate how > > > large should be "audit_backlog_limit" > > > > > > The pause/sleep behavior is desired behavior and is intended to help > > > kauditd/auditd process the audit backlog on a busy system. If we > > > didn't sleep the current process and give kauditd/auditd a chance to > > > flush the backlog when it was full, a lot of bad things could happen > > > with respect to audit. We generally select the backlog limit so > > > that this is not a problem for most systems, although there will > > > always be edge cases where the default does not work well; it is > > > impossible > to pick defaults that work well for every case. > > > > > > > I just want to it as before 3197542482df ("audit: rework > > audit_log_start()"), wait 60 seconds once if > > auditd/readaheaad-collector have some problem to drain the audit backlog. > > The patch you mention fixed what was deemed to be buggy behavior; as > mentioned previously in this thread I see no good reason to go back to the old > behavior. > > > > If you are not using audit, you can always disable it via the kernel > > > command line, or at runtime (look at what Fedora does). > > > > > > > > You might also want to investigate what is generating some many > > > > > audit records prior to starting the audit daemon. > > > > > > > > It is /sbin/readahead-collector, in fact, we stop the auditd; We > > > > are doing a > > > reboot test, which rebooting machine continue to test hardware/software. > > > > > > > > it is same as below: > > > > auditctl -a always,exit -S all -F pid='xxx' > > > > kill -s 19 `pidof auditd` > > > > > > > > then the audited task will be hung > > > > > > So you are seeing this problem only when you run a test, or did you > > > provide this as a reproducer? > > > > auditctl -a always,exit -S all -F ppid=`pidof sshd` kill -s 19 `pidof > > auditd` ssh root@127.0.0.1 > > > > then ssh will be hung forever > > That is expected behavior. You are putting a massive audit load on the system > by telling the kernel to audit every syscall that sshd makes, then you are > intentionally killing the audit daemon and attempting to ssh into the system. > The proper fix(es) here would be to 1) set reasonable audit rules and/or 2) > use > an init system that monitors and restarts auditd when it fails (systemd has > this > capability, I believe some others do as well). >
Both are not working. The auditd is not dead, it is in stop status(kill -s 19). So systemd/init will not restart it. Even if with little audit rules, after multiple accesses, the backlog will full due to no receiver whether, I think, the original behavior maybe better commit ac4cec443a80bfde829516e7a7db10f7325aa528 Author: David Woodhouse <dw...@shinybook.infradead.org> Date: Sat Jul 2 14:08:48 2005 +0100 AUDIT: Stop waiting for backlog after audit_panic() happens We force a rate-limit on auditable events by making them wait for space on the backlog queue. However, if auditd really is AWOL then this could potentially bring the entire system to a halt, depending on the audit rules in effect. Firstly, make sure the wait time is honoured correctly -- it's the maximum time the process should wait, rather than the time to wait _each_ time round the loop. We were getting re-woken _each_ time a packet was dequeued, and the timeout was being restarted each time. Secondly, reset the wait time after audit_panic() is called. In general this will be reset to zero, to allow progress to be made. If the system is configured to _actually_ panic on audit_panic() then that will already have happened; otherwise we know that audit records are being lost anyway. These two tunables can't be exposed via AUDIT_GET and AUDIT_SET because those aren't particularly well-designed. It probably should have been done by sysctls or sysfs anyway -- one for a later patch. Thanks -RongQing > -- > paul moore > www.paul-moore.com -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit