On Wed, Nov 5, 2008 at 4:19 PM, Steve Grubb <[EMAIL PROTECTED]> wrote: > On Wednesday 05 November 2008 11:30:16 Lucas C. Villa Real wrote: >> I'm facing a situation where -ENOBUFS is returned from both >> audit_send() and audit_get_reply(). The system is under high stress, >> with 250k files being created and having creat() and chmod() syscalls >> audited. > > Is this what you really wanted to audit? :)
Yes, not a single event can be missed in the system I'm working on, unfortunately :) >> Looking the code at lib/netlink.c, I saw that audit_send() doesn't >> handle -ENOBUFS. Would it be possible to replace the condition from >> "while (retval < 0 && errno == EINTR)" to "while (retval < 0 && (errno >> == EINTR || errno == ENOBUFS))" to fix the problem when sending >> packets from userspace to kernel? > > Have you tried that? Does it fix the problem or just hang the utility? So far it didn't hang. However, just in case, I added a maximum number of retries (currently set to 64). I'm about to launch a new batch to stress the system once again, and then I'll be able to see if it works as expected. >> My understanding for the problem in audit_get_reply() is that the I/O >> buffers are all full and auditd was just not scheduled at the expected >> rate, causing these buffers to overflow. Does that make sense? > > If you go over the backlog limit, you get a syslog message about that unless > you have it set to ignore. My guess would be that you have a general network > memory pool depletion and is not related to audit specifically. Yes. I hope that increasing auditd's priority will help to drain that. I'll let you know if that works. >> If it does, do you have a suggestion about the best way to approach this >> problem, besides changing auditd's priority? > > Increase the backlog and increase auditd's priority. I have not played with > running auditd with a different scheduler policy than whatever the default > is. But you may want to see if one of the other scheduler polices treat audit > better. or maybe you want to tune /proc/sys/kernel/sched_granularity_ns. > > >> One interesting thing which I noticed is that 'auditctl -s' doesn't >> report that messages were lost, > > They weren't lost by the audit system so it doesn't know they didn't arrive. Do you think it would make sense to add an extra member to struct sk_buff (a pointer to a callback function) and then have skb_queue_tail() signal if it failed to send a message? That would allow audit to keep track of such losses, as well as any other subsystem using netlink for communicating with userspace. >> This is happening with an old kernel, 2.6.16.46 + a bunch of patches, >> and audit 1.7.4. I cannot completely upgrade it to a new release, but >> I can certainly backport audit specific bits if you remember having >> fixed something similar since then. > > Well, that proc tunable is only available for the CFS scheduler. Not sure what > you have for older kernels. It's not, but I'll keep looking for other ways to improve the responsiveness of auditd here. Thanks! Lucas -- Linux-audit mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-audit
