Fast network file copy; "recvfile()" ?
I need to copy large (> 100GB) files between machines on a fast network. Both machines have reasonably fast disk subsystems, with read/write performance benchmarked at > 800 MB/sec. Using 10GigE cards and the usual tweaks to tcp_rmem etc., I am getting single-stream TCP throughput better than 600 MB/sec. My question is how best to move the actual file. NFS writes appear to max out at a little over 100 MB/sec on this configuration. FTP and rcp give me around 250 MB/sec. Thus I am planning to write custom code to send and receive the file. For sending, I believe my best options are: 1) O_DIRECT read() + send() 2) mmap() + madvise(WILLNEED) + send() 3) fadvise(WILLNEED) + sendfile() I am leaning towards (3), since I gather that sendfile() is supposed to be pretty fast. My question is what to do on the receiving end. In short, if I want the equivalent of a "recvfile()" to go with sendfile(), what is my best bet on Linux? I will probably try recv() + O_DIRECT write(), but I am curious if there are other approaches I should try. Thanks! - Pat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Fast network file copy; recvfile() ?
I need to copy large ( 100GB) files between machines on a fast network. Both machines have reasonably fast disk subsystems, with read/write performance benchmarked at 800 MB/sec. Using 10GigE cards and the usual tweaks to tcp_rmem etc., I am getting single-stream TCP throughput better than 600 MB/sec. My question is how best to move the actual file. NFS writes appear to max out at a little over 100 MB/sec on this configuration. FTP and rcp give me around 250 MB/sec. Thus I am planning to write custom code to send and receive the file. For sending, I believe my best options are: 1) O_DIRECT read() + send() 2) mmap() + madvise(WILLNEED) + send() 3) fadvise(WILLNEED) + sendfile() I am leaning towards (3), since I gather that sendfile() is supposed to be pretty fast. My question is what to do on the receiving end. In short, if I want the equivalent of a recvfile() to go with sendfile(), what is my best bet on Linux? I will probably try recv() + O_DIRECT write(), but I am curious if there are other approaches I should try. Thanks! - Pat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
oom-killer with 27G free swap and overcommit_memory=2
I am using Linux 2.6.16.46-0.12-smp (SUSE 10 SP1 stock kernel). I do intend to bother SUSE, but I am hoping some kind kernel savant can help me interpret these log messages and/or give me some suggestions for how to proceed. My system is a SunFire x4100 (x86_64) with 16G of RAM and 32G of swap in a single partition. I have an application which consumes a lot of memory, and after a few hours the oom-killer kills it. This would not be surprising, except a) the machine still has 27G of free swap at the time; and b) it happens even when I set vm.overcommit_memory=2 (with overcommit_ratio=50). It is time-consuming to reproduce, but consistent. Below is the first batch of messages from the oom-killer. In this case, overcommit_memory was set to 2, and the application received no failures from malloc() or new. 9-10 more such batches appear in the log in rapid succession before I get the "Kill process ... Killed process " pair of messages. Here are my questions. I realize these log messages may be insufficient to answer them, and I am ready to provide more information upon request. (Unfortunately I cannot provide the application itself. But I can perform tests and provide logs.) 1) Does this behavior definitely indicate a kernel bug? If not, what could my application possibly be doing to cause it? 2) Is there any sysctl setting that might "avoid" this problem? Or a way to change my application to avoid it? 3) Should I attempt to reproduce this problem on the stock kernel.org kernel? Thank you for reading, and please CC me on any replies. - Pat Aug 24 15:02:49 localhost kernel: oom-killer: gfp_mask=0x201d2, order=0 Aug 24 15:02:49 localhost kernel: Aug 24 15:02:49 localhost kernel: Call Trace: {out_of_memory+93} {__alloc_pages+552} Aug 24 15:02:49 localhost kernel: {page_cache_alloc_cold+13} {__do_page_cache_readahead+149} Aug 24 15:02:49 localhost kernel: {sync_page+0} {getnstimeofday+16} Aug 24 15:02:49 localhost kernel: {ktime_get_ts+26} {delayacct_end+93} Aug 24 15:02:49 localhost kernel: {blockable_page_cache_readahead+83} Aug 24 15:02:49 localhost kernel: {make_ahead_window+130} {page_cache_readahead+337} Aug 24 15:02:49 localhost kernel: {filemap_nopage+161} {__handle_mm_fault+511} Aug 24 15:02:49 localhost kernel: {do_page_fault+965} {default_wake_function+0} Aug 24 15:02:49 localhost kernel:{error_exit+0} Aug 24 15:02:49 localhost kernel: Mem-info: Aug 24 15:02:49 localhost kernel: Node 1 DMA per-cpu: empty Aug 24 15:02:50 localhost kernel: Node 1 DMA32 per-cpu: empty Aug 24 15:02:50 localhost kernel: Node 1 Normal per-cpu: Aug 24 15:02:50 localhost kernel: cpu 0 hot: high 186, batch 31 used:30 Aug 24 15:02:50 localhost kernel: cpu 0 cold: high 62, batch 15 used:27 Aug 24 15:02:50 localhost kernel: cpu 1 hot: high 186, batch 31 used:160 Aug 24 15:02:50 localhost kernel: cpu 1 cold: high 62, batch 15 used:16 Aug 24 15:02:50 localhost kernel: cpu 2 hot: high 186, batch 31 used:0 Aug 24 15:02:50 localhost kernel: cpu 2 cold: high 62, batch 15 used:45 Aug 24 15:02:50 localhost kernel: cpu 3 hot: high 186, batch 31 used:94 Aug 24 15:02:50 localhost kernel: cpu 3 cold: high 62, batch 15 used:60 Aug 24 15:02:50 localhost kernel: Node 1 HighMem per-cpu: empty Aug 24 15:02:50 localhost kernel: Node 0 DMA per-cpu: Aug 24 15:02:50 localhost kernel: cpu 0 hot: high 0, batch 1 used:0 Aug 24 15:02:50 localhost kernel: cpu 0 cold: high 0, batch 1 used:0 Aug 24 15:02:50 localhost kernel: cpu 1 hot: high 0, batch 1 used:0 Aug 24 15:02:50 localhost kernel: cpu 1 cold: high 0, batch 1 used:0 Aug 24 15:02:50 localhost rpc.mountd: export request from 192.146.3.6 Aug 24 15:02:50 localhost kernel: cpu 2 hot: high 0, batch 1 used:0 Aug 24 15:02:50 localhost rpc.mountd: export request from 192.146.3.6 Aug 24 15:02:50 localhost kernel: cpu 2 cold: high 0, batch 1 used:0 Aug 24 15:02:50 localhost rpc.mountd: export request from 192.146.3.6 Aug 24 15:02:50 localhost kernel: cpu 3 hot: high 0, batch 1 used:0 Aug 24 15:02:50 localhost kernel: cpu 3 cold: high 0, batch 1 used:0 Aug 24 15:02:51 localhost kernel: Node 0 DMA32 per-cpu: Aug 24 15:02:51 localhost kernel: cpu 0 hot: high 186, batch 31 used:145 Aug 24 15:02:51 localhost rpc.mountd: authenticated mount request from server:936 for /share (/share) Aug 24 15:02:51 localhost kernel: cpu 0 cold: high 62, batch 15 used:42 Aug 24 15:02:51 localhost kernel: cpu 1 hot: high 186, batch 31 used:160 Aug 24 15:02:51 localhost kernel: cpu 1 cold: high 62, batch 15 used:14 Aug 24 15:02:51 localhost kernel: cpu 2 hot: high 186, batch 31 used:0 Aug 24 15:02:51 localhost kernel: cpu 2 cold: high 62, batch 15 used:0 Aug 24 15:02:51 localhost kernel: cpu 3 hot: high 186, batch 31 used:4 Aug 24 15:02:51 localhost kernel: cpu 3 cold: high 62, batch 15 used:0 Aug 24 15:02:51 localhost kernel: Node 0 Normal per-cpu: Aug 24 15:02:51 localhost kernel: cpu 0 hot: high 186, batch 31 used:7 Aug 24 15:02:51 localhost kernel: cpu 0 cold: high 62, batch 15 used:58 Aug 24
oom-killer with 27G free swap and overcommit_memory=2
I am using Linux 2.6.16.46-0.12-smp (SUSE 10 SP1 stock kernel). I do intend to bother SUSE, but I am hoping some kind kernel savant can help me interpret these log messages and/or give me some suggestions for how to proceed. My system is a SunFire x4100 (x86_64) with 16G of RAM and 32G of swap in a single partition. I have an application which consumes a lot of memory, and after a few hours the oom-killer kills it. This would not be surprising, except a) the machine still has 27G of free swap at the time; and b) it happens even when I set vm.overcommit_memory=2 (with overcommit_ratio=50). It is time-consuming to reproduce, but consistent. Below is the first batch of messages from the oom-killer. In this case, overcommit_memory was set to 2, and the application received no failures from malloc() or new. 9-10 more such batches appear in the log in rapid succession before I get the Kill process ... Killed process pair of messages. Here are my questions. I realize these log messages may be insufficient to answer them, and I am ready to provide more information upon request. (Unfortunately I cannot provide the application itself. But I can perform tests and provide logs.) 1) Does this behavior definitely indicate a kernel bug? If not, what could my application possibly be doing to cause it? 2) Is there any sysctl setting that might avoid this problem? Or a way to change my application to avoid it? 3) Should I attempt to reproduce this problem on the stock kernel.org kernel? Thank you for reading, and please CC me on any replies. - Pat Aug 24 15:02:49 localhost kernel: oom-killer: gfp_mask=0x201d2, order=0 Aug 24 15:02:49 localhost kernel: Aug 24 15:02:49 localhost kernel: Call Trace: 801613f5{out_of_memory+93} 8016329c{__alloc_pages+552} Aug 24 15:02:49 localhost kernel: 8015e4e9{page_cache_alloc_cold+13} 80164a9f{__do_page_cache_readahead+149} Aug 24 15:02:49 localhost kernel: 8015e31a{sync_page+0} 80136af1{getnstimeofday+16} Aug 24 15:02:49 localhost kernel: 80148129{ktime_get_ts+26} 8015be29{delayacct_end+93} Aug 24 15:02:49 localhost kernel: 80164c6a{blockable_page_cache_readahead+83} Aug 24 15:02:49 localhost kernel: 80164d4b{make_ahead_window+130} 80164eb8{page_cache_readahead+337} Aug 24 15:02:49 localhost kernel: 8016090d{filemap_nopage+161} 8016b2eb{__handle_mm_fault+511} Aug 24 15:02:49 localhost kernel: 802dc44e{do_page_fault+965} 8012aee7{default_wake_function+0} Aug 24 15:02:49 localhost kernel:8010bc15{error_exit+0} Aug 24 15:02:49 localhost kernel: Mem-info: Aug 24 15:02:49 localhost kernel: Node 1 DMA per-cpu: empty Aug 24 15:02:50 localhost kernel: Node 1 DMA32 per-cpu: empty Aug 24 15:02:50 localhost kernel: Node 1 Normal per-cpu: Aug 24 15:02:50 localhost kernel: cpu 0 hot: high 186, batch 31 used:30 Aug 24 15:02:50 localhost kernel: cpu 0 cold: high 62, batch 15 used:27 Aug 24 15:02:50 localhost kernel: cpu 1 hot: high 186, batch 31 used:160 Aug 24 15:02:50 localhost kernel: cpu 1 cold: high 62, batch 15 used:16 Aug 24 15:02:50 localhost kernel: cpu 2 hot: high 186, batch 31 used:0 Aug 24 15:02:50 localhost kernel: cpu 2 cold: high 62, batch 15 used:45 Aug 24 15:02:50 localhost kernel: cpu 3 hot: high 186, batch 31 used:94 Aug 24 15:02:50 localhost kernel: cpu 3 cold: high 62, batch 15 used:60 Aug 24 15:02:50 localhost kernel: Node 1 HighMem per-cpu: empty Aug 24 15:02:50 localhost kernel: Node 0 DMA per-cpu: Aug 24 15:02:50 localhost kernel: cpu 0 hot: high 0, batch 1 used:0 Aug 24 15:02:50 localhost kernel: cpu 0 cold: high 0, batch 1 used:0 Aug 24 15:02:50 localhost kernel: cpu 1 hot: high 0, batch 1 used:0 Aug 24 15:02:50 localhost kernel: cpu 1 cold: high 0, batch 1 used:0 Aug 24 15:02:50 localhost rpc.mountd: export request from 192.146.3.6 Aug 24 15:02:50 localhost kernel: cpu 2 hot: high 0, batch 1 used:0 Aug 24 15:02:50 localhost rpc.mountd: export request from 192.146.3.6 Aug 24 15:02:50 localhost kernel: cpu 2 cold: high 0, batch 1 used:0 Aug 24 15:02:50 localhost rpc.mountd: export request from 192.146.3.6 Aug 24 15:02:50 localhost kernel: cpu 3 hot: high 0, batch 1 used:0 Aug 24 15:02:50 localhost kernel: cpu 3 cold: high 0, batch 1 used:0 Aug 24 15:02:51 localhost kernel: Node 0 DMA32 per-cpu: Aug 24 15:02:51 localhost kernel: cpu 0 hot: high 186, batch 31 used:145 Aug 24 15:02:51 localhost rpc.mountd: authenticated mount request from server:936 for /share (/share) Aug 24 15:02:51 localhost kernel: cpu 0 cold: high 62, batch 15 used:42 Aug 24 15:02:51 localhost kernel: cpu 1 hot: high 186, batch 31 used:160 Aug 24 15:02:51 localhost kernel: cpu 1 cold: high 62, batch 15 used:14 Aug 24 15:02:51 localhost kernel: cpu 2 hot: high 186, batch 31 used:0 Aug 24 15:02:51 localhost kernel: cpu 2 cold: high 62, batch 15 used:0 Aug 24 15:02:51 localhost kernel: cpu 3 hot: high 186, batch 31 used:4 Aug 24 15:02:51 localhost
Re: Fortuna
Theodore Ts'o <[EMAIL PROTECTED]> writes: > With a properly set up set of init scripts, /dev/random is > initialized with seed material for all but the initial boot What about CD-ROM based distros (e.g., Knoppix), where every boot is the initial boot? > (and even that problem can be solved by having the distribution's > install scripts set up /var/lib/urandom/random-seed after installing > the system). Could you elaborate? How should Knoppix seed its /dev/urandom? Would reading 256 bits from /dev/random and writing them to /dev/urandom do the job? - Pat - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fortuna
Theodore Ts'o [EMAIL PROTECTED] writes: With a properly set up set of init scripts, /dev/random is initialized with seed material for all but the initial boot What about CD-ROM based distros (e.g., Knoppix), where every boot is the initial boot? (and even that problem can be solved by having the distribution's install scripts set up /var/lib/urandom/random-seed after installing the system). Could you elaborate? How should Knoppix seed its /dev/urandom? Would reading 256 bits from /dev/random and writing them to /dev/urandom do the job? - Pat - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [NFS] Help: 2.2.18 NFS is corrupting our files
Trond Myklebust <[EMAIL PROTECTED]> writes: > What filesystem are you exporting? Just ext2; all of our file systems are ext2. The disks here are a mixture of IDE, SCSI (aic7xxx and sym53c8xx), and Mylex DAC960 RAID. In this case, the machine running 2.2.18 has aic7xxx SCSI. I suspect I could reproduce the problem on other configurations; let me know if that would be useful. - Pat - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [NFS] Help: 2.2.18 NFS is corrupting our files
Trond Myklebust [EMAIL PROTECTED] writes: What filesystem are you exporting? Just ext2; all of our file systems are ext2. The disks here are a mixture of IDE, SCSI (aic7xxx and sym53c8xx), and Mylex DAC960 RAID. In this case, the machine running 2.2.18 has aic7xxx SCSI. I suspect I could reproduce the problem on other configurations; let me know if that would be useful. - Pat - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: syslog() blocks on glibc 2.1.3 with kernel 2.2.x
Igmar Palsenberg <[EMAIL PROTECTED]> writes: > On 23 Oct 2000, Patrick J. LoPresti wrote: > > > Not true. The named on our loghost is authoritative for the reverse > > mappings for all of the machines which can log there. > > Put the names of your machines in /etc/hosts on your logmachine. This is not an option for us, unfortunately. Many of our IP addresses are dynamically assigned, with the DNS tables dynamically updated. Thank you for the patch to syslogd, though! Can you try to get your "-x" option into the standard distributions of syslogd, or should I work up a bug report / feature request for Red Hat myself? - Pat - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: syslog() blocks on glibc 2.1.3 with kernel 2.2.x
Igmar Palsenberg [EMAIL PROTECTED] writes: On 23 Oct 2000, Patrick J. LoPresti wrote: Not true. The named on our loghost is authoritative for the reverse mappings for all of the machines which can log there. Put the names of your machines in /etc/hosts on your logmachine. This is not an option for us, unfortunately. Many of our IP addresses are dynamically assigned, with the DNS tables dynamically updated. Thank you for the patch to syslogd, though! Can you try to get your "-x" option into the standard distributions of syslogd, or should I work up a bug report / feature request for Red Hat myself? - Pat - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: syslog() blocks on glibc 2.1.3 with kernel 2.2.x
Ricky Beam <[EMAIL PROTECTED]> writes: > Personally, I'd look closely at your setup to determine exactly why > this has become a problem. named is being blocked on writing to > /dev/log. This should only happen if there is sufficient _local_ > syslog traffic to fill the buffer or syslogd has too much remote > traffic to ever read from /dev/log. There is a lot of local traffic as well, yes. Lots of local traffic means named eventually finds itself waiting in line to log. Lots of remote traffic means syslogd is trying to talk to named a lot (to do reverse lookups). named waiting in line + syslogd trying to talk to named == deadlock; this is not too hard to see. Once the name resolution times out, you might expect things to become unstuck. But they don't. Perhaps syslogd is not giving higher priority to local messages; if it did, maybe it could recover from the deadlock. But this would not be a reliable solution; the only reliable solution is for syslogd to be independent of any processes which need to talk to it. > Per chance are you running the name service caching daemon (nscd)? No. > I'd also guess you aren't disabling fsync() for your sysylog files > (it's part of the syslog.conf format) -- this is a conciderable > drain on syslogd. I see no documentation for such an option in the syslog.conf man page. This is with the current Red Hat 6.2 syslogd (package sysklogd-1.3.31-17). - Pat - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: syslog() blocks on glibc 2.1.3 with kernel 2.2.x
Ricky Beam <[EMAIL PROTECTED]> writes: > syslogd isn't the blocker. The syslog functions in glibc being > called by named are the problem. Stop named from blocking on syslog > writes and the world will be happy again. So I have the glibc maintainer (and others) saying that syslog messages should never be dropped, and you saying that named should be dropping its syslog messages. One more time, from the top: named is calling syslog(), a glibc function. This function *blocks* waiting for delivery to the local syslogd, even though it is using SOCK_DGRAM sockets. There is no option to openlog() or syslog() to get non-blocking behavior (the LOG_NDELAY option means something else entirely). You are effectively suggesting that named should be rewritten not to use the glibc syslog functions at all. That strikes me as the worst suggestion so far; it would be far better for syslogd not to do name lookups. But my syslogd has no option to avoid name lookups; I will submit a request to add one. > I've gotta ask what kind of "load" can cause this to happen. > And for the record, syslogd shouldn't be doing DNS lookups for > things arriving via /dev/log -- that's always the local machine. This particular syslogd also accepts messages from remote hosts. So when there is a lot of syslog traffic, this syslogd talks to named a lot. named occasionally sends messages to syslog. Since syslog pauses waiting for named to respond to name queries, and named blocks waiting for syslog to consume the message, a deadlock is triggered. True, it is not a full deadlock, because the name query times out eventually. But it is bad enough that the system becomes largely non-responsive. - Pat - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: syslog() blocks on glibc 2.1.3 with kernel 2.2.x
Ricky Beam <[EMAIL PROTECTED]> writes: > I would suggest disabling name resolution for syslog, but that's an > ugly option. There's no way to stop a glibc system from doing a DNS > query for a reverse lookup. HOWEVER, you can set the DNS timeout to > 1 second and set the resolver options to prevent recursion (answer > from cache only.) Recursion has nothing to do with it; as I said, the named on this system is itself authoritative for all of the reverse lookups. Turning down the DNS timeout would affect *all* name resolution on the system, right? That is not acceptable. As I said, I already have a workaround, which is to have named log to a flat file. I agree that this is a poor workaround, and the "right fix" is to modify syslogd not to perform blocking operations. My only quibble is that SOCK_DGRAM is an odd transport to use here, even over AF_UNIX. > PS: Technically, this is not a lockup. syslogd should eventually > timeout waiting for the DNS query and go about it's business. Of > course, that may be upwards of 45 seconds -- very annoying. Yes. We are able to log in to the machine eventually and restart the offending processes. But that is little consolation to our users who notice the hang and the fallout afterward. - Pat - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: syslog() blocks on glibc 2.1.3 with kernel 2.2.x
Ulrich Drepper <[EMAIL PROTECTED]> writes: > If anything has to be changed it's (as suggested) the configuration > or even the implementation of syslogd. Make it robust. OK, but my current syslogd only listens to /dev/log as a SOCK_DGRAM. If I wanted reliable syslogging, it would be listening on it as a SOCK_STREAM. Maybe I care more about performance and backwards compatibility than reliable syslogging. But whatever my reasons, my connection to syslogd is already unreliable and therefore *should not block*. (Could a syslogd listen on /dev/log both as SOCK_DGRAM and as SOCK_STREAM? If so, then your philosophy implies that glibc should be trying SOCK_STREAM *first*, falling back to SOCK_DGRAM for historical compatibility. Either way, when it uses datagrams, it should never block, period.) - Pat - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: syslog() blocks on glibc 2.1.3 with kernel 2.2.x
Jesse Pollard <[EMAIL PROTECTED]> writes: > Don't configure syslogd to do reverse lookups. Our syslogd has no option to disable the reverse lookups. > You can NEVER guarantee that the reverse lookup will succeed, and > can be delayed several minutes for a single reply. Not true. The named on our loghost is authoritative for the reverse mappings for all of the machines which can log there. > I consider this a configuration error. I don't believe syslogd > should ever do a reverse lookup, since the name you are trying to > get may never arrive, or if arrives, it may be spoofed. There *is* no configuration for these tools which gives the behavior you describe, so this is not a "configuration error". > It's not a bug, but a security feature. NO log to syslogd should be > lost, since it may be related to an attack. Historically, no other Unix system has had reliable syslogging. It would require very defensive programming for syslogd, and that has clearly not been performed. And if this is what GNU/Linux intends, why does glibc use a SOCK_DGRAM socket for communication with syslogd? By definition, such sockets are *unreliable*. If syslog is supposed to be reliable, a different connection type must be used. Your philosophy that "no syslog message should ever be lost" is not necessarily bad. But it is clearly at odds with historical practice, the current glibc syslog() implementation, and the current syslogd itself. It is true that glibc falls back to using SOCK_STREAM if the SOCK_DGRAM connection fails. Does that mean GNU/Linux is expects syslog to be reliable eventually? If so, then my problem is entirely a bug in syslogd and I will report it as such. - Pat - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
syslog() blocks on glibc 2.1.3 with kernel 2.2.x
If you send SIGSTOP to syslogd on a Red Hat 6.2 system (glibc 2.1.3, kernel 2.2.x), within a few minutes you will find your entire machine grinds to a halt. For example, nobody can log in. This happens because once the /dev/log buffer fills, the syslog() glibc function blocks. This is a problem for us in Real Life because named and syslogd are deadlocking while trying to talk to each other. On our loghost, syslogd needs to do reverse name lookups while named needs to call syslog(). When traffic gets heavy all around, the deadlock is triggered, and it is quite unpleasant. We are about to configure named to use flat files instead of syslog, but that feels more like a workaround than a fix. I am not sure whether this is a glibc bug or a kernel bug. I have used netstat and the glibc sources to confirm that glibc is using a SOCK_DGRAM Unix socket to send to /dev/log. I thought DGRAM sockets were supposed to drop packets on the floor instead of blocking. But perhaps I am wrong and glibc is supposed to be explicitly marking the socket as non-blocking. Regardless of whose bug it is, I suggest that it needs to be fixed. To my knowledge, other Unix systems do not behave this way; they simply drop messages when syslogd is not responding. IMO, that is correct behavior. - Pat - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
syslog() blocks on glibc 2.1.3 with kernel 2.2.x
If you send SIGSTOP to syslogd on a Red Hat 6.2 system (glibc 2.1.3, kernel 2.2.x), within a few minutes you will find your entire machine grinds to a halt. For example, nobody can log in. This happens because once the /dev/log buffer fills, the syslog() glibc function blocks. This is a problem for us in Real Life because named and syslogd are deadlocking while trying to talk to each other. On our loghost, syslogd needs to do reverse name lookups while named needs to call syslog(). When traffic gets heavy all around, the deadlock is triggered, and it is quite unpleasant. We are about to configure named to use flat files instead of syslog, but that feels more like a workaround than a fix. I am not sure whether this is a glibc bug or a kernel bug. I have used netstat and the glibc sources to confirm that glibc is using a SOCK_DGRAM Unix socket to send to /dev/log. I thought DGRAM sockets were supposed to drop packets on the floor instead of blocking. But perhaps I am wrong and glibc is supposed to be explicitly marking the socket as non-blocking. Regardless of whose bug it is, I suggest that it needs to be fixed. To my knowledge, other Unix systems do not behave this way; they simply drop messages when syslogd is not responding. IMO, that is correct behavior. - Pat - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: syslog() blocks on glibc 2.1.3 with kernel 2.2.x
Jesse Pollard [EMAIL PROTECTED] writes: Don't configure syslogd to do reverse lookups. Our syslogd has no option to disable the reverse lookups. You can NEVER guarantee that the reverse lookup will succeed, and can be delayed several minutes for a single reply. Not true. The named on our loghost is authoritative for the reverse mappings for all of the machines which can log there. I consider this a configuration error. I don't believe syslogd should ever do a reverse lookup, since the name you are trying to get may never arrive, or if arrives, it may be spoofed. There *is* no configuration for these tools which gives the behavior you describe, so this is not a "configuration error". It's not a bug, but a security feature. NO log to syslogd should be lost, since it may be related to an attack. Historically, no other Unix system has had reliable syslogging. It would require very defensive programming for syslogd, and that has clearly not been performed. And if this is what GNU/Linux intends, why does glibc use a SOCK_DGRAM socket for communication with syslogd? By definition, such sockets are *unreliable*. If syslog is supposed to be reliable, a different connection type must be used. Your philosophy that "no syslog message should ever be lost" is not necessarily bad. But it is clearly at odds with historical practice, the current glibc syslog() implementation, and the current syslogd itself. It is true that glibc falls back to using SOCK_STREAM if the SOCK_DGRAM connection fails. Does that mean GNU/Linux is expects syslog to be reliable eventually? If so, then my problem is entirely a bug in syslogd and I will report it as such. - Pat - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: syslog() blocks on glibc 2.1.3 with kernel 2.2.x
Ulrich Drepper [EMAIL PROTECTED] writes: If anything has to be changed it's (as suggested) the configuration or even the implementation of syslogd. Make it robust. OK, but my current syslogd only listens to /dev/log as a SOCK_DGRAM. If I wanted reliable syslogging, it would be listening on it as a SOCK_STREAM. Maybe I care more about performance and backwards compatibility than reliable syslogging. But whatever my reasons, my connection to syslogd is already unreliable and therefore *should not block*. (Could a syslogd listen on /dev/log both as SOCK_DGRAM and as SOCK_STREAM? If so, then your philosophy implies that glibc should be trying SOCK_STREAM *first*, falling back to SOCK_DGRAM for historical compatibility. Either way, when it uses datagrams, it should never block, period.) - Pat - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: syslog() blocks on glibc 2.1.3 with kernel 2.2.x
Ricky Beam [EMAIL PROTECTED] writes: I would suggest disabling name resolution for syslog, but that's an ugly option. There's no way to stop a glibc system from doing a DNS query for a reverse lookup. HOWEVER, you can set the DNS timeout to 1 second and set the resolver options to prevent recursion (answer from cache only.) Recursion has nothing to do with it; as I said, the named on this system is itself authoritative for all of the reverse lookups. Turning down the DNS timeout would affect *all* name resolution on the system, right? That is not acceptable. As I said, I already have a workaround, which is to have named log to a flat file. I agree that this is a poor workaround, and the "right fix" is to modify syslogd not to perform blocking operations. My only quibble is that SOCK_DGRAM is an odd transport to use here, even over AF_UNIX. PS: Technically, this is not a lockup. syslogd should eventually timeout waiting for the DNS query and go about it's business. Of course, that may be upwards of 45 seconds -- very annoying. Yes. We are able to log in to the machine eventually and restart the offending processes. But that is little consolation to our users who notice the hang and the fallout afterward. - Pat - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: syslog() blocks on glibc 2.1.3 with kernel 2.2.x
Ricky Beam [EMAIL PROTECTED] writes: syslogd isn't the blocker. The syslog functions in glibc being called by named are the problem. Stop named from blocking on syslog writes and the world will be happy again. So I have the glibc maintainer (and others) saying that syslog messages should never be dropped, and you saying that named should be dropping its syslog messages. One more time, from the top: named is calling syslog(), a glibc function. This function *blocks* waiting for delivery to the local syslogd, even though it is using SOCK_DGRAM sockets. There is no option to openlog() or syslog() to get non-blocking behavior (the LOG_NDELAY option means something else entirely). You are effectively suggesting that named should be rewritten not to use the glibc syslog functions at all. That strikes me as the worst suggestion so far; it would be far better for syslogd not to do name lookups. But my syslogd has no option to avoid name lookups; I will submit a request to add one. I've gotta ask what kind of "load" can cause this to happen. And for the record, syslogd shouldn't be doing DNS lookups for things arriving via /dev/log -- that's always the local machine. This particular syslogd also accepts messages from remote hosts. So when there is a lot of syslog traffic, this syslogd talks to named a lot. named occasionally sends messages to syslog. Since syslog pauses waiting for named to respond to name queries, and named blocks waiting for syslog to consume the message, a deadlock is triggered. True, it is not a full deadlock, because the name query times out eventually. But it is bad enough that the system becomes largely non-responsive. - Pat - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: syslog() blocks on glibc 2.1.3 with kernel 2.2.x
Ricky Beam [EMAIL PROTECTED] writes: Personally, I'd look closely at your setup to determine exactly why this has become a problem. named is being blocked on writing to /dev/log. This should only happen if there is sufficient _local_ syslog traffic to fill the buffer or syslogd has too much remote traffic to ever read from /dev/log. There is a lot of local traffic as well, yes. Lots of local traffic means named eventually finds itself waiting in line to log. Lots of remote traffic means syslogd is trying to talk to named a lot (to do reverse lookups). named waiting in line + syslogd trying to talk to named == deadlock; this is not too hard to see. Once the name resolution times out, you might expect things to become unstuck. But they don't. Perhaps syslogd is not giving higher priority to local messages; if it did, maybe it could recover from the deadlock. But this would not be a reliable solution; the only reliable solution is for syslogd to be independent of any processes which need to talk to it. Per chance are you running the name service caching daemon (nscd)? No. I'd also guess you aren't disabling fsync() for your sysylog files (it's part of the syslog.conf format) -- this is a conciderable drain on syslogd. I see no documentation for such an option in the syslog.conf man page. This is with the current Red Hat 6.2 syslogd (package sysklogd-1.3.31-17). - Pat - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Availability of kdb
Linus Torvalds <[EMAIL PROTECTED]> writes: > Sure. I just don't see many end-users single-stepping through > interrupt handlers etc. > > But yes, there probably are a few. I think you would be surprised, and I speak as someone who has found and fixed race conditions in your kernel. There are more Linux users who are competent with x86 hardware and SMP issues than there are Linux developers. A *lot* more. When these technically savvy users have a problem, they want to diagnose it as best they can and then hand it off to a kernel expert to analyse and to fix. They wish they had the time to understand the kernel deeply and come up with the "right" solution, but they don't; and the expert can do the job ten times faster anyway. If you give us better diagnostic tools, your kernel *will* improve faster. Whether this benefit outweighs the cost is, of course, up to you. - Pat - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Availability of kdb
Linus Torvalds [EMAIL PROTECTED] writes: Sure. I just don't see many end-users single-stepping through interrupt handlers etc. But yes, there probably are a few. I think you would be surprised, and I speak as someone who has found and fixed race conditions in your kernel. There are more Linux users who are competent with x86 hardware and SMP issues than there are Linux developers. A *lot* more. When these technically savvy users have a problem, they want to diagnose it as best they can and then hand it off to a kernel expert to analyse and to fix. They wish they had the time to understand the kernel deeply and come up with the "right" solution, but they don't; and the expert can do the job ten times faster anyway. If you give us better diagnostic tools, your kernel *will* improve faster. Whether this benefit outweighs the cost is, of course, up to you. - Pat - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/