Fast network file copy; "recvfile()" ?

2008-01-17 Thread Patrick J. LoPresti
I need to copy large (> 100GB) files between machines on a fast
network.  Both machines have reasonably fast disk subsystems, with
read/write performance benchmarked at > 800 MB/sec. Using 10GigE cards
and the usual tweaks to tcp_rmem etc., I am getting single-stream TCP
throughput better than 600 MB/sec.

My question is how best to move the actual file.  NFS writes appear to
max out at a little over 100 MB/sec on this configuration.  FTP and
rcp give me around 250 MB/sec.  Thus I am planning to write custom
code to send and receive the file.

For sending, I believe my best options are:

1) O_DIRECT read() + send()
2) mmap() + madvise(WILLNEED) + send()
3) fadvise(WILLNEED) + sendfile()

I am leaning towards (3), since I gather that sendfile() is supposed
to be pretty fast.

My question is what to do on the receiving end.  In short, if I want
the equivalent of a "recvfile()" to go with sendfile(), what is my
best bet on Linux?

I will probably try recv() + O_DIRECT write(), but I am curious if
there are other approaches I should try.

Thanks!

 - Pat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Fast network file copy; recvfile() ?

2008-01-17 Thread Patrick J. LoPresti
I need to copy large ( 100GB) files between machines on a fast
network.  Both machines have reasonably fast disk subsystems, with
read/write performance benchmarked at  800 MB/sec. Using 10GigE cards
and the usual tweaks to tcp_rmem etc., I am getting single-stream TCP
throughput better than 600 MB/sec.

My question is how best to move the actual file.  NFS writes appear to
max out at a little over 100 MB/sec on this configuration.  FTP and
rcp give me around 250 MB/sec.  Thus I am planning to write custom
code to send and receive the file.

For sending, I believe my best options are:

1) O_DIRECT read() + send()
2) mmap() + madvise(WILLNEED) + send()
3) fadvise(WILLNEED) + sendfile()

I am leaning towards (3), since I gather that sendfile() is supposed
to be pretty fast.

My question is what to do on the receiving end.  In short, if I want
the equivalent of a recvfile() to go with sendfile(), what is my
best bet on Linux?

I will probably try recv() + O_DIRECT write(), but I am curious if
there are other approaches I should try.

Thanks!

 - Pat
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


oom-killer with 27G free swap and overcommit_memory=2

2007-08-27 Thread Patrick J. LoPresti
I am using Linux 2.6.16.46-0.12-smp (SUSE 10 SP1 stock kernel).
I do intend to bother SUSE, but I am hoping some kind kernel savant
can help me interpret these log messages and/or give me some
suggestions for how to proceed.

My system is a SunFire x4100 (x86_64) with 16G of RAM and 32G of swap
in a single partition.  I have an application which consumes a lot of
memory, and after a few hours the oom-killer kills it.

This would not be surprising, except a) the machine still has 27G of
free swap at the time; and b) it happens even when I set
vm.overcommit_memory=2 (with overcommit_ratio=50).  It is
time-consuming to reproduce, but consistent.

Below is the first batch of messages from the oom-killer.  In this
case, overcommit_memory was set to 2, and the application
received no failures from malloc() or new.

9-10 more such batches appear in the log in rapid succession before
I get the "Kill process  ... Killed process " pair of messages.

Here are my questions.  I realize these log messages may be insufficient
to answer them, and I am ready to provide more information upon
request.  (Unfortunately I cannot provide the application itself.  But
I can perform tests and provide logs.)

1) Does this behavior definitely indicate a kernel bug?  If not, what
could my application possibly be doing to cause it?

2) Is there any sysctl setting that might "avoid" this problem?  Or a
way to change my application to avoid it?

3) Should I attempt to reproduce this problem on the stock kernel.org
kernel?

Thank you for reading, and please CC me on any replies.

 - Pat

Aug 24 15:02:49 localhost kernel: oom-killer: gfp_mask=0x201d2, order=0
Aug 24 15:02:49 localhost kernel:
Aug 24 15:02:49 localhost kernel: Call Trace:
{out_of_memory+93}
{__alloc_pages+552}
Aug 24 15:02:49 localhost kernel:
{page_cache_alloc_cold+13}
{__do_page_cache_readahead+149}
Aug 24 15:02:49 localhost kernel:
{sync_page+0} {getnstimeofday+16}
Aug 24 15:02:49 localhost kernel:
{ktime_get_ts+26}
{delayacct_end+93}
Aug 24 15:02:49 localhost kernel:
{blockable_page_cache_readahead+83}
Aug 24 15:02:49 localhost kernel:
{make_ahead_window+130}
{page_cache_readahead+337}
Aug 24 15:02:49 localhost kernel:
{filemap_nopage+161}
{__handle_mm_fault+511}
Aug 24 15:02:49 localhost kernel:
{do_page_fault+965}
{default_wake_function+0}
Aug 24 15:02:49 localhost kernel:{error_exit+0}
Aug 24 15:02:49 localhost kernel: Mem-info:
Aug 24 15:02:49 localhost kernel: Node 1 DMA per-cpu: empty
Aug 24 15:02:50 localhost kernel: Node 1 DMA32 per-cpu: empty
Aug 24 15:02:50 localhost kernel: Node 1 Normal per-cpu:
Aug 24 15:02:50 localhost kernel: cpu 0 hot: high 186, batch 31 used:30
Aug 24 15:02:50 localhost kernel: cpu 0 cold: high 62, batch 15 used:27
Aug 24 15:02:50 localhost kernel: cpu 1 hot: high 186, batch 31 used:160
Aug 24 15:02:50 localhost kernel: cpu 1 cold: high 62, batch 15 used:16
Aug 24 15:02:50 localhost kernel: cpu 2 hot: high 186, batch 31 used:0
Aug 24 15:02:50 localhost kernel: cpu 2 cold: high 62, batch 15 used:45
Aug 24 15:02:50 localhost kernel: cpu 3 hot: high 186, batch 31 used:94
Aug 24 15:02:50 localhost kernel: cpu 3 cold: high 62, batch 15 used:60
Aug 24 15:02:50 localhost kernel: Node 1 HighMem per-cpu: empty
Aug 24 15:02:50 localhost kernel: Node 0 DMA per-cpu:
Aug 24 15:02:50 localhost kernel: cpu 0 hot: high 0, batch 1 used:0
Aug 24 15:02:50 localhost kernel: cpu 0 cold: high 0, batch 1 used:0
Aug 24 15:02:50 localhost kernel: cpu 1 hot: high 0, batch 1 used:0
Aug 24 15:02:50 localhost kernel: cpu 1 cold: high 0, batch 1 used:0
Aug 24 15:02:50 localhost rpc.mountd: export request from 192.146.3.6
Aug 24 15:02:50 localhost kernel: cpu 2 hot: high 0, batch 1 used:0
Aug 24 15:02:50 localhost rpc.mountd: export request from 192.146.3.6
Aug 24 15:02:50 localhost kernel: cpu 2 cold: high 0, batch 1 used:0
Aug 24 15:02:50 localhost rpc.mountd: export request from 192.146.3.6
Aug 24 15:02:50 localhost kernel: cpu 3 hot: high 0, batch 1 used:0
Aug 24 15:02:50 localhost kernel: cpu 3 cold: high 0, batch 1 used:0
Aug 24 15:02:51 localhost kernel: Node 0 DMA32 per-cpu:
Aug 24 15:02:51 localhost kernel: cpu 0 hot: high 186, batch 31 used:145
Aug 24 15:02:51 localhost rpc.mountd: authenticated mount request from
server:936 for /share (/share)
Aug 24 15:02:51 localhost kernel: cpu 0 cold: high 62, batch 15 used:42
Aug 24 15:02:51 localhost kernel: cpu 1 hot: high 186, batch 31 used:160
Aug 24 15:02:51 localhost kernel: cpu 1 cold: high 62, batch 15 used:14
Aug 24 15:02:51 localhost kernel: cpu 2 hot: high 186, batch 31 used:0
Aug 24 15:02:51 localhost kernel: cpu 2 cold: high 62, batch 15 used:0
Aug 24 15:02:51 localhost kernel: cpu 3 hot: high 186, batch 31 used:4
Aug 24 15:02:51 localhost kernel: cpu 3 cold: high 62, batch 15 used:0
Aug 24 15:02:51 localhost kernel: Node 0 Normal per-cpu:
Aug 24 15:02:51 localhost kernel: cpu 0 hot: high 186, batch 31 used:7
Aug 24 15:02:51 localhost kernel: cpu 0 cold: high 62, batch 15 used:58
Aug 24 

oom-killer with 27G free swap and overcommit_memory=2

2007-08-27 Thread Patrick J. LoPresti
I am using Linux 2.6.16.46-0.12-smp (SUSE 10 SP1 stock kernel).
I do intend to bother SUSE, but I am hoping some kind kernel savant
can help me interpret these log messages and/or give me some
suggestions for how to proceed.

My system is a SunFire x4100 (x86_64) with 16G of RAM and 32G of swap
in a single partition.  I have an application which consumes a lot of
memory, and after a few hours the oom-killer kills it.

This would not be surprising, except a) the machine still has 27G of
free swap at the time; and b) it happens even when I set
vm.overcommit_memory=2 (with overcommit_ratio=50).  It is
time-consuming to reproduce, but consistent.

Below is the first batch of messages from the oom-killer.  In this
case, overcommit_memory was set to 2, and the application
received no failures from malloc() or new.

9-10 more such batches appear in the log in rapid succession before
I get the Kill process  ... Killed process  pair of messages.

Here are my questions.  I realize these log messages may be insufficient
to answer them, and I am ready to provide more information upon
request.  (Unfortunately I cannot provide the application itself.  But
I can perform tests and provide logs.)

1) Does this behavior definitely indicate a kernel bug?  If not, what
could my application possibly be doing to cause it?

2) Is there any sysctl setting that might avoid this problem?  Or a
way to change my application to avoid it?

3) Should I attempt to reproduce this problem on the stock kernel.org
kernel?

Thank you for reading, and please CC me on any replies.

 - Pat

Aug 24 15:02:49 localhost kernel: oom-killer: gfp_mask=0x201d2, order=0
Aug 24 15:02:49 localhost kernel:
Aug 24 15:02:49 localhost kernel: Call Trace:
801613f5{out_of_memory+93}
8016329c{__alloc_pages+552}
Aug 24 15:02:49 localhost kernel:
8015e4e9{page_cache_alloc_cold+13}
80164a9f{__do_page_cache_readahead+149}
Aug 24 15:02:49 localhost kernel:
8015e31a{sync_page+0} 80136af1{getnstimeofday+16}
Aug 24 15:02:49 localhost kernel:
80148129{ktime_get_ts+26}
8015be29{delayacct_end+93}
Aug 24 15:02:49 localhost kernel:
80164c6a{blockable_page_cache_readahead+83}
Aug 24 15:02:49 localhost kernel:
80164d4b{make_ahead_window+130}
80164eb8{page_cache_readahead+337}
Aug 24 15:02:49 localhost kernel:
8016090d{filemap_nopage+161}
8016b2eb{__handle_mm_fault+511}
Aug 24 15:02:49 localhost kernel:
802dc44e{do_page_fault+965}
8012aee7{default_wake_function+0}
Aug 24 15:02:49 localhost kernel:8010bc15{error_exit+0}
Aug 24 15:02:49 localhost kernel: Mem-info:
Aug 24 15:02:49 localhost kernel: Node 1 DMA per-cpu: empty
Aug 24 15:02:50 localhost kernel: Node 1 DMA32 per-cpu: empty
Aug 24 15:02:50 localhost kernel: Node 1 Normal per-cpu:
Aug 24 15:02:50 localhost kernel: cpu 0 hot: high 186, batch 31 used:30
Aug 24 15:02:50 localhost kernel: cpu 0 cold: high 62, batch 15 used:27
Aug 24 15:02:50 localhost kernel: cpu 1 hot: high 186, batch 31 used:160
Aug 24 15:02:50 localhost kernel: cpu 1 cold: high 62, batch 15 used:16
Aug 24 15:02:50 localhost kernel: cpu 2 hot: high 186, batch 31 used:0
Aug 24 15:02:50 localhost kernel: cpu 2 cold: high 62, batch 15 used:45
Aug 24 15:02:50 localhost kernel: cpu 3 hot: high 186, batch 31 used:94
Aug 24 15:02:50 localhost kernel: cpu 3 cold: high 62, batch 15 used:60
Aug 24 15:02:50 localhost kernel: Node 1 HighMem per-cpu: empty
Aug 24 15:02:50 localhost kernel: Node 0 DMA per-cpu:
Aug 24 15:02:50 localhost kernel: cpu 0 hot: high 0, batch 1 used:0
Aug 24 15:02:50 localhost kernel: cpu 0 cold: high 0, batch 1 used:0
Aug 24 15:02:50 localhost kernel: cpu 1 hot: high 0, batch 1 used:0
Aug 24 15:02:50 localhost kernel: cpu 1 cold: high 0, batch 1 used:0
Aug 24 15:02:50 localhost rpc.mountd: export request from 192.146.3.6
Aug 24 15:02:50 localhost kernel: cpu 2 hot: high 0, batch 1 used:0
Aug 24 15:02:50 localhost rpc.mountd: export request from 192.146.3.6
Aug 24 15:02:50 localhost kernel: cpu 2 cold: high 0, batch 1 used:0
Aug 24 15:02:50 localhost rpc.mountd: export request from 192.146.3.6
Aug 24 15:02:50 localhost kernel: cpu 3 hot: high 0, batch 1 used:0
Aug 24 15:02:50 localhost kernel: cpu 3 cold: high 0, batch 1 used:0
Aug 24 15:02:51 localhost kernel: Node 0 DMA32 per-cpu:
Aug 24 15:02:51 localhost kernel: cpu 0 hot: high 186, batch 31 used:145
Aug 24 15:02:51 localhost rpc.mountd: authenticated mount request from
server:936 for /share (/share)
Aug 24 15:02:51 localhost kernel: cpu 0 cold: high 62, batch 15 used:42
Aug 24 15:02:51 localhost kernel: cpu 1 hot: high 186, batch 31 used:160
Aug 24 15:02:51 localhost kernel: cpu 1 cold: high 62, batch 15 used:14
Aug 24 15:02:51 localhost kernel: cpu 2 hot: high 186, batch 31 used:0
Aug 24 15:02:51 localhost kernel: cpu 2 cold: high 62, batch 15 used:0
Aug 24 15:02:51 localhost kernel: cpu 3 hot: high 186, batch 31 used:4
Aug 24 15:02:51 localhost 

Re: Fortuna

2005-04-19 Thread Patrick J. LoPresti
Theodore Ts'o <[EMAIL PROTECTED]> writes:

> With a properly set up set of init scripts, /dev/random is
> initialized with seed material for all but the initial boot

What about CD-ROM based distros (e.g., Knoppix), where every boot is
the initial boot?

> (and even that problem can be solved by having the distribution's
> install scripts set up /var/lib/urandom/random-seed after installing
> the system).

Could you elaborate?  How should Knoppix seed its /dev/urandom?

Would reading 256 bits from /dev/random and writing them to
/dev/urandom do the job?

 - Pat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fortuna

2005-04-19 Thread Patrick J. LoPresti
Theodore Ts'o [EMAIL PROTECTED] writes:

 With a properly set up set of init scripts, /dev/random is
 initialized with seed material for all but the initial boot

What about CD-ROM based distros (e.g., Knoppix), where every boot is
the initial boot?

 (and even that problem can be solved by having the distribution's
 install scripts set up /var/lib/urandom/random-seed after installing
 the system).

Could you elaborate?  How should Knoppix seed its /dev/urandom?

Would reading 256 bits from /dev/random and writing them to
/dev/urandom do the job?

 - Pat
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [NFS] Help: 2.2.18 NFS is corrupting our files

2001-01-22 Thread Patrick J. LoPresti

Trond Myklebust <[EMAIL PROTECTED]> writes:

> What filesystem are you exporting?

Just ext2; all of our file systems are ext2.

The disks here are a mixture of IDE, SCSI (aic7xxx and sym53c8xx), and
Mylex DAC960 RAID.  In this case, the machine running 2.2.18 has
aic7xxx SCSI.  I suspect I could reproduce the problem on other
configurations; let me know if that would be useful.

 - Pat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [NFS] Help: 2.2.18 NFS is corrupting our files

2001-01-22 Thread Patrick J. LoPresti

Trond Myklebust [EMAIL PROTECTED] writes:

 What filesystem are you exporting?

Just ext2; all of our file systems are ext2.

The disks here are a mixture of IDE, SCSI (aic7xxx and sym53c8xx), and
Mylex DAC960 RAID.  In this case, the machine running 2.2.18 has
aic7xxx SCSI.  I suspect I could reproduce the problem on other
configurations; let me know if that would be useful.

 - Pat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: syslog() blocks on glibc 2.1.3 with kernel 2.2.x

2000-10-26 Thread Patrick J. LoPresti

Igmar Palsenberg <[EMAIL PROTECTED]> writes:

> On 23 Oct 2000, Patrick J. LoPresti wrote:
>
> > Not true.  The named on our loghost is authoritative for the reverse
> > mappings for all of the machines which can log there.
> 
> Put the names of your machines in /etc/hosts on your logmachine.

This is not an option for us, unfortunately.  Many of our IP addresses
are dynamically assigned, with the DNS tables dynamically updated.

Thank you for the patch to syslogd, though!  Can you try to get your
"-x" option into the standard distributions of syslogd, or should I
work up a bug report / feature request for Red Hat myself?

 - Pat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: syslog() blocks on glibc 2.1.3 with kernel 2.2.x

2000-10-26 Thread Patrick J. LoPresti

Igmar Palsenberg [EMAIL PROTECTED] writes:

 On 23 Oct 2000, Patrick J. LoPresti wrote:

  Not true.  The named on our loghost is authoritative for the reverse
  mappings for all of the machines which can log there.
 
 Put the names of your machines in /etc/hosts on your logmachine.

This is not an option for us, unfortunately.  Many of our IP addresses
are dynamically assigned, with the DNS tables dynamically updated.

Thank you for the patch to syslogd, though!  Can you try to get your
"-x" option into the standard distributions of syslogd, or should I
work up a bug report / feature request for Red Hat myself?

 - Pat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: syslog() blocks on glibc 2.1.3 with kernel 2.2.x

2000-10-23 Thread Patrick J. LoPresti

Ricky Beam <[EMAIL PROTECTED]> writes:

> Personally, I'd look closely at your setup to determine exactly why
> this has become a problem.  named is being blocked on writing to
> /dev/log.  This should only happen if there is sufficient _local_
> syslog traffic to fill the buffer or syslogd has too much remote
> traffic to ever read from /dev/log.

There is a lot of local traffic as well, yes.

Lots of local traffic means named eventually finds itself waiting in
line to log.  Lots of remote traffic means syslogd is trying to talk
to named a lot (to do reverse lookups).  named waiting in line +
syslogd trying to talk to named == deadlock; this is not too hard to
see.

Once the name resolution times out, you might expect things to become
unstuck.  But they don't.

Perhaps syslogd is not giving higher priority to local messages; if it
did, maybe it could recover from the deadlock.  But this would not be
a reliable solution; the only reliable solution is for syslogd to be
independent of any processes which need to talk to it.

> Per chance are you running the name service caching daemon (nscd)?

No.

> I'd also guess you aren't disabling fsync() for your sysylog files
> (it's part of the syslog.conf format) -- this is a conciderable
> drain on syslogd.

I see no documentation for such an option in the syslog.conf man page.
This is with the current Red Hat 6.2 syslogd (package
sysklogd-1.3.31-17).

 - Pat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: syslog() blocks on glibc 2.1.3 with kernel 2.2.x

2000-10-23 Thread Patrick J. LoPresti

Ricky Beam <[EMAIL PROTECTED]> writes:

> syslogd isn't the blocker.  The syslog functions in glibc being
> called by named are the problem.  Stop named from blocking on syslog
> writes and the world will be happy again.

So I have the glibc maintainer (and others) saying that syslog
messages should never be dropped, and you saying that named should be
dropping its syslog messages.

One more time, from the top:

named is calling syslog(), a glibc function.  This function *blocks*
waiting for delivery to the local syslogd, even though it is using
SOCK_DGRAM sockets.  There is no option to openlog() or syslog() to
get non-blocking behavior (the LOG_NDELAY option means something else
entirely).

You are effectively suggesting that named should be rewritten not to
use the glibc syslog functions at all.  That strikes me as the worst
suggestion so far; it would be far better for syslogd not to do name
lookups.  But my syslogd has no option to avoid name lookups; I will
submit a request to add one.

> I've gotta ask what kind of "load" can cause this to happen.

> And for the record, syslogd shouldn't be doing DNS lookups for
> things arriving via /dev/log -- that's always the local machine.

This particular syslogd also accepts messages from remote hosts.  So
when there is a lot of syslog traffic, this syslogd talks to named a
lot.  named occasionally sends messages to syslog.  Since syslog
pauses waiting for named to respond to name queries, and named blocks
waiting for syslog to consume the message, a deadlock is triggered.
True, it is not a full deadlock, because the name query times out
eventually.  But it is bad enough that the system becomes largely
non-responsive.

 - Pat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: syslog() blocks on glibc 2.1.3 with kernel 2.2.x

2000-10-23 Thread Patrick J. LoPresti

Ricky Beam <[EMAIL PROTECTED]> writes:

> I would suggest disabling name resolution for syslog, but that's an
> ugly option.  There's no way to stop a glibc system from doing a DNS
> query for a reverse lookup.  HOWEVER, you can set the DNS timeout to
> 1 second and set the resolver options to prevent recursion (answer
> from cache only.)

Recursion has nothing to do with it; as I said, the named on this
system is itself authoritative for all of the reverse lookups.

Turning down the DNS timeout would affect *all* name resolution on the
system, right?  That is not acceptable.

As I said, I already have a workaround, which is to have named log to
a flat file.  I agree that this is a poor workaround, and the "right
fix" is to modify syslogd not to perform blocking operations.  My only
quibble is that SOCK_DGRAM is an odd transport to use here, even over
AF_UNIX.

> PS: Technically, this is not a lockup.  syslogd should eventually
> timeout waiting for the DNS query and go about it's business.  Of
> course, that may be upwards of 45 seconds -- very annoying.

Yes.  We are able to log in to the machine eventually and restart the
offending processes.  But that is little consolation to our users who
notice the hang and the fallout afterward.

 - Pat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: syslog() blocks on glibc 2.1.3 with kernel 2.2.x

2000-10-23 Thread Patrick J. LoPresti

Ulrich Drepper <[EMAIL PROTECTED]> writes:

> If anything has to be changed it's (as suggested) the configuration
> or even the implementation of syslogd.  Make it robust.

OK, but my current syslogd only listens to /dev/log as a SOCK_DGRAM.
If I wanted reliable syslogging, it would be listening on it as a
SOCK_STREAM.  Maybe I care more about performance and backwards
compatibility than reliable syslogging.  But whatever my reasons, my
connection to syslogd is already unreliable and therefore *should not
block*.

(Could a syslogd listen on /dev/log both as SOCK_DGRAM and as
SOCK_STREAM?  If so, then your philosophy implies that glibc should be
trying SOCK_STREAM *first*, falling back to SOCK_DGRAM for historical
compatibility.  Either way, when it uses datagrams, it should never
block, period.)

 - Pat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: syslog() blocks on glibc 2.1.3 with kernel 2.2.x

2000-10-23 Thread Patrick J. LoPresti

Jesse Pollard <[EMAIL PROTECTED]> writes:

> Don't configure syslogd to do reverse lookups.

Our syslogd has no option to disable the reverse lookups.

> You can NEVER guarantee that the reverse lookup will succeed, and
> can be delayed several minutes for a single reply.

Not true.  The named on our loghost is authoritative for the reverse
mappings for all of the machines which can log there.

> I consider this a configuration error. I don't believe syslogd
> should ever do a reverse lookup, since the name you are trying to
> get may never arrive, or if arrives, it may be spoofed.

There *is* no configuration for these tools which gives the behavior
you describe, so this is not a "configuration error".

> It's not a bug, but a security feature. NO log to syslogd should be
> lost, since it may be related to an attack.

Historically, no other Unix system has had reliable syslogging.  It
would require very defensive programming for syslogd, and that has
clearly not been performed.

And if this is what GNU/Linux intends, why does glibc use a SOCK_DGRAM
socket for communication with syslogd?  By definition, such sockets
are *unreliable*.  If syslog is supposed to be reliable, a different
connection type must be used.

Your philosophy that "no syslog message should ever be lost" is not
necessarily bad.  But it is clearly at odds with historical practice,
the current glibc syslog() implementation, and the current syslogd
itself.

It is true that glibc falls back to using SOCK_STREAM if the
SOCK_DGRAM connection fails.  Does that mean GNU/Linux is expects
syslog to be reliable eventually?  If so, then my problem is entirely
a bug in syslogd and I will report it as such.

 - Pat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



syslog() blocks on glibc 2.1.3 with kernel 2.2.x

2000-10-23 Thread Patrick J. LoPresti

If you send SIGSTOP to syslogd on a Red Hat 6.2 system (glibc 2.1.3,
kernel 2.2.x), within a few minutes you will find your entire machine
grinds to a halt.  For example, nobody can log in.

This happens because once the /dev/log buffer fills, the syslog()
glibc function blocks.

This is a problem for us in Real Life because named and syslogd are
deadlocking while trying to talk to each other.  On our loghost,
syslogd needs to do reverse name lookups while named needs to call
syslog().  When traffic gets heavy all around, the deadlock is
triggered, and it is quite unpleasant.  We are about to configure
named to use flat files instead of syslog, but that feels more like a
workaround than a fix.

I am not sure whether this is a glibc bug or a kernel bug.  I have
used netstat and the glibc sources to confirm that glibc is using a
SOCK_DGRAM Unix socket to send to /dev/log.  I thought DGRAM sockets
were supposed to drop packets on the floor instead of blocking.  But
perhaps I am wrong and glibc is supposed to be explicitly marking the
socket as non-blocking.

Regardless of whose bug it is, I suggest that it needs to be fixed.
To my knowledge, other Unix systems do not behave this way; they
simply drop messages when syslogd is not responding.  IMO, that is
correct behavior.

 - Pat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



syslog() blocks on glibc 2.1.3 with kernel 2.2.x

2000-10-23 Thread Patrick J. LoPresti

If you send SIGSTOP to syslogd on a Red Hat 6.2 system (glibc 2.1.3,
kernel 2.2.x), within a few minutes you will find your entire machine
grinds to a halt.  For example, nobody can log in.

This happens because once the /dev/log buffer fills, the syslog()
glibc function blocks.

This is a problem for us in Real Life because named and syslogd are
deadlocking while trying to talk to each other.  On our loghost,
syslogd needs to do reverse name lookups while named needs to call
syslog().  When traffic gets heavy all around, the deadlock is
triggered, and it is quite unpleasant.  We are about to configure
named to use flat files instead of syslog, but that feels more like a
workaround than a fix.

I am not sure whether this is a glibc bug or a kernel bug.  I have
used netstat and the glibc sources to confirm that glibc is using a
SOCK_DGRAM Unix socket to send to /dev/log.  I thought DGRAM sockets
were supposed to drop packets on the floor instead of blocking.  But
perhaps I am wrong and glibc is supposed to be explicitly marking the
socket as non-blocking.

Regardless of whose bug it is, I suggest that it needs to be fixed.
To my knowledge, other Unix systems do not behave this way; they
simply drop messages when syslogd is not responding.  IMO, that is
correct behavior.

 - Pat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: syslog() blocks on glibc 2.1.3 with kernel 2.2.x

2000-10-23 Thread Patrick J. LoPresti

Jesse Pollard [EMAIL PROTECTED] writes:

 Don't configure syslogd to do reverse lookups.

Our syslogd has no option to disable the reverse lookups.

 You can NEVER guarantee that the reverse lookup will succeed, and
 can be delayed several minutes for a single reply.

Not true.  The named on our loghost is authoritative for the reverse
mappings for all of the machines which can log there.

 I consider this a configuration error. I don't believe syslogd
 should ever do a reverse lookup, since the name you are trying to
 get may never arrive, or if arrives, it may be spoofed.

There *is* no configuration for these tools which gives the behavior
you describe, so this is not a "configuration error".

 It's not a bug, but a security feature. NO log to syslogd should be
 lost, since it may be related to an attack.

Historically, no other Unix system has had reliable syslogging.  It
would require very defensive programming for syslogd, and that has
clearly not been performed.

And if this is what GNU/Linux intends, why does glibc use a SOCK_DGRAM
socket for communication with syslogd?  By definition, such sockets
are *unreliable*.  If syslog is supposed to be reliable, a different
connection type must be used.

Your philosophy that "no syslog message should ever be lost" is not
necessarily bad.  But it is clearly at odds with historical practice,
the current glibc syslog() implementation, and the current syslogd
itself.

It is true that glibc falls back to using SOCK_STREAM if the
SOCK_DGRAM connection fails.  Does that mean GNU/Linux is expects
syslog to be reliable eventually?  If so, then my problem is entirely
a bug in syslogd and I will report it as such.

 - Pat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: syslog() blocks on glibc 2.1.3 with kernel 2.2.x

2000-10-23 Thread Patrick J. LoPresti

Ulrich Drepper [EMAIL PROTECTED] writes:

 If anything has to be changed it's (as suggested) the configuration
 or even the implementation of syslogd.  Make it robust.

OK, but my current syslogd only listens to /dev/log as a SOCK_DGRAM.
If I wanted reliable syslogging, it would be listening on it as a
SOCK_STREAM.  Maybe I care more about performance and backwards
compatibility than reliable syslogging.  But whatever my reasons, my
connection to syslogd is already unreliable and therefore *should not
block*.

(Could a syslogd listen on /dev/log both as SOCK_DGRAM and as
SOCK_STREAM?  If so, then your philosophy implies that glibc should be
trying SOCK_STREAM *first*, falling back to SOCK_DGRAM for historical
compatibility.  Either way, when it uses datagrams, it should never
block, period.)

 - Pat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: syslog() blocks on glibc 2.1.3 with kernel 2.2.x

2000-10-23 Thread Patrick J. LoPresti

Ricky Beam [EMAIL PROTECTED] writes:

 I would suggest disabling name resolution for syslog, but that's an
 ugly option.  There's no way to stop a glibc system from doing a DNS
 query for a reverse lookup.  HOWEVER, you can set the DNS timeout to
 1 second and set the resolver options to prevent recursion (answer
 from cache only.)

Recursion has nothing to do with it; as I said, the named on this
system is itself authoritative for all of the reverse lookups.

Turning down the DNS timeout would affect *all* name resolution on the
system, right?  That is not acceptable.

As I said, I already have a workaround, which is to have named log to
a flat file.  I agree that this is a poor workaround, and the "right
fix" is to modify syslogd not to perform blocking operations.  My only
quibble is that SOCK_DGRAM is an odd transport to use here, even over
AF_UNIX.

 PS: Technically, this is not a lockup.  syslogd should eventually
 timeout waiting for the DNS query and go about it's business.  Of
 course, that may be upwards of 45 seconds -- very annoying.

Yes.  We are able to log in to the machine eventually and restart the
offending processes.  But that is little consolation to our users who
notice the hang and the fallout afterward.

 - Pat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: syslog() blocks on glibc 2.1.3 with kernel 2.2.x

2000-10-23 Thread Patrick J. LoPresti

Ricky Beam [EMAIL PROTECTED] writes:

 syslogd isn't the blocker.  The syslog functions in glibc being
 called by named are the problem.  Stop named from blocking on syslog
 writes and the world will be happy again.

So I have the glibc maintainer (and others) saying that syslog
messages should never be dropped, and you saying that named should be
dropping its syslog messages.

One more time, from the top:

named is calling syslog(), a glibc function.  This function *blocks*
waiting for delivery to the local syslogd, even though it is using
SOCK_DGRAM sockets.  There is no option to openlog() or syslog() to
get non-blocking behavior (the LOG_NDELAY option means something else
entirely).

You are effectively suggesting that named should be rewritten not to
use the glibc syslog functions at all.  That strikes me as the worst
suggestion so far; it would be far better for syslogd not to do name
lookups.  But my syslogd has no option to avoid name lookups; I will
submit a request to add one.

 I've gotta ask what kind of "load" can cause this to happen.

 And for the record, syslogd shouldn't be doing DNS lookups for
 things arriving via /dev/log -- that's always the local machine.

This particular syslogd also accepts messages from remote hosts.  So
when there is a lot of syslog traffic, this syslogd talks to named a
lot.  named occasionally sends messages to syslog.  Since syslog
pauses waiting for named to respond to name queries, and named blocks
waiting for syslog to consume the message, a deadlock is triggered.
True, it is not a full deadlock, because the name query times out
eventually.  But it is bad enough that the system becomes largely
non-responsive.

 - Pat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: syslog() blocks on glibc 2.1.3 with kernel 2.2.x

2000-10-23 Thread Patrick J. LoPresti

Ricky Beam [EMAIL PROTECTED] writes:

 Personally, I'd look closely at your setup to determine exactly why
 this has become a problem.  named is being blocked on writing to
 /dev/log.  This should only happen if there is sufficient _local_
 syslog traffic to fill the buffer or syslogd has too much remote
 traffic to ever read from /dev/log.

There is a lot of local traffic as well, yes.

Lots of local traffic means named eventually finds itself waiting in
line to log.  Lots of remote traffic means syslogd is trying to talk
to named a lot (to do reverse lookups).  named waiting in line +
syslogd trying to talk to named == deadlock; this is not too hard to
see.

Once the name resolution times out, you might expect things to become
unstuck.  But they don't.

Perhaps syslogd is not giving higher priority to local messages; if it
did, maybe it could recover from the deadlock.  But this would not be
a reliable solution; the only reliable solution is for syslogd to be
independent of any processes which need to talk to it.

 Per chance are you running the name service caching daemon (nscd)?

No.

 I'd also guess you aren't disabling fsync() for your sysylog files
 (it's part of the syslog.conf format) -- this is a conciderable
 drain on syslogd.

I see no documentation for such an option in the syslog.conf man page.
This is with the current Red Hat 6.2 syslogd (package
sysklogd-1.3.31-17).

 - Pat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Availability of kdb

2000-09-08 Thread Patrick J. LoPresti

Linus Torvalds <[EMAIL PROTECTED]> writes:

> Sure. I just don't see many end-users single-stepping through
> interrupt handlers etc.
> 
> But yes, there probably are a few.

I think you would be surprised, and I speak as someone who has found
and fixed race conditions in your kernel.

There are more Linux users who are competent with x86 hardware and SMP
issues than there are Linux developers.  A *lot* more.

When these technically savvy users have a problem, they want to
diagnose it as best they can and then hand it off to a kernel expert
to analyse and to fix.  They wish they had the time to understand the
kernel deeply and come up with the "right" solution, but they don't;
and the expert can do the job ten times faster anyway.

If you give us better diagnostic tools, your kernel *will* improve
faster.  Whether this benefit outweighs the cost is, of course, up to
you.

 - Pat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Availability of kdb

2000-09-08 Thread Patrick J. LoPresti

Linus Torvalds [EMAIL PROTECTED] writes:

 Sure. I just don't see many end-users single-stepping through
 interrupt handlers etc.
 
 But yes, there probably are a few.

I think you would be surprised, and I speak as someone who has found
and fixed race conditions in your kernel.

There are more Linux users who are competent with x86 hardware and SMP
issues than there are Linux developers.  A *lot* more.

When these technically savvy users have a problem, they want to
diagnose it as best they can and then hand it off to a kernel expert
to analyse and to fix.  They wish they had the time to understand the
kernel deeply and come up with the "right" solution, but they don't;
and the expert can do the job ten times faster anyway.

If you give us better diagnostic tools, your kernel *will* improve
faster.  Whether this benefit outweighs the cost is, of course, up to
you.

 - Pat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/