Re: Kernel bug with UNIX sockets not detecting other end gone?

2001-05-17 Thread Chris Evans


On Thu, 17 May 2001, Alan Cox wrote:

> > The following program blocks indefinitely on Linux (2.2, 2.4 not tested).
> > Since the other end is clearly gone, I would expect some sort of error
> > condition. Indeed, FreeBSD gives ECONNRESET.
>
> Since its a datagram socket Im not convinced thats a justifiable assumption.

Hmm - there's definitely a Linux inconsistency here. With SOCK_DGRAM,
read() is blocking but write() is giving ECONNRESET.

The ECONNRESET makes sense to me (despite this being a datagram socket),
because the sockets are anonymous. Once one end goes away, the other end
is pretty useless.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Kernel bug with UNIX sockets not detecting other end gone?

2001-05-17 Thread Chris Evans


Hi,

I wonder if the following is a bug? It certainly differs from FreeBSD 4.2
behaviour, which gives the behaviour I would expect.

The following program blocks indefinitely on Linux (2.2, 2.4 not tested).
Since the other end is clearly gone, I would expect some sort of error
condition. Indeed, FreeBSD gives ECONNRESET.

#include 
#include 
#include 
#include 

int
main(int argc, const char* argv[])
{
  int the_sockets[2];
  int retval;
  char the_char;
  int opt = 1;

  retval = socketpair(PF_UNIX, SOCK_DGRAM, 0, the_sockets);
  if (retval != 0)
  {
perror("socketpair");
exit(1);
  }
  close(the_sockets[0]);
  /* Linux (2.2) blocks here; FreeBSD does not */
  retval = read(the_sockets[1], _char, sizeof(the_char));
}

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



2.2, 2.4 bug in sock_no_fcntl()/F_SETOWN? (fwd)

2001-05-17 Thread Chris Evans


Resend (no response first time)

-- Forwarded message --
Date: Wed, 24 Jan 2001 21:09:09 + (GMT)
From: Chris Evans <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: 2.2, 2.4 bug in sock_no_fcntl()/F_SETOWN?


Hi,

Looking at the code for sock_no_fcntl() in net/core.c, I cannot specify
"0" as a value for F_SETOWN, unless I'm the superuser. I believe this to
be a bug, it stops de-registering an interest in SIGURG signals. Let me
know if you want a patch.

Cheers
Chris


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Linux 2.4.4-ac10

2001-05-17 Thread Chris Evans


On Thu, 17 May 2001, Alan Cox wrote:

> 2.4.4-ac10
[...]
>   - now 2.4.5pre vm seems sane dump other vmscan
> experiments

Has anyone benched 2.4.5pre3 vs 2.4.4 vs. ?

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Linux 2.4.4-ac10

2001-05-17 Thread Chris Evans


On Thu, 17 May 2001, Alan Cox wrote:

 2.4.4-ac10
[...]
   - now 2.4.5pre vm seems sane dump other vmscan
 experiments

Has anyone benched 2.4.5pre3 vs 2.4.4 vs. ?

Cheers
Chris

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Kernel bug with UNIX sockets not detecting other end gone?

2001-05-17 Thread Chris Evans


Hi,

I wonder if the following is a bug? It certainly differs from FreeBSD 4.2
behaviour, which gives the behaviour I would expect.

The following program blocks indefinitely on Linux (2.2, 2.4 not tested).
Since the other end is clearly gone, I would expect some sort of error
condition. Indeed, FreeBSD gives ECONNRESET.

#include sys/types.h
#include sys/socket.h
#include stdio.h
#include unistd.h

int
main(int argc, const char* argv[])
{
  int the_sockets[2];
  int retval;
  char the_char;
  int opt = 1;

  retval = socketpair(PF_UNIX, SOCK_DGRAM, 0, the_sockets);
  if (retval != 0)
  {
perror(socketpair);
exit(1);
  }
  close(the_sockets[0]);
  /* Linux (2.2) blocks here; FreeBSD does not */
  retval = read(the_sockets[1], the_char, sizeof(the_char));
}

Cheers
Chris

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Kernel bug with UNIX sockets not detecting other end gone?

2001-05-17 Thread Chris Evans


On Thu, 17 May 2001, Alan Cox wrote:

  The following program blocks indefinitely on Linux (2.2, 2.4 not tested).
  Since the other end is clearly gone, I would expect some sort of error
  condition. Indeed, FreeBSD gives ECONNRESET.

 Since its a datagram socket Im not convinced thats a justifiable assumption.

Hmm - there's definitely a Linux inconsistency here. With SOCK_DGRAM,
read() is blocking but write() is giving ECONNRESET.

The ECONNRESET makes sense to me (despite this being a datagram socket),
because the sockets are anonymous. Once one end goes away, the other end
is pretty useless.

Cheers
Chris

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



2.2, 2.4 bug in sock_no_fcntl()/F_SETOWN? (fwd)

2001-05-17 Thread Chris Evans


Resend (no response first time)

-- Forwarded message --
Date: Wed, 24 Jan 2001 21:09:09 + (GMT)
From: Chris Evans [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: 2.2, 2.4 bug in sock_no_fcntl()/F_SETOWN?


Hi,

Looking at the code for sock_no_fcntl() in net/core.c, I cannot specify
0 as a value for F_SETOWN, unless I'm the superuser. I believe this to
be a bug, it stops de-registering an interest in SIGURG signals. Let me
know if you want a patch.

Cheers
Chris


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Linux 2.4 Scalability, Samba, and Netbench

2001-05-09 Thread Chris Evans


On Wed, 9 May 2001, Alan Cox wrote:

> > significant problems with lockmeter.  csum_partial_copy_generic was the
> > highest % in profile, at 4.34%.  I'll see if we can get some space on
>
> Are you using Antons optimisations to samba to use sendfile ?

And you might like to try 2.4.4 (I saw 2.4.0 and 2.4.3 mentioned). 2.4.4
has the zerocopy TCP stuff (or was it 2.4.3 :)

Also, if the load is not disk limited, you might like to try Mingo's
pagecache/timers scalability patches. etc.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Linux 2.4 Scalability, Samba, and Netbench

2001-05-09 Thread Chris Evans


On Wed, 9 May 2001, Alan Cox wrote:

  significant problems with lockmeter.  csum_partial_copy_generic was the
  highest % in profile, at 4.34%.  I'll see if we can get some space on

 Are you using Antons optimisations to samba to use sendfile ?

And you might like to try 2.4.4 (I saw 2.4.0 and 2.4.3 mentioned). 2.4.4
has the zerocopy TCP stuff (or was it 2.4.3 :)

Also, if the load is not disk limited, you might like to try Mingo's
pagecache/timers scalability patches. etc.

Cheers
Chris

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [CHECKER] copy_*_user length bugs?

2001-04-18 Thread Chris Evans


On Wed, 18 Apr 2001, Russell King wrote:

> > Now, providing the malicious user passes a low user space pointer (e.g.
> > just above 0), the kernel's virtual address space wrap check will not
> > trigger because ~0 + ~2Gb does not exceed 4G. And the result is the user
> > being able to read kernel memory.
>
> But ~0 + ~2GB = ~2GB.  Last time I checked, ~2GB is less than 3GB, and 3GB
> is the start of kernel memory on x86.  Therefore, I don't see that the
> user will be able to read kernel memory.

The problem is that (up to) a 2Gb copy is attempted into userspace. The
source is a kernel object which is not 2Gb large! So, we read off the end
of some kernel object, and there is often something very interesting after
it ;-)

For a good real-world example, please see my Bugtraq post regarding
sysctl():

http://www.securityfocus.com/archive/1/161764

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [CHECKER] copy_*_user length bugs?

2001-04-18 Thread Chris Evans


On Wed, 18 Apr 2001, David Schleef wrote:

> On Tue, Apr 17, 2001 at 09:39:15PM -0700, Dawson Engler wrote:
> > Hi All,
> >
> > at the suggestion of Chris ([EMAIL PROTECTED]) I wrote a simple
> > checker to warn when the length parameter to copy_*_user was (1) an
> > integer and (2) not checked < 0.
> >
> > As an example, the ipv6 routine rawv6_geticmpfilter gets an integer 'len'
> > from user space, checks that it is smaller than a struct size and then
> > uses length as an argument to copy_to_user:
> >
> > if (get_user(len, optlen))
> > return -EFAULT;
> > if (len > sizeof(struct icmp6_filter))
> > len = sizeof(struct icmp6_filter);
> > if (put_user(len, optlen))
> > return -EFAULT;
> > if (copy_to_user(optval, >tp_pinfo.tp_raw.filter, len))
> > return -EFAULT;
> >
> > Is this a real bug?  Or is the checked rule only applicable to
> > __copy_*_user routines rather than copy_*_user routines?  (If its a real
> > bug, theres about 8 others that we found).
>
> The len parameter is an unsigned value, so this code is ok as
> long as access_ok() correctly checks that the range to copy
> doesn't stray outside of the userspace range, including the
> possible wraparound for a very large len.  access_ok() on i386
> checks for the wraparound.  m68k doesn't use it.  PowerPC
> is correct, but only because TASK_SIZE is 0x8000.  If it
> is ever changed, there could be a problem.  I didn't check
> other architectures, because I don't understand the asm.

Incorrect - if the "len" variable is a signed integer, this is a nasty
bug.

To justify this, consider if len were set to minus 2 billion. This will
pass the sanity check, and pass the value straight on to copy_to_user. The
copy_to_user parameter is unsigned, so this value because approximately
+2Gb.

Now, providing the malicious user passes a low user space pointer (e.g.
just above 0), the kernel's virtual address space wrap check will not
trigger because ~0 + ~2Gb does not exceed 4G. And the result is the user
being able to read kernel memory.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [CHECKER] copy_*_user length bugs?

2001-04-18 Thread Chris Evans


On Wed, 18 Apr 2001, David Schleef wrote:

 On Tue, Apr 17, 2001 at 09:39:15PM -0700, Dawson Engler wrote:
  Hi All,
 
  at the suggestion of Chris ([EMAIL PROTECTED]) I wrote a simple
  checker to warn when the length parameter to copy_*_user was (1) an
  integer and (2) not checked  0.
 
  As an example, the ipv6 routine rawv6_geticmpfilter gets an integer 'len'
  from user space, checks that it is smaller than a struct size and then
  uses length as an argument to copy_to_user:
 
  if (get_user(len, optlen))
  return -EFAULT;
  if (len  sizeof(struct icmp6_filter))
  len = sizeof(struct icmp6_filter);
  if (put_user(len, optlen))
  return -EFAULT;
  if (copy_to_user(optval, sk-tp_pinfo.tp_raw.filter, len))
  return -EFAULT;
 
  Is this a real bug?  Or is the checked rule only applicable to
  __copy_*_user routines rather than copy_*_user routines?  (If its a real
  bug, theres about 8 others that we found).

 The len parameter is an unsigned value, so this code is ok as
 long as access_ok() correctly checks that the range to copy
 doesn't stray outside of the userspace range, including the
 possible wraparound for a very large len.  access_ok() on i386
 checks for the wraparound.  m68k doesn't use it.  PowerPC
 is correct, but only because TASK_SIZE is 0x8000.  If it
 is ever changed, there could be a problem.  I didn't check
 other architectures, because I don't understand the asm.

Incorrect - if the "len" variable is a signed integer, this is a nasty
bug.

To justify this, consider if len were set to minus 2 billion. This will
pass the sanity check, and pass the value straight on to copy_to_user. The
copy_to_user parameter is unsigned, so this value because approximately
+2Gb.

Now, providing the malicious user passes a low user space pointer (e.g.
just above 0), the kernel's virtual address space wrap check will not
trigger because ~0 + ~2Gb does not exceed 4G. And the result is the user
being able to read kernel memory.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [CHECKER] copy_*_user length bugs?

2001-04-18 Thread Chris Evans


On Wed, 18 Apr 2001, Russell King wrote:

  Now, providing the malicious user passes a low user space pointer (e.g.
  just above 0), the kernel's virtual address space wrap check will not
  trigger because ~0 + ~2Gb does not exceed 4G. And the result is the user
  being able to read kernel memory.

 But ~0 + ~2GB = ~2GB.  Last time I checked, ~2GB is less than 3GB, and 3GB
 is the start of kernel memory on x86.  Therefore, I don't see that the
 user will be able to read kernel memory.

The problem is that (up to) a 2Gb copy is attempted into userspace. The
source is a kernel object which is not 2Gb large! So, we read off the end
of some kernel object, and there is often something very interesting after
it ;-)

For a good real-world example, please see my Bugtraq post regarding
sysctl():

http://www.securityfocus.com/archive/1/161764

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [CHECKER] security rules?

2001-04-13 Thread Chris Evans


Hi Dawson,

Excellent project.

Can I suggest that you check for signedness issues? A typical signature of
a signedness problem is:

int i = get_from_userspace_somehow();
/* Sanity check i */
if (i > MAX_LEN_FOR_I)
  goto bad_bad_out;
/* Bug here!! i can be negative! */


I suspect you find a lot of these sort of errors. I've already nailed a
few.

Cheers
Chris

On Fri, 13 Apr 2001, Dawson Engler wrote:

>
> We're looking at making a set of security checkers.  Does anyone have
> suggestions for good things to go after in addition to the usual
> copy_*_user and buffer overrun bugs?  For example, are there any
> documents that describe the rules for when/how 'capable' is supposed to
> be used?
>
> Thanks for any help,
> Dawson
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [CHECKER] security rules?

2001-04-13 Thread Chris Evans


Hi Dawson,

Excellent project.

Can I suggest that you check for signedness issues? A typical signature of
a signedness problem is:

int i = get_from_userspace_somehow();
/* Sanity check i */
if (i  MAX_LEN_FOR_I)
  goto bad_bad_out;
/* Bug here!! i can be negative! */


I suspect you find a lot of these sort of errors. I've already nailed a
few.

Cheers
Chris

On Fri, 13 Apr 2001, Dawson Engler wrote:


 We're looking at making a set of security checkers.  Does anyone have
 suggestions for good things to go after in addition to the usual
 copy_*_user and buffer overrun bugs?  For example, are there any
 documents that describe the rules for when/how 'capable' is supposed to
 be used?

 Thanks for any help,
 Dawson
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Non-root sshd and capabilities

2001-03-19 Thread Chris Evans


[cc: security-audit, because it's interesting :-)]

On Sun, 18 Mar 2001, Topi Miettinen wrote:

> (Please cc: me, I'm not subscribed.)
>
> Using the magical prctl() call it's possible to run daemons as non-root
> while still possessing some capabilities. For full support, patched kernel
> with ext2 capabilities is required, but if the daemon doesn't exec()
> anything (for example, by emulating exec() with mmap()), stock 2.4 is
> enough.

Kernel 2.2.18 (I think) also added this prctl().

> This works well for programs like pppd, hwclock and XFree86. There is a
> problem if the daemon uses setuid() and setgid() to change identity, like
> sshd or cron. In function cap_emulate_setxuid() (in kernel/sys.c) the
> capabilities are cleared when IDs are switched. However, the check misses
> the case where old_*uid are already nonzero. This patch attempts to fix
> the problem.

[...]

> Any suggestions?

No comments on the patch/bug you've highlighted, but I've got some
comments on the general approach.

Firstly, changing sshd so it runs with minimal privilege, is an excellent
project. You only need to look at the recent deattack.c vulnerability to
see why. I was going to tackle this once I finished vsftpd (also makes use
of capabilities and the prctl()).

However, I don't think running any daemon with CAP_SETUID can be
considered running with "minimal privilege". With CAP_SETUID, you can
change your uid to the owner of any number of critical system files, and
gain full access, as if you hadn't bothered using capabilities at all.
Even inside a chroot() jail, you have to be careful with CAP_SETUID. Think
"ptrace(), sysctl()".

Of course, _something_ needs to have CAP_SETUID, otherwise you cannot
switch to the authenticated userid at all. The solution is to have a
minimal privileged helper process, which takes authentication details from
the main sshd process over a pipe or socket. The helper process carefully
validates the authentication details, and if they are correct, switches to
the authenticated user, drops privileges, and runs some action on behalf
of sshd.

The above is a bit of hassle, but extremely powerful and secure. If you
also throw in a bit of chroot(), you can make future sshd holes very low
severity indeed.

For bonus points, make sure that sensitive information such as the private
host key, is only accesible to the privileged helper. Trickier (maybe not
feasible), but useful.

Finally, a comprised sshd session should not be able to compromise other
sshd sessions. This can be accomplished by ensuring the sshd session
processes all have "dumpable == 0" in the kernel, e.g. by starting sshd as
root and doing setuid() to some other userid without any exec()

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Non-root sshd and capabilities

2001-03-19 Thread Chris Evans


[cc: security-audit, because it's interesting :-)]

On Sun, 18 Mar 2001, Topi Miettinen wrote:

 (Please cc: me, I'm not subscribed.)

 Using the magical prctl() call it's possible to run daemons as non-root
 while still possessing some capabilities. For full support, patched kernel
 with ext2 capabilities is required, but if the daemon doesn't exec()
 anything (for example, by emulating exec() with mmap()), stock 2.4 is
 enough.

Kernel 2.2.18 (I think) also added this prctl().

 This works well for programs like pppd, hwclock and XFree86. There is a
 problem if the daemon uses setuid() and setgid() to change identity, like
 sshd or cron. In function cap_emulate_setxuid() (in kernel/sys.c) the
 capabilities are cleared when IDs are switched. However, the check misses
 the case where old_*uid are already nonzero. This patch attempts to fix
 the problem.

[...]

 Any suggestions?

No comments on the patch/bug you've highlighted, but I've got some
comments on the general approach.

Firstly, changing sshd so it runs with minimal privilege, is an excellent
project. You only need to look at the recent deattack.c vulnerability to
see why. I was going to tackle this once I finished vsftpd (also makes use
of capabilities and the prctl()).

However, I don't think running any daemon with CAP_SETUID can be
considered running with "minimal privilege". With CAP_SETUID, you can
change your uid to the owner of any number of critical system files, and
gain full access, as if you hadn't bothered using capabilities at all.
Even inside a chroot() jail, you have to be careful with CAP_SETUID. Think
"ptrace(), sysctl()".

Of course, _something_ needs to have CAP_SETUID, otherwise you cannot
switch to the authenticated userid at all. The solution is to have a
minimal privileged helper process, which takes authentication details from
the main sshd process over a pipe or socket. The helper process carefully
validates the authentication details, and if they are correct, switches to
the authenticated user, drops privileges, and runs some action on behalf
of sshd.

The above is a bit of hassle, but extremely powerful and secure. If you
also throw in a bit of chroot(), you can make future sshd holes very low
severity indeed.

For bonus points, make sure that sensitive information such as the private
host key, is only accesible to the privileged helper. Trickier (maybe not
feasible), but useful.

Finally, a comprised sshd session should not be able to compromise other
sshd sessions. This can be accomplished by ensuring the sshd session
processes all have "dumpable == 0" in the kernel, e.g. by starting sshd as
root and doing setuid() to some other userid without any exec()

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: system hang with "__alloc_page: 1-order allocation failed"

2001-03-13 Thread Chris Evans


On Tue, 13 Mar 2001, Manfred Spraul wrote:

> * bugfixes for get_pid(). This is the longest part of the patch, but
> it's only necessary if you have more than 10.000 threads running. If you
> have enough memory: launch a forkbomb. If ~ 32760 thread are running the
> kernel enters an endless loop in get_pid() (or around 11000 threads if
> they intentionally create additional sessions and process groups)

I thought (on Intel) there was a 4092 hard limit?

Chers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: system hang with __alloc_page: 1-order allocation failed

2001-03-13 Thread Chris Evans


On Tue, 13 Mar 2001, Manfred Spraul wrote:

 * bugfixes for get_pid(). This is the longest part of the patch, but
 it's only necessary if you have more than 10.000 threads running. If you
 have enough memory: launch a forkbomb. If ~ 32760 thread are running the
 kernel enters an endless loop in get_pid() (or around 11000 threads if
 they intentionally create additional sessions and process groups)

I thought (on Intel) there was a 4092 hard limit?

Chers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch][rfc][rft] vm throughput 2.4.2-ac4

2001-03-01 Thread Chris Evans


On Thu, 1 Mar 2001, Rik van Riel wrote:

> True. I think we want something in-between our ideas...
^^^
> a while. This should make it possible for the disk reads to
^^

Oh dear.. not more "vm design by waving hands in the air". Come on people,
improve the vm by careful profiling, tweaking and benching, not by
throwing random patches in that seem cool in theory.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch][rfc][rft] vm throughput 2.4.2-ac4

2001-03-01 Thread Chris Evans


On Thu, 1 Mar 2001, Rik van Riel wrote:

 True. I think we want something in-between our ideas...
^^^
 a while. This should make it possible for the disk reads to
^^

Oh dear.. not more "vm design by waving hands in the air". Come on people,
improve the vm by careful profiling, tweaking and benching, not by
throwing random patches in that seem cool in theory.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.1 under heavy network load - more info

2001-02-23 Thread Chris Evans


On Wed, 21 Feb 2001, Rik van Riel wrote:

> I'm really interested in things which make Linux 2.4 break
> performance-wise since I'd like to have them fixed before the
> distributions start shipping 2.4 as default.

Hi Rik,

With kernel 2.4.1, I found that caching is way too aggressive. I was
running konqueror in 32Mb (the quest for a lightwieght browser!)
Unfortunately, the system seemed to insist on keeping 16Mb used for
caches, with 15Mb given to the application and X. This led to a lot of
swapping and paging by konqueror. I think the browser would be fully
usable in 32Mb, were the caching not out of balance.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.1 under heavy network load - more info

2001-02-23 Thread Chris Evans


On Wed, 21 Feb 2001, Rik van Riel wrote:

 I'm really interested in things which make Linux 2.4 break
 performance-wise since I'd like to have them fixed before the
 distributions start shipping 2.4 as default.

Hi Rik,

With kernel 2.4.1, I found that caching is way too aggressive. I was
running konqueror in 32Mb (the quest for a lightwieght browser!)
Unfortunately, the system seemed to insist on keeping 16Mb used for
caches, with 15Mb given to the application and X. This led to a lot of
swapping and paging by konqueror. I think the browser would be fully
usable in 32Mb, were the caching not out of balance.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SO_SNDTIMEO: 2.4 kernel bugs

2001-02-19 Thread Chris Evans


On Mon, 19 Feb 2001 [EMAIL PROTECTED] wrote:

> Wakeup does not happen until _enough_ (1/3 of snbuf) of space in sndbuf
> is released, otherwise you will overschedule. So, as soon as
> write() goes to sleep, it will sleep waiting until 1/3 is released.

Of course. Thank you.

> If it is interrupted, it use all the released space immediately before
> exit. Again, to make more for in this context. This can be even wrong
> and, probably, we should return instantly with -EAGAIN/-EINTR/partial
> count, but it is most likely suboptimal (though I have already changed
> this to instant return). But this does not look essential from
> caller's viewpoint, except for sendfile() of course. 8)

Cool.

I think the proper fix, long term, is to fix our internal I/O routine APIs
so that they are capable of returning a byte count _and_ an error. One
day, that might be a useful thing to export to userspace.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SO_SNDTIMEO: 2.4 kernel bugs

2001-02-19 Thread Chris Evans


On Mon, 19 Feb 2001 [EMAIL PROTECTED] wrote:

 Wakeup does not happen until _enough_ (1/3 of snbuf) of space in sndbuf
 is released, otherwise you will overschedule. So, as soon as
 write() goes to sleep, it will sleep waiting until 1/3 is released.

Of course. Thank you.

 If it is interrupted, it use all the released space immediately before
 exit. Again, to make more for in this context. This can be even wrong
 and, probably, we should return instantly with -EAGAIN/-EINTR/partial
 count, but it is most likely suboptimal (though I have already changed
 this to instant return). But this does not look essential from
 caller's viewpoint, except for sendfile() of course. 8)

Cool.

I think the proper fix, long term, is to fix our internal I/O routine APIs
so that they are capable of returning a byte count _and_ an error. One
day, that might be a useful thing to export to userspace.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



sendfile() breakage was Re: SO_SNDTIMEO: 2.4 kernel bugs

2001-02-18 Thread Chris Evans


On Mon, 19 Feb 2001, Chris Evans wrote:

> > BTW, if you have enough fast network, you probably can observe
> > that sendfile() is even not interrupted by signals. 8) But this
> > is possible to fix at least. BTW the same fix will repair SO_*TIMEO
> > partially, i.e. it will timeout after n*timeo, where n is an arbitrary
> > number not exceeding size/sndbuf.
>
> Hi Alexey,
>
> You are right - our sendfile() implementation is broken. I have fixed it
> (patch at end of mail).

Actually the whole mess stems from our broken internal ->write() and
->read() APIs.

The _single_ return value is trying to convery _two_ pieces of information
- always a bad move. They are:
1) Success/failure (and error code if it's a failure)
2) Amount of bytes read or written

This bogon does not allow for the following information to be returned
(assume I asked for 8192 bytes to be written):
"4096 bytes were written, and the operation was aborted due to EINTR"

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SO_SNDTIMEO: 2.4 kernel bugs

2001-02-18 Thread Chris Evans


On Sun, 18 Feb 2001 [EMAIL PROTECTED] wrote:

> Hello!
>
> > Unfortunately, I discovered a bug with SO_SNDTIMEO/sendfile():
>
> None of the options apply to sendfile(). It is not socket level
> operation. You have to use alarm for it.
>
> BTW, if you have enough fast network, you probably can observe
> that sendfile() is even not interrupted by signals. 8) But this
> is possible to fix at least. BTW the same fix will repair SO_*TIMEO
> partially, i.e. it will timeout after n*timeo, where n is an arbitrary
> number not exceeding size/sndbuf.

Hi Alexey,

You are right - our sendfile() implementation is broken. I have fixed it
(patch at end of mail).

However, I believe something is still wrong in the networking layer, even
with my fix applied.

Before I go into details, I want to step back and describe things from a
_users_ perspective. That is most important after all.

Take two different operations: write() to a socket and sendfile() down a
socket. In both cases, the socket has a send timeout of 10 seconds. From a
users' point of view, these are two socket write operations. The source of
data is different (a buffer or a file descriptor), but that is irrelevant.
The user has the right to expect a timeout after 10 seconds of no
progress, on both operations.

I have tried this on FreeBSD, and this is what happens: both sendfile()
and write() timeout in the same way.

On Linux, this is not the case => bug. I fixed a small sendfile() issue,
which did not recognise partial writes as an interruption, but as I said
above, the bug still remains.

Investigation shows that the Linux network layer is behaving oddly. It
seems that we are writing 4096 bytes to a socket. This proceeds in 4096
byte chunks until the send buffer on the socket is full, and a 4096 byte
write blocks. This blocking write is eventually interrupted by the
timeout, and the write call returns.. wait for it.. 4096! This suggests
there was socket space after all, and the call should not have blocked.

I wonder what is going on? I'd like to get this fixed. I think the FreeBSD
behaviour is definitely correct and we want it on Linux.

Cheers
Chris

--- filemap.c.old   Sun Feb 18 23:35:06 2001
+++ filemap.c   Mon Feb 19 00:13:38 2001
@@ -1062,7 +1062,7 @@

for (;;) {
struct page *page, **hash;
-   unsigned long end_index, nr;
+   unsigned long end_index, nr, actor_ret;

end_index = inode->i_size >> PAGE_CACHE_SHIFT;
if (index > end_index)
@@ -1110,13 +1110,13 @@
 * "pos" here (the actor routine has to update the user
buffer
 * pointers and the remaining count).
 */
-   nr = actor(desc, page, offset, nr);
-   offset += nr;
+   actor_ret = actor(desc, page, offset, nr);
+   offset += actor_ret;
index += offset >> PAGE_CACHE_SHIFT;
offset &= ~PAGE_CACHE_MASK;

page_cache_release(page);
-   if (nr && desc->count)
+   if (actor_ret == nr && desc->count)
continue;
break;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SO_SNDTIMEO: 2.4 kernel bugs

2001-02-18 Thread Chris Evans


On Sun, 18 Feb 2001 [EMAIL PROTECTED] wrote:

> Hello!
>
> > So the actual timeout would be 2 * SO_SNDTIMEO.
>
> It will timeout if write of some page blocks for SO_SNDTIMEO.

.. unless that page was partially written, in which case a short write
count is returned (rather than a timeout error), and the loop goes around
again.

> If transmission of any page never takes more than SO_SNDTIMEO it never
> times out.

Which is good, because SO_SNDTIMEO is an inactivity monitor.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SO_SNDTIMEO: 2.4 kernel bugs

2001-02-18 Thread Chris Evans


On Sun, 18 Feb 2001 [EMAIL PROTECTED] wrote:

> Hello!
>
> > Unfortunately, I discovered a bug with SO_SNDTIMEO/sendfile():
>
> None of the options apply to sendfile(). It is not socket level
> operation. You have to use alarm for it.

Hi Alexey,

Actually sendfile() _does_ timeout using SO_SNDTIMEO. It just takes longer
to timeout because the kernel sendfile() page loop will (usually) need to
timeout a short write, and then timeout a 0 byte write.

So the actual timeout would be 2 * SO_SNDTIMEO.

Unfortunately, I'm seeing timeout at (I think) 3 * SO_SNDTIMEO, which I
can't account for.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SO_SNDTIMEO: 2.4 kernel bugs

2001-02-18 Thread Chris Evans


On Sun, 18 Feb 2001 [EMAIL PROTECTED] wrote:

 Hello!

  Unfortunately, I discovered a bug with SO_SNDTIMEO/sendfile():

 None of the options apply to sendfile(). It is not socket level
 operation. You have to use alarm for it.

Hi Alexey,

Actually sendfile() _does_ timeout using SO_SNDTIMEO. It just takes longer
to timeout because the kernel sendfile() page loop will (usually) need to
timeout a short write, and then timeout a 0 byte write.

So the actual timeout would be 2 * SO_SNDTIMEO.

Unfortunately, I'm seeing timeout at (I think) 3 * SO_SNDTIMEO, which I
can't account for.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SO_SNDTIMEO: 2.4 kernel bugs

2001-02-18 Thread Chris Evans


On Sun, 18 Feb 2001 [EMAIL PROTECTED] wrote:

 Hello!

  So the actual timeout would be 2 * SO_SNDTIMEO.

 It will timeout if write of some page blocks for SO_SNDTIMEO.

.. unless that page was partially written, in which case a short write
count is returned (rather than a timeout error), and the loop goes around
again.

 If transmission of any page never takes more than SO_SNDTIMEO it never
 times out.

Which is good, because SO_SNDTIMEO is an inactivity monitor.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SO_SNDTIMEO: 2.4 kernel bugs

2001-02-18 Thread Chris Evans


On Sun, 18 Feb 2001 [EMAIL PROTECTED] wrote:

 Hello!

  Unfortunately, I discovered a bug with SO_SNDTIMEO/sendfile():

 None of the options apply to sendfile(). It is not socket level
 operation. You have to use alarm for it.

 BTW, if you have enough fast network, you probably can observe
 that sendfile() is even not interrupted by signals. 8) But this
 is possible to fix at least. BTW the same fix will repair SO_*TIMEO
 partially, i.e. it will timeout after n*timeo, where n is an arbitrary
 number not exceeding size/sndbuf.

Hi Alexey,

You are right - our sendfile() implementation is broken. I have fixed it
(patch at end of mail).

However, I believe something is still wrong in the networking layer, even
with my fix applied.

Before I go into details, I want to step back and describe things from a
_users_ perspective. That is most important after all.

Take two different operations: write() to a socket and sendfile() down a
socket. In both cases, the socket has a send timeout of 10 seconds. From a
users' point of view, these are two socket write operations. The source of
data is different (a buffer or a file descriptor), but that is irrelevant.
The user has the right to expect a timeout after 10 seconds of no
progress, on both operations.

I have tried this on FreeBSD, and this is what happens: both sendfile()
and write() timeout in the same way.

On Linux, this is not the case = bug. I fixed a small sendfile() issue,
which did not recognise partial writes as an interruption, but as I said
above, the bug still remains.

Investigation shows that the Linux network layer is behaving oddly. It
seems that we are writing 4096 bytes to a socket. This proceeds in 4096
byte chunks until the send buffer on the socket is full, and a 4096 byte
write blocks. This blocking write is eventually interrupted by the
timeout, and the write call returns.. wait for it.. 4096! This suggests
there was socket space after all, and the call should not have blocked.

I wonder what is going on? I'd like to get this fixed. I think the FreeBSD
behaviour is definitely correct and we want it on Linux.

Cheers
Chris

--- filemap.c.old   Sun Feb 18 23:35:06 2001
+++ filemap.c   Mon Feb 19 00:13:38 2001
@@ -1062,7 +1062,7 @@

for (;;) {
struct page *page, **hash;
-   unsigned long end_index, nr;
+   unsigned long end_index, nr, actor_ret;

end_index = inode-i_size  PAGE_CACHE_SHIFT;
if (index  end_index)
@@ -1110,13 +1110,13 @@
 * "pos" here (the actor routine has to update the user
buffer
 * pointers and the remaining count).
 */
-   nr = actor(desc, page, offset, nr);
-   offset += nr;
+   actor_ret = actor(desc, page, offset, nr);
+   offset += actor_ret;
index += offset  PAGE_CACHE_SHIFT;
offset = ~PAGE_CACHE_MASK;

page_cache_release(page);
-   if (nr  desc-count)
+   if (actor_ret == nr  desc-count)
continue;
break;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



sendfile() breakage was Re: SO_SNDTIMEO: 2.4 kernel bugs

2001-02-18 Thread Chris Evans


On Mon, 19 Feb 2001, Chris Evans wrote:

  BTW, if you have enough fast network, you probably can observe
  that sendfile() is even not interrupted by signals. 8) But this
  is possible to fix at least. BTW the same fix will repair SO_*TIMEO
  partially, i.e. it will timeout after n*timeo, where n is an arbitrary
  number not exceeding size/sndbuf.

 Hi Alexey,

 You are right - our sendfile() implementation is broken. I have fixed it
 (patch at end of mail).

Actually the whole mess stems from our broken internal -write() and
-read() APIs.

The _single_ return value is trying to convery _two_ pieces of information
- always a bad move. They are:
1) Success/failure (and error code if it's a failure)
2) Amount of bytes read or written

This bogon does not allow for the following information to be returned
(assume I asked for 8192 bytes to be written):
"4096 bytes were written, and the operation was aborted due to EINTR"

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SO_SNDTIMEO: 2.4 kernel bugs

2001-02-17 Thread Chris Evans


Hi,

By the way - I tested SO_RCVLOWAT, another 2.4 addition. Good news this
time - seems to work fine.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SO_SNDTIMEO: 2.4 kernel bugs

2001-02-17 Thread Chris Evans


Hi Alexey,

This patch fixes my simple read()/write() tests, nice one. The behaviour
also now matches BSD (someone kindly donated me a FreeBSD shell for
testing).

Unfortunately, I discovered a bug with SO_SNDTIMEO/sendfile():

- Connect an AF_INET, SOCK_STREAM socket to a local listening socket.
- Set 5 seconds SO_SNDTIMEO on the connected socket
- Do a sendfile() from a big file down the connected socket. Make sure the
size is big (e.g. 1Mb) so the call blocks.
--> BUG!! The call blocks indefinitely rather than being interrupted after
5 seconds.

Cheers
Chris

On Sat, 17 Feb 2001 [EMAIL PROTECTED] wrote:

> Hello!
>
> > Unfortunately, it seems to be very buggy. Here are two buggy scenarios.
>
>
> --- ../vger3-010210/linux/net/ipv4/tcp.c  Sat Feb 10 23:16:51 2001
> +++ linux/net/ipv4/tcp.c  Sat Feb 17 23:27:43 2001
> @@ -691,6 +691,8 @@
>
>   set_current_state(TASK_INTERRUPTIBLE);
>
> + if (!timeo)
> + break;
>   if (signal_pending(current))
>   break;
>   if (tcp_memory_free(sk) && !vm_wait)
> --- ../vger3-010210/linux/net/core/sock.c Tue Jan 30 21:20:16 2001
> +++ linux/net/core/sock.c Sat Feb 17 23:27:44 2001
> @@ -727,6 +727,8 @@
>   clear_bit(SOCK_ASYNC_NOSPACE, >socket->flags);
>   add_wait_queue(sk->sleep, );
>   for (;;) {
> + if (!timeo)
> + break;
>   if (signal_pending(current))
>   break;
>   set_bit(SOCK_NOSPACE, >socket->flags);
>


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SO_SNDTIMEO: 2.4 kernel bugs

2001-02-17 Thread Chris Evans


Alexey,

Damn you are quick! :) Testing immediately

Cheers
Chris

On Sat, 17 Feb 2001 [EMAIL PROTECTED] wrote:

> Hello!
>
> > Unfortunately, it seems to be very buggy. Here are two buggy scenarios.
>
>
> --- ../vger3-010210/linux/net/ipv4/tcp.c  Sat Feb 10 23:16:51 2001
> +++ linux/net/ipv4/tcp.c  Sat Feb 17 23:27:43 2001
> @@ -691,6 +691,8 @@
>
>   set_current_state(TASK_INTERRUPTIBLE);
>
> + if (!timeo)
> + break;
>   if (signal_pending(current))
>   break;
>   if (tcp_memory_free(sk) && !vm_wait)
> --- ../vger3-010210/linux/net/core/sock.c Tue Jan 30 21:20:16 2001
> +++ linux/net/core/sock.c Sat Feb 17 23:27:44 2001
> @@ -727,6 +727,8 @@
>   clear_bit(SOCK_ASYNC_NOSPACE, >socket->flags);
>   add_wait_queue(sk->sleep, );
>   for (;;) {
> + if (!timeo)
> + break;
>   if (signal_pending(current))
>   break;
>   set_bit(SOCK_NOSPACE, >socket->flags);
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SO_SNDTIMEO: 2.4 kernel bugs

2001-02-17 Thread Chris Evans


Alexey,

Damn you are quick! :) Testing immediately

Cheers
Chris

On Sat, 17 Feb 2001 [EMAIL PROTECTED] wrote:

 Hello!

  Unfortunately, it seems to be very buggy. Here are two buggy scenarios.


 --- ../vger3-010210/linux/net/ipv4/tcp.c  Sat Feb 10 23:16:51 2001
 +++ linux/net/ipv4/tcp.c  Sat Feb 17 23:27:43 2001
 @@ -691,6 +691,8 @@

   set_current_state(TASK_INTERRUPTIBLE);

 + if (!timeo)
 + break;
   if (signal_pending(current))
   break;
   if (tcp_memory_free(sk)  !vm_wait)
 --- ../vger3-010210/linux/net/core/sock.c Tue Jan 30 21:20:16 2001
 +++ linux/net/core/sock.c Sat Feb 17 23:27:44 2001
 @@ -727,6 +727,8 @@
   clear_bit(SOCK_ASYNC_NOSPACE, sk-socket-flags);
   add_wait_queue(sk-sleep, wait);
   for (;;) {
 + if (!timeo)
 + break;
   if (signal_pending(current))
   break;
   set_bit(SOCK_NOSPACE, sk-socket-flags);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SO_SNDTIMEO: 2.4 kernel bugs

2001-02-17 Thread Chris Evans


Hi Alexey,

This patch fixes my simple read()/write() tests, nice one. The behaviour
also now matches BSD (someone kindly donated me a FreeBSD shell for
testing).

Unfortunately, I discovered a bug with SO_SNDTIMEO/sendfile():

- Connect an AF_INET, SOCK_STREAM socket to a local listening socket.
- Set 5 seconds SO_SNDTIMEO on the connected socket
- Do a sendfile() from a big file down the connected socket. Make sure the
size is big (e.g. 1Mb) so the call blocks.
-- BUG!! The call blocks indefinitely rather than being interrupted after
5 seconds.

Cheers
Chris

On Sat, 17 Feb 2001 [EMAIL PROTECTED] wrote:

 Hello!

  Unfortunately, it seems to be very buggy. Here are two buggy scenarios.


 --- ../vger3-010210/linux/net/ipv4/tcp.c  Sat Feb 10 23:16:51 2001
 +++ linux/net/ipv4/tcp.c  Sat Feb 17 23:27:43 2001
 @@ -691,6 +691,8 @@

   set_current_state(TASK_INTERRUPTIBLE);

 + if (!timeo)
 + break;
   if (signal_pending(current))
   break;
   if (tcp_memory_free(sk)  !vm_wait)
 --- ../vger3-010210/linux/net/core/sock.c Tue Jan 30 21:20:16 2001
 +++ linux/net/core/sock.c Sat Feb 17 23:27:44 2001
 @@ -727,6 +727,8 @@
   clear_bit(SOCK_ASYNC_NOSPACE, sk-socket-flags);
   add_wait_queue(sk-sleep, wait);
   for (;;) {
 + if (!timeo)
 + break;
   if (signal_pending(current))
   break;
   set_bit(SOCK_NOSPACE, sk-socket-flags);



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SO_SNDTIMEO: 2.4 kernel bugs

2001-02-17 Thread Chris Evans


Hi,

By the way - I tested SO_RCVLOWAT, another 2.4 addition. Good news this
time - seems to work fine.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



SO_SNDTIMEO: 2.4 kernel bugs

2001-02-16 Thread Chris Evans


Hi,

I was glad to see Linux gain SO_SNDTIMEO in kernel 2.4. It is a very use
feature which can avoid complexity and pain in userspace programs.

Unfortunately, it seems to be very buggy. Here are two buggy scenarios.

1)
Create a socketpair(), PF_UNIX, SOCK_STREAM.
Set a 5 second SO_SNDTIMEO on the socket.
write() 100k down the socket in one write(), i.e. enough to cause the
write to have to block.
--> BUG!!! The call blocks indefinitely instead of returning after 5
seconds

(Note that the same test but with SO_RCVTIMEO and a read() works as
expected - I get EAGAIN after 5 seconds).


2)
Create a localhost listening socket - AF_INET, SOCK_STREAM.
Connect to the listening port
Set a 5 second SO_SNDTIMEO on the socket.
write() 1Mb down the socket in one write(), i.e. enough to cause it to
have to block
-> The write() will return after 5 seconds with a partial write count.
GOOD!
Repeat the write() - send another 1Mb.
--> BUG!! The call blocks indefinitely instead of returning with EAGAIN
after 5s.


I hope this is detailled enough. I'm trying to gain access to a FreeBSD
box to compare results..

Cheers
Chris


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



SO_SNDTIMEO: 2.4 kernel bugs

2001-02-16 Thread Chris Evans


Hi,

I was glad to see Linux gain SO_SNDTIMEO in kernel 2.4. It is a very use
feature which can avoid complexity and pain in userspace programs.

Unfortunately, it seems to be very buggy. Here are two buggy scenarios.

1)
Create a socketpair(), PF_UNIX, SOCK_STREAM.
Set a 5 second SO_SNDTIMEO on the socket.
write() 100k down the socket in one write(), i.e. enough to cause the
write to have to block.
-- BUG!!! The call blocks indefinitely instead of returning after 5
seconds

(Note that the same test but with SO_RCVTIMEO and a read() works as
expected - I get EAGAIN after 5 seconds).


2)
Create a localhost listening socket - AF_INET, SOCK_STREAM.
Connect to the listening port
Set a 5 second SO_SNDTIMEO on the socket.
write() 1Mb down the socket in one write(), i.e. enough to cause it to
have to block
- The write() will return after 5 seconds with a partial write count.
GOOD!
Repeat the write() - send another 1Mb.
-- BUG!! The call blocks indefinitely instead of returning with EAGAIN
after 5s.


I hope this is detailled enough. I'm trying to gain access to a FreeBSD
box to compare results..

Cheers
Chris


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



SO_RCVTIMEO, SO_SNDTIMEO

2001-02-12 Thread Chris Evans


Hi,

I notice the entities in the subject line have appeared in Linux 2.4.

What is their functional specification? I guess they trigger if no bytes
are received/send within a consecutive period. How does the app get the
error? -EPIPE for a blocking read/write? If so, does SIGPIPE
get raised? Or is -ETIMEDOUT used? ...

TIA,
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://vger.kernel.org/lkml/



Re: BUG: SO_LINGER + shutdown() does not block?

2001-02-11 Thread Chris Evans


On Sun, 11 Feb 2001, Andi Kleen wrote:

> On Sun, Feb 11, 2001 at 08:41:04PM +0000, Chris Evans wrote:
> >
> > [cc: Andi]
>
> Missing context..

[...]

> What do you exactly think is wrong?

man socket(7) says that setting SO_LINGER on a socket will make shutdown()
and close() block. That's incorrect; only close() blocks.

Sorry for the missing context.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: BUG: SO_LINGER + shutdown() does not block?

2001-02-11 Thread Chris Evans


[cc: Andi]

On Sun, 11 Feb 2001 [EMAIL PROTECTED] wrote:

> Hello!
>
> > I'm not seeing shutdown(2) block on a TCP socket. This is Linux kernel
> > 2.2.16 (RH7.0). Is this a kernel bug, a documentation bug,
>
> Man page is wrong.

Yes, man socket(7) seems to be wrong.

I don't have access to a genuine BSD at the moment, but from man pages:
- HP/UX specifically states that SO_LINGER has no effect on shutdown()
- Solaris SO_LINGER only mentions that close() is affected.
- Likewise FreeBSD

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



BUG: SO_LINGER + shutdown() does not block?

2001-02-11 Thread Chris Evans


Hi,

>From socket(7):

   SO_LINGER
...
  When  enabled,  a  close(2) or shutdown(2) will not
  return until all queued  messages  for  the  socket
  have  been  successfully sent or the linger timeout
  has been reached.

I'm not seeing shutdown(2) block on a TCP socket. This is Linux kernel
2.2.16 (RH7.0). Is this a kernel bug, a documentation bug, or does it all
work fine and it's a Chris bug?

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



BUG: SO_LINGER + shutdown() does not block?

2001-02-11 Thread Chris Evans


Hi,

From socket(7):

   SO_LINGER
...
  When  enabled,  a  close(2) or shutdown(2) will not
  return until all queued  messages  for  the  socket
  have  been  successfully sent or the linger timeout
  has been reached.

I'm not seeing shutdown(2) block on a TCP socket. This is Linux kernel
2.2.16 (RH7.0). Is this a kernel bug, a documentation bug, or does it all
work fine and it's a Chris bug?

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: BUG: SO_LINGER + shutdown() does not block?

2001-02-11 Thread Chris Evans


On Sun, 11 Feb 2001, Andi Kleen wrote:

 On Sun, Feb 11, 2001 at 08:41:04PM +, Chris Evans wrote:
 
  [cc: Andi]

 Missing context..

[...]

 What do you exactly think is wrong?

man socket(7) says that setting SO_LINGER on a socket will make shutdown()
and close() block. That's incorrect; only close() blocks.

Sorry for the missing context.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: sard on kernel 2.4

2001-02-02 Thread Chris Evans


On Fri, 2 Feb 2001, Marcelo Tosatti wrote:

>
> Linus,
>
> There is a significative amount of people who use sard's additional block
> layer statistics (I'm one of them). It would be nice to have it in the
> official free.

Definitely.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: sard on kernel 2.4

2001-02-02 Thread Chris Evans


On Fri, 2 Feb 2001, Marcelo Tosatti wrote:


 Linus,

 There is a significative amount of people who use sard's additional block
 layer statistics (I'm one of them). It would be nice to have it in the
 official free.

Definitely.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Serious reproducible 2.4.x kernel hang

2001-02-01 Thread Chris Evans


[cc: davem because of the severity]

On Thu, 1 Feb 2001, Malcolm Beattie wrote:

> rid of the hang. So it looks as though some combination of
> shutdown(2) and SIGABRT is at fault. After the hang the kernel-side

Nope - I've nailed it to a _really_ simple test case. It looks like a
read() on a shutdown() unix dgram socket just kills the kernel. Demo code
below. I wonder if this affects UP or is SMP only?

Malcolm, does the below code reproduce the problem for you?

Cheers
Chris

#include 
#include 
#include 
#include 

int
main(int argc, const char* argv[])
{
  int retval;
  int sockets[2];
  char buf[1];

  retval = socketpair(PF_UNIX, SOCK_DGRAM, 0, sockets);
  if (retval != 0)
  {
perror("socketpair");
exit(1);
  }
  shutdown(sockets[0], SHUT_RDWR);
  read(sockets[0], buf, 1);
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Serious reproducible 2.4.x kernel hang

2001-02-01 Thread Chris Evans


On Thu, 1 Feb 2001, Malcolm Beattie wrote:

> Mapping the addresses from whichever ScrollLock combination produced
> the task list to symbols produces the call trace
>  do_exit <- do_signal <- tcp_destroy_sock <- inet_ioctl <- signal_return
>
> The inet_ioctl is odd there--vsftpd doesn't explicitly call ioctl
> anywhere at all and the next function before it in memory is
> inet_shutdown which looks more believable. I have checked I'm looking

Probably, the empty SIGPIPE handler triggered. The response to this is a
lot of shutdown() close() and finally an exit().

The trace you give above looks like the child process trace. I always see
the parent process go nuts. The parent process is almost always blocking
on read() of a unix dgram socket, which it shares with the child. The
child does a shutdown() on this socket just before exit().

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Serious reproducible 2.4.x kernel hang

2001-02-01 Thread Chris Evans


On Thu, 1 Feb 2001, Malcolm Beattie wrote:

> Chris Evans writes:
> > I've just managed to reproduce this personally on 2.4.0. I've had a report
> > that 2.4.1 is also affected. Both myself and the other person who
> > reproduced this have SMP i686 machines, which may or may not be relevant.
> >
> > To reproduce, all you need to do is get my vsftpd ftp server:
> > ftp://ferret.lmh.ox.ac.uk/pub/linux/vsftpd-0.0.9.tar.gz
>
> I got this just before lunch too. I was trying out 2.4.1 + zerocopy
> (with netfilter configured off, see the sendfile/zerocopy thread for

[...]

I reproduced with 2.4.1.

> Looking at the kernel's EIP every so often to see what was going
> showed remove_wait_queue, add_wait_queue, skb_recv_datagram and
> wait_for_packet mostly. Random thought: if vsftpd did a sendfile and
> then exited, becoming a zombie, could there be a problem with
> tearing down a sendfile mapping? I'm off to read some code.

I get it simply doing CTRL-C at the ftp logon prompt. No sendfile has been
used at this point. Trying to distill a test case...

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Serious reproducible 2.4.x kernel hang

2001-02-01 Thread Chris Evans


Hi,

I've just managed to reproduce this personally on 2.4.0. I've had a report
that 2.4.1 is also affected. Both myself and the other person who
reproduced this have SMP i686 machines, which may or may not be relevant.

To reproduce, all you need to do is get my vsftpd ftp server:
ftp://ferret.lmh.ox.ac.uk/pub/linux/vsftpd-0.0.9.tar.gz

It runs from inetd. Connect using the Linux command line ftp client, to
localhost, and simply press CTRL-C. If it matters, I'm using RH7.0
software.

After the first iteration of this, I'm left with:
[chris@localhost chris]$ ps auwx | grep ftp
root   713 99.9  0.4  1416  592 ?SN   22:01  38:17 vsftpd
/etc/vsftpd.conf
nobody 715  0.0  0.0 00 ?ZN   22:01   0:00 [vsftpd
]

As you can see, the root process is burning 100% of one of my CPUs. It
_cannot_ be killed with kill -9!

>From Alt-Sysrq-T:
Jan 30 22:01:52 localhost kernel: vsftpdS    860   713670
715
 (NOTLB)
Jan 30 22:01:52 localhost kernel: Call Trace:
[smp_apic_timer_interrupt+240/272] [smp_apic_timer_interrupt+240/272]
[update_process_times+32/160] [smp_apic_timer_interrupt+240/272]
[remove_wait_queue+6/48] [wait_for_packet+273/288]
[skb_recv_datagram+205/240]
Jan 30 22:01:52 localhost kernel:[unix_dgram_recvmsg+69/256]
[sock_recvmsg+53/176] [sock_read+134/144] [sys_read+150/208]
[system_call+51/56]
Jan 30 22:01:52 localhost kernel: vsftpdZ C5E07040  1408   715713
 (L-TLB)
Jan 30 22:01:52 localhost kernel: Call Trace: [do_exit+628/672]
[system_call+51/56]

As we can see, the 100% CPU broken process has got stuck in a blocking
read() on a unix socket.

If I repeat the ftp connect/CTRL-C process again, I get a totally dead
machine.

Hope this is sufficient info. I'll try and write a minimal test case.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Serious reproducible 2.4.x kernel hang

2001-02-01 Thread Chris Evans


Hi,

I've just managed to reproduce this personally on 2.4.0. I've had a report
that 2.4.1 is also affected. Both myself and the other person who
reproduced this have SMP i686 machines, which may or may not be relevant.

To reproduce, all you need to do is get my vsftpd ftp server:
ftp://ferret.lmh.ox.ac.uk/pub/linux/vsftpd-0.0.9.tar.gz

It runs from inetd. Connect using the Linux command line ftp client, to
localhost, and simply press CTRL-C. If it matters, I'm using RH7.0
software.

After the first iteration of this, I'm left with:
[chris@localhost chris]$ ps auwx | grep ftp
root   713 99.9  0.4  1416  592 ?SN   22:01  38:17 vsftpd
/etc/vsftpd.conf
nobody 715  0.0  0.0 00 ?ZN   22:01   0:00 [vsftpd
defunct]

As you can see, the root process is burning 100% of one of my CPUs. It
_cannot_ be killed with kill -9!

From Alt-Sysrq-T:
Jan 30 22:01:52 localhost kernel: vsftpdS    860   713670
715
 (NOTLB)
Jan 30 22:01:52 localhost kernel: Call Trace:
[smp_apic_timer_interrupt+240/272] [smp_apic_timer_interrupt+240/272]
[update_process_times+32/160] [smp_apic_timer_interrupt+240/272]
[remove_wait_queue+6/48] [wait_for_packet+273/288]
[skb_recv_datagram+205/240]
Jan 30 22:01:52 localhost kernel:[unix_dgram_recvmsg+69/256]
[sock_recvmsg+53/176] [sock_read+134/144] [sys_read+150/208]
[system_call+51/56]
Jan 30 22:01:52 localhost kernel: vsftpdZ C5E07040  1408   715713
 (L-TLB)
Jan 30 22:01:52 localhost kernel: Call Trace: [do_exit+628/672]
[system_call+51/56]

As we can see, the 100% CPU broken process has got stuck in a blocking
read() on a unix socket.

If I repeat the ftp connect/CTRL-C process again, I get a totally dead
machine.

Hope this is sufficient info. I'll try and write a minimal test case.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Serious reproducible 2.4.x kernel hang

2001-02-01 Thread Chris Evans


On Thu, 1 Feb 2001, Malcolm Beattie wrote:

 Chris Evans writes:
  I've just managed to reproduce this personally on 2.4.0. I've had a report
  that 2.4.1 is also affected. Both myself and the other person who
  reproduced this have SMP i686 machines, which may or may not be relevant.
 
  To reproduce, all you need to do is get my vsftpd ftp server:
  ftp://ferret.lmh.ox.ac.uk/pub/linux/vsftpd-0.0.9.tar.gz

 I got this just before lunch too. I was trying out 2.4.1 + zerocopy
 (with netfilter configured off, see the sendfile/zerocopy thread for

[...]

I reproduced with 2.4.1.

 Looking at the kernel's EIP every so often to see what was going
 showed remove_wait_queue, add_wait_queue, skb_recv_datagram and
 wait_for_packet mostly. Random thought: if vsftpd did a sendfile and
 then exited, becoming a zombie, could there be a problem with
 tearing down a sendfile mapping? I'm off to read some code.

I get it simply doing CTRL-C at the ftp logon prompt. No sendfile has been
used at this point. Trying to distill a test case...

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Serious reproducible 2.4.x kernel hang

2001-02-01 Thread Chris Evans


On Thu, 1 Feb 2001, Malcolm Beattie wrote:

 Mapping the addresses from whichever ScrollLock combination produced
 the task list to symbols produces the call trace
  do_exit - do_signal - tcp_destroy_sock - inet_ioctl - signal_return

 The inet_ioctl is odd there--vsftpd doesn't explicitly call ioctl
 anywhere at all and the next function before it in memory is
 inet_shutdown which looks more believable. I have checked I'm looking

Probably, the empty SIGPIPE handler triggered. The response to this is a
lot of shutdown() close() and finally an exit().

The trace you give above looks like the child process trace. I always see
the parent process go nuts. The parent process is almost always blocking
on read() of a unix dgram socket, which it shares with the child. The
child does a shutdown() on this socket just before exit().

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Serious reproducible 2.4.x kernel hang

2001-02-01 Thread Chris Evans


[cc: davem because of the severity]

On Thu, 1 Feb 2001, Malcolm Beattie wrote:

 rid of the hang. So it looks as though some combination of
 shutdown(2) and SIGABRT is at fault. After the hang the kernel-side

Nope - I've nailed it to a _really_ simple test case. It looks like a
read() on a shutdown() unix dgram socket just kills the kernel. Demo code
below. I wonder if this affects UP or is SMP only?

Malcolm, does the below code reproduce the problem for you?

Cheers
Chris

#include stdio.h
#include unistd.h
#include sys/types.h
#include sys/socket.h

int
main(int argc, const char* argv[])
{
  int retval;
  int sockets[2];
  char buf[1];

  retval = socketpair(PF_UNIX, SOCK_DGRAM, 0, sockets);
  if (retval != 0)
  {
perror("socketpair");
exit(1);
  }
  shutdown(sockets[0], SHUT_RDWR);
  read(sockets[0], buf, 1);
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[SLUG] RE: Linux Disk Performance/File IO per process

2001-01-29 Thread Chris Evans


On Mon, 29 Jan 2001 [EMAIL PROTECTED] wrote:

> Thanks to both Jens and Chris - this provides the information I need to
> obtain our busy rate
> It's unfortunate that the kernel needs to be patched to provide this
> information - hopefully it will become part of the kernel soon.
>
> I had a response saying that this shouldn't become part of the kernel due to
> the performance cost that obtaining such data will involve. I agree that a
> cost is involved here, however I think it's up to the user to decide which
> cost is more expensive to them - getting the data, or not being able to see
> how busy their disks are. My feeling here is that this support could be user
> configurable at run time - eg 'cat 1 > /proc/getdiskperf'.

Hi,

I disagree with this runtime variable. It is unnecessary complexity.
Maintaining a few counts is total noise compared with the time I/O takes.

Cheers
Chris



-- 
SLUG - Sydney Linux User Group Mailing List - http://slug.org.au/
More Info: http://slug.org.au/lists/listinfo/slug



[SLUG] RE: Linux Disk Performance/File IO per process

2001-01-29 Thread Chris Evans


On Mon, 29 Jan 2001 [EMAIL PROTECTED] wrote:

 Thanks to both Jens and Chris - this provides the information I need to
 obtain our busy rate
 It's unfortunate that the kernel needs to be patched to provide this
 information - hopefully it will become part of the kernel soon.

 I had a response saying that this shouldn't become part of the kernel due to
 the performance cost that obtaining such data will involve. I agree that a
 cost is involved here, however I think it's up to the user to decide which
 cost is more expensive to them - getting the data, or not being able to see
 how busy their disks are. My feeling here is that this support could be user
 configurable at run time - eg 'cat 1  /proc/getdiskperf'.

Hi,

I disagree with this runtime variable. It is unnecessary complexity.
Maintaining a few counts is total noise compared with the time I/O takes.

Cheers
Chris



-- 
SLUG - Sydney Linux User Group Mailing List - http://slug.org.au/
More Info: http://slug.org.au/lists/listinfo/slug



Re: Linux Disk Performance/File IO per process

2001-01-28 Thread Chris Evans


On Mon, 29 Jan 2001 [EMAIL PROTECTED] wrote:

> All,
>
> I work for a company that develops a systems and performance management
> product for Unix (as well as PC and TANDEM) called PROGNOSIS. Currently we
> support AIX, HP, Solaris, UnixWare, IRIX, and Linux.
>
> I've hit a bit of a wall trying to expand the data provided by our Linux
> solution - I can't seem to find anywhere that provides the metrics needed to
> calculate disk busy in the kernel! This is a major piece of information that
> any mission critical system administrator needs to successfully monitor
> their systems.

Stephen Tweedie has a rather funky i/o stats enhancement patch which
should provide what you need. It comes with RedHat7.0 and gives decent
disk statistics in /proc/partitions.

Unfortunately this patch is not yet in the 2.2 or 2.4 kernel. I'd like to
see it make the kernel as a 2.4.x item. Failing that, it'll probably make
the 2.5 kernel.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Linux Disk Performance/File IO per process

2001-01-28 Thread Chris Evans


On Mon, 29 Jan 2001 [EMAIL PROTECTED] wrote:

 All,

 I work for a company that develops a systems and performance management
 product for Unix (as well as PC and TANDEM) called PROGNOSIS. Currently we
 support AIX, HP, Solaris, UnixWare, IRIX, and Linux.

 I've hit a bit of a wall trying to expand the data provided by our Linux
 solution - I can't seem to find anywhere that provides the metrics needed to
 calculate disk busy in the kernel! This is a major piece of information that
 any mission critical system administrator needs to successfully monitor
 their systems.

Stephen Tweedie has a rather funky i/o stats enhancement patch which
should provide what you need. It comes with RedHat7.0 and gives decent
disk statistics in /proc/partitions.

Unfortunately this patch is not yet in the 2.2 or 2.4 kernel. I'd like to
see it make the kernel as a 2.4.x item. Failing that, it'll probably make
the 2.5 kernel.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.2, 2.4 bug in sock_no_fcntl()/F_SETOWN?

2001-01-24 Thread Chris Evans


Hi,

Looking at the code for sock_no_fcntl() in net/core.c, I cannot specify
"0" as a value for F_SETOWN, unless I'm the superuser. I believe this to
be a bug, it stops de-registering an interest in SIGURG signals. Let me
know if you want a patch.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.2, 2.4 bug in sock_no_fcntl()/F_SETOWN?

2001-01-24 Thread Chris Evans


Hi,

Looking at the code for sock_no_fcntl() in net/core.c, I cannot specify
"0" as a value for F_SETOWN, unless I'm the superuser. I believe this to
be a bug, it stops de-registering an interest in SIGURG signals. Let me
know if you want a patch.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread Chris Evans


On Tue, 9 Jan 2001, Ingo Molnar wrote:

> This is one of the busiest and most complex block-IO Linux systems i've
> ever seen, this is why i quoted it - the talk was about block-IO
> performance, and Stephen said that our block IO sucks. It used to suck,
> but in 2.4, with the right patch from Jens, it doesnt suck anymore. )

Is this "right patch from Jens" on the radar for 2.4 inclusion?

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread Chris Evans


On Tue, 9 Jan 2001, Ingo Molnar wrote:

 This is one of the busiest and most complex block-IO Linux systems i've
 ever seen, this is why i quoted it - the talk was about block-IO
 performance, and Stephen said that our block IO sucks. It used to suck,
 but in 2.4, with the right patch from Jens, it doesnt suck anymore. )

Is this "right patch from Jens" on the radar for 2.4 inclusion?

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.2 vs. 2.4 benchmarks

2001-01-08 Thread Chris Evans


Hi,

I ran some 2.2 vs. 2.4 benchmarks, particularly in the area of file i/o,
using bonnie++.

The machine is a SMP 128Mb PII-350 with a udma2 drive capable of some
20Mb/sec+. Kernels involved are 2.4.0, and the default RH7.0 kernel
(2.2.16 plus more patches than you can shake a stick at).

Not going too much into the gory details, here are the differences exposed
between 2,2 and 2.4:

1) Amazing 2.4 increase in streaming write performance; 13Mb/sec ->
20Mb/sec. I suspect this is the result of the "last minute" 2.4.0 dirty
buffer/sync waiting handling changes.

2) Slight 2.4 increase in streaming read performance; 16Mb/sec ->
17Mb/sec. This leaves 2.4.0 writing faster than reading, I find that
surprising.

3) Some 10% drop in rewrite performance from 2.2 -> 2.4 (possibly because
page aging, like LRU, isn't too hot for the 2nd+ linear scan over data)

4) File creation 30% faster in 2.4; random deletes 30% faster; sequential
deletes 10% slower.


I did one other quick test, with disappointing results for 2.4.0. I did a
kernel build with 32Mb.

2.4.0 was taking about 10 mins to do the build. 2.2.x was 1min30 quicker
:( I was hoping/expecting the 2.4.0 page aging to do better, due to
keeping the more useful pages in RAM better. I have no explanation.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.2 vs. 2.4 benchmarks

2001-01-08 Thread Chris Evans


Hi,

I ran some 2.2 vs. 2.4 benchmarks, particularly in the area of file i/o,
using bonnie++.

The machine is a SMP 128Mb PII-350 with a udma2 drive capable of some
20Mb/sec+. Kernels involved are 2.4.0, and the default RH7.0 kernel
(2.2.16 plus more patches than you can shake a stick at).

Not going too much into the gory details, here are the differences exposed
between 2,2 and 2.4:

1) Amazing 2.4 increase in streaming write performance; 13Mb/sec -
20Mb/sec. I suspect this is the result of the "last minute" 2.4.0 dirty
buffer/sync waiting handling changes.

2) Slight 2.4 increase in streaming read performance; 16Mb/sec -
17Mb/sec. This leaves 2.4.0 writing faster than reading, I find that
surprising.

3) Some 10% drop in rewrite performance from 2.2 - 2.4 (possibly because
page aging, like LRU, isn't too hot for the 2nd+ linear scan over data)

4) File creation 30% faster in 2.4; random deletes 30% faster; sequential
deletes 10% slower.


I did one other quick test, with disappointing results for 2.4.0. I did a
kernel build with 32Mb.

2.4.0 was taking about 10 mins to do the build. 2.2.x was 1min30 quicker
:( I was hoping/expecting the 2.4.0 page aging to do better, due to
keeping the more useful pages in RAM better. I have no explanation.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: reiserfs patch for 2.4.0-final

2001-01-05 Thread Chris Evans


On Fri, 5 Jan 2001, Chris Mason wrote:

> > Could someone create one single patch for the 2.4.0 ?
> >
> I put all the code into CVS, and Yura is making the official patch now.

Since 2.4.0 final should fix a few i/o performance issues (particuarly
under heavy write loads), a quick few ext2 vs. reiserfs benchmarks would
make very interesting reading ;-)

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] dcache 2nd chance replacement

2001-01-05 Thread Chris Evans


On Thu, 4 Jan 2001, Alan Cox wrote:

> > On Thu, Jan 04, 2001 at 02:59:49PM -0200, Rik van Riel wrote:
> > > Unfortunately you seem to ignore my arguments, so lets
> > I've not ignored them, as said they were either obviously wrong of offtopic.
>
> Would the two of you ajourn this debate to alt.flame

Better still stop _theorizing_ and start _measuring_

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] dcache 2nd chance replacement

2001-01-05 Thread Chris Evans


On Thu, 4 Jan 2001, Alan Cox wrote:

  On Thu, Jan 04, 2001 at 02:59:49PM -0200, Rik van Riel wrote:
   Unfortunately you seem to ignore my arguments, so lets
  I've not ignored them, as said they were either obviously wrong of offtopic.

 Would the two of you ajourn this debate to alt.flame

Better still stop _theorizing_ and start _measuring_

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: reiserfs patch for 2.4.0-final

2001-01-05 Thread Chris Evans


On Fri, 5 Jan 2001, Chris Mason wrote:

  Could someone create one single patch for the 2.4.0 ?
 
 I put all the code into CVS, and Yura is making the official patch now.

Since 2.4.0 final should fix a few i/o performance issues (particuarly
under heavy write loads), a quick few ext2 vs. reiserfs benchmarks would
make very interesting reading ;-)

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre7...

2000-12-31 Thread Chris Evans


On Sat, 30 Dec 2000, Linus Torvalds wrote:

> On Sat, 30 Dec 2000, Steven Cole wrote:
> >
> > It looks like 2.4.0-test13-pre7 is a clear winner when running dbench 48
> > on my somewhat slow test machine (450 Mhz P-III, 192MB, IDE).
>
> This is almost certainly purely due to changing (some would say "fixing")
> the bdflush synchronous wait point.

Nice:)

Did Rik's drop_behind performance fix make it in or can we look forward to
another jump in the dbench benchmarks?

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre7...

2000-12-31 Thread Chris Evans


On Sat, 30 Dec 2000, Linus Torvalds wrote:

 On Sat, 30 Dec 2000, Steven Cole wrote:
 
  It looks like 2.4.0-test13-pre7 is a clear winner when running dbench 48
  on my somewhat slow test machine (450 Mhz P-III, 192MB, IDE).

 This is almost certainly purely due to changing (some would say "fixing")
 the bdflush synchronous wait point.

Nice:)

Did Rik's drop_behind performance fix make it in or can we look forward to
another jump in the dbench benchmarks?

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Modprobe local root exploit

2000-11-14 Thread Chris Evans


On Tue, 14 Nov 2000, Jakub Jelinek wrote:

> > Rather than add sanity checking to modprobe, it would be a lot easier
> > and safer from a security audit point of view to have the kernel call
> > /sbin/kmodprobe instead of /sbin/modprobe. Then kmodprobe can sanitise
> > all the data and exec the real modprobe. That way the only thing that
> > needs auditing is a string munging/sanitising program.
>
> Well, no matter what kernel needs auditing as well, the fact that dev_load
> will without any check load any module the user wants is already problematic
> and no munging helps with it at all, especially loading old ISA drivers
> might not be a good idea.

FWIW: A quick look at the kernel source, and dev_load() seems to be the
only place that does this. Other places apply prefixes to user supplied
names.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Modprobe local root exploit

2000-11-14 Thread Chris Evans


On Tue, 14 Nov 2000, Jakub Jelinek wrote:

  Rather than add sanity checking to modprobe, it would be a lot easier
  and safer from a security audit point of view to have the kernel call
  /sbin/kmodprobe instead of /sbin/modprobe. Then kmodprobe can sanitise
  all the data and exec the real modprobe. That way the only thing that
  needs auditing is a string munging/sanitising program.

 Well, no matter what kernel needs auditing as well, the fact that dev_load
 will without any check load any module the user wants is already problematic
 and no munging helps with it at all, especially loading old ISA drivers
 might not be a good idea.

FWIW: A quick look at the kernel source, and dev_load() seems to be the
only place that does this. Other places apply prefixes to user supplied
names.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Modprobe local root exploit

2000-11-13 Thread Chris Evans


On Mon, 13 Nov 2000, Torsten Duwe wrote:

> > "Francis" == Francis Galiegue <[EMAIL PROTECTED]> writes:
> 
> >> + if ((*p & 0xdf) >= 'a' && (*p & 0xdf) <= 'z') continue;
> 
> Francis> Just in case... Some modules have uppercase letters too :)
> 
> That's what the &0xdf is intended for...

Code in a security sensitive area needs to be crystal clear.

What's wrong with isalnum() ?

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Modprobe local root exploit

2000-11-13 Thread Chris Evans


On Mon, 13 Nov 2000, Torsten Duwe wrote:

  "Francis" == Francis Galiegue [EMAIL PROTECTED] writes:
 
  + if ((*p  0xdf) = 'a'  (*p  0xdf) = 'z') continue;
 
 Francis Just in case... Some modules have uppercase letters too :)
 
 That's what the 0xdf is intended for...

Code in a security sensitive area needs to be crystal clear.

What's wrong with isalnum() ?

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.2.18Pre Lan Performance Rocks!

2000-10-30 Thread Chris Evans


On Mon, 30 Oct 2000, Andrea Arcangeli wrote:

> functionality that needs high performance completly in kernel? People
> may need to write high performance network code for custom protocols,
> this way they will end creating kernel modules with system-crashing
> bugs, memory leaks and kernel buffer overflows (chroot+nobody+logging
> won't work anymore). (plus they will get into pain while debugging)

I'm glad _someone_ is connected to reality with regards the security
implications of throwing loads of servers into kernel space.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.2.18Pre Lan Performance Rocks!

2000-10-30 Thread Chris Evans


On Mon, 30 Oct 2000, Andrea Arcangeli wrote:

 functionality that needs high performance completly in kernel? People
 may need to write high performance network code for custom protocols,
 this way they will end creating kernel modules with system-crashing
 bugs, memory leaks and kernel buffer overflows (chroot+nobody+logging
 won't work anymore). (plus they will get into pain while debugging)

I'm glad _someone_ is connected to reality with regards the security
implications of throwing loads of servers into kernel space.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: LMbench 2.4.0-test10pre-SMP vs. 2.2.18pre-SMP

2000-10-23 Thread Chris Evans


On Mon, 23 Oct 2000, Jeff Garzik wrote:

> First test was with 2.4.0-test10-pre3.
> Next four tests were with 2.4.0-test10-pre4.
> Final four tests were with 2.2.18-pre17.
> 
> All are 'virgin' kernels, without any patches.

[...]

I'll take the liberty of highlighting some big changes, v2.2 vs v2.4

*Local* Communication latencies in microseconds - smaller is better
---
Host OS 2p/0K  Pipe AF UDP  RPC/   TCP  RPC/ TCP
ctxsw   UNIX UDP TCP conn
- - - -  - - - - 
rum.normn Linux 2.4.0-t 620   4563   10681   157  146
rum.normn Linux 2.2.18p 212   1856   123   106   159  237

- So we broke pipe/AF UNIX latencies

File & VM system latencies in microseconds - smaller is better
--
Host OS   0K File  10K File  MmapProtPage   
Create Delete Create Delete  Latency Fault   Fault 
- - -- -- -- --  --- -   - 
rum.normn Linux 2.4.0-t 15  1 28  3 1016 10.0K
rum.normn Linux 2.2.18p 16  1 29  2 7658 20.6K

- But gave steroids to mmap latencies

*Local* Communication bandwidths in MB/s - bigger is better
---
HostOS  Pipe AFTCP  File   Mmap  Bcopy  Bcopy  Mem Mem
 UNIX  reread reread (libc) (hand) read write
- -    -- -- -- --  -
rum.normn Linux 2.4.0-t  152  105   98151326138144  326 171
rum.normn Linux 2.2.18p  264  106   55152326137142  326 180

- Mixed fortunes here. A serious boost to TCP bandwidth but pipe bandwidth
dies a bit


Cheers
Chris


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch(?)] question wrt context switching during disk i/o

2000-10-21 Thread Chris Evans


On Sat, 21 Oct 2000, Bill Wendling wrote:

> } > bdflush is broken in current kernels.  I posted to linux-mm about this,
> } > but Rik et al haven't shown any interest.  I normally see bursts of 
> } > up to around 40K cs/second when doing writes; I hacked a little 
> } > premption counter into the kernel and verified that they're practially
> } > all bdflush...
> } 
> There's some strangness in bdflush(). The comment says:
> 
> /*
>  * If there are still a lot of dirty buffers around,
>  * skip the sleep and flush some more. Otherwise, we
>  * go to sleep waiting a wakeup.
>  */
> if (!flushed || balance_dirty_state(NODEV) < 0) {
> run_task_queue(_disk);
> schedule();
> }

Speaking of bdflush brokenness, I was trying to tune it using
/proc/sys/vm/bdflush. I was trying to eliminate the bursty write behaviour
Linux always seems to have had (exhibited by e.g. find /).

Unfortunately, different /proc/sys/vm/bdflush settings didn't seem to have
much (if any) effect. Is this another case of /proc/sys/vm/* settings
being ignored? If so they should be removed.

I was hoping to get a steady trickle of writes instead of the occasional
mammoth burst.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch(?)] question wrt context switching during disk i/o

2000-10-21 Thread Chris Evans


On Sat, 21 Oct 2000, Bill Wendling wrote:

 }  bdflush is broken in current kernels.  I posted to linux-mm about this,
 }  but Rik et al haven't shown any interest.  I normally see bursts of 
 }  up to around 40K cs/second when doing writes; I hacked a little 
 }  premption counter into the kernel and verified that they're practially
 }  all bdflush...
 } 
 There's some strangness in bdflush(). The comment says:
 
 /*
  * If there are still a lot of dirty buffers around,
  * skip the sleep and flush some more. Otherwise, we
  * go to sleep waiting a wakeup.
  */
 if (!flushed || balance_dirty_state(NODEV)  0) {
 run_task_queue(tq_disk);
 schedule();
 }

Speaking of bdflush brokenness, I was trying to tune it using
/proc/sys/vm/bdflush. I was trying to eliminate the bursty write behaviour
Linux always seems to have had (exhibited by e.g. find /).

Unfortunately, different /proc/sys/vm/bdflush settings didn't seem to have
much (if any) effect. Is this another case of /proc/sys/vm/* settings
being ignored? If so they should be removed.

I was hoping to get a steady trickle of writes instead of the occasional
mammoth burst.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[FIXED] Re: 2.4.0test-9: IDE problems

2000-10-11 Thread Chris Evans


On Wed, 11 Oct 2000, Alan Cox wrote:

> > > have the rules for testing if the driver/host/device register and report
> > > that all signals are valid and stable.
> > 
> > Yes, I had some "interesting" modifications to a lot of my /usr when I
> > tried to activate UDMA4 under RH7.0 (I don't believe my hardware is
> > capable of UDMA4!)
> 
> The 2.2 kernel we ship doesnt have the ide patches either so Im not suprised
> it got upset 8)

OK, so in case anyone is tracking open issues, this was "pilot error". My
motherboard only does ATA33 (UDMA2). It just happens to work under ATA44
(UDMA3).

Since ATA44 is out of my machine's spec, and could corrupt data, the 2.4
kernel is correct to reject attempts to set UDMA3.[1]

Cheers
Chris

[1] But if you're mad, you can still boot with idex=ata66 and force the
issue.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Raw i/o usage wrecks block device performance??

2000-10-11 Thread Chris Evans


Hi,

Here's a very strange (and repeatable) result. Affects 2.2.x + raw device
patches (i.e. RH7.0). Also had a similar effect on 2.4.0test9!

The problem is best described with a little sequence. After using raw i/o
facilities, streamed block device reads from the same underlying device
exhibit much poorer performance than before the raw i/o.

Example

[root@localhost /root]# hdparm -t /dev/hda

/dev/hda:
 Timing buffered disk reads:  64 MB in  3.81 seconds = 16.80 MB/sec
[root@localhost /root]# time dd if=/dev/raw/raw1 of=/dev/null bs=1024k
count=64
64+0 records in
64+0 records out

real0m2.990s
user0m0.010s
sys 0m0.450s
[root@localhost /root]# hdparm -t /dev/hda

/dev/hda:
 Timing buffered disk reads:  64 MB in  6.12 seconds = 10.46 MB/sec


The read figures before and after the raw i/o are repeatable with only
little jitter.

Raw device reads are consistent and not affected by this phenomena.

Anyone know what's going on? Looks like a bug or inefficiency somewhere in
the kernel.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[FIXED] Re: 2.4.0test-9: IDE problems

2000-10-11 Thread Chris Evans


On Wed, 11 Oct 2000, Alan Cox wrote:

   have the rules for testing if the driver/host/device register and report
   that all signals are valid and stable.
  
  Yes, I had some "interesting" modifications to a lot of my /usr when I
  tried to activate UDMA4 under RH7.0 (I don't believe my hardware is
  capable of UDMA4!)
 
 The 2.2 kernel we ship doesnt have the ide patches either so Im not suprised
 it got upset 8)

OK, so in case anyone is tracking open issues, this was "pilot error". My
motherboard only does ATA33 (UDMA2). It just happens to work under ATA44
(UDMA3).

Since ATA44 is out of my machine's spec, and could corrupt data, the 2.4
kernel is correct to reject attempts to set UDMA3.[1]

Cheers
Chris

[1] But if you're mad, you can still boot with idex=ata66 and force the
issue.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.0test9 vm: disappointing streaming i/o under load

2000-10-10 Thread Chris Evans


Hi,

Finally got round to checking out 2.4.0test9.

Unfortunately, 2.4.0test9 exhibits poor streaming i/o performance when
under a bit of memory pressure.

The test is this: boot with mem=32M, log onto GNOME and start xmms playing
a big .wav ripped from a CD (this requires 100-200k read i/o per second).

Then, I start then kill netscape. I then started a find / and started
gnumeric firing up at the same time.

Results
===

2.2 RH7.0: the music skipped maybe twice briefly during the test.

2.4.0test9: music stuttered repeatedly while netscape started. Worse, when
firing up gnumeric with the find / on the go, there were big pauses in
sound output. On pause was over 5 seconds!!!


So not so hot.

Could this perhaps be related to the drop_behind magic penalizing
streaming i/o pages too much? Perhaps the greater ago on the i/o pages
means that when there is a little memory pressure, they are getting thrown
out the page cache before the app (xmms) gets a chance to use them!

Might it be useful for me to try pre10-1, I note it has more "balancing
fixes".

Cheers
Chris


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0test-9: IDE problems

2000-10-10 Thread Chris Evans


On Wed, 11 Oct 2000, Alan Cox wrote:

> The 2.2 kernel we ship doesnt have the ide patches either so Im not suprised
> it got upset 8)

Ah yes you're correct. I saw the patch in the kernel SRPM but didn't look
far enough to see:

...
# IDE patch provides UDMA66 support, but is known to corrupt filesystems
# on a few systems, so is not applied by default.
Patch151: linux-2.2.16-ide-2805.patch
...
# Dangerous IDE patch available but off by default
#%patch151 -p1
...

Still, with hdparm -d1 -X67, I can presumably get UDMA3 and good 2.2
speeds (17Mb/s, or 21Mb/s with rawio) without this patch.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0test-9: IDE problems

2000-10-10 Thread Chris Evans


On Wed, 11 Oct 2000, Alan Cox wrote:

> > hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> > hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
> 
> Bad CRC is a cable error. That could be misconfiguration but could also be
> crap cables

It went away when I enabled PIIX4 support + PIIX4 tuning support.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0test-9: IDE problems

2000-10-10 Thread Chris Evans


On Tue, 10 Oct 2000, Andre Hedrick wrote:

> Also set this option "CONFIG_IDEDMA_IVB" because you are in the
> transistion period of drive manufacturing.

Turned that on, applied the patch. BTW, your patch seems to make the
"Speed warnings" failure _more_ likely??

Still refuses to activate UDMA3. 11.5Mb/sec vs. 17Mb/sec in 2.2. Is my
hardware trying to tell me that 2.2 shouldn't be allowing me to run with
UDMA3? It's rock solid and yes I've given it a pounding.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0test-9: IDE problems

2000-10-10 Thread Chris Evans


On Tue, 10 Oct 2000, Andre Hedrick wrote:

> Basically you have drive that caught in the word93 rules change.
> 
> However, the error you got were real and the kernel did properly respeed
> the drive to one step slower.  The problem above prevented you from going
> from ATA66 to ATA44, thus you fell to ATA33.
> 
> You RHS 7.0 kernel does not have all the fallback and rules testing to
> keep things running the very best and in the safest way.  Also you do not
> have the rules for testing if the driver/host/device register and report
> that all signals are valid and stable.

Yes, I had some "interesting" modifications to a lot of my /usr when I
tried to activate UDMA4 under RH7.0 (I don't believe my hardware is
capable of UDMA4!)

> If you did not set TUNING option if the chipset has it specifically
> flagged then you will not be able to retune the chipset/drive and the IO
> will be out of sync.

Shortly after my first post, I noticed and activated the Intel PIIX4
support + tuning. This got rid of the nasty errors but didn't get my
17Mb/sec.

Trying your patch now.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.0test-9: IDE problems

2000-10-10 Thread Chris Evans


Hi,

Finally got around to trying out 2.4.0test9. I'm going to do some VM
performance comparisons (incidentally because VM should be a carefully
measured science not random cool idea of the day which we have seen too
much of recently).

Unfortunately, I can't start fair tests yet because UDMA3 refuses to
activate in 2.4.0test-9.

I get these messages on boot

ide0: Speed warnings UDMA 3/4/5 is not functional.
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide0: reset: success
hda: DMA disabled

Subsequently, if I run
hdparm -d1 -X67 /dev/hda

I get told
ide0: Speed warnings UDMA 3/4/5 is not functional.

And this leaves me with
/dev/hda:
 Timing buffered disk reads:  64 MB in  5.71 seconds = 11.21 MB/sec

Under the stock 2.2 RedHat 7.0 kernel, the same hdparm tuning gives me
about 17Mb/s.

Anyone got any hints? I selected the ATA option. Might this be causing the
failure?

Anyone got any hints?

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.0test-9: IDE problems

2000-10-10 Thread Chris Evans


Hi,

Finally got around to trying out 2.4.0test9. I'm going to do some VM
performance comparisons (incidentally because VM should be a carefully
measured science not random cool idea of the day which we have seen too
much of recently).

Unfortunately, I can't start fair tests yet because UDMA3 refuses to
activate in 2.4.0test-9.

I get these messages on boot

ide0: Speed warnings UDMA 3/4/5 is not functional.
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide0: reset: success
hda: DMA disabled

Subsequently, if I run
hdparm -d1 -X67 /dev/hda

I get told
ide0: Speed warnings UDMA 3/4/5 is not functional.

And this leaves me with
/dev/hda:
 Timing buffered disk reads:  64 MB in  5.71 seconds = 11.21 MB/sec

Under the stock 2.2 RedHat 7.0 kernel, the same hdparm tuning gives me
about 17Mb/s.

Anyone got any hints? I selected the ATA option. Might this be causing the
failure?

Anyone got any hints?

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0test-9: IDE problems

2000-10-10 Thread Chris Evans


On Tue, 10 Oct 2000, Andre Hedrick wrote:

 Basically you have drive that caught in the word93 rules change.
 
 However, the error you got were real and the kernel did properly respeed
 the drive to one step slower.  The problem above prevented you from going
 from ATA66 to ATA44, thus you fell to ATA33.
 
 You RHS 7.0 kernel does not have all the fallback and rules testing to
 keep things running the very best and in the safest way.  Also you do not
 have the rules for testing if the driver/host/device register and report
 that all signals are valid and stable.

Yes, I had some "interesting" modifications to a lot of my /usr when I
tried to activate UDMA4 under RH7.0 (I don't believe my hardware is
capable of UDMA4!)

 If you did not set TUNING option if the chipset has it specifically
 flagged then you will not be able to retune the chipset/drive and the IO
 will be out of sync.

Shortly after my first post, I noticed and activated the Intel PIIX4
support + tuning. This got rid of the nasty errors but didn't get my
17Mb/sec.

Trying your patch now.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0test-9: IDE problems

2000-10-10 Thread Chris Evans


On Tue, 10 Oct 2000, Andre Hedrick wrote:

 Also set this option "CONFIG_IDEDMA_IVB" because you are in the
 transistion period of drive manufacturing.

Turned that on, applied the patch. BTW, your patch seems to make the
"Speed warnings" failure _more_ likely??

Still refuses to activate UDMA3. 11.5Mb/sec vs. 17Mb/sec in 2.2. Is my
hardware trying to tell me that 2.2 shouldn't be allowing me to run with
UDMA3? It's rock solid and yes I've given it a pounding.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0test-9: IDE problems

2000-10-10 Thread Chris Evans


On Wed, 11 Oct 2000, Alan Cox wrote:

  hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
  hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
 
 Bad CRC is a cable error. That could be misconfiguration but could also be
 crap cables

It went away when I enabled PIIX4 support + PIIX4 tuning support.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0test-9: IDE problems

2000-10-10 Thread Chris Evans


On Wed, 11 Oct 2000, Alan Cox wrote:

 The 2.2 kernel we ship doesnt have the ide patches either so Im not suprised
 it got upset 8)

Ah yes you're correct. I saw the patch in the kernel SRPM but didn't look
far enough to see:

...
# IDE patch provides UDMA66 support, but is known to corrupt filesystems
# on a few systems, so is not applied by default.
Patch151: linux-2.2.16-ide-2805.patch
...
# Dangerous IDE patch available but off by default
#%patch151 -p1
...

Still, with hdparm -d1 -X67, I can presumably get UDMA3 and good 2.2
speeds (17Mb/s, or 21Mb/s with rawio) without this patch.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.0test9 vm: disappointing streaming i/o under load

2000-10-10 Thread Chris Evans


Hi,

Finally got round to checking out 2.4.0test9.

Unfortunately, 2.4.0test9 exhibits poor streaming i/o performance when
under a bit of memory pressure.

The test is this: boot with mem=32M, log onto GNOME and start xmms playing
a big .wav ripped from a CD (this requires 100-200k read i/o per second).

Then, I start then kill netscape. I then started a find / and started
gnumeric firing up at the same time.

Results
===

2.2 RH7.0: the music skipped maybe twice briefly during the test.

2.4.0test9: music stuttered repeatedly while netscape started. Worse, when
firing up gnumeric with the find / on the go, there were big pauses in
sound output. On pause was over 5 seconds!!!


So not so hot.

Could this perhaps be related to the drop_behind magic penalizing
streaming i/o pages too much? Perhaps the greater ago on the i/o pages
means that when there is a little memory pressure, they are getting thrown
out the page cache before the app (xmms) gets a chance to use them!

Might it be useful for me to try pre10-1, I note it has more "balancing
fixes".

Cheers
Chris


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: VM in v2.4.0test9

2000-10-04 Thread Chris Evans


On Wed, 4 Oct 2000, Rik van Riel wrote:

> Handling out-of-memory in a clean and predictable way is the
> next thing on the feature list. I'll add it RSN (I'm reasonably
> sure now that the current VM features are stable ... time for
> OOM handling).

Stable is good. But before moving on, wouldn't it be nice to have some
test8 vs. test9 vs. 2.2.14 (or so) benchmarks, to confirm it was worth the
pain of a whole pre-patch series weeding out deadlocks?

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: VM in v2.4.0test9

2000-10-04 Thread Chris Evans


On Wed, 4 Oct 2000, Rik van Riel wrote:

 Handling out-of-memory in a clean and predictable way is the
 next thing on the feature list. I'll add it RSN (I'm reasonably
 sure now that the current VM features are stable ... time for
 OOM handling).

Stable is good. But before moving on, wouldn't it be nice to have some
test8 vs. test9 vs. 2.2.14 (or so) benchmarks, to confirm it was worth the
pain of a whole pre-patch series weeding out deadlocks?

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



  1   2   >