date:20010109

Re: [OT]: DRI doesn't work on 2.4.0 but does on prerelease-ac5

2001-01-09 Thread Alan Olsen

On Mon, 8 Jan 2001, J Sloan wrote:

> This is a little OT for linux-kernel, but I'll take a swing at it
> since I'm running 2.4 and Xfree 4 with a voodoo 3.
> 
> After upgrading to Red Hat 7.0, I noticed 3D screensavers
> and Quake 3 Arena were dog slow - in the end, I basically
> had to make sure the mesa libs didn't get found before the
> real opengl libs.
> 
> In my case, that meant nuking mesa from my system and
> letting Linux use what was left, which got me back the good
> accelerated performance - you may choose a less drastic
> option. I don't see any breakage from the absence of mesa.

Sounds like the version you blew away was not the one built in 4.0.2.
(Mesa is built along with XFree86 now, not as an add-on.)

I will test with my current configuration and see if I can duplicate the
slow down.

I am currently using a Matrox G400 max card with 4.0.2cvs.  I get about
1285 frames per second on the gears demo currently. We will see if that
changes with the 2.4.0 final release version.

[EMAIL PROTECTED] | Note to AOL users: for a quick shortcut to reply
Alan Olsen| to my mail, just hit the ctrl, alt and del keys.
"In the future, everything will have its 15 minutes of blame."

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.2 vs. 2.4 benchmarks

2001-01-09 Thread Helge Hafting


Chris Evans wrote:
> 
> Hi,
> 
> I ran some 2.2 vs. 2.4 benchmarks, particularly in the area of file i/o,
> using bonnie++.
> 
> The machine is a SMP 128Mb PII-350 with a udma2 drive capable of some
> 20Mb/sec+. Kernels involved are 2.4.0, and the default RH7.0 kernel
> (2.2.16 plus more patches than you can shake a stick at).
> 
> Not going too much into the gory details, here are the differences exposed
> between 2,2 and 2.4:
> 
> 1) Amazing 2.4 increase in streaming write performance; 13Mb/sec ->
> 20Mb/sec. I suspect this is the result of the "last minute" 2.4.0 dirty
> buffer/sync waiting handling changes.
> 
> 2) Slight 2.4 increase in streaming read performance; 16Mb/sec ->
> 17Mb/sec. This leaves 2.4.0 writing faster than reading, I find that
> surprising.
>
I am not surprised.  Reading _have_
to read the stuff before presenting a result.  So you are completely
bound by
IO waiting, unless the stuff is cached.  But test-programs tend to
empty the cache first.  Writes can be buffered partially even if the
testfile is
much larger than memory.  The extra 3Mb/s might be going into RAM. 
Filling
128M with 3M/s takes about 43s.  20M/s in 43s is about 850M.  Did you
use a
testfile in the 500MB-1000MB range?
 
> 3) Some 10% drop in rewrite performance from 2.2 -> 2.4 (possibly because
> page aging, like LRU, isn't too hot for the 2nd+ linear scan over data)
> 
> 4) File creation 30% faster in 2.4; random deletes 30% faster; sequential
> deletes 10% slower.
> 
> I did one other quick test, with disappointing results for 2.4.0. I did a
> kernel build with 32Mb.
> 
> 2.4.0 was taking about 10 mins to do the build. 2.2.x was 1min30 quicker
> :( I was hoping/expecting the 2.4.0 page aging to do better, due to
> keeping the more useful pages in RAM better. I have no explanation.

You built exactly the same kernel in each case? (Version and options)
 With the same amount of other software (X, daemons,...)  running?  
Using the same source tree?  (Different disk locations may have large
speed
differences)  The circumstances where the same?  (Doing a make
[dist]clean
in order to get rid of files from the previous build will cache the
directory contents and be unfair if it happened in only one of the
cases.)

Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Failure building 2.4 while running 2.4. Success in building 2.4 while running 2.2.

2001-01-09 Thread Silviu Marin-Caea


I have RedHat7, glibc-2.2-9, gcc-2.96-69.

I can build 2.4.0 while running kernel 2.2.16.

If I try to rebuild 2.4.0 while running the new kernel, I get random
compiler errors.

It happens on two machines.  One of them runs 2.4.0-test12, the other
2.4.0.  Both of them with the updates above mentioned.

I know this is a RedHat issue, but it may be useful to know for some.

-- 
Systems and Network Administrator - Delta Romania
Phone +4093-267961
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

2.4.0-vmbigpatch compile problem

2001-01-09 Thread Boszormenyi Zoltan


Hi!

PF_RSSTRIM is not declared anywhere either in the linux-2.4.0 sources
or in the 2.4.0-vmbigpatch.

Regards,
Zoltan Boszormenyi


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] hisax/sportster dependency error

2001-01-09 Thread Daniel Stodden



hi.

Alan Cox <[EMAIL PROTECTED]> writes:

> > > according to sportster.c:get_io_range, this appears to be perfectly
> > > intentional, request_regioning 64x8 byte from 0x268 in 1024byte-steps.
> > 
> > AFAIK, this is because the hardware is stupid and does decode the higher
> > address lines. Therefore, the IO ports are mirrored every 1024 bytes and
> > should be reserved to avoid potential conflicts with other devices.
> 
> Almost every 10bit decode ISA card is like that. You don't need to do the
> work. The PCI alloc rules already cover it.

so, if i understand this correctly, since all offsets actually in use
are 1024B multiples the following would be sufficient, or more elegant..?

or should 
#define  SPORTSTER_ISAC 0xC000
#define  SPORTSTER_HSCXA0x
#define  SPORTSTER_HSCXB0x4000
#define  SPORTSTER_RES_IRQ  0x8000
still get requested explicitly in such cases?


--- linux-2.4/drivers/isdn/hisax/sportster.c.orig   Tue Jan  9 09:31:36 2001
+++ linux-2.4/drivers/isdn/hisax/sportster.cTue Jan  9 09:54:18 2001
@@ -133,13 +133,10 @@
 void
 release_io_sportster(struct IsdnCardState *cs)
 {
-   int i, adr;
 
byteout(cs->hw.spt.cfg_reg + SPORTSTER_RES_IRQ, 0);
-   for (i=0; i<64; i++) {
-   adr = cs->hw.spt.cfg_reg + i *1024;
-   release_region(adr, 8);
-   }
+
+   release_region(cs->hw.spt.cfg_reg, 8);
 }
 
 void
@@ -185,27 +182,18 @@
 static int __init
 get_io_range(struct IsdnCardState *cs)
 {
-   int i, j, adr;
+   int adr = cs->hw.spt.cfg_reg;
+
+   if ( check_region(adr, 8) ) {
+   printk(KERN_WARNING
+  "HiSax: %s config port %x-%x already in use\n",
+  CardType[cs->typ], adr, adr + 8);
+   return 0;
+   } 

-   for (i=0;i<64;i++) {
-   adr = cs->hw.spt.cfg_reg + i *1024;
-   if (check_region(adr, 8)) {
-   printk(KERN_WARNING
-   "HiSax: %s config port %x-%x already in use\n",
-   CardType[cs->typ], adr, adr + 8);
-   break;
-   } else
-   request_region(adr, 8, "sportster");
-   }
-   if (i==64)
-   return(1);
-   else {
-   for (j=0; jhw.spt.cfg_reg + j *1024;
-   release_region(adr, 8);
-   }
-   return(0);
-   }
+   request_region(adr, 8, "sportster");
+
+   return 1;
 }
 
 int __init


best regards,
dns

-- 
___
 mailto:[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Journaling: Surviving or allowing unclean shutdown?

2001-01-09 Thread Roger Gammans


On Mon, Jan 08, 2001 at 12:02:49PM +, Stephen C. Tweedie wrote:
> Right.  There are two distinct meanings:
> 
> 1) Do not write to this medium, ever (physical readonly); and
> 
> 2) Do not allow modifications to the filesystem (logical readonly).
> 
> The fact is that the kernel confuses the two, but that just isn't
>[snip]
> We just don't have a way of specifying these two things independently.

Is this call for a new mount option?, or should we just
clutter /dev even further with devices with ro permissions as the
marker.

TTFN
-- 
Roger
 Think of the mess on the carpet. Sensible people do all their
 demon-summoning in the garage, which you can just hose down afterwards.
-- [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: kernel network problem ?

2001-01-09 Thread Helge Hafting

Nicolas Noble wrote:
[...]
As others have told already, this is the ECN problem.

> I noticed the same bug. This is very weired, I can send a list of sites
> which I can't connect anymore. 

You have a list?  Send all of them a message stating that they ought
to upgrade their firewalls which cause this problem.  Or they
will loose customers/visitors.  Cisco already have an upgrade for them,
so fixing is dead easy, and they can then boast compatibility with
the latest internet standards.  

If they don't care about linux users, tell them that windows eventually
will use ECN too.  They definitely don't want to have a ECN problem when
that happens.

Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] advansys.c: include missing restore_flags, etc

2001-01-09 Thread Alan Cox


> >  save_flags(flags);
> >  cli();
> > @@ -9965,7 +9972,7 @@
> >  }
> 
> Err, according tho wise ppl on this list, this does not work on
> MIPSes. The flags thing must stay in the same stackframe.

Certainly doesnt work on sparc32, but then it didnt before. Inline it might
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] advansys.c: include missing restore_flags, etc

2001-01-09 Thread Russell King


Arnaldo Carvalho de Melo writes:
>   Please consider applying, comments in the patch.

Can't the following be fixed properly?

> -STATIC int
> +STATIC unsigned long
>  DvcEnterCritical(void)
>  {
> -intflags;
> +unsigned long flags;
>  
>  save_flags(flags);
>  cli();

Guess what happens here?

return flags;

> @@ -9965,7 +9972,7 @@
>  }
>  
>  STATIC void
> -DvcLeaveCritical(int flags)
> +DvcLeaveCritical(unsigned long flags)
>  {
>  restore_flags(flags);
>  }

The above doesn't work on some architectures.  Its better to use a macro
if you want to separate this out.  ie, something like (davem will have to
okay it tho):

#define DvcEnterCritical()  \
 ({ unsigned long __flags; save_flags(__flags); cli(); __flags; })

#define DvcLeaveCritical(flags) \
 do { restore_flags(flags); } while (0)

This should then ensure that you don't end up with problems associated
with register windows on the sparc or whatever.  Even better would be
to use a spinlock instead of Dvc?Critical.
   _
  |_| - ---+---+-
  |   | Russell King[EMAIL PROTECTED]  --- ---
  | | | | http://www.arm.linux.org.uk/personal/aboutme.html   /  /  |
  | +-+-+ --- -+-
  /   |   THE developer of ARM Linux  |+| /|\
 /  | | | ---  |
+-+-+ -  /\\\  |
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] hisax/sportster dependency error

2001-01-09 Thread Alan Cox


> > Almost every 10bit decode ISA card is like that. You don't need to do the
> > work. The PCI alloc rules already cover it.
> 
> so, if i understand this correctly, since all offsets actually in use
> are 1024B multiples the following would be sufficient, or more elegant..?

PCI allocation rules handle all of this. PCI I/O is not allocated in the
ranges 0x[1-F][0-3]xx

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

[testcase] madvise->semaphore deadlock 2.4.0

2001-01-09 Thread Mike Galbraith


Greetings,

While trying to configure ftpsearch, the process hangs while running
it's madvise confidence test below.  It appears to be taking a fault
in madvise_fixup_middle():atomic_add(2, &vma->vm_file->f_count) and
immediately deadlocking forever on mm->mmap_sem per IKD.  (Virgin 2.4.0
agrees)

Accesses to /proc afterward (ie ps) leaves hangers.

kdb> bp sys_madvise
Instruction(i) BP #0 at 0xc0129aa4 (sys_madvise)
is enabled globally adjust 1
kdb> go
Instruction(i) breakpoint #0 at 0xc0129aa4 (adjusted)
0xc0129aa4 sys_madvise:   0xc0129aa4 sys_madviseint3   

Entering kdb (current=0xc4232000, pid 260) due to Breakpoint @ 0xc0129aa4
kdb> bp __down_failed
Instruction(i) BP #1 at 0xc0107c84 (__down_failed)
is enabled globally adjust 1
kdb> go
Instruction(i) breakpoint #1 at 0xc0107c84 (adjusted)
0xc0107c84 __down_failed:   0xc0107c84 __down_failedint3   

Entering kdb (current=0xc4232000, pid 260) due to Breakpoint @ 0xc0107c84
kdb> sr z
SysRq: Suspending trace
kdb> rd
eax = 0x ebx = 0xc4232000 ecx = 0xc7f6963c edx = 0x0010 
esi = 0xc7f69620 edi = 0xc4232000 esp = 0xc4233e58 eip = 0xc0107c84 
ebp = 0xc4233efc xss = 0x0018 xcs = 0x0010 eflags = 0x0296 
xds = 0x0018 xes = 0x0018 origeax = 0x ®s = 0xc4233e24
kdb> bt
EBP   EIP Function(args)
0xc4233efc 0xc0107c84 __down_failed (0xc4232000, 0x2, 0xc0114c00, 0x0, 0x3)
   kernel .text 0xc010 0xc0107c84 0xc0107c9c
   0xc0226571 stext_lock+0x12d
   kernel .text.lock 0xc0226444 0xc0226444 0xc0227840
   0xc0114c77 do_page_fault+0x77 (0xc4233f0c, 0x2, 0xc370bd20, 0x4017, 0x2)
   kernel .text 0xc010 0xc0114c00 0xc0115060
   0xc0109284 error_code+0x34
   kernel .text 0xc010 0xc0109250 0xc010928c
Interrupt registers:
eax = 0x ebx = 0xc370bd20 ecx = 0x4017 edx = 0x0002 
esi = 0xc370bce0 edi = 0xc370bca0 esp = 0xc4233f40 eip = 0xc012964a 
ebp = 0xc4233f50 xss = 0x0018 xcs = 0x0010 eflags = 0x00010202 
xds = 0x0018 xes = 0x0018 origeax = 0x ®s = 0xc4233f0c
   0xc012964a madvise_fixup_middle+0xb6 (0xc370bca0, 0x4016, 0x4017, 
0x0)
   kernel .text 0xc010 0xc0129594 0xc01296fc
0xc4233f74 0xc0129789 madvise_behavior+0x8d (0xc370bca0, 0x4016, 0x4017, 0x0)
   kernel .text 0xc010 0xc01296fc 0xc0129798
0xc4233f90 0xc0129a7d madvise_vma+0x35 (0xc370bca0, 0x4016, 0x4017, 0x0)
   kernel .text 0xc010 0xc0129a48 0xc0129aa4
0xc4233fbc 0xc0129b48 sys_madvise+0xa4 (0x4016, 0x1, 0x0, 0x4000e6d0, 
0xb86c)
   kernel .text 0xc010 0xc0129aa4 0xc0129b94
more> 
   0xc0109154 system_call+0x3c
   kernel .text 0xc010 0xc0109118 0xc0109158
kdb> go
pid 260 starving for fork.c205
pid 260 starving for fork.c205
pid 260 starving for fork.c205

ksymoops 2.3.5 on i686 2.4.0.  Options used
 -V (default)
 -k /proc/ksyms (default)
 -l /proc/modules (default)
 -o /lib/modules/2.4.0/ (default)
 -m /lib/modules/2.4.0/System.map (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Call Trace: [] [] [] [] [] 
[] [] 
   [] [] [] [] [] [] 
[] [] 
Warning (Oops_read): Code line not seen, dumping what data is available

Trace; c010791d <__down+55/9c>
Trace; c0107a68 <__down_failed+8/c>
Trace; c020989d 
Trace; c0112ef4 
Trace; c012b805 <__alloc_pages+dd/2d0>
Trace; c012206c 
Trace; c01220fa 
Trace; c0122260 
Trace; c0113037 
Trace; c0108e80 
Trace; c0125965 
Trace; c0125a9f 
Trace; c0125d60 
Trace; c0125e1a 
Trace; c0108d63 


2 warnings issued.  Results may not be reliable.

#include 
#include 


int main(int argc,char **argv)
{
  char *dummy;
  char *base;
  dummy = malloc(2 * 64 * 1024 );
  base = (char *) (( ((unsigned long) dummy) + 64 * 1024UL - 1 ) & - (64 * 1024UL));
  if (madvise(base,64*1024,MADV_NORMAL))
exit(1);
  if (madvise(base,64*1024,MADV_RANDOM))
exit(1);
  if (madvise(base,64*1024,MADV_SEQUENTIAL))
exit(1);
  if (madvise(base,64*1024,MADV_WILLNEED))
exit(1);
  if (madvise(base,64*1024,MADV_DONTNEED))
exit(1);
  exit(0);
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: FS callback routines

2001-01-09 Thread Daniel Stodden


"Sean R. Bright" <[EMAIL PROTECTED]> writes:

>   Ok, before I begin, don't shoot me down, but I had an idea for a kernel
> modification and was wondering how feasible the group thought it was.
> 
>   I was writing a user space application to monitor a folder's contents.  The
> folder itself contained 100 folders, and each of those contained 24 folders.
> While writing the code to traverse the directory structure I realized that
> instead of my software figuring out when things change, why not just have
> the fs tell my application when something was updated.  For example, say we
> had a function called watch_fs(), that took an inode reference and a
> function pointer and maybe a bitmask of events to watch for.  When that
> inode (or its children) were changed, why couldn't the fs code call the
> callback function I specified?
> 
>   I have no idea how expensive this would be or if its even worth it at this
> point.  It also wouldn't be portable at all considering that I know of no
> other OS that does this (could be wrong).
> 
>   Like I said, I am not asking that this be (necessarily) implemented, I am
> just curious as to what the percieved performance ramifications would be if
> it were to implemented, say, by a virgin kernel developer ;)

you want to have a look at

http://oss.sgi.com/projects/fam/

resp. imon, the corresponding kernel modules. 

this has been around for quite some time now. enlightenment has
been/still is? using it since it's earliest incarnations of its file
manager extension efm. (same with kde? not sure..)

i'm wondering whether this could get into the mainstream kernels soon?
i'm not really deep in the filesystem layers, but this sounds to me
like an extremely useful feature.

could anyone comment on section 2 of
http://oss.sgi.com/projects/fam/imon.txt ? would this actually be the
way to do it or is there any better method?


regards,
dns

-- 
___
 mailto:[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Unified power management userspace policy

2001-01-09 Thread Andrew Morton

John Fremlin wrote:
> 
> Hi all!
> 
> At the moment there are two power management drivers in the linux
> kernel (AFAIK). They each have different userspace interfaces --
> /proc/apm and /dev/apmctl and /proc/sys/acpi/events or something. This
> is not altogether bad, but as they do the same thing, it might be nice
> to unify (part) of the interface. In fact this is already done for the
> in kernel interface with pm_send_all.
>

John,

Could you please use call_usermodehelper() in this patch
rather than exec_usermodehelper()?  I want to kill
exec_usermodehelper() sometime.

Plus your code will be simpler - no need to create
your own kernel thread.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: `rmdir .` doesn't work in 2.4

2001-01-09 Thread Eric Lammerts



On Tue, 9 Jan 2001, Stefan Traby wrote:
> "rmdir `pwd`" is required to fail (at least under csh, bash, ksh) if the
> path component contains a white space and thereof it can't be a valid
> replacement for Andreas "rmdir ." which was what Al initially suggested.
>
> Yes, I'm very pickey about that; but hey, I don't want to force anyone
> to write GNU/Linux like rms; just valid shell code. :)

Of course you should do rmdir "`pwd`"
But this is a userspace issue.

Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread David S. Miller

   Date: Tue, 9 Jan 2001 11:31:45 +0100
   From: Christoph Hellwig <[EMAIL PROTECTED]>

   Yuck.  A new file_opo just to get a few benchmarks right ...  I
   hope the writepages stuff will not be merged in Linus tree (but I
   wish the code behind it!)

It's a "I know how to send a page somewhere via this filedescriptor
all by myself" operation.  I don't see why people need to take
painkillers over this for 2.4.x.  I think f_op->write is stupid, such
a special case file operation just to get a few benchmarks right.
This is the kind of argument I am hearing.

Orthogonal to f_op->write being for specifying a low-level
implementation of sys_write, f_op->writepage is for specifying a
low-level implementation of sys_sendfile.  Can you grok that?

Linus has already seen this.  Originally he had a gripe because in an
older revision of the code used to allow multiple pages to be passed
in an array to the writepage(s) operation.  He didn't like that, so I
made it take only one page as he requested.  He had no other major
objections to the infrastructure.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread Ingo Molnar

On Mon, 8 Jan 2001, Rik van Riel wrote:

> I really think the zerocopy network stuff should be ported to kiobuf
> proper.

yep, we talked to Stephen Tweedie about this already, but it involves some
changes in kiovec support and we didnt want to touch too much code for
2.4. In any case, the zerocopy code is 'kiovec in spirit' (uses vectors of
struct page *, offset, size entities), so transition to a finalized kiovec
framework (or whatever other mechanizm) is trivial. Right now kiovecs are
*way* too bloated for the purposes of skb fragments.

> The usefulness of the patch you posted is rather .. umm .. limited.
> [...]

i violently disagree :-) The upcoming TUX release is based on David's and
Alexey's cleaned-up zerocopy framework. [thus TUX and zerocopy are
separated.] David's patch adds a *very* scalable implementation of
zerocopy sendfile() and zerocopy sendmsg(), the panacea of fileserver
(webserver) scalability - it can be used by Apache, Samba and other
fileservers. The new zerocopy networking code DMA-s straight out of the
pagecache, natively supports hardware-checksumming and highmem (64-bit DMA
on 32-bit systems) zerocopy as well and multi-fragment DMA - no
limitations. We can saturate a gigabit link with TCP traffic, at about 20%
CPU usage on a 500 MHz x86 UP system. David and Alexey's patch is cool -
check it out!

> Having proper kiobuf support would make it possible to, for example,
> do zerocopy network->disk data transfers and lots of other things.

i used to think that this is useful, but these days it isnt. It's a waste
of PCI bandwidth resources, and it's much cheaper to keep a cache in RAM
instead of doing direct disk=>network DMA *all the time* some resource is
requested.

> Furthermore, by using kiobuf for the network zerocopy stuff there's a
> good chance the networking code will be integrated.

David and Alexey are TCP/IP networking code maintainers. So if you see a
'test this' networking framework patch from them on l-k, it has quite high
chances of being integrated into the networking code :-)

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread Christoph Hellwig


On Tue, Jan 09, 2001 at 11:23:41AM +0100, Ingo Molnar wrote:
> 
> On Mon, 8 Jan 2001, Rik van Riel wrote:
> 
> > I really think the zerocopy network stuff should be ported to kiobuf
> > proper.
> 
> yep, we talked to Stephen Tweedie about this already, but it involves some
> changes in kiovec support and we didnt want to touch too much code for
> 2.4. In any case, the zerocopy code is 'kiovec in spirit' (uses vectors of
> struct page *, offset, size entities),

Yep.  That is why I was so worried aboit the writepages file op.
It's rather hackish (only write, looks usefull only for networking)
instead of the proposed rw_kiovec fop.

> 
> > The usefulness of the patch you posted is rather .. umm .. limited.
> > [...]
> 
> i violently disagree :-) The upcoming TUX release is based on David's and
> Alexey's cleaned-up zerocopy framework. [thus TUX and zerocopy are
> separated.] David's patch adds a *very* scalable implementation of
> zerocopy sendfile() and zerocopy sendmsg(), the panacea of fileserver
> (webserver) scalability - it can be used by Apache, Samba and other
> fileservers. The new zerocopy networking code DMA-s straight out of the
> pagecache, natively supports hardware-checksumming and highmem (64-bit DMA
> on 32-bit systems) zerocopy as well and multi-fragment DMA - no
> limitations. We can saturate a gigabit link with TCP traffic, at about 20%
> CPU usage on a 500 MHz x86 UP system. David and Alexey's patch is cool -
> check it out!

Yuck.  A new file_opo just to get a few benchmarks right ...
I hope the writepages stuff will not be merged in Linus tree
(but I wish the code behind it!)

Christoph

-- 
Whip me.  Beat me.  Make me maintain AIX.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread Ingo Molnar

On Tue, 9 Jan 2001, Christoph Hellwig wrote:

> > 2.4. In any case, the zerocopy code is 'kiovec in spirit' (uses
> > vectors of struct page *, offset, size entities),

> Yep. That is why I was so worried aboit the writepages file op.

i believe you misunderstand. kiovecs (in their current form) are simply
too bloated for networking purposes. Due to its nature and nonpersistency,
networking is very lightweight and memory-footprint-sensitive code (as
opposed to eg. block IO code), right now an 'struct skb_shared_info'
[which is roughly equivalent to a kiovec] is 12+4*6 == 36 bytes, which
includes support for 6 distinct fragments (each fragment can be on any
page, any offset, any size). A *single* kiobuf (which is roughly
equivalent to an skb fragment) is 52+16*4 == 116 bytes. 6 of these would
be 696 bytes, for a single TCP packet (!!!). This is simply not something
to be used for lightweight zero-copy networking.

so it's easy to say 'use kiovecs', but right now it's simply not
practical. kiobufs are a loaded concept, and i'm not sure whether it's
desirable at all to mix networking zero-copy concepts with
block-IO/filesystem zero-copy concepts. Just to make it even more clear:
although i do believe it to be desirable from an architectural point of
view, i'm not sure at all whether it's possible, based on the experience
we gathered while implementing TCP-zerocopy.

we talked (and are talking) to Stephen about this problem, but it's a
clealy 2.5 kernel issue. Merging to a finalized zero-copy framework will
be easy. (The overwhelming percentage of zero-copy code is in the
networking code itself and is insensitive to any kiovec issues.)

> It's rather hackish (only write, looks usefull only for networking)
> instead of the proposed rw_kiovec fop.

i'm not sure what you are trying to say. You mean we should remove
sendfile() as well? It's only write, looks useful mostly for networking. A
substantial percentage of kernel code is useful only for networking :-)

> > zerocopy sendfile() and zerocopy sendmsg(), the panacea of fileserver
> > (webserver) scalability - it can be used by Apache, Samba and other
> > fileservers. The new zerocopy networking code DMA-s straight out of the
> > The new zerocopy networking code DMA-s straight out of the
> > pagecache, natively supports hardware-checksumming and highmem (64-bit
> > DMA on 32-bit systems) zerocopy as well and multi-fragment DMA - no
> > limitations. We can saturate a gigabit link with TCP traffic, at about
> > 20% CPU usage on a 500 MHz x86 UP system. David and Alexey's patch is
> > cool - check it out!

> Yuck. A new file_opo just to get a few benchmarks right ...

no. As David said, it's direct sendfile() support. It's completely
isolated, it's 20 lines of code, it does not impact filesystems, it only
shows up in sendfile(). So i truly dont understand your point. This
interface has gone through several iterations and was actually further
simplified.

Ingo

ps1. "first they say it's impossible, then they ridicule you, then they
 oppose you, finally they say it's self-evident". Looks like, after
 many many years, zero-copy networking for Linux is now finally in
 phase III. :-)

ps2. i'm joking :-)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: FS callback routines

2001-01-09 Thread Philipp Matthias Hahn


On Mon, 8 Jan 2001, Sean R. Bright wrote:

> I was writing a user space application to monitor a folder's contents.  The
> folder itself contained 100 folders, and each of those contained 24 folders.
> While writing the code to traverse the directory structure I realized that
> instead of my software figuring out when things change, why not just have
> the fs tell my application when something was updated.  For example, say we
> had a function called watch_fs(), that took an inode reference and a
> function pointer and maybe a bitmask of events to watch for.  When that
> inode (or its children) were changed, why couldn't the fs code call the
> callback function I specified?
RFTM: linux-2.4.0/Documentation/dnotify.txt

BYtE
Philipp
-- 
  / /  (_)__  __   __ Philipp Hahn
 / /__/ / _ \/ // /\ \/ /
//_/_//_/\_,_/ /_/\_\ [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

PROBLEM: loop device doesnt reset it's flags

2001-01-09 Thread David Tritscher


[1.]

The loop device doesnt seem to reset it's read-only status after it
gets used on a file that is on a read-only filesystem.

[6.]

(Pretty chopped up, but it demonstrates the problem)

# mount
/dev/hda1 on / type reiserfs (rw)
/dev/scd0 on /mnt/cdrom type iso9660 (ro,noexec,nosuid,nodev,sync,unhide)
# list /mnt/cdrom/floppy.img 
-r--r--r--   1 root root  1474560 Dec 16 15:40 /mnt/cdrom/floppy.img
# cp /mnt/cdrom/floppy.img /

# mount -o loop=/dev/loop0 /floppy.img /mnt/disk
# mount
/floppy.img on /mnt/disk type ext2 (rw,loop=/dev/loop0)
# umount /mnt/disk

# mount -o ro,loop=/dev/loop0 /floppy.img /mnt/disk
# mount
/floppy.img on /mnt/disk type ext2 (ro,loop=/dev/loop0)
# umount /mnt/disk

# mount -o rw,loop=/dev/loop0 /floppy.img /mnt/disk
# mount
/floppy.img on /mnt/disk type ext2 (rw,loop=/dev/loop0)
# umount /mnt/disk

(All that above is normal)

# mount -o loop=/dev/loop0 /mnt/cdrom/floppy.img /mnt/disk
# mount
/mnt/cdrom/floppy.img on /mnt/disk type ext2 (ro,loop=/dev/loop0)
# umount /mnt/disk

(Now loop0 is screwed)

# mount -o loop=/dev/loop0 /floppy.img /mnt/disk
mount: floppy.img is write-protected, mounting read-only
# mount
/floppy.img on /mnt/disk type ext2 (ro,loop=/dev/loop0)
# umount /mnt/disk

# mount -o rw,loop=/dev/loop0 /floppy.img /mnt/disk
mount: floppy.img is write-protected, mounting read-only
# mount
/floppy.img on /mnt/disk type ext2 (ro,loop=/dev/loop0)
# umount /mnt/disk

The same behavior as shown above is exhibited by:

losetup /dev/loop1 /mnt/cdrom/floppy.img
losetup -d /dev/loop1

now loop1 thinks it is always read-only.

[7.1.]

Linux dave 2.4.0 #1 i586 unknown
Kernel modules 2.4.0
Gnu C  2.95.2
Gnu Make   3.79.1
Binutils   2.10.1
Linux C Library2.2
Dynamic linker ldd: version 1.9.9
Procps 2.0.7
Mount  2.10r
Net-tools  1.57
Console-tools  0.2.3
Sh-utils   2.0
Modules Loaded 

[X.]

This patch seems to fix the problem on my machine.

--- linux/drivers/block/loop.c.orig Tue Jan  9 12:16:02 2001
+++ linux/drivers/block/loop.c  Tue Jan  9 12:16:57 2001
@@ -412,13 +412,14 @@
error = -EINVAL;
inode = file->f_dentry->d_inode;
 
+   lo->lo_flags = 0;
+
if (S_ISBLK(inode->i_mode)) {
/* dentry will be wired, so... */
error = blkdev_get(inode->i_bdev, file->f_mode,
   file->f_flags, BDEV_FILE);
 
lo->lo_device = inode->i_rdev;
-   lo->lo_flags = 0;
 
/* Backed by a block device - don't need to hold onto
   a file structure */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Subtle MM bug (really 830MB barrier question)

2001-01-09 Thread Dan Maas


> 08048000-08b5c000 r-xp  03:05 1130923
/tmp/newmagma/magma.exe.dyn
> 08b5c000-08cc9000 rw-p 00b13000 03:05 1130923
/tmp/newmagma/magma.exe.dyn
> 08cc9000-0bd0 rwxp  00:00 0

> Now, subsequent to each memory allocation, only the second number in the
> third line changes.  It becomes 23a78000, then 3b7f, and finally
> 3b808000 (after the failed allocation).

OK it's fairly obvious what's happening here. Your program is using its own
allocator, which relies solely on brk() to obtain more memory. On x86 Linux,
brk()-allocated memory (the heap) begins right above the executable and
grows upward - the increasing number you noted above is the top of the heap,
which grows with every brk(). Problem is, the heap can't keep growing
forever - as you discovered, on x86 Linux the upper bound is just below
0x4000. That boundary is where shared libraries and other memory-mapped
files start to appear.

Note that there is still plenty (~2GB) of address space left, in the region
between the shared libraries and the top of user address space (just under
0xBFFF). How do you use that space? You need an allocation scheme based
on mmap'ing /dev/zero. As others pointed out, glibc's allocator does just
that.

Here's your short answer: ask the authors of your program to either 1)
replace their custom allocator with regular malloc() or 2) enhance their
custom allocator to use mmap. (or, buy some 64-bit hardware =)...)

Dan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread Christoph Hellwig


On Tue, Jan 09, 2001 at 02:31:13AM -0800, David S. Miller wrote:
>Date: Tue, 9 Jan 2001 11:31:45 +0100
>From: Christoph Hellwig <[EMAIL PROTECTED]>
> 
>Yuck.  A new file_opo just to get a few benchmarks right ...  I
>hope the writepages stuff will not be merged in Linus tree (but I
>wish the code behind it!)
> 
> It's a "I know how to send a page somewhere via this filedescriptor
> all by myself" operation.  I don't see why people need to take
> painkillers over this for 2.4.x.  I think f_op->write is stupid, such
> a special case file operation just to get a few benchmarks right.
> This is the kind of argument I am hearing.
> 
> Orthogonal to f_op->write being for specifying a low-level
> implementation of sys_write, f_op->writepage is for specifying a
> low-level implementation of sys_sendfile.  Can you grok that?

Sure.  But sendfile is not one of the fundamental UNIX operations...
If there was no alternative to this I would probably have not said
anything, but with the rw_kiovec file op just before the door I don't
see any reason to add this _very_ specific file operation.

An alloc_kiovec before and an free_kiovec after the actual call
and the memory overhaed of a kiobuf won't hurt so much that it stands
against a clean interface, IMHO.


> 
> Linus has already seen this.  Originally he had a gripe because in an
> older revision of the code used to allow multiple pages to be passed
> in an array to the writepage(s) operation.  He didn't like that, so I
> made it take only one page as he requested.  He had no other major
> objections to the infrastructure.

You get that multiple page call with kiobufs for free...

Christoph

-- 
Whip me.  Beat me.  Make me maintain AIX.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] hashed device lookup (New Benchmarks)

2001-01-09 Thread Andi Kleen

On Mon, Jan 08, 2001 at 04:23:41PM +0100, Ben Greear wrote:
> I don't argue that ifconfig shouldn't be fixed, but the hash speeds up

It's already fixed since months. There was one stupid algorithm, which
I was to blame for when I changed ifconfig to use a device list two years ago.
At that time I didn't think that anybody would be ever crazy enough to set up
4000 interfaces and just chosed the simplest list management. I fixed it 
when you first complained a few months ago and now the list insertion works
that the list does not need to be walked fully in the usual case.
It could be optimized more in user space, but it's probably not worth it. 

> ip by about 2X too.  Is that not useful enough?  ip seems to be implemented
> pretty efficient, so if the hash helps it significantly then maybe it
> can help other efficient programs too.  Notice that it is the system
> (ie kernel) time that stays remarkably flat with the hash + ip graph.

Just does your benchmark represent anything that real users do frequently ? 

If you really want to optimize I'm sure there are lots of areas in the kernel
where your efforts are better spent ;) [just run with a the kernel profiler on
for a few days on your box and look at all the real hot spots] 

BTW, if you just want to optimize ip link ls speed it would be probably enough
to keep a one behind cache that just caches the next member after the last
search. 

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread David S. Miller

   Date: Tue, 9 Jan 2001 12:28:10 +0100
   From: Christoph Hellwig <[EMAIL PROTECTED]>

   Sure.  But sendfile is not one of the fundamental UNIX operations...

It's a fundamental Linux interface and VFS-->networking interface.

   An alloc_kiovec before and an free_kiovec after the actual call
   and the memory overhaed of a kiobuf won't hurt so much that it stands
   against a clean interface, IMHO.

This whole exercise is pointless unless it performs well.

The overhead _DOES_ matter, we've tested and profiled all of this with
full specweb99 runs, zerocopy ftp server loads, etc.  Removing one
word of information from anything involved in these code paths makes
enormous differences.  Have you run such tests with your suggested
kiobuf scheme?

Know what I really hate?  People who are talking, "almost done", and
"designing" the "real solution" to a problem and have no code to show
for it.  Ie. a total working implementation.  Often they have not one
line of code to show.

Then the folks who actually get off their lazy asses and make
something real, which works, and in fact exceeded most of our personal
performance expectations, are the ones who are getting told that what
they did was crap.

What was the first thing out of people's mouths?  Not "nice work", but
"I think writepage is ugly and an eyesore, I hope nobody seriously
considers this code for inclusion."  Keep designing... like Linus
says, "show me the code".

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Problem compiling linux-2.4.0 for Athlon/K7

2001-01-09 Thread Walter Mueller


Hello,

if  Athlon/K7 is selected as processor type i get the following error
messages when compiling

make -C  kernel CFLAGS="-D__KERNEL__ -I/usr/src/linux-2.4.x/linux-2.4.0/include -Wall 
-Wstrict-prototypes -O2 -fomit-frame-pointer -fno-strict-aliasing -pipe 
-mpreferred-stack-boundary=2 -march=i686 -malign-functions=4  -DMODULE -DMODVERSIONS 
-include /usr/src/linux-2.4.x/linux-2.4.0/include/linux/modversions.h" 
MAKING_MODULES=1 modules
make[1]: Entering directory `/usr/src/linux-2.4.x/linux-2.4.0/kernel'
make[1]: Nothing to be done for `modules'.
make[1]: Leaving directory `/usr/src/linux-2.4.x/linux-2.4.0/kernel'
make -C  drivers CFLAGS="-D__KERNEL__ -I/usr/src/linux-2.4.x/linux-2.4.0/include -Wall 
-Wstrict-prototypes -O2 -fomit-frame-pointer -fno-strict-aliasing -pipe 
-mpreferred-stack-boundary=2 -march=i686 -malign-functions=4  -DMODULE -DMODVERSIONS 
-include /usr/src/linux-2.4.x/linux-2.4.0/include/linux/modversions.h" 
MAKING_MODULES=1 modules
make[1]: Entering directory `/usr/src/linux-2.4.x/linux-2.4.0/drivers'
make -C block modules
make[2]: Entering directory `/usr/src/linux-2.4.x/linux-2.4.0/drivers/block'
gcc -D__KERNEL__ -I/usr/src/linux-2.4.x/linux-2.4.0/include -Wall -Wstrict-prototypes 
-O2 -fomit-frame-pointer -fno-strict-aliasing -pipe -mpreferred-stack-boundary=2 
-march=i686 -malign-functions=4  -DMODULE -DMODVERSIONS -include 
/usr/src/linux-2.4.x/linux-2.4.0/include/linux/modversions.h   -DEXPORT_SYMTAB -c 
loop.c
In file included from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/irq.h:57,
 from /usr/src/linux-2.4.x/linux-2.4.0/include/asm/hardirq.h:6,
 from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/interrupt.h:45,
 from /usr/src/linux-2.4.x/linux-2.4.0/include/asm/string.h:296,
 from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/string.h:21,
 from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/fs.h:23,
 from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/capability.h:17,
 from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/binfmts.h:5,
 from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/sched.h:9,
 from loop.c:53:
/usr/src/linux-2.4.x/linux-2.4.0/include/asm/hw_irq.h: In function `x86_do_profile':
/usr/src/linux-2.4.x/linux-2.4.0/include/asm/hw_irq.h:198: `current' undeclared (first 
use in this function)
/usr/src/linux-2.4.x/linux-2.4.0/include/asm/hw_irq.h:198: (Each undeclared identifier 
is reported only once
/usr/src/linux-2.4.x/linux-2.4.0/include/asm/hw_irq.h:198: for each function it 
appears in.)
In file included from /usr/src/linux-2.4.x/linux-2.4.0/include/asm/string.h:296,
 from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/string.h:21,
 from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/fs.h:23,
 from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/capability.h:17,
 from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/binfmts.h:5,
 from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/sched.h:9,
 from loop.c:53:
/usr/src/linux-2.4.x/linux-2.4.0/include/linux/interrupt.h: In function 
`raise_softirq':
/usr/src/linux-2.4.x/linux-2.4.0/include/linux/interrupt.h:89: `current' undeclared 
(first use in this function)
/usr/src/linux-2.4.x/linux-2.4.0/include/linux/interrupt.h: In function 
`tasklet_schedule':
/usr/src/linux-2.4.x/linux-2.4.0/include/linux/interrupt.h:160: `current' undeclared 
(first use in this function)
/usr/src/linux-2.4.x/linux-2.4.0/include/linux/interrupt.h: In function 
`tasklet_hi_schedule':
/usr/src/linux-2.4.x/linux-2.4.0/include/linux/interrupt.h:174: `current' undeclared 
(first use in this function)
In file included from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/string.h:21,
 from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/fs.h:23,
 from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/capability.h:17,
 from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/binfmts.h:5,
 from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/sched.h:9,
 from loop.c:53:
/usr/src/linux-2.4.x/linux-2.4.0/include/asm/string.h: In function 
`__constant_memcpy3d':
/usr/src/linux-2.4.x/linux-2.4.0/include/asm/string.h:305: `current' undeclared (first 
use in this function)
/usr/src/linux-2.4.x/linux-2.4.0/include/asm/string.h: In function `__memcpy3d':
/usr/src/linux-2.4.x/linux-2.4.0/include/asm/string.h:312: `current' undeclared (first 
use in this function)
{standard input}: Assembler messages:
{standard input}:8: Warning: Ignoring changed section attributes for .modinfo
make[2]: *** [loop.o] Error 1
make[2]: Leaving directory `/usr/src/linux-2.4.x/linux-2.4.0/drivers/block'
make[1]: *** [_modsubdir_block] Error 2
make[1]: Leaving directory `/usr/src/linux-2.4.x/linux-2.4.0/drivers'
make: *** [_mod_drivers] Error 2




-
To unsubscribe fr

Re: Benchmarking 2.2 and 2.4 using hdparm and dbench 1.1

2001-01-09 Thread Anton Blanchard


 
> Where is the size defined, and is it easy to modify?

Look in fs/buffer.c:buffer_init()

> I noticed that /proc/sys/vm/freepages is not writable any more.  Is there
> any reason for this?

I am not sure why.

> Hmm...  I'm still using samba 2.0.7.  I'll try 2.2 to see if it
> helps.  What are tdb spinlocks?

samba 2.2 uses tdb which is an SMP safe gdbm like database. By default it
uses byte range fcntl locks to provide locking, but has the option of
using spinlocks (./configure --with-spinlocks). I doubt it would make
a difference on your setup.

> Have you actually compared the same setup with 2.2 and 2.4 kernels and a
> single client transferring a large file, preferably from a slow server
> with little memory?  Most samba servers that people benchmark are fast
> computers with lots of memory.  So far, every major kernel upgrade has
> given me a performance boost, even for slow computers, and I would hate to
> see that trend break for 2.4...

I havent done any testing on slow hardware and the high end stuff is
definitely performing better in 2.4, but I agree we shouldn't forget
about the slower stuff.

Narrowing down where the problem is would help. My guess is it is a TCP
problem, can you check if it is performing worse in your case? (eg ftp
something against 2.2 and 2.4)

Anton
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [testcase] madvise->semaphore deadlock 2.4.0

2001-01-09 Thread Andrew Morton


Mike Galbraith wrote:
> 
> Greetings,
> 
> While trying to configure ftpsearch, the process hangs while running
> it's madvise confidence test below.  It appears to be taking a fault
> in madvise_fixup_middle():atomic_add(2, &vma->vm_file->f_count) and
> immediately deadlocking forever on mm->mmap_sem per IKD.  (Virgin 2.4.0
> agrees)
>

This should fix it.

We're still in disagreement with the HPUX 11 manpage though.
HP say that MADV_DONTNEED requires an underlying file,
and thus implies that MADV_WILLNEED doesn't need an
underlying file.  We have it the other way round, which
seems more sensible.


--- linux-2.4.0/mm/filemap.cFri Jan  5 21:37:20 2001
+++ linux-akpm/mm/filemap.c Tue Jan  9 23:05:00 2001
@@ -1835,7 +1835,8 @@
n->vm_end = end;
setup_read_behavior(n, behavior);
n->vm_raend = 0;
-   get_file(n->vm_file);
+   if (n->vm_file)
+   get_file(n->vm_file);
if (n->vm_ops && n->vm_ops->open)
n->vm_ops->open(n);
lock_vma_mappings(vma);
@@ -1861,7 +1862,8 @@
n->vm_pgoff += (n->vm_start - vma->vm_start) >> PAGE_SHIFT;
setup_read_behavior(n, behavior);
n->vm_raend = 0;
-   get_file(n->vm_file);
+   if (n->vm_file)
+   get_file(n->vm_file);
if (n->vm_ops && n->vm_ops->open)
n->vm_ops->open(n);
lock_vma_mappings(vma);
@@ -1893,7 +1895,8 @@
right->vm_pgoff += (right->vm_start - left->vm_start) >> PAGE_SHIFT;
left->vm_raend = 0;
right->vm_raend = 0;
-   atomic_add(2, &vma->vm_file->f_count);
+   if (vma->vm_file)
+   atomic_add(2, &vma->vm_file->f_count);
 
if (vma->vm_ops && vma->vm_ops->open) {
vma->vm_ops->open(left);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] advansys.c: include missing restore_flags, etc

2001-01-09 Thread Arnaldo Carvalho de Melo


Em Tue, Jan 09, 2001 at 08:30:07AM +0100, Pauline Middelink escreveu:
> > +STATIC unsigned long
> >  DvcEnterCritical(void)
> >  {
> > -intflags;
> > +unsigned long flags;
> >  
> >  save_flags(flags);
> >  cli();
> > @@ -9965,7 +9972,7 @@
> >  }
> 
> Err, according tho wise ppl on this list, this does not work on
> MIPSes. The flags thing must stay in the same stackframe.
> 
> (I know, not your fault, but since you are patching the driver...)

yap, know that, just thought that this beast was only for i386, will submit
another patch, and I think that some other drivers does this as well...

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: `rmdir .` doesn't work in 2.4

2001-01-09 Thread Andries . Brouwer


> I trust your specs said so, however I'm not sure which are the specs
> we should follow for Linux.

> At least for LFS 2.2.x fixage I always followed the SuSv2 specs

We are Linux, and free to do whatever we want.
However, following POSIX makes a large body of software available.
It would be very unwise to deviate from POSIX if it can be avoided.

Now POSIX describes only part of Unix, and for other parts we get
our inspiration from SVID, or X/Open, or SUSv2, or by looking at
what other Unix-like systems do, like *BSD*, Solaris, AIX, etc.
But these sources are often contradictory.

The next version of the POSIX standard (which will simultaneously
be SUSv3) is expected a few months from now. As soon as it exists,
we'll want to follow it, as much as possible. Today it doesnt exist,
but in case of doubt it is reasonable to follow the draft.
(And in case the draft is really ridiculous, there is still time
to file a change request.)

Andries


See http://www.opengroup.org/austin/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Confirmation request about new 2.4.x. kernel limits

2001-01-09 Thread Matti Aarnio

On Mon, Jan 08, 2001 at 11:11:05PM -0500, Venkatesh Ramamurthy wrote:
> 
> > Max. RAM size:  64 GB   (any slowness accessing RAM over 4 GB
> >  with 32 bit machines ?)
>   more than 4GB in RAM is bounce buffered, so there is performance
>   penalty as the data have to be copied into the 4GB RAM area

  Actual memory limit is lower, your run-of-the-mill Pentium-PAE36 capable
  system has PCI bus(es) for IO, and address space for that/those need to
  stay in area below 4G for bootup to access any devices, thus very likely 
  your system doesn't have more than, say, 3 GB of RAM below 4G.

  Pick your processors.  You need XEONs to have L1/L2 cacheing on memory
  above the 4 GB address (PAE36 mapped physical addresses.)

  For IO on usual systems you have 32 bit address space PCI busmasters,
  so those can access only the lowest 4GB of address space, and to have
  a block of data in upper area, it needs to be "bounced", that is, CPU
  must copy it.  Linux 2.4.0 system doesn't support 64-bit PCI addresses
  at 32-bit systems (not at 64-bit Alpha either, I recall.)
  On the other hand, Alpha systems and SPARC systems have IOMMU hardware,
  and we do support that (to some extent), but 32-bit intel world doesn't
  have similar things.

  For userspace, if parts of userspace are physically mapped above 4G,
  it might not be very harmfull at all -- presuming you have XEONs which
  cache the memory accesses there also.  The libc and similar multiply
  shared objects might as well reside in high memory.   Userspace process
  doesn't see, after all, where each page resides physically.

/Matti Aarnio
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Which Bind version..

2001-01-09 Thread Michael H. Warfield

On Mon, Jan 08, 2001 at 12:56:37PM +0500, Mike wrote:
> Hi!!

> I wanna install Bind on my DNS. Which Bind version is most stabel and
> secure.

9.0.1 is the latest release in the 9.x series and if you are
interested in "SecureDNS", that's the way to go.  I'm currently running
9.1.0beta2, and it seems rock solid to me.  There is also 8.2.2P7 if
you want to stick the the older 8.x series, but I certainly wouldn't
if I were setting up a new DNS server.  The 4.x series is totally
deprecated at this point.  Personally, I wouldn't use anything less
than 9.0.1 and I currently support over 100 domains on my servers
(my partner runs a hosting service).

> Regards,
> Nauman Ansari

Mike
-- 
 Michael H. Warfield|  (770) 985-6132   |  [EMAIL PROTECTED]
  (The Mad Wizard)  |  (678) 463-0932   |  http://www.wittsend.com/mhw/
  NIC whois:  MHW9  |  An optimist believes we live in the best of all
 PGP Key: 0xDF1DD471|  possible worlds.  A pessimist is sure of it!

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread Ingo Molnar

On Tue, 9 Jan 2001, Christoph Hellwig wrote:

> Sure.  But sendfile is not one of the fundamental UNIX operations...

Neither were eg. kernel-based semaphores. So what? Unix wasnt perfect and
isnt perfect - but it was a (very) good starting point. If you are arguing
against the existence or importance of sendfile() you should re-think,
sendfile() is a unique (and important) interface because it enables moving
information between files (streams) without involving any interim
user-space memory buffer. No original Unix API did this AFAIK, so we
obviously had to add it. It's an important Linux API category.

> If there was no alternative to this I would probably have not said
> anything, but with the rw_kiovec file op just before the door I don't
> see any reason to add this _very_ specific file operation.

I do think that the kiovec code has to be rewritten substantially before
it can be used for networking zero-copy, so right now we do the least
damange if we do not increase the coverage of kiovec code.

> An alloc_kiovec before and an free_kiovec after the actual call and
> the memory overhaed of a kiobuf won't hurt so much that it stands
> against a clean interface, IMHO.

please study the networking portions of the zerocopy patch and you'll see
why this is not desirable. An alloc_kiovec()/free_kiovec() is exactly the
thing we cannot afford in a sendfile() operation. sendfile() is
lightweight, the setup times of kiovecs are not.

basically the current kiovec design does not deal with the realities of
high-speed, featherweight networking. DO NOT talk in hypotheticals. The
code is there, do it, measure it. You might not care about performance, we
do.

another, more theoretical issue is that i think the kernel should not be
littered with multi-page interfaces, we should keep the one "struct page *
at a time" interfaces. Eg. check out how the new zerocopy code generates
perfect MTU sized frames via the ->writepage() interface. No interim
container objects are necessary.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread Ingo Molnar

On Mon, 8 Jan 2001, David S. Miller wrote:

>All I am asking is that someone lets me know if they make major
>changes to my code so I can keep track of whats happening.
>
> We have not made any major changes to your code, in lieu of this
> not being code which is actually being submitted yet.
>
> If it bothers you that publicly someone has published changes to your
> driver which you disagree with, oh well... :-)

i did tell Jes about our zerocopy work, months ago (and IIRC we even
exchanged emails about technical issues briefly). The changes were first
published in the TUX 1.0 source code last August, and subsequent cleanups
(more than 10 iterations) were published on Alexey's public FTP site:

ftp://ftp.inr.ac.ru/ip-routing/

i think this whole issue got miscommunicated because Jes moved to Canada
exactly when we wrote the fragmented-API changes. I do believe Jes will
like most of our changes though, and i can surely tell that the elegant
and clean code of the Acenic driver made these changes so much easier.
Jen's Acenic driver was the first Linux networking driver in history to
support zero-copy TCP.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Subtle MM bug

2001-01-09 Thread Zlatko Calusic

Linus Torvalds <[EMAIL PROTECTED]> writes:

> On 8 Jan 2001, Eric W. Biederman wrote:
> 
> > Zlatko Calusic <[EMAIL PROTECTED]> writes:> 
> > > 
> > > Yes, but a lot more data on the swap also means degraded performance,
> > > because the disk head has to seek around in the much bigger area. Are
> > > you sure this is all OK?
> > 
> > I don't think we have more data on the swap, just more data has an
> > allocated home on the swap.
> 
> I think Zlatko's point is that because of the extra allocations, we will
> have worse locality (more seeks etc).

Yes that was my concern.

But in the end I'm not sure. I made two simple tests and haven't found
any problems with 2.4.0 mm logic (opposed to 2.2.17). In fact, the new
kernel was faster in the more interesting (make -j32) test.

Also I have found that new kernel allocates 4 times more swap space
under some circumstances. That may or may not be alarming, it remains
to be seen.

-- 
Zlatko
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: kernel network problem ?

2001-01-09 Thread Steven N. Hirsch

On Tue, 9 Jan 2001, Helge Hafting wrote:

> Nicolas Noble wrote:
> [...]
> As others have told already, this is the ECN problem.
> 
> > I noticed the same bug. This is very weired, I can send a list of sites
> > which I can't connect anymore. 
> 
> You have a list?  Send all of them a message stating that they ought
> to upgrade their firewalls which cause this problem.  Or they
> will loose customers/visitors.  Cisco already have an upgrade for them,
> so fixing is dead easy, and they can then boast compatibility with
> the latest internet standards.  
> 
> If they don't care about linux users, tell them that windows eventually
> will use ECN too.  They definitely don't want to have a ECN problem when
> that happens.

After upgrading to kernel 2.4.0, I found myself unable to retrieve mail
from Adelphia's (2-way cable ISP) POP server.  It took several days to
figure out that _one_ of their routers was configured to block ECN.  After
bringing this to the attention of their network engineers, I was informed
that their policy prohibits making any router changes on the basis of one
trouble report.  The person I spoke with did NOT try to defend their
setup, but it was made clear that they'll do nothing until Windows breaks.

If I were packaging a Linux distribution, I'd be sure to have ECN disabled
by default, FWIW.

Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

[PATCH] ad1848.c: include missing restore_flags

2001-01-09 Thread Arnaldo Carvalho de Melo


Alan,

Please apply.

- Arnaldo

--- linux-2.4.0-ac4/drivers/sound/ad1848.c  Thu Aug 24 07:40:05 2000
+++ linux-2.4.0-ac4.acme/drivers/sound/ad1848.c Tue Jan  9 08:55:58 2001
@@ -28,6 +28,7 @@
  *   of irqs. Use dev_id.
  * Christoph Hellwig   : adapted to module_init/module_exit
  * Aki Laukkanen   : added power management support
+ * Arnaldo C. de Melo  : added missing restore_flags in ad1848_resume
  *
  * Status:
  * Tested. Believed fully functional.
@@ -2751,6 +2752,7 @@
bits = interrupt_bits[devc->irq];
if (bits == -1) {
printk(KERN_ERR "MSS: Bad IRQ %d\n", devc->irq);
+   restore_flags(flags);
return -1;
}
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread Stephen Landamore


Ingo Molnar wrote:
> On Tue, 9 Jan 2001, Christoph Hellwig wrote:
>
>> Sure.  But sendfile is not one of the fundamental UNIX operations...
>
> Neither were eg. kernel-based semaphores. So what? Unix wasnt
> perfect and isnt perfect - but it was a (very) good starting
> point. If you are arguing against the existence or importance of
> sendfile() you should re-think, sendfile() is a unique (and
> important) interface because it enables moving information between
> files (streams) without involving any interim user-space memory
> buffer. No original Unix API did this AFAIK, so we obviously had to
> add it. It's an important Linux API category.

Ehh, that's not correct. HP-UX was the first to implement sendfile().
Linux (and other commercial unices) then copied the idea...

For the record, sendfile() exists because we (Zeus) asked HP for
it. (So of course we agree that sendfile is important!)

Regards,
Stephen

--
Stephen Landamore, <[EMAIL PROTECTED]>  Zeus Technology
Tel: +44 1223 525000  Universally Serving the Net
Fax: +44 1223 525100  http://www.zeus.com
Zeus Technology, Zeus House, Cowley Road, Cambridge, CB4 0ZT, ENGLAND

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: FS callback routines

2001-01-09 Thread Daniel Phillips


"Michael D. Crawford" wrote:
> 
> Regarding notification when there's a change to the filesystem:
> 
> This is one of the most significant things about the BeOS BFS filesystem, and
> something I'd dearly love to see Linux adopt.  It makes an app very efficient,
> you just get notified when a directory changes and you never waste time polling.
> 
> I think it would require changes to the VFS layer, not just to the filesystems,
> because this is a concept POSIX filesystems do not presently possess.
> 
> The other is indexed filesystem attributes, for example a file can have its
> mimetype in the filesystem, and any application can add an attribute and have it
> indexed.
> 
> There's a method to do boolean queries on indexed attributes, and you can find
> files in an entire filesystem that match a query in a blazingly short time, much
> faster than walking the directory tree.
> 
> If you want to try out the BeOS, there's a free-as-in-beer version at
> http://free.be.com for Pentium PC's.  You can also purchase a version that comes
> for both PC's and certain PowerPC macs.
> 
> There are read-only versions of this for Linux which I believe are under the
> GPL.  The original author is here:
> 
> http://hp.vector.co.jp/authors/VA008030/bfs/
> 
> He refers you to here to get a version that works under 2.2.16:
> 
> http://milosch.net/beos/
> 
> The author's intention was to take it read-write, but it's complex because it is
> a journaling filesystem.
> 
> Daniel Berlin, a BeOS developer modified the Linux BFS driver so it works with
> 2.4.0-test1.  I don't know if it works with 2.4.0.  The web site where it used
> to be posted isn't there anymore, and the laptop where I had it is in for
> repair.  I may have it on a backup, and I'll see if I can track Daniel down.
> 
> While Be, Inc.'s implementation is closed-source, the design of the BFS (_not_
> "befs" as it is sometimes called) is explained in Practical File System Design
> with the Be File System by Dominic Giampolo, ISBN 1-55860-497-9.  Dominic has
> since left Be and I understand works at Google now.

fs/dnotify.c:

   /*
* Directory notifications for Linux.
*
* Copyright (C) 2000 Stephen Rothwell
...

The currently defined events are:

DN_ACCESS   A file in the directory was accessed (read)
DN_MODIFY   A file in the directory was modified (write,truncate)
DN_CREATE   A file was created in the directory
DN_DELETE   A file was unlinked from directory
DN_RENAME   A file in the directory was renamed
DN_ATTRIB   A file in the directory had its attributes
changed (chmod,chown)

It was done last year, quietly and without fanfare, by Stephen Rothwell:

  http://www.linuxcare.com/about-us/os-dev/rothwell.epl

This may be the most significant new feature in 2.4.0, as it allows us
to take a fundamentally different approach to many different problems. 
Three that come to mind: mail (get your mail instantly without polling);
make (don't rely on timestamps to know when rebuilding is needed, don't
scan huge directory trees on each build); locate (reindex only those
directories that have changed, keep index database current).  As you
noticed, there are many others.

Stephen, it would be very interesting to know more about the development
process you went through and what motivated you to provide this
fundamental facility.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Subtle MM bug

2001-01-09 Thread Eric W. Biederman

Linus Torvalds <[EMAIL PROTECTED]> writes:

> On 8 Jan 2001, Eric W. Biederman wrote:
> 
> > Zlatko Calusic <[EMAIL PROTECTED]> writes:> 
> > > 
> > > Yes, but a lot more data on the swap also means degraded performance,
> > > because the disk head has to seek around in the much bigger area. Are
> > > you sure this is all OK?
> > 
> > I don't think we have more data on the swap, just more data has an
> > allocated home on the swap.
> 
> I think Zlatko's point is that because of the extra allocations, we will
> have worse locality (more seeks etc). 
> 
> Clearly we should not actually do any more actual IO. But the sticky
> allocation _might_ make the IO we do be more spread out.

The tradeoff when implemented correctly is that writes will tend to be
more spread out and reads should be better clustered together. 

> To offset that, I think the sticky allocation makes us much better able to
> handle things like clustering etc more intelligently, which is why I think
> it's very much worth it.  But let's not close our eyes to potential
> downsides.

Certainly, keeping ours eyes open is a good a good thing.

But it has been apparent for a long time that by doing allocation as
we were doing it, that when it came to heavy swapping we were taking a
performance hit.  So I'm relieved that we are now being more aggressive.

>From the sounds of it what we are currently doing actually sucks worse
for some heavy loads.  But it still feels like the right direction.

It's been my impression that work loads where we are actively swapping
are a lot different from work loads where we really don't swap.  To
the extent that it might make sense to make the actively swapping case
a config option to get our attention in the code.  It would be nice
to have a linux kernel for once that handles heavy swapping (below
the level of thrashing) gracefully. :)

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread Ingo Molnar

On Tue, 9 Jan 2001, Stephen Landamore wrote:

> >> Sure.  But sendfile is not one of the fundamental UNIX operations...

> > Neither were eg. kernel-based semaphores. So what? Unix wasnt

> Ehh, that's not correct. HP-UX was the first to implement sendfile().

i dont think we disagree. What i was referring to was the 'original' Unix
idea, the 30 years old one, which did not include sendfile() :-) We never
claimed that sendfile() first came up in Linux [that would be a blatant
lie] - and the Linux API itself was indeed influenced by existing
sendfile()/copyfile() interfaces. (at the time Linus implemented
sendfile() there already existed several similar interfaces.)

> For the record, sendfile() exists because we (Zeus) asked HP for it.

good move :-) [honestly.]

> (So of course we agree that sendfile is important!)

:-) I think sendfile() should also have its logical extensions:
receivefile(). I dont know how the HPUX implementation works, but in
Linux, right now it's only possible to sendfile() from a file to a socket.
The logical extension of this is to allow socket->file IO and file->file,
socket->socket IO as well. (the later one could be interesting for things
like web proxies.)

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [testcase] madvise->semaphore deadlock 2.4.0

2001-01-09 Thread Mike Galbraith


On Tue, 9 Jan 2001, Andrew Morton wrote:

> Mike Galbraith wrote:
> > 
> > Greetings,
> > 
> > While trying to configure ftpsearch, the process hangs while running
> > it's madvise confidence test below.  It appears to be taking a fault
> > in madvise_fixup_middle():atomic_add(2, &vma->vm_file->f_count) and
> > immediately deadlocking forever on mm->mmap_sem per IKD.  (Virgin 2.4.0
> > agrees)
> >
> 
> This should fix it.

Indeed it does.   (benchmark _that_ OS rags;)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: kernel network problem ?

2001-01-09 Thread Alan Cox


> trouble report.  The person I spoke with did NOT try to defend their
> setup, but it was made clear that they'll do nothing until Windows breaks.
> 
> If I were packaging a Linux distribution, I'd be sure to have ECN disabled
> by default, FWIW.

Probably the case. However the more people who pester the faulty sites the
better. Did you ask the person how many reports he needed 

I certainly intend to run ECN on my mailhost once I trust 2.4 a bit more.

Alan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Broken tty handling

2001-01-09 Thread Blu3Viper


Drat it, don't you hate it when you get around to reporting a long standing
bug and it's already fixed.

Thank you,
-d

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: `rmdir .` doesn't work in 2.4

2001-01-09 Thread Alexander Viro

On Mon, 8 Jan 2001, Stefan Traby wrote:

> On Mon, Jan 08, 2001 at 12:58:20PM -0500, Alexander Viro wrote:
> 
> > Shell equivalent is rmdir `pwd`. Also portable.
> 
> Very portable - not.
> 
> rmdir "`pwd`" !!!

OK, got me on that. Yes, you'll need quoting here. Sorry.
Notice that there are two effects in the game:
* some Unices refuse to rmdir() busy directories. For them
removing the pwd is impossible. Period. You can chdir() away, but
there is no promise that after
chdir foo
chdir ..
foo will refer to the directory that used to be your pwd in the middle.
That's pretty obvious - consider the effect of

chdir foo
chdir ..mv foo bar
mv baz foo
So unless you have external information about behaviour of other processes
the only way to pinpoint a directory is to keep it opened/pwd/root. Each
of these will keep it busy and Unices that refuse to rmdir() busy ones
will return -EBUSY on that.
On such systems there is _no_ reliable way to remove your current
pwd unless you can guarantee that it won't be renamed away by another process.
No matter what you are doing.

* All Unices are required to refuse rmdir() on pathnames that end on
"." or "..". 2.2 is an exception in that respect - usually it allows such
operation. However, that is _still_ unreliable. rename() called by another
process in the right time will make rmdir(".") return -ENOENT, even though
at any moment "." would resolve to the same directory. Including the window
when rmdir() would fail. Notice that error value is indistinguishable from
the other cases, so blind repeating rmdir(".") while you get -ENOENT is not
a solution (as the matter of fact, it can trivially turn into infinite loop).
All examples mentioned in that thread (HP/UX, Solaris, *BSD) _will_
fail with "rmdir .". Some of them will fail with "rmdir " too -
see discussion of -EBUSY above.

The bottom line: without external information about behaviour of
other processes you can't reliably remove the directory that is your pwd
now. "chdir away and rmdir by the name it used to have" works around the
problem with -EBUSY (on the systems that refuse to remove busy ones) _BUT_
it is still vulnerable to "rename by another process" kind of races.

If you _have_ such external warranties - trivial wrapper will do
the trick on the systems that allow rmdir() of busy directories and
the same wrapper combined with chdir away will solve the problem for all
systems. There is no reason to put that in the kernel - it will not give
you any additional warranties.

We _can_ pinpoint the link and do rmdir() on it reliably. We can't
do the same to inode. In principle kernel could do that, but NONE of the
existing Unices (2.2 included) do such things and it would require more
trickery than it's worth.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: `rmdir .` doesn't work in 2.4

2001-01-09 Thread Jesse Pollard


-  Received message begins Here  -

> 
> Hello Al,
> 
> why `rmdir .` is been deprecated in 2.4.x?  I wrote software that depends on
> `rmdir .` to work (it's local software only for myself so I don't care that it
> may not work on unix) and I'm getting flooded by failing cronjobs since I put
> 2.4.0 on such machine.  `rmdir .` makes perfect sense, the cwd dentry remains
> pinned by me until I `cd ..`, when it gets finally deleted from disk.  I'd like
> if we could resurrect such fine feature (adapting userspace is just a few liner
> but that isn't the point). Comments?

Not exactly valid, since a file could be created in that "pinned" directory
after the rmdir...

-
Jesse I Pollard, II
Email: [EMAIL PROTECTED]

Any opinions expressed are solely my own.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread Andrew Morton


Ingo Molnar wrote:
> 
> On Tue, 9 Jan 2001, Stephen Landamore wrote:
> 
> > >> Sure.  But sendfile is not one of the fundamental UNIX operations...
> 
> > > Neither were eg. kernel-based semaphores. So what? Unix wasnt
> 
> > Ehh, that's not correct. HP-UX was the first to implement sendfile().
> 
> i dont think we disagree. What i was referring to was the 'original' Unix
> idea, the 30 years old one, which did not include sendfile() :-) We never
> claimed that sendfile() first came up in Linux [that would be a blatant
> lie] - and the Linux API itself was indeed influenced by existing
> sendfile()/copyfile() interfaces. (at the time Linus implemented
> sendfile() there already existed several similar interfaces.)
> 

y'know our pals have patented it?

http://www.delphion.com/details?pn=US05845280__
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: kernel network problem ?

2001-01-09 Thread Gregory Maxwell


On Tue, Jan 09, 2001 at 01:32:49PM +, Alan Cox wrote:
> > If I were packaging a Linux distribution, I'd be sure to have ECN disabled
> > by default, FWIW.
> 
> Probably the case. However the more people who pester the faulty sites the
> better. Did you ask the person how many reports he needed 
> 
> I certainly intend to run ECN on my mailhost once I trust 2.4 a bit more.
> 
> Alan

Is anyone maintaing an automated sweep of sites that I can complain to all
at once (for each 2.4 ecn system I install of course) rather then finding
them one at a time as my connections fail?

:)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

[PATCH] sscape.c: include missing restore_flags

2001-01-09 Thread Arnaldo Carvalho de Melo


Alan,

Please apply.

- Arnaldo

--- linux-2.4.0-ac4/drivers/sound/sscape.c  Mon Jan  8 20:39:30 2001
+++ linux-2.4.0-ac4.acme/drivers/sound/sscape.c Tue Jan  9 09:16:39 2001
@@ -16,6 +16,7 @@
  * Christoph Hellwig   : adapted to module_init/module_exit
  * Bartlomiej Zolnierkiewicz : added __init to attach_sscape()
  * Chris Rankin: Specify that this module owns the coprocessor
+ * Arnaldo C. de Melo  : added missing restore_flags in sscape_pnp_upload_file
  */
 
 #include 
@@ -969,7 +970,10 @@
memcpy(devc->raw_buf, dt, l); dt += l;
sscape_start_dma(devc->dma, devc->raw_buf_phys, l, 0x48);
sscape_pnp_start_dma ( devc, 0 );
-   if (sscape_pnp_wait_dma ( devc, 0 ) == 0) return 0;
+   if (sscape_pnp_wait_dma ( devc, 0 ) == 0) {
+   restore_flags(flags);   
+   return 0;
+   }
}

restore_flags(flags);   
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] hashed device lookup (Does NOT meet Linus' sumissionpolicy!)

2001-01-09 Thread Blu3Viper


> Actually if you count arp which is also part of ip; ip becomes smaller
> by about 15K.

...i always forget some small detail.

thx

-d

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

2.4.0 bug in SHM an via-rhine or is it my fault?

2001-01-09 Thread Felix Maibaum


Hi folks!

I searched the kernel archives for information on this at least half a
yearback but I found only one article on the subject and that was never
replied to:

I'm using a via-rhine chip (DFE-530TX) on a 10 Mbit network, I use 2.4.0
final, Athlon (classic) 1Gig, Abit-KA7 mobo (via KX133), Debian woody.
whenever I try to get a file on my local network, meaning I get close to
the 10Mbit barrier the network card hangs up. Traffic just stops.
One ifdown/ifup and everything works fine again. (for about 10 seconds)
this problem has persisted for some time now, I thought it would be
fixed in the final, but, alas, it hasn't. It only happens during high
traffic, too, at about 400k, no problem!


Something new that cropped up in prerelease:

My SHM stopped working!
everything was fine in test12, and after that all I got was "no space
left on device".
Has anything changed that one should know about? I mounted shm like it's
written in the help, and on a friends celeron SMP machine it works fine,
I just don't know what I did wrong.

any ideas on any of the 2 problems?

TIA

Felix Maibaum


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] cramfs is ro only, so honour this in inode->mode

2001-01-09 Thread Albert D. Cahalan

Shane Nay writes:

> but the bits are useless in the "normal interpretation" of it,
...
> But then you pull out the write bits,

If you need to steal a bit, grab one that won't hurt.
Take the owner's read bit. (owner may read own files)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] More compile warning fixes for 2.4.0

2001-01-09 Thread Albert D. Cahalan


[about labels w/o statements after them]

>> Is this really a kernel bug? This is common idiom in C, so gcc
>> shouldn't warn about it. If it does, it is a bug in gcc IMHO.
>
> No, it is not a common idiom in C.  It has _never_ been valid C.
>
> GCC originally allowed it due to a mistake in the grammar; we
> now warn for it.  Fix your source.

Since neither -ansi nor -std=foo was specified, gcc should just
shut up and be happy. Consider this as another GNU extension.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Unified power management userspace policy

2001-01-09 Thread John Fremlin



Hi!

 Andrew Morton <[EMAIL PROTECTED]> writes:

> Could you please use call_usermodehelper() in this patch
> rather than exec_usermodehelper()?  I want to kill
> exec_usermodehelper() sometime.

The reason I used exec_usermodehelper is that I wanted to waitpid on
the process to see how it exited. Am I still allowed to do that if it
runs as a child of keventd?

[...]

-- 

http://www.penguinpowered.com/~vii
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: `rmdir .` doesn't work in 2.4

2001-01-09 Thread Stephen C. Tweedie

Hi,

On Tue, Jan 09, 2001 at 01:01:25AM +0100, Andrea Arcangeli wrote:
> On Mon, Jan 08, 2001 at 03:27:21PM -0800, Linus Torvalds wrote:
> > However, it is against all UNIX standards, and Linux-2.4 will explicitly
> 
> I may be missing something but apparently SuSv2 allows it, you can check here:
> 
>   http://www.opengroup.org/onlinepubs/007908799/xsh/rmdir.html
> 
> Infact SuSv2 doesn't even allow rmdir to return -EINVAL.

SuS always allows implementations to return other errors than the ones
listed:

  Implementations will not generate a different error number from the
  ones described here for error conditions described in this
  specification, but may generate additional errors unless explicitly
  disallowed for a particular function.

See http://www.opengroup.org/onlinepubs/007908799/xsh/errors.html

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: `rmdir .` doesn't work in 2.4

2001-01-09 Thread Albert D. Cahalan

Alexander Viro writes:

> [...] If you really need to destroy the directory
> that happens to be your pwd - sorry, no reliable way to do that without
> interesting locking. On _any_ UNIX out there. 2.2 included. It will
> happily give you -ENOENT and refuse to perform the action above in
> case if some other process renames your pwd. Yes, for rmdir(".");

Well, this bites.

Locking guess: use a global read-write lock, with the "write" case
being deletion of "." and the "read" case being everything else.
You could have one lock per CPU, with the writer needing to grab all
of them in order. So removal of "." pays the cost.

If the standards gripe, well, rmdot() is a nice name.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: `rmdir .` doesn't work in 2.4

2001-01-09 Thread Stephen C. Tweedie


Hi,

On Mon, Jan 08, 2001 at 09:28:33PM +0100, Andrea Arcangeli wrote:
> On Mon, Jan 08, 2001 at 12:58:20PM -0500, Alexander Viro wrote:
> > It's a hell of a pain wrt locking. You need to lock the parent, but it can
> 
> This is a no-brainer and bad implementation, but shows it's obviously right
> wrt locking. (pseudocode, I ignored the uaccess details and all the other not
> relevant things)
> 
>   err = sys_getcwd(buf, PAGE_SIZE)
>   if (!memcmp(path, ".", 2))
>   path = buf
>   err = 2_4_0_sys_rmdir(path)

> Could you enlight me on where's the locking pain?

Do the above while another process is renaming one of your parents and
watch an innocent directory get shot down in flames, or prepare for an
incorrect ENOENT.

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Network Performance?

2001-01-09 Thread Tim Sailer


On Mon, Jan 08, 2001 at 07:07:18PM +0100, Erik Mouw wrote:
> I had similar problems two weeks ago. Turned out the connection between
> two switches: one of them was hard wired to 100Mbit/s full duplex, the
> other one to 100Mbit/s half duplex. Just to rule out the obvious...

We check that as the first thing. Both are set the same. No collisions
out of the ordinary.

Tim

-- 
Tim Sailer <[EMAIL PROTECTED]> Cyber Security Operations
Brookhaven National Laboratory  (631) 344-3001
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0 bug in SHM an via-rhine or is it my fault?

2001-01-09 Thread Nils Philippsen


On Tue, 9 Jan 2001, Felix Maibaum wrote:

> My SHM stopped working!
> everything was fine in test12, and after that all I got was "no space
> left on device".
> Has anything changed that one should know about? I mounted shm like it's
> written in the help, and on a friends celeron SMP machine it works fine,
> I just don't know what I did wrong.

You used a buggy version of powertweak which set kernel.shmall to 0 in
/etc/sysctl.conf. Remove the offending line in /etc/sysctl.conf and either
reboot the machine or "echo 2097152 > /proc/sys/kernel/shmall".

Ciao,
Nils
-- 
 Nils Philippsen / Berliner Straße 39 / D-71229 Leonberg // +49.7152.209647
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED]
   The use of COBOL cripples the mind; its teaching should, therefore, be
   regarded as a criminal offence.  -- Edsger W. Dijkstra

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: FS callback routines

2001-01-09 Thread Jesse Pollard


Daniel Phillips <[EMAIL PROTECTED]>:
> "Michael D. Crawford" wrote:
> > 
> > Regarding notification when there's a change to the filesystem:
> > 
> > This is one of the most significant things about the BeOS BFS filesystem, and
> > something I'd dearly love to see Linux adopt.  It makes an app very efficient,
> > you just get notified when a directory changes and you never waste time polling.
> > 
> > I think it would require changes to the VFS layer, not just to the filesystems,
> > because this is a concept POSIX filesystems do not presently possess.
> > 
> > The other is indexed filesystem attributes, for example a file can have its
> > mimetype in the filesystem, and any application can add an attribute and have it
> > indexed.
> > 
> > There's a method to do boolean queries on indexed attributes, and you can find
> > files in an entire filesystem that match a query in a blazingly short time, much
> > faster than walking the directory tree.
> > 
> > If you want to try out the BeOS, there's a free-as-in-beer version at
> > http://free.be.com for Pentium PC's.  You can also purchase a version that comes
> > for both PC's and certain PowerPC macs.
> > 
> > There are read-only versions of this for Linux which I believe are under the
> > GPL.  The original author is here:
> > 
> > http://hp.vector.co.jp/authors/VA008030/bfs/
> > 
> > He refers you to here to get a version that works under 2.2.16:
> > 
> > http://milosch.net/beos/
> > 
> > The author's intention was to take it read-write, but it's complex because it is
> > a journaling filesystem.
> > 
> > Daniel Berlin, a BeOS developer modified the Linux BFS driver so it works with
> > 2.4.0-test1.  I don't know if it works with 2.4.0.  The web site where it used
> > to be posted isn't there anymore, and the laptop where I had it is in for
> > repair.  I may have it on a backup, and I'll see if I can track Daniel down.
> > 
> > While Be, Inc.'s implementation is closed-source, the design of the BFS (_not_
> > "befs" as it is sometimes called) is explained in Practical File System Design
> > with the Be File System by Dominic Giampolo, ISBN 1-55860-497-9.  Dominic has
> > since left Be and I understand works at Google now.
> 
> fs/dnotify.c:
> 
>/*
> * Directory notifications for Linux.
> *
> * Copyright (C) 2000 Stephen Rothwell
> ...
> 
> The currently defined events are:
> 
>   DN_ACCESS   A file in the directory was accessed (read)
>   DN_MODIFY   A file in the directory was modified (write,truncate)
>   DN_CREATE   A file was created in the directory
>   DN_DELETE   A file was unlinked from directory
>   DN_RENAME   A file in the directory was renamed
>   DN_ATTRIB   A file in the directory had its attributes
>   changed (chmod,chown)
> 
> It was done last year, quietly and without fanfare, by Stephen Rothwell:
> 
>   http://www.linuxcare.com/about-us/os-dev/rothwell.epl
> 
> This may be the most significant new feature in 2.4.0, as it allows us
> to take a fundamentally different approach to many different problems. 
> Three that come to mind: mail (get your mail instantly without polling);
> make (don't rely on timestamps to know when rebuilding is needed, don't
> scan huge directory trees on each build); locate (reindex only those
> directories that have changed, keep index database current).  As you
> noticed, there are many others.
> 
> Stephen, it would be very interesting to know more about the development
> process you went through and what motivated you to provide this
> fundamental facility.

It would also be very nice if the security of the feature could be
confirmed. The problem with SGI's implementation is that it becomes
possible to monitor files that you don't own, don't have access to,
or are not permitted to know even exist. For these reasons, we have
disabled the feature.

-
Jesse I Pollard, II
Email: [EMAIL PROTECTED]

Any opinions expressed are solely my own.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: `rmdir .` doesn't work in 2.4

2001-01-09 Thread Andrea Arcangeli


On Tue, Jan 09, 2001 at 07:41:21AM -0600, Jesse Pollard wrote:
> Not exactly valid, since a file could be created in that "pinned" directory
> after the rmdir...

In 2.2.x no file can be created in the pinned directory after the rmdir.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Confirmation request about new 2.4.x. kernel limits

2001-01-09 Thread Stephen C. Tweedie

Hi,

On Fri, Jan 05, 2001 at 11:46:04PM +0100, Pavel Machek wrote:
> 
> > Max. file size: 1 TB(?)
> > Max. file system size:  2 TB(?)
> 
> Again, maybe on i386 with ext2.

Actually, the 2TB limit affects all architectures, as we assume that
block indexes fit into 32 bits.  Blocks are passed around as unsigned
longs in some cases, but even on 64-bit machines that doesn't help us
as the limit still persists in the filesystem (32-bit block numbers)
and device drivers (ints and 4-byte sector numbers used when
generating SCSI commands).

Auditing the whole driver path to allow 64-bit block numbers, and
adding the logic to generate the 5th sector address byte in the scsi
command when we're doing 10-byte commands, are all possible extensions
for 2.5.  For now, though, the 2TB device limit is with us for all
architectures and all filesystems on 2.4.

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Unified power management userspace policy

2001-01-09 Thread Andrew Morton

John Fremlin wrote:
> 
> Hi!
> 
>  Andrew Morton <[EMAIL PROTECTED]> writes:
> 
> > Could you please use call_usermodehelper() in this patch
> > rather than exec_usermodehelper()?  I want to kill
> > exec_usermodehelper() sometime.
> 
> The reason I used exec_usermodehelper is that I wanted to waitpid on
> the process to see how it exited. Am I still allowed to do that if it
> runs as a child of keventd?

Oh foo.  I missed that.

In the patch-which-didn't-make-it, yes, it can be called
synchronously.  Or you can be called back with the exit
code when the subprocess exits.  It does all the waitpid
stuff, the signal management, handles chrootedness, etc.
But that's vapourware now.  

In the current implementation of call_usermodehelper(),
it looks like the commentary is incorrect - it returns
a negative error code or the subprocess's pid, but you
can't wait on that because it's parented by keventd.

Sorry for the noise - stick with what you have now.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Confirmation request about new 2.4.x. kernel limits

2001-01-09 Thread Stephen C. Tweedie

Hi,

On Mon, Jan 08, 2001 at 11:11:05PM -0500, Venkatesh Ramamurthy wrote:
> 
>   > Max. RAM size:64 GB   (any slowness
> accessing RAM over 4 GB
> * with 32 bit machines ?)
>   Imore than 4GB in RAM is bounce buffered, so there is performance
> penalty as the data have to be copied into the 4GB RAM area

Any memory over 1GB is bounce-buffered, but we don't use that memory
for anything other than process data pages or file cache, so only
swapping and disk IO to regular files gets the extra copy.  In
particular, things like network buffers are still all kept in the low
1GB so never need to be buffered.

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: VM subsystem bug in 2.4.0 ?

2001-01-09 Thread Stephen C. Tweedie


Hi,

On Mon, Jan 08, 2001 at 04:30:10PM -0200, Rik van Riel wrote:
> On Mon, 8 Jan 2001, Linus Torvalds wrote:
> > 
> > The only solution I see is something like a "active_immobile"
> > list, and add entries to that list whenever "writepage()"
> > returns 1 - instead of just moving them to the active list.
> 
> Just marking them with a special "do not deactivate me"
> bit seems to work fine enough. When this special bit is
> set, we simply move the page to the back of the active
> list instead of deactivating.

But again, how do you clear the bit?  Locking is a per-vma property,
not per-page.  I can mmap a file twice and mlock just one of the
mappings.  If you get a munlock(), how are you to know how many other
locked mappings still exist?

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0 bug in SHM an via-rhine or is it my fault?

2001-01-09 Thread Felix Maibaum


Nils Philippsen wrote:

> reboot the machine or "echo 2097152 > /proc/sys/kernel/shmall".

now thats what I call a quick response, thanks, it did the trick.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] cramfs is ro only, so honour this in inode->mode

2001-01-09 Thread Doug McNaught


"Albert D. Cahalan" <[EMAIL PROTECTED]> writes:

> Shane Nay writes:
> 
> > but the bits are useless in the "normal interpretation" of it,
> ...
> > But then you pull out the write bits,
> 
> If you need to steal a bit, grab one that won't hurt.
> Take the owner's read bit. (owner may read own files)

Er,

bash-2.03$ cd /tmp
bash-2.03$ cat >foo
This is a test.
bash-2.03$ chmod u-r foo
bash-2.03$ cat foo
cat: foo: Permission denied
bash-2.03$ ls -l foo
--w-r--r--1 doug doug   16 Jan  9 09:16 foo
bash-2.03$ 

This is Linux 2.4.0.

-Doug
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

2.4.0-ac4 lockups

2001-01-09 Thread anders . karlsson





Hi,

I'm currently running ROCK Linux 1.3.11 on a MiTAC 6033 laptop, XFree86
4.0.1
and the rest of the linux install is quite bleeding edge (I can find out
version
numbers for most things is needed). In this box there is a PCMCIA Token
Ring
card (IBM Turbo 16/4 PC Card 2) and to drive this, pcmcia-cs-3.1.23.

The problem that is showing its ugly face is that after some prolonged
network
activity the system will lock solidly. The magic SysRq keys still work,
well sort
of anyway. Alt-SysRq-s does inspire the system to disk activity.
Alt-SysRq-u
doesn't do enough for the disk-led to light up but Alt-SysRq-b does reboot
the
system. Upon reboot I go and fetch a coffee while the system is fsck'ing
the
filesystems.

I have had several lockups in the last couple of days. It started in
2.4.0-prer
and with the Token Ring card. Some crash messages has been relating to
virtual memory at invalid addresses and when I get a good crash message I
will write it down and post to the list.

Any ideas anyone?

  /Anders


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: `rmdir .` doesn't work in 2.4

2001-01-09 Thread Alexander Viro

On Tue, 9 Jan 2001, Jesse Pollard wrote:

> Not exactly valid, since a file could be created in that "pinned" directory
> after the rmdir...

No, it couldn't (if you can show a testcase when it would - please do, you've
found a bug). Moreover, busy directories can be removed in 2.4 quite fine -
it's about pathname, not about the thing being your (or somebody else) pwd.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

[PATCH] via-macii.c: restore_flags on failure

2001-01-09 Thread Arnaldo Carvalho de Melo


Hi,

Please consider applying.

- Arnaldo

--- linux-2.4.0-ac4/drivers/macintosh/via-macii.c   Tue Dec 19 11:25:39 2000
+++ linux-2.4.0-ac4.acme/drivers/macintosh/via-macii.c  Tue Jan  9 10:18:17 2001
@@ -9,6 +9,9 @@
  *
  * Rewrite for Unified ADB by Joshua M. Thompson ([EMAIL PROTECTED])
  *
+ * Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
+ * - restore_flags on failure in macii_init - 09/01/2001
+ *
  * 1999-08-02 (jmt) - Initial rewrite for Unified ADB.
  */
  
@@ -147,15 +150,16 @@
cli();

err = macii_init_via();
-   if (err) return err;
+   if (err) goto out;
 
err = request_irq(IRQ_MAC_ADB, macii_interrupt, IRQ_FLG_LOCK, "ADB",
macii_interrupt);
-   if (err) return err;
+   if (err) goto out;
 
macii_state = idle;
-   restore_flags(flags);   
-   return 0;
+   err = 0;
+out:   restore_flags(flags);   
+   return err;
 }
 
 /* initialize the hardware */  
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: `rmdir .` doesn't work in 2.4

2001-01-09 Thread Jesse Pollard

> On Tue, 9 Jan 2001, Jesse Pollard wrote:
> 
> > Not exactly valid, since a file could be created in that "pinned" directory
> > after the rmdir...
> 
> No, it couldn't (if you can show a testcase when it would - please do, you've
> found a bug). Moreover, busy directories can be removed in 2.4 quite fine -
> it's about pathname, not about the thing being your (or somebody else) pwd.

Apologies to all, foot-in-mouth disease

-
Jesse I Pollard, II
Email: [EMAIL PROTECTED]

Any opinions expressed are solely my own.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread David S. Miller

   From: Trond Myklebust <[EMAIL PROTECTED]>
   Date: 09 Jan 2001 14:52:40 +0100

   I don't really want to be chiming in with another 'make it a kiobuf',
   but given that you already have written 'do_tcp_sendpages()' why did
   you make sock->ops->sendpage() take the single page as an argument
   rather than just have it take the 'struct page **'?

It was like that to begin with.  But to do it cleanly you have to pass
in not a vector of "pages" but a vector of "page+offset+len" triplets.

Linus hated it, and I understood why, so I reverted the API to be
single page based.

   I would have thought one of the main interests of doing something
   like this would be to allow us to speed up large writes to the
   socket for ncpfs/knfsd/nfs/smbfs/...

This is what TCP_CORK/MSG_MORE et al. are all for, things get
coalesced perfectly.  Sending in a vector of pages seems nice, but
none of the page cache infrastructure works like this, all of the core
routines work on a page at a time.  It actually simplifies a lot.

The writepage interface optimizes large file writes to a socket just
fine.

Later,
David S. Miller
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: adding a system call

2001-01-09 Thread Mihai Moise



> > What is the procedure for adding a new system call to the Linux kernel?
> 
> hack away, the code's free.  don't expect Linus to accept your 
> changes into the "real" kernel without a VERY good argument.

I know. However the Kernel Hacker's Guide writes about sys.h. After a bit of 
exploring, I found that sys.h has been replaced by something else in later kernels, 
which leaves me wondering where in the kernel I should insert my code, and where the 
dispatcher is located for the other system calls, in case my system call would need 
them.

My system call idea is to allow a superuser process to request a mmap on behalf of an 
user process. To see how this would be useful, let us consider svgalib.

Until now, there were two ways to allow an application access to the video array. The 
first was by making it setuid root, but this compromises system security by allowing 
it too many permissions. The second was by having a helper module which allows user 
applications access to the video card. However this allows any remote user to set the 
screen in flames.

With my new system call, a superuser process can set the graphics mode in a safe 
manner and then ask for an mmap of the video array into the application data segment.

Mihai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Change of policy for future 2.2 driver submissions

2001-01-09 Thread Hubert Mantel

Hi,

On Fri, Jan 05, Linus Torvalds wrote:

[...]

> But that's very different from having somebody like RedHat, SuSE or
> Debian make such a kernel part of their standard package. No, I don't
> expect that they'll switch over completely immediately: that would show
> a lack of good judgement. The prudent approach has always been to have
> both a 2.2.19 and a 2.4.0 kernel on there, and ask the user if he wants
> to test the new kernel first.

Right, but now there is a problem: Software RAID. The RAID code of 2.4.0
is not backwards compatible to the one in 2.2.18; if somebody has used
2.4.0 on softraid and discovers some problem, he can not switch back to
some official 2.2 kernel. In order to make it possible to switch between
kernel releases, every vendor now really is forced to integrate the new
RAID0.90 code to their 2.2 kernel. IMHO this code should be integrated to
the next official 2.2 kernel so people can use whatever they want.

>   Linus
  -o)
Hubert Mantel  Goodbye, dots...   /\\
 _\_v
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Change of policy for future 2.2 driver submissions

2001-01-09 Thread Alan Cox


> some official 2.2 kernel. In order to make it possible to switch between
> kernel releases, every vendor now really is forced to integrate the new
> RAID0.90 code to their 2.2 kernel. IMHO this code should be integrated to
> the next official 2.2 kernel so people can use whatever they want.

Then people using a newer 2.2 cannot go back to an older 2.2 thats really
far far worse.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: VM subsystem bug in 2.4.0 ?

2001-01-09 Thread Christoph Rohland

Hi Stephen,

On Tue, 9 Jan 2001, Stephen C. Tweedie wrote:
> But again, how do you clear the bit?  Locking is a per-vma property,
> not per-page.  I can mmap a file twice and mlock just one of the
> mappings.  If you get a munlock(), how are you to know how many
> other locked mappings still exist?

It's worse: The issue we are talking about is SYSV IPC_LOCK. This is a
per segment thing. A user can (un)lock a segment at any time. But we
do not have the references to the vmas attached to the segemnts or to
the pages allocated.

Greetings
Christoph

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: adding a system call

2001-01-09 Thread David Woodhouse



[EMAIL PROTECTED] said:
>  What is the procedure for adding a new system call to the Linux
> kernel?

First: Convince people that it's necessary. 

--
dwmw2


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: `rmdir .` doesn't work in 2.4

2001-01-09 Thread Alexander Viro

On Tue, 9 Jan 2001, Albert D. Cahalan wrote:

> Alexander Viro writes:
> 
> > [...] If you really need to destroy the directory
> > that happens to be your pwd - sorry, no reliable way to do that without
> > interesting locking. On _any_ UNIX out there. 2.2 included. It will
> > happily give you -ENOENT and refuse to perform the action above in
> > case if some other process renames your pwd. Yes, for rmdir(".");
> 
> Well, this bites.
> 
> Locking guess: use a global read-write lock, with the "write" case
> being deletion of "." and the "read" case being everything else.
> You could have one lock per CPU, with the writer needing to grab all
> of them in order. So removal of "." pays the cost.

It's _so_ far from the SMP cache issues that it's not even funny. So reference
to brw-locks is completely bogus. What you are proposing is to serialize
rmdir() and rename() (including lookups) wrt rmdir and rename. Globally.
Fun, fun...

> If the standards gripe, well, rmdot() is a nice name.

If anything, frmdir() might be a better name. However, it's really
inconsistent with the whole namespace-modifying stuff. You don't have
flink(fd, newname). frename() and funlink() are not even funny - _which_
link would you want to be renamed/removed?

Filesystem consists of two types of objects - files (and that includes
directories, etc.) and links. Pathname can be evaluated to link and to
file. Namespace syscalls (creat()/mkdir()/mknod()/symlink()/link()/
unlink()/rmdir()/rename()) operate on links. open(), truncate(), stat(),
lstat(), etc. operate on files - completely different can of worms.

2.2 tried (without success) to make rmdir() and some cases of rename() act
on files. Notice that if you have /foo as pwd, "." and "/foo" will evaluate
to the same file, but to different links. That's what it's really about.

We could add new syscalls. However, I'm yet to see the real-world situation
where they would be needed enough to warrant their inclusion. And I mean
real-world, not an exercise asking for that functionality. Occam's Razor...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Delay in authentication.

2001-01-09 Thread Chris Meadors

On Mon, 8 Jan 2001, Scott Laird wrote:

>
> Is syslog running correctly?  When syslog screws up, it very frequently
> results in this sort of problem.
>

I would guess that syslog is okay.  I'm getting plenty of entries in my
various logs, along with a few boxes remote logging into this server.

Another interesting thing I have noticed about this delay.  If I remove
the data in the password field from the shadow file ("username::...")
there is no pause during login.

-Chris
-- 
Two penguins were walking on an iceberg.  The first penguin said to the
second, "you look like you are wearing a tuxedo."  The second penguin
said, "I might be..." --David Lynch, Twin Peaks

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] More compile warning fixes for 2.4.0

2001-01-09 Thread Richard B. Johnson

On Tue, 9 Jan 2001, Albert D. Cahalan wrote:

> [about labels w/o statements after them]
> 
> >> Is this really a kernel bug? This is common idiom in C, so gcc
> >> shouldn't warn about it. If it does, it is a bug in gcc IMHO.
> >
> > No, it is not a common idiom in C.  It has _never_ been valid C.
> >
> > GCC originally allowed it due to a mistake in the grammar; we
> > now warn for it.  Fix your source.
> 
> Since neither -ansi nor -std=foo was specified, gcc should just
> shut up and be happy. Consider this as another GNU extension.
> 

It has to do with ; "a label at the end of a compound statement..."
This has never been correctly allowed. Many don't realilize that
case 'X':
case 'Y':
default:

... are all labels. Modern compilers are now enforcing the rules.
When a 'switch' is a compound statement, tt follows the rules for
other compound statements. For instance, you can code (correctly)

switch(a) case 1: a--;

... this, with no braces at all. If a == 1, it gets changed to 0,
otherwise it is untouched. If we need another test, it becomes
a compound statement requiring braces as:

switch(a) { case 1: a--; default: }

Observe that we have tricked the compiler into generating code without
using a ';' denoting the end of a statement. The standards makers don't
like this and and now requiring that the above be coded as:

switch(a) { case 1: a--; default: ; }
  ^___ no tricks allowed.

A 'program unit', denoted by {} braces has never required a terminating
semicolon, so putting a ';' at the end of the physical statement
just won't do it in this case.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.0 on an i686 machine (799.53 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

RE: Confirmation request about new 2.4.x. kernel limits

2001-01-09 Thread Venkatesh Ramamurthy


> Any memory over 1GB is bounce-buffered, but we don't use that memory
> for anything other than process data pages or file cache, so only
> swapping and disk IO to regular files gets the extra copy.  In
> particular, things like network buffers are still all kept in the low
> 1GB so never need to be buffered.
[Venkatesh Ramamurthy]  If anything over 1GB is bounce buffered than
what is the purpose of setting the pci_dev->dma_mask field.  On a IA32
system we set it to 32 1's and IA64 to 64 1's - Venkat
>  
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] hashed device lookup (New Benchmarks)

2001-01-09 Thread Ben Greear


Andi Kleen wrote:
> 
> On Mon, Jan 08, 2001 at 04:23:41PM +0100, Ben Greear wrote:
> > I don't argue that ifconfig shouldn't be fixed, but the hash speeds up
> 
> It's already fixed since months. There was one stupid algorithm, which
> I was to blame for when I changed ifconfig to use a device list two years ago.

The benchmark was run against this one:
[root@candle lanforge]# ifconfig --version
net-tools 1.57
ifconfig 1.40 (2000-05-21)


The latest I could find anywhere  Please tell me the version of a
newer one if it exists.

> > ip by about 2X too.  Is that not useful enough?  ip seems to be implemented
> > pretty efficient, so if the hash helps it significantly then maybe it
> > can help other efficient programs too.  Notice that it is the system
> > (ie kernel) time that stays remarkably flat with the hash + ip graph.
> 
> Just does your benchmark represent anything that real users do frequently ?

I'm going to write something that binds to a raw device, which is something
users (DHCP, for sure) does.  If it does not show any significant improvement,
then I'll drop the issue untill many-many interfaces are more common.

> 
> If you really want to optimize I'm sure there are lots of areas in the kernel
> where your efforts are better spent ;) [just run with a the kernel profiler on
> for a few days on your box and look at all the real hot spots]

I was just trying to smooth VLAN's adoption into the kernel by removing the
one linear-lookup that I know of relating to lots of VLANs.  It obviously
isn't horribly important, but it was fun :)


> 
> BTW, if you just want to optimize ip link ls speed it would be probably enough
> to keep a one behind cache that just caches the next member after the last
> search.

That is still linear in the kernel...or do you mean cache in the kernel?  At any
rate, I'm more concerned about random access.

> 
> -Andi

-- 
Ben Greear ([EMAIL PROTECTED])  http://www.candelatech.com
Author of ScryMUD:  scry.wanfear.com (Released under GPL)
http://scry.wanfear.com   http://scry.wanfear.com/~greear
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

[PATCH] dn_keyb.c: restore_flags on failure

2001-01-09 Thread Arnaldo Carvalho de Melo


Alan,

Please consider applying. I don't who is the maintainer, no
references in the driver, CREDITS or MAINTAINERS

- Arnaldo

--- linux-2.4.0-ac4/drivers/char/dn_keyb.c  Fri Jul 28 06:34:40 2000
+++ linux-2.4.0-ac4.acme/drivers/char/dn_keyb.c Tue Jan  9 10:32:17 2001
@@ -435,15 +435,14 @@
for(;length;length--) {
keyb_cmds[keyb_cmd_write++]=*(cmd++);
if(keyb_cmd_write==keyb_cmd_read)
-   return;
+   goto out;
if(keyb_cmd_write==APOLLO_KEYB_CMD_ENTRIES)
keyb_cmd_write=0;
}
if(!keyb_cmd_transmit)  {
   sio01.BRGtest_cra=5;
}
-   restore_flags(flags);
-
+out:   restore_flags(flags);
 }
 
 static struct busmouse apollo_mouse = {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Network Performance?

2001-01-09 Thread Tim Sailer


On Mon, Jan 08, 2001 at 01:40:57PM -0500, Craig I. Hagan wrote:
> > 101 packets transmitted, 101 packets received, 0% packet loss
> > round-trip min/avg/max = 109.6/110.3/112.2 ms
> > 
> > > Does the problem occur in both directions?
> > 
> > Good question. I'll find out.
> > 
> > > Are you _sure_ the window size is being set correctly? How
> > > is it being set?
> > 
> > I'm fairly sure. We echo the value to the file. catting it back
> > shows the correct value. If we go lower than default, it slows
> > down even more.
> 
> what are you setting it to on the solaris machine? what window
> sizes have you tried?
> 
> Your pipe looks like it will have quite a few bits in flight due to its

Yup. That's why the tuning. WAN performance here is very important.

> latency. From my quick guess math, which sucks, it appears that you can fit 1.2
> to 1.5 megabytes on the wire (100mbit machine<-> machine) times 100-120ms wire

Hmm. 100/8 is about 12, no? 

> time. This is a rather large number, so you may want to see what hosts really
> support, perhaps starting with 64k or 128k and work up. Make sure that you have
> window scaling turned on if you go with very large windows.

Yes, we have that enabled too.

> Also, have you upped your socket buffers to match your window sizes?

We are using straight ftp for the testing.

> Last, solaris tends to have poorly tuned tcp values out of the box, look at
> this link and tune the solaris stack to better reflect reality.
> 
>http://www.google.com/search?q=cache:www.rvs.uni-hannover.de/people/voeckler/tune/EN/tune.html+%2Bwan+%2Bwindow+%2Bscale+%2Bsize+%2Bnetwork&hl=en
> 
> linux tuning has a decent amount of data in the docs section of the kernel
> sources.

I'll take a look. THanks.

Tim

-- 
Tim Sailer <[EMAIL PROTECTED]> Cyber Security Operations
Brookhaven National Laboratory  (631) 344-3001
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread Alan Cox


> designing for them. Eg. if an IO operation (eg. streaming video webcast)
> does a DMA from a camera card to an outgoing networking card, would it be

Most mpeg2 hardware isnt set up for that kind of use. And webcast protocols 
like h.263 tend to be software implemented. 

Capturing raw video for pre-processing is similar. Right now thats best
done with mmap() on the ring buffer and O_DIRECT I/O it seems

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread Ingo Molnar

On Tue, 9 Jan 2001, Stephen C. Tweedie wrote:

> > please study the networking portions of the zerocopy patch and you'll see
> > why this is not desirable. An alloc_kiovec()/free_kiovec() is exactly the
> > thing we cannot afford in a sendfile() operation. sendfile() is
> > lightweight, the setup times of kiovecs are not.
> >
> Right.  However, kiobufs can be kept around for as long as you want
> and can be reused easily, and even if allocating and freeing them is
> more work than you want, populating an existing kiobuf is _very_
> cheap.

we do have SLAB [which essentially caches structures, on a per-CPU basis]
which i did take into account, but still, initializing a 600+ byte kiovec
is probably more work than the rest of sending a packet! I mean i'd love
to eliminate the 200+ bytes skb initialization as well, it shows up.

> > another, more theoretical issue is that i think the kernel should not be
> > littered with multi-page interfaces, we should keep the one "struct page *
> > at a time" interfaces.
>
> Bad bad bad.  We already have SCSI devices optimised for bandwidth
> which don't approach decent performance until you are passing them 1MB
> IOs, [...]

The fact that we're using single-page interfaces doesnt preclude us from
having nicely clustered requests, this is what IO-plugging is about!

> and even in networking the 1.5K packet limit kills us in some cases
> and we need an interface capable of generating jumbograms.

which cases?

> Perhaps tcp can merge internal 4K requests, [...]

yes, because depending on the application to send properly sized requests
is a futile act IMO. So we do have intelligent buffering and clustering in
basically every kernel subsystem - and we'll continue to have it because
we have no choice - most of Linux's user-visible IO APIs have byte
granularity (which is good btw.). Adding a multi-page interface will IMO
mostly just complicate the design and the implementation. Do you have
empirical (or theoretical) proof which shows that single-page interfaces
cannot perform well?

> but if you're doing udp jumbograms (or STP or VIA), you do need an
> interface which can give the networking stack more than one page at
> once.

nothing prevents the introduction of specialized interfaces - if they feel
like they can get enough traction. I was talking about the normal Linux IO
APIs, read()/write()/sendfile(), which are byte granularity and invoke an
almost mandatory buffering/clustering mechanizm in every kernel subsystem
they deal with.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Failure building 2.4 while running 2.4. Success in building 2.4 while running 2.2.

2001-01-09 Thread Alessandro Suardi

Silviu Marin-Caea wrote:
> 
> I have RedHat7, glibc-2.2-9, gcc-2.96-69.
> 
> I can build 2.4.0 while running kernel 2.2.16.
> 
> If I try to rebuild 2.4.0 while running the new kernel, I get random
> compiler errors.
> 
> It happens on two machines.  One of them runs 2.4.0-test12, the other
> 2.4.0.  Both of them with the updates above mentioned.
> 
> I know this is a RedHat issue, but it may be useful to know for some.

I know this isn't since I already built 2.4.0-ac2 and -ac3 on this
 laptop and never got any compiler error :)

[asuardi@princess asuardi]$ rpm -q glibc gcc
glibc-2.2-9
gcc-2.96-69

random compiler errors => bad hardware. On two machines ? Yes.

--alessandro  <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>

Linux:  kernel 2.2.19p6/2.4.0 glibc-2.2 gcc-2.96-69 binutils-2.10.1.0.4
Oracle: Oracle8i 8.1.7.0.0 Enterprise Edition for Linux
motto:  Tell the truth, there's less to remember.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Failure building 2.4 while running 2.4. Success in building 2.4 while running 2.2.

2001-01-09 Thread Alan Cox


> I have RedHat7, glibc-2.2-9, gcc-2.96-69.

Ditto

> If I try to rebuild 2.4.0 while running the new kernel, I get random
> compiler errors.

Now I don't. What hardware are you using ?

> It happens on two machines.  One of them runs 2.4.0-test12, the other
> 2.4.0.  Both of them with the updates above mentioned.

What hardware what errors ?

> I know this is a RedHat issue, but it may be useful to know for some.

It may well be compiler optimisation where the new gcc is optimising out 
something someone forgot in a driver or miscompiling a specific driver. 
One good way to test if its compiler or kernel triggered would be to rebuild
2.4.0 with egcs (aka kgcc).

I'd like to know what drivers you are running so I can try and duplicate it

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-ac3 write() to tcp socket returning errno of -3 (ESRCH:"No such process")

2001-01-09 Thread Paul Cassella


On Tue, 9 Jan 2001, Andrew Morton wrote:

> is this still reproducible?  If so can I send you a debugging
> patch to diagnose a bit further?

Yes to both.  If I get a patch in the next hour or so, I can have it
running before I go to work.  Otherwise I won't be able to try it until
this evening.


With the appended patch, I got these logged, and the
application produces the expected error, all with the same timestamp:

tcp.c:1165:tcp_sendmsg: err is unexpectedly -375.
tcp.c:963:tcp_sendmsg: err is unexpectedly -375.
tcp_sendmsg:991: copy = -375, mss_now = 512, skb->len = 887, skb_tailroom(skb) = 521, 
seglen = 37. 

The second message is misleading; err is not -375 at this point, copied
is.

I'm looking at how these were produced, and they seem to be in the
opposite order that the code produces them?

If you're trying to find these in an unpatched file, The first (line 1165
above) printk() is in the err = copied case of do_fault2.  The second is
in the if(err) goto do_fault2 check.  The last is right after this in
tcp_sendmsg.

  if(copy > seglen) 
  copy = seglen;


This is kind of frightening; the printk on line 991 is effectively inside

  if(mss_now - skb->len > 0)

and mss_now seems to be less than skb->len when the printk happens.  My
copy of K&R is at work; could that comparison be being done unsigned
because of skb->len?  I wouldn't think so, but the alternative seems
somewhat worse...

Most of this patch is to tcp_sendmsg.

diff -ru linux-2.4.0-ac3/net/ipv4/tcp.c linux-2.4.0-ac3-debugging/net/ipv4/tcp.c
--- linux-2.4.0-ac3/net/ipv4/tcp.c  Mon Jan  8 22:41:14 2001
+++ linux-2.4.0-ac3-debugging/net/ipv4/tcp.cMon Jan  8 23:02:03 2001
@@ -451,6 +451,23 @@
 
 #define TCP_PAGES(amt) (((amt)+TCP_MEM_QUANTUM-1)/TCP_MEM_QUANTUM)
 
+#define CHECK_TCP_RET() check_tcp_ret(err, __FILE__, __LINE__, __FUNCTION__)
+
+void check_tcp_ret(int ret, char *file, int line, char *func) {
+  if(ret < 0) {
+   switch(-ret) {
+ case EAGAIN: case EBADF: case EPIPE: case ENOSPC: case EIO: case ECONNRESET:
+ case EINTR: case ETIMEDOUT: case EFAULT: case EINVAL: case EMSGSIZE: case 
+ENOMEM:
+ case ENOBUFS: case ENOTCONN: case ECONNREFUSED: case ERESTARTSYS: case 
+EHOSTUNREACH:
+   break;
+
+ default:
+   printk(KERN_ERR "%s:%d:%s: err is unexpectedly %d.\n", file, line, 
+func, ret);
+   }
+  }
+}
+
+
 int tcp_mem_schedule(struct sock *sk, int size, int kind)
 {
int amt = TCP_PAGES(size);
@@ -883,6 +900,8 @@
}
current->state = TASK_RUNNING;
remove_wait_queue(sk->sleep, &wait);
+   if(timeo < 0)
+ printk(KERN_ERR "wait_for_tcp_memory: timeo == %ld\n", timeo);
return timeo;
 }
 
@@ -916,8 +935,10 @@
 
/* Wait for a connection to finish. */
if ((1 << sk->state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT))
-   if((err = wait_for_tcp_connect(sk, flags, &timeo)) != 0)
-   goto out_unlock;
+ if((err = wait_for_tcp_connect(sk, flags, &timeo)) != 0) {
+   CHECK_TCP_RET();
+   goto out_unlock;
+ }
 
/* This should be in poll */
clear_bit(SOCK_ASYNC_NOSPACE, &sk->socket->flags);
@@ -938,8 +959,11 @@
while (seglen > 0) {
int copy, tmp, queue_it;
 
-   if (err)
-   goto do_fault2;
+   if (err) {
+ if(copied) check_tcp_ret(copied, __FILE__, __LINE__, 
+__FUNCTION__);
+ else CHECK_TCP_RET();
+ goto do_fault2;
+   }
 
/* Stop on errors. */
if (sk->err)
@@ -948,7 +972,7 @@
/* Make sure that we are established. */
if (sk->shutdown & SEND_SHUTDOWN)
goto do_shutdown;
-   
+
/* Now we need to check if we have a half
 * built packet we can tack some data onto.
 */
@@ -964,6 +988,7 @@
copy = skb_tailroom(skb);
if(copy > seglen)
copy = seglen;
+   if(copy < 0) printk(KERN_ERR "tcp_sendmsg:%d: 
+copy = %d, mss_now = %d, skb->len = %d, skb_tailroom(skb) = %d, seglen = %d.\n", 
+__LINE__ copy, mss_now, skb->len, skb_tailroom(skb), seglen);
if(last_byte_was_odd) {
if(copy_from_user(skb_put(skb, copy),
  from, copy))
@@ -975,6 +1000,7 @@
csum_and_copy_from_user(
from, skb_put(skb, copy),

Re: Failure building 2.4 while running 2.4. Success in building 2.4

2001-01-09 Thread Alan Cox


> I know this isn't since I already built 2.4.0-ac2 and -ac3 on this
>  laptop and never got any compiler error :)
> 
> [asuardi@princess asuardi]$ rpm -q glibc gcc
> glibc-2.2-9
> gcc-2.96-69
> 
> random compiler errors => bad hardware. On two machines ? Yes.

My guess is a bad driver. Two machines with random errors from hardware only
in 2.4 is pushing it - possible but pushing it.

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [rlug] Failure building 2.4 while running 2.4. Success inbuilding 2.4 while running 2.2.

2001-01-09 Thread Eugen


io am compilat 2.4.0 in timp ce rulam 2.4.0-test12 si a mers

> 
> I can build 2.4.0 while running kernel 2.2.16.
> 
> If I try to rebuild 2.4.0 while running the new kernel, I get random
> compiler errors.
> 
> It happens on two machines.  One of them runs 2.4.0-test12, the other
> 2.4.0.  Both of them with the updates above mentioned.
> 
> I know this is a RedHat issue, but it may be useful to know for some.
> 
> -- 
> Systems and Network Administrator - Delta Romania
> Phone +4093-267961
> 
> ---
> Send e-mail to '[EMAIL PROTECTED]' with 'unsubscribe rlug' to 
> unsubscribe from this list.
> 



Eugen

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread Stephen C. Tweedie

Hi,

On Tue, Jan 09, 2001 at 03:40:56PM +0100, Ingo Molnar wrote:
> 
> i'd love to first see these kinds of applications (under Linux) before
> designing for them.

Things like Beowulf have been around for a while now, and SGI have
been doing that sort of multimedia stuff for ages.  I don't think that
there's any doubt that there's a demand for this.

> Eg. if an IO operation (eg. streaming video webcast)
> does a DMA from a camera card to an outgoing networking card, would it be
> possible to access the packet data in case of a TCP retransmit? 

I'm not thinking about pci-to-pci as much as pci-to-memory-to-pci
with no memory-to-memory copies.  That's no different to writepage:
doing a zero-copy writepage on a page cache page still gives you the
problem of maintaining retransmit semantics if a user mmaps the file
or writes to it after your initial transmit.

And if you want other examples, we have applications such as Oracle
who want to do raw disk IO in chunks of at least 128K.  Going through
a page-by-page interface for large IOs is almost as bad as the
existing buffer_head-by-buffer_head interface, and we have already
demonstrated that to be a bottleneck in the block device layer.

Jes has also got hard numbers for the performance advantages of
jumbograms on some of the networks he's been using, and you ain't
going to get udp jumbograms through a page-by-page API, ever.

Cheers,
 Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread Stephen Frost

* Ingo Molnar ([EMAIL PROTECTED]) wrote:
> 
> On Tue, 9 Jan 2001, Stephen C. Tweedie wrote:
> 
> > but it just doesn't apply when you look at some other applications,
> > such as streaming out video data or performing fileserving in a
> > high-performance compute cluster where you are serving bulk data.
> > The multimedia and HPC worlds typically operate on datasets which are
> > far too large to cache, so you want to keep them in memory as little
> > as possible when you ship them over the wire.
> 
> i'd love to first see these kinds of applications (under Linux) before
> designing for them. Eg. if an IO operation (eg. streaming video webcast)
> does a DMA from a camera card to an outgoing networking card, would it be
> possible to access the packet data in case of a TCP retransmit? Basically
> these applications are limited enough in scope to justify even temporary
> 'hacks' that enable them - and once we *see* things in action, we could
> design for them. Not the other way around.

Well, I know I for one use a system that you might have heard
of called 'MOSIX'.  It's a (kinda large) kernel patch with some user-space
tools but allows for migration of processes between machines without
modifying any code.  There are some limitations (threaded applications and
shared memory and whatnot) but it works very well for the rendering work
we use it for.  We use radiance which in general has pretty little inter-
process communication and what it has is done through the filesystem.

Now, the interesting bit here is that the processes can grow to be
pretty large (200M+, up as high as 500M, higher if we let it ;) ) and what
happens with MOSIX is that entire processes get sent over the wire to 
other machines for work.  MOSIX will also attempt to rebalance the load on
all of the machines in the cluster and whatnot so it can often be moving
processes back and forth.

So, anyhow, this is just an fyi if you weren't aware of it that I
believe more than a few people are using MOSIX these days for similar
appliactions and that it's availible at http://www.mosix.org if you're
curious.

Stephen

 PGP signature

Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread Trond Myklebust


> David S Miller <[EMAIL PROTECTED]> writes:

 >I would have thought one of the main interests of doing
 >something like this would be to allow us to speed up large
 >writes to the socket for ncpfs/knfsd/nfs/smbfs/...

 > This is what TCP_CORK/MSG_MORE et al. are all for, things get
 > coalesced perfectly.  Sending in a vector of pages seems nice,
 > but none of the page cache infrastructure works like this, all
 > of the core routines work on a page at a time.  It actually
 > simplifies a lot.

 > The writepage interface optimizes large file writes to a socket
 > just fine.

OK, but can you eventually generalize it to non-stream protocols
(i.e. UDP)?
After all, it doesn't make sense to differentiate between zero-copy on
stream and non-stream sockets, and Linux NFS, at least, remains
heavily UDP-oriented...

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread Stephen C. Tweedie

Hi,

On Tue, Jan 09, 2001 at 11:23:41AM +0100, Ingo Molnar wrote:
> 
> > Having proper kiobuf support would make it possible to, for example,
> > do zerocopy network->disk data transfers and lots of other things.
> 
> i used to think that this is useful, but these days it isnt. It's a waste
> of PCI bandwidth resources, and it's much cheaper to keep a cache in RAM
> instead of doing direct disk=>network DMA *all the time* some resource is
> requested.

No.  I'm certain you're right when talking about things like web
serving, but it just doesn't apply when you look at some other
applications, such as streaming out video data or performing
fileserving in a high-performance compute cluster where you are
serving bulk data.  The multimedia and HPC worlds typically operate on
datasets which are far too large to cache, so you want to keep them in
memory as little as possible when you ship them over the wire.

Cheers,
 Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread Stephen C. Tweedie

Hi,

On Tue, Jan 09, 2001 at 01:04:49PM +0100, Ingo Molnar wrote:
> 
> On Tue, 9 Jan 2001, Christoph Hellwig wrote:
> 
> please study the networking portions of the zerocopy patch and you'll see
> why this is not desirable. An alloc_kiovec()/free_kiovec() is exactly the
> thing we cannot afford in a sendfile() operation. sendfile() is
> lightweight, the setup times of kiovecs are not.
> 
Right.  However, kiobufs can be kept around for as long as you want
and can be reused easily, and even if allocating and freeing them is
more work than you want, populating an existing kiobuf is _very_
cheap.

> another, more theoretical issue is that i think the kernel should not be
> littered with multi-page interfaces, we should keep the one "struct page *
> at a time" interfaces.

Bad bad bad.  We already have SCSI devices optimised for bandwidth
which don't approach decent performance until you are passing them 1MB
IOs, and even in networking the 1.5K packet limit kills us in some
cases and we need an interface capable of generating jumbograms.
Perhaps tcp can merge internal 4K requests, but if you're doing udp
jumbograms (or STP or VIA), you do need an interface which can give
the networking stack more than one page at once.

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread Alan Cox


> Bad bad bad.  We already have SCSI devices optimised for bandwidth
> which don't approach decent performance until you are passing them 1MB
> IOs, and even in networking the 1.5K packet limit kills us in some

Even low end cheap raid cards like the AMI megaraid dearly want 128K writes.
Its quite a difference on them

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread Trond Myklebust

> " " == David S Miller <[EMAIL PROTECTED]> writes:

 > I've put a patch up for testing on the kernel.org mirrors:

 > /pub/linux/kernel/people/davem/zerocopy-2.4.0-1.diff.gz

.

 > Finally, regardless of networking card, there should be a
 > measurable performance boost for NFS clients with this patch
 > due to the delayed fragment coalescing.  KNFSD does not take
 > full advantage of this facility yet.

Hi David,

I don't really want to be chiming in with another 'make it a kiobuf',
but given that you already have written 'do_tcp_sendpages()' why did
you make sock->ops->sendpage() take the single page as an argument
rather than just have it take the 'struct page **'?

I would have thought one of the main interests of doing something like
this would be to allow us to speed up large writes to the socket for
ncpfs/knfsd/nfs/smbfs/...
After all, in both the case of the client WRITE requests and the
server READ responses, we end up with a set of several pages that just
need to be pushed down the network without further ado. Unless I
misunderstood the code, it seems that do_tcp_sendpages() fits the bill
nicely...

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread Ingo Molnar

On Tue, 9 Jan 2001, Stephen C. Tweedie wrote:

> > i used to think that this is useful, but these days it isnt. It's a waste
> > of PCI bandwidth resources, and it's much cheaper to keep a cache in RAM
> > instead of doing direct disk=>network DMA *all the time* some resource is
> > requested.
>
> No.  I'm certain you're right when talking about things like web
> serving, [...]

yep, i was concentrating on fileserving load.

> but it just doesn't apply when you look at some other applications,
> such as streaming out video data or performing fileserving in a
> high-performance compute cluster where you are serving bulk data.
> The multimedia and HPC worlds typically operate on datasets which are
> far too large to cache, so you want to keep them in memory as little
> as possible when you ship them over the wire.

i'd love to first see these kinds of applications (under Linux) before
designing for them. Eg. if an IO operation (eg. streaming video webcast)
does a DMA from a camera card to an outgoing networking card, would it be
possible to access the packet data in case of a TCP retransmit? Basically
these applications are limited enough in scope to justify even temporary
'hacks' that enable them - and once we *see* things in action, we could
design for them. Not the other way around.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: VM subsystem bug in 2.4.0 ?

2001-01-09 Thread Stephen C. Tweedie


Hi,

On Tue, Jan 09, 2001 at 03:53:55PM +0100, Christoph Rohland wrote:
> 
> On Tue, 9 Jan 2001, Stephen C. Tweedie wrote:
> > But again, how do you clear the bit?  Locking is a per-vma property,
> > not per-page.  I can mmap a file twice and mlock just one of the
> > mappings.  If you get a munlock(), how are you to know how many
> > other locked mappings still exist?
> 
> It's worse: The issue we are talking about is SYSV IPC_LOCK.

The issue is locked VA pages.  SysV is just one of the ways in which
it can happen: the solution has got to address both that and
mlock()/mlockall().

> This is a
> per segment thing. A user can (un)lock a segment at any time. But we
> do not have the references to the vmas attached to the segemnts

Why not?  Won't the address space mmap* lists give you this?

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: FS callback routines

2001-01-09 Thread Daniel Phillips


Jesse Pollard wrote:
> Daniel Phillips <[EMAIL PROTECTED]>:
> > This may be the most significant new feature in 2.4.0, as it allows us
> > to take a fundamentally different approach to many different problems.
> > Three that come to mind: mail (get your mail instantly without polling);
> > make (don't rely on timestamps to know when rebuilding is needed, don't
> > scan huge directory trees on each build); locate (reindex only those
> > directories that have changed, keep index database current).  As you
> > noticed, there are many others.
> > ...
> 
> It would also be very nice if the security of the feature could be
> confirmed. The problem with SGI's implementation is that it becomes
> possible to monitor files that you don't own, don't have access to,
> or are not permitted to know even exist.

To receive notification about events in a given directory you have to be
able to open it.  Is this adequate for your needs?

> For these reasons, we have disabled the feature.

It's nice to have that option, isn't it? ;-)

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

1 2 3 4 >

1 - 100 of 367 matches

Mail list logo