Re: unresolved symbols on SPARC with depmod -ae

2001-04-05 Thread David S. Miller


Jeff Layton writes:
 > Anyway here's what I get, should I be concerned about this?
...
 > caladan:~# /sbin/depmod -ae -F /boot/System.map-2.4.2
 > depmod: *** Unresolved symbols in
 > /lib/modules/2.4.2/kernel/drivers/block/loop.o
 > depmod: .div
 > depmod: .urem
 > depmod: .umul
 > depmod: .udiv
 > depmod: .rem
 > depmod: *** Unresolved symbols in

Try to load one of the modules which show the problem, does
it work?  If so, it is a bug in depmod's handling of these
".foo" symbols.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: asm/unistd.h

2001-04-05 Thread David S. Miller


SardaƱ[EMAIL PROTECTED], Eliel writes:
 > I'm taking a look at the linux code and I don't understand how do you
 > programm...mmm (?) may be i'm a stupid why in include/asm/unistd.h in some
 > macros you use this:

Two reasons:

1) Empty statements give a warning from the compiler so
   this is why you see "#define FOO do { } while(0)"
2) It gives you a basic block in which to declare local
   variables.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: asm/unistd.h

2001-04-05 Thread David S. Miller


Steve Grubb writes:
 > It would seem to me that after hearing how the macros are used in practice,
 > wouldn't turning them into inline functions be an improvement? This is
 > something gcc supports, it accomplishes the same thing, and has the added
 > advantage of type checking.
 > http://gcc.gnu.org/onlinedocs/gcc-2.95.3/gcc_4.html#SEC92

Two reasons:

1) Sometimes I don't want any type checking because it would create
   the necessity of adding a new include to a file --> a circular
   dependency to resolve.  Macros hide the types except in the
   cases where they are actually invoked :-)

2) Historically GCC was very bad with code generation with inline
   functions, so at that time the GCC manual statement "inline
   functions are just like a macro" was technically false :-)

   Yes, I know this is much different in today's gcc tree, but
   there hasn't been a gcc release in over 2 years so...

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.3 tcp window id causes problems talking to windows clients

2001-04-06 Thread David S. Miller


Kevin Stone writes:
 > Is there any plan to include the zerocopy patches into the stock kernel? 
 >   The win2k dial-up/window id problem is really a showstopper but hasn't 
 > generated much traffic on lkml or the digests. 

I submitted the patch to Linus, it will likely go into 2.4.4
but if not I'll submit the ID patch seperately.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [CHECKER] __init functions called by non-__init

2001-04-06 Thread David S. Miller


Rusty Russell writes:
 > It's incredibly poor taste, though, and if we ever implement __init
 > dropping for modules (Keith?),

Jakub Jelinek implemented this about 2 years ago, right before
we hit 2.2.x, Linus thought it was too late at the time so
we dropped that work from our trees.

It was really good at finding __init bugs though...

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: goodbye

2001-04-08 Thread David S. Miller


Rik van Riel writes:
 > Anyway, since linux-kernel has chosen to not receive email from me

Funny how this posting went through then...

If it is specifically when you are sending mail from some other place,
state so, don't make blanket statements which obviously are not wholly
true.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [CHECKER] amusing copy_from_user bug

2001-04-10 Thread David S. Miller


Petru Paler writes:
 > On Tue, Apr 10, 2001 at 06:41:28AM -0400, Jakub Jelinek wrote:
 > > some architectures don't care at all, because verify_area is a noop
 > > (sparc64).
 > 
 > Why (and how) is this?

On sparc64, the user lives in an entirely different address space.
The user cannot even generate addresses in kernel space.  Basically,
addresses are prefix'd by an 8-bit tag called an ASI (Address Space
Identifier), which tells the cpu which TLB context to use etc.
When running in user space or accessing user space in kernel mode
we make the cpu use the special userspace ASI.

In fact the user can be given the complete 32-bit or 64-bit virtual
address space, the kernel takes up none of it.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [race][RFC] d_flags use

2001-04-12 Thread David S. Miller


Alexander Viro writes:
 > If nobody objects I'll go for test_bit/set_bit/clear_bit here.

Be sure to make d_flags an unsigned long when you do this! :-)

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [CFT][PATCH] Re: Fwd: Re: memory usage - dentry_cache

2001-04-12 Thread David S. Miller


Alexander Viro writes:
 > OK, how about wider testing? Theory: prune_dcache() goes through the
 > list of immediately killable dentries and tries to free given amount.
 > It has a "one warning" policy - it kills dentry if it sees it twice without
 > lookup finding that dentry in the interval. Unfortunately, as implemented
 > it stops when it had freed _or_ warned given amount. As the result, memory
 > pressure on dcache is less than expected.

The reason the code is how it is right now is there used to be a bug
where that goto spot would --count but not check against zero, making
count possibly go negative and then you'd be there for a _long_ time
:-)

Just a FYI...

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Kernel 2.5 Workshop RealVideo streams -- next time, please get better audio.

2001-04-16 Thread David S. Miller


Miles Lane writes:
 > There is one major shortcoming of the recordings.
 > Usually, only the comments of the presenter(s)
 > can be heard.

The problem is that nobody wants to wait for one of the microphones to
go across the entire room before they can begin speaking, this is what
was happening.  Sometimes there was a dialogue going on between three
people sitting at tables, there were 2 microphones to go around...

One solution I've seen sort of work is to have 2 standing fixed
microphones in the isles, but this only really functions correctly
for a Q&A type session after a presentation.

It does not work in a relaxed "people sit at tables and comment
at arbitrary points in time during a talk" setting such as the
kernel summit.  Besides putting a microphone at every table (which
isn't all that practical honestly) I can't come up with a solution.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Possible problem with zero-copy TCP and sendfile()

2001-04-17 Thread David S. Miller


Jesse S Sipprell writes:
 > A patch will be coming out soon, as it is a fairly trivial fix.

Thank you for tracking this down.

One more subtle note, for the case of error handling.  There is a
change to sendfile() in the zerocopy patches which causes sendfile()
to act more like sendmsg() when errors occur.

Specifically, sendmsg() works roughly like the following when an
error happens:

handle_error:
if (sent_something)
return how_much_we_sent;
else
return ERROR_CODE;

So when an error happens, and the kernel was able to send some of
the data, you see something like this in the trace:

sendmsg() = N
...
sendmsg() = ERROR_CODE

sendfile() used to act differently, and this made it difficult to
directly transform a sendmsg()+local_buffer based server into a
sendfile() one because the error handling was so different.

Previously, sendfile() wouldn't give you the partial transfer length,
you'd just get the error _regardless_ of whether any data was sent
successfully during that call.  Alexey, myself, and others considered
this behavior bogus and inconsistent.  So it was changed.

The long and short of it is that sendfile() now acts just like
sendmsg() when errors happen mid-send.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Possible problem with zero-copy TCP and sendfile()

2001-04-17 Thread David S. Miller


Jesse S Sipprell writes:
 > On error, -1 is returned in the usual fashion and offset is purported to be
 > updated to point to the next byte following the last one sent.
 > 
 > Will the zerocopy patches break this?

No, they should not.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Very bad behavior of kswapd

2001-04-18 Thread David S. Miller


Rik van Riel writes:
 > > Watch top: when this program needs the memory that kswapd keep
 > > in cache they go both at 100% cpu (on SMP) but still the size of
 > > the program only grows at about 100KB/s, why is kswapd releasing
 > > it so slowly and taking so much CPU ?
 > 
 > Because kswapd still has to scan all the (unfreeable) memory
 > of the big process to determine it isn't freeable.

This is not the only badly performing case actually.  When one of ones
swap partitions gets close to full, kswapd basically sits endlessly in
get_swap_page() due to all the broken linear scan algorithms.  I've
tried to fix some of this with the patch below.

This may not be the case Laurent is seeing but it is a problem that
needs fixing.

--- ../vanilla/linux/include/linux/swap.h   Fri Apr 13 17:08:16 2001
+++ include/linux/swap.hSat Apr 14 00:10:02 2001
@@ -43,8 +43,22 @@
 
 #define SWAP_CLUSTER_MAX 32
 
-#define SWAP_MAP_MAX   0x7fff
-#define SWAP_MAP_BAD   0x8000
+#define SWAPFILE_CLUSTER 256
+
+struct swap_cluster_struct {
+   struct list_headlist;
+   int nr_free;/* 0 --> SWAPFILE_CLUSTER */
+   unsigned intstart_offset;
+};
+
+#define SWAP_MAP_MAX   0x7fff
+#define SWAP_MAP_BAD   0x8000
+
+struct swap_map_struct {
+   struct list_headlist;
+   unsigned intoffset;
+   unsigned intcount;
+};
 
 struct swap_info_struct {
unsigned int flags;
@@ -52,11 +66,15 @@
spinlock_t sdev_lock;
struct dentry * swap_file;
struct vfsmount *swap_vfsmnt;
-   unsigned short * swap_map;
-   unsigned int lowest_bit;
-   unsigned int highest_bit;
-   unsigned int cluster_next;
-   unsigned int cluster_nr;
+   struct swap_map_struct * swap_maps;
+   struct list_head swap_map_free_list;
+
+   struct swap_cluster_struct * swap_clusters;
+   struct list_head swap_cluster_free_list;
+
+   struct swap_cluster_struct * curr_cluster;
+   unsigned int cluster_offset_next;
+
int prio;   /* swap priority */
int pages;
unsigned long max;
--- ../vanilla/linux/mm/swapfile.c  Thu Mar 22 09:22:15 2001
+++ mm/swapfile.c   Sat Apr 14 01:07:40 2001
@@ -24,62 +24,60 @@
 
 struct swap_info_struct swap_info[MAX_SWAPFILES];
 
-#define SWAPFILE_CLUSTER 256
-
-static inline int scan_swap_map(struct swap_info_struct *si, unsigned short count)
+static unsigned int scan_swap_map(struct swap_info_struct *si, unsigned int count)
 {
-   unsigned long offset;
-   /* 
-* We try to cluster swap pages by allocating them
-* sequentially in swap.  Once we've allocated
-* SWAPFILE_CLUSTER pages this way, however, we resort to
-* first-free allocation, starting a new cluster.  This
-* prevents us from scattering swap pages all over the entire
-* swap partition, so that we reduce overall disk seek times
-* between swap pages.  -- sct */
-   if (si->cluster_nr) {
-   while (si->cluster_next <= si->highest_bit) {
-   offset = si->cluster_next++;
-   if (si->swap_map[offset])
-   continue;
-   si->cluster_nr--;
-   goto got_page;
-   }
-   }
-   si->cluster_nr = SWAPFILE_CLUSTER;
+   struct swap_map_struct *map;
+   struct list_head *head, *tmp;
 
-   /* try to find an empty (even not aligned) cluster. */
-   offset = si->lowest_bit;
- check_next_cluster:
-   if (offset+SWAPFILE_CLUSTER-1 <= si->highest_bit)
-   {
-   int nr;
-   for (nr = offset; nr < offset+SWAPFILE_CLUSTER; nr++)
-   if (si->swap_map[nr])
-   {
-   offset = nr+1;
-   goto check_next_cluster;
-   }
-   /* We found a completly empty cluster, so start
-* using it.
+   /* Any swap entries left at all? */
+   if (list_empty(&si->swap_map_free_list))
+   return 0;
+
+get_from_cluster:
+
+   /* Currently allocating from a cluster? */
+   if (si->curr_cluster != NULL) {
+   struct swap_cluster_struct *cluster = si->curr_cluster;
+   unsigned int offset = si->cluster_offset_next;
+
+   /* Note that this test cannot be made with cluster->nr_free
+* because it is possible for a swap entry to be freed before
+* we are done allocating from this cluster.
 */
-   goto got_page;
+   if (si->cluster_offset_next++ == SWAPFILE_CLUSTER)
+   si->curr_cluster = NULL;
+
+   cluster->nr_free--;
+
+   map = &si->swap_maps[offset];
+   goto finish_alloc;
}
-   /* No luck, so now go finegrined

Re: [PATCH] IP forwarded checksum, kernel 2.2.18-19

2001-04-18 Thread David S. Miller


Martin Gadbois writes:
 > Hi there!
 > I realized that some tests were failing due to dropped IP packets. I
 > traced and discovered the following:

Thanks, I've put your patch into my 2.2.x source and will
push this to Alan once he starts doing 2.2.20pre patches.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



RE: Let init know user wants to shutdown

2001-04-18 Thread David S. Miller


Grover, Andrew writes:
 > IMHO an abstracted interface at this point is overengineering.

ACPI is the epitome of overengineering.

An abstracted interface would allow simpler systems to avoid all of
the bloated garbage ACPI brings with it.  Sorry, Alan hit it right on
the head, ACPI is not much more than keeping speedstep proprietary.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: sk->state_chage is not called for listening sockets

2001-04-19 Thread David S. Miller


Pete Zaitcev writes:
 > With that in mind, would the following chage have any ill effects?
 > It does not seem to break anything obvious, but I am worried about
 > a performance degradation for some retarded benchmark.
 > 
 > diff -u -U 4 linux-2.4.3/net/ipv4/tcp_input.c linux-2.4.3-nfs/net/ipv4/tcp_input.c
 > --- linux-2.4.3/net/ipv4/tcp_input.c Fri Feb  9 11:34:13 2001
 > +++ linux-2.4.3-nfs/net/ipv4/tcp_input.c Thu Apr 12 23:23:59 2001

I've applied this patch, thanks.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] generic rw_semaphores, compile warnings patch

2001-04-19 Thread David S. Miller


D.W.Howells writes:
 > This patch (made against linux-2.4.4-pre4) gets rid of some warnings obtained 
 > when using the generic rwsem implementation.

Have a look at pre5, this is already fixed.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ANNOUNCE New Open Source X server

2001-04-18 Thread David S. Miller


James Simmons writes:
 > The Linux GFX project grew out the need for a higher performance X
 > server that has a much faster developement cycle. In the last few years
 > the graphics card and multimedia environments have grow at such a rate 
 > the current X solutions can no longer keep pace nor do they focus on
 > producing high performance X servers specifically for linux. Also the
 > community has demanded for specific functionality which has never come to
 > light. 

And this specific functionality is?

I think this is not a worthwhile project at all.  The X tree, it's
assosciated protocols and APIs, are complicated enough as it is, and
the xfree86 project has some of the most talented and capable people
in this area.  It would be a step backwards to do things outside of
xfree86 development.

If the issue is that "things don't happen fast enough in the xfree86
tree", why not lend them a hand and submitting patches to them instead
of complaining?

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] generic rw_semaphores, compile warnings patch

2001-04-20 Thread David S. Miller


David Howells writes:
 > There's also a missing "struct rw_semaphore;" declaration in linux/rwsem.h. It
 > needs to go in the gap below "#include ". Otherwise the
 > declarations for the contention handling functions will give warnings about
 > the struct being declared in the parameter list.

Indeed, I didn't see this in my setup on sparc64 for some reason.

Later,
David S. Miller
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Longstanding elf fix (2.4.3 fix)

2001-04-22 Thread David S. Miller


Eric W. Biederman writes:
 > In building a patch for 2.4.3 I also discovered that we are not taking 
 > the mmap_sem around do_brk in the exec paths.

Does that really matter?  Who else can get at the address space?  We
are a singly referenced address space at that point... perhaps ptrace?

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: All architecture maintainers: pgd_alloc()

2001-04-22 Thread David S. Miller


Russell King writes:
 > There are various options here:
 > 
 > 1. Either I can fix up all architectures, and send a patch to this list, or

Fixup all the architectures and send this and the ARM bits to Linus.

I really would wish folks would not choose Alan as the first place
to send the patch.  I'm not directly accusing anyone of it, but it
does appear that often AC is used as a "back door" to get a change
in.  While this scheme most of the time, often it unnecessarily
overworks Alan which I think is unfair.

Sending it to Linus first also eliminates 2 levels of indirection
each time Linus wants something done differently in the change.

person --> alan --> linus --> needs change

alan BCC's person, person codes new version

person --> alan --> linus --> etc. etc.

Sure Alan could fix it up himself, but...

My main point is that for changes like this, sending stuff to Alan
first is often an ineffective mechanism.  If someone were to reply to
this "Linus is hard to push changes too, or takes too long" my reply
is "if this is really the problem, should the burdon should be
entirely placed on Alan's shoulders?"

The AC patches are huge, but they have substantially decreased in size
during the recent 2.4.4-preX series.  And sure, Alan makes conscious
decisions to apply patches and eventually work to push them to Linus,
but honestly people should consider ways to help decrease his load.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Linux 2.4.3ac13

2001-04-23 Thread David S. Miller


Alan Cox writes:
 > 2.4.3-ac13
 > oSwitch to NOVERS symbols for rwsem  (me)
 >  | Called from asm blocks so they can't be versioned

Yes they most certainly can be versioned inside of an asm.  Use the
"i" constraint, we've been doing this on sparc64 for ages.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.2.19pre10 doesn't compile on alphas (sunrpc)

2001-02-12 Thread David S. Miller


Alan Cox writes:
 > I suspect adding 
 > 
 > #define BUG()  __asm__ __volatile__("call_pal 129 # bugchk")
 > 
 > to include/asm-alpha/page.h will do the right thing, since it works on 2.4

You have to add a few bits to arch/alpha/kernel/traps.c
I could be wrong though...

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://vger.kernel.org/lkml/



Re: [UPDATE] zerocopy patch against 2.4.2-pre2

2001-02-12 Thread David S. Miller


Andrew Morton writes:
 > Changing the memory copy function did make some difference
 > in my setup.  But the performance drop on send(8k) is only approx 10%,
 > partly because I changed the way I'm testing it - `cyclesoak' is
 > now penalised more heavily by cache misses, and amount of cache
 > missing which networking causes cyclesoak is basically the same,
 > whether or not the ZC patch is applied.

Ok ok ok, but are we at the point where there are no sizable "over the
wire" performance anomalies anymore?  That is what is important, what
are the localhost bandwidth measurements looking like for you now
with/without the patch applied?

I want to reach a known state where we can conclude "over the wire is
about as good or better than before, but there is a cpu/cache usage
penalty from the zerocopy stuff".

This is important.  It lets us get to the next stage which is to
use your tools, numbers, and some profiling to see if we can get
some of that cpu overhead back.

Later,
David S. Miller
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[UPDATE] zerocopy + powder rule

2001-02-12 Thread David S. Miller


The only change is to update things to 2.4.2-pre3:

ftp://ftp.kernel.org/pub/linux/kernel/people/davem/zerocopy-2.4.2p3-1.diff.gz

All the reports I am getting now appear to be consistent,
and they all basically show me that:

1) There are no known bugs (as in things that crash the
   kernel or corrupt data)

2) The loopback etc. raw performance anomalies have been
   killed by the P-II Mendocino unaligned memcpy workaround.

3) The acenic/gbit performance anomalies have been cured
   by reverting the PCI mem_inval tweaks.

4) The zerocopy patches have a small yet non-neglible
   cpu usage cost for normal write/send/sendmsg.

If this truly is the current state of affairs, then I am
pretty happy as this is where I wanted things to be when
I first began to publish these zerocopy diffs.  The next
step is to begin profiling things heavily to see if we
can back some of that extra cpu usage the pages SKBs
afford us.

Due to the powder rule (Lake Tahoe received 6 or so feet of snow this
past weekend) I will be a bit quiet until Friday night.  However, I'll
be doing my own profiling of the zerocopy stuff on my laptop while I'm
up there.

Later,
David Snowboard Miller
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] starfire reads irq before pci_enable_device.

2001-02-17 Thread David S. Miller


Jeff Garzik writes:
 > And in another message, On Mon, 12 Feb 2001, David S. Miller wrote:
 > > 3) The acenic/gbit performance anomalies have been cured
 > >by reverting the PCI mem_inval tweaks.
 > 
 > 
 > Just to be clear, acenic should or should not use MWI?
 > 
 > And can a general rule be applied here?  Newer Tulip hardware also
 > has the ability to enable/disable MWI usage, IIRC.

I think this is an Acenic specific issue.  The second processor on the
Acenic board is only there to work around bugs in their DMA
controller.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: MTU and 2.4.x kernel

2001-02-18 Thread David S. Miller


[EMAIL PROTECTED] writes:
 > A. Datagram protocols do not work with mtus not allowing to send
 >512 byte frames (even DNS).

This smells bad.  Datagram protocol send sizes are only limited by
socket buffer size, nothing more.  Fragmentation makes it work.

If you are really talking about side effects of UDP path-mtu, then I
will turn off UDP path-mtu by default in 2.4.x because it is obviously
very broken either conceptually or in our implementation. :-)

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[UPDATE] Zerocopy BETA 1, against 2.4.2-pre4

2001-02-18 Thread David S. Miller


I'm calling this "BETA 1" because I currently feel that all
performance and other issues have been addressed and that the
patch is up for serious consideration for inclusion into a
future 2.4.x release:

ftp://ftp.kernel.org/pub/linux/kernel/people/davem/zerocopy-2.4.2p4-1.diff.gz

Besides merging to 2.4.2-pre4 the main change in this release is
a totally revamped paged-SKB sendmsg implementation by Alexey.
I truly believe now that bandwidth/latency is back to where we
were before the zerocopy patches, and preliminary testing done
by Andrew Morton supports this.  (actually, in my own testing,
latency over loopback seems to have improved)

Some verbose TCP debugging is enabled in this release, most of
the messages are harmless %99 of the time.  If these messages
bother you just set "FASTRETRANS_DEBUG" back to "1" in
include/net/tcp.h

Thanks.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))

2001-02-21 Thread David S. Miller


Ookhoi writes:
 > We have exactly the same problem but in our case it depends on the
 > following three conditions: 1, kernel 2.4 (2.2 is fine), 2, windows ip
 > header compression turned on, 3, a free internet access provider in
 > Holland called 'Wish' (which seemes to stand for 'I Wish I had a faster
 > connection').
 > If we remove one of the three conditions, the connection is oke. It is
 > only tcp which is affected.
 > A packet on its way from linux server to windows client seems to get
 > dropped once and retransmitted. This makes the connection _very_ slow.

:-( I hate these buggy systems.

Does this patch below fix the performance problem and are the windows
clients win2000 or win95?

--- include/net/ip.h.~1~Mon Feb 19 00:12:31 2001
+++ include/net/ip.hWed Feb 21 02:56:15 2001
@@ -190,9 +190,11 @@
 
 static inline void ip_select_ident(struct iphdr *iph, struct dst_entry *dst)
 {
+#if 0
if (iph->frag_off&__constant_htons(IP_DF))
iph->id = 0;
else
+#endif
__ip_select_ident(iph, dst);
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Problem with 2.2.19pre9 (Connection closed.)

2001-02-21 Thread David S. Miller


Alan Cox writes:
 > Dave - any ideas, shall we back it out and work on it for 2.2.20 ?

The one change which is probably causing this is non-critical,
so let me study things quickly tonight and if I come up with
nothing I'll show you what you can revert safely.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))

2001-02-21 Thread David S. Miller


Jordan Mendelson writes:
 > Now, if it didn't have the side effect of dropping packets left and
 > right after ~4000 open connections (simultaneously), I could finally
 > move our production system to 2.4.x.

There is no reason my patch should have this effect.

All of this is what appears to be a bug in Windows TCP header
compression, if the ID field of the IPv4 header does not change then
it drops every other packet.

The change I posted as-is, is unacceptable because it adds unnecessary
cost to a fast path.  The final change I actually use will likely
involve using the TCP sequence numbers to calculate an "always
changing" ID number in the IPv4 headers to placate these broken
windows machines.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[UPDATE] Zerocopy BETA 2 against 2.4.2 final.

2001-02-21 Thread David S. Miller


Usual place:

ftp://ftp.kernel.org/pub/linux/kernel/people/davem/zerocopy-2.4.2-1.diff.gz

Besides merging to the 2.4.2-final release there are two bug fixes:

1) New TCP receive queue collapser could trigger assertion failures
   in tcp_recvmsg(), reason: uninitialized skb->used field in fresh
   SKB allocated for collapsing.

2) IP header IDs are generated differently on big vs. little endian
   systems, added htons() to fix.

Some have asked why this isn't pushed to Alan for his AC patches yet,
the reason is that I want to fully resolve the final few performance
issues that remain (1.5K mtu on gbit still has some warts).  Once
those are cleared and everyone involved is satisfied that there are no
performance regressions against vanilla 2.4.2, I will ask Alan to
consider including it.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] drivers/net/sunhme.c, unbalanced and unchecked ioremap()

2001-02-22 Thread David S. Miller


Andrey Panin writes:
 > I found that sunhme.c doesn't check ioremap() return value and doesn't
 > call iounmap() on module unload. Attached patch (for 2.4.1-ac20) should fix it, 
 > compiles clearly, but untested (I have no such hardware).

Thanks I've applied this patch.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Patch] 2.4.2: af_unix.c warnings

2001-02-22 Thread David S. Miller


Russell King writes:
 > The following patch fixes these warnings:

Thanks, applied.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ipv4: 2.4.2: unused static variables

2001-02-22 Thread David S. Miller


Russell King writes:
 > With CONFIG_SYSCTL=n, I get the following warnings:
 > 
 > sysctl_net_ipv4.c:50: warning: `tcp_retr1_max' defined but not used
 > sysctl_net_ipv4.c:52: warning: `ip_local_port_range_min' defined but not used
 > sysctl_net_ipv4.c:53: warning: `ip_local_port_range_max' defined but not used
 > 
 > These are defined static in sysctl_net_ipv4.c, and appear to only be
 > exported via procfs.  In other words, you can set them to whatever you
 > like and the IPv4 stack couldn't care less.
 > 
 > Why do we have them?  If they're not used, can we either eliminate them,
 > or else move their definition within the '#ifdef CONFIG_SYSCTL' to
 > eliminate the warning?

They aren't set to anything because they are not sysctl
"values", they are sysctl "limits".  Ie. they tell the sysctl
layer what legal range the user's setting of a particular sysctl
must reside in.

The fix is to enclose these things in CONFIG_SYSCTL, which I have
done in my tree, thanks for bringing this to my attention.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[UPDATE] zerocopy BETA 3

2001-02-22 Thread David S. Miller


Usual spot:

ftp://ftp.kernel.org/pub/linux/kernel/people/davem/zerocopy-2.4.2-2.diff.gz

Changes since last installment:

1) More errors in TCP receive queue collapser are discovered and
   fixed.
2) Several URG handling details on receive side are made more
   consistent and sane.
3) Workaround for win2000/95 VJ header compression bugs is
   implemented.
4) Update to latest 3c59x driver from Andrew, this should cure some
   link type detection problems.
5) IP conntrack fix from Rusty.

Please test, to my knowledge the only issue remaining now are the
gbit performance issues, which are being discussed by Pekka and
Alexey.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



A plea for help, forwarded message from postmaster@morotsmedia.se

2001-02-23 Thread David S. Miller


Unless someone can tell me who is the recipient on the linux-kernel
list generating these bogus virus bounces back to me, I am going
to have no choice but to unsubscribe the entire *.se domain to
try and get rid of this guy.

Thanks.




Your mail was recieved, but looked like it might contain a virus and was
not delivered.

Please do not respond to this mail, it is only an autoreply.




Re: A plea for help, forwarded message from postmaster@morotsmedia.se

2001-02-23 Thread David S. Miller


Mohammad A. Haque writes:
 > >From autoreplay headers...
 >  Message-Id: <[EMAIL PROTECTED]>
 >  From: [EMAIL PROTECTED]
 >  Sender: [EMAIL PROTECTED] 
 > 
 > Other posts from jborg...
 >  From: Jakob Borg <[EMAIL PROTECTED]>
 >  ..
 >  -- 
 >  Jakob Borgmailto:[EMAIL PROTECTED]   (personal)
 >  UNIX/network adminmailto:[EMAIL PROTECTED](development)
 >  systems programmermailto:[EMAIL PROTECTED]   (work)
 >http://jakob.borg.pp.se/

Thanks a lot, he's gone.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: possible bug x86 2.4.2 SMP in IP receive stack

2001-02-25 Thread David S. Miller


Sounds like a bug wrt. SKB allocations in the Myrinet driver.

You're the author of most of that code, so I'm sure you're the
best one to audit it :-)

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



RE: UDP attack? How to suppress kernel msgs?

2001-02-25 Thread David S. Miller


This should fix your problem:

--- include/net/sock.h.~1~  Thu Feb 22 21:12:12 2001
+++ include/net/sock.h  Sun Feb 25 21:26:16 2001
@@ -1279,7 +1279,7 @@
  * Enable debug/info messages 
  */
 
-#if 0
+#if 1
 #define NETDEBUG(x)do { } while (0)
 #else
 #define NETDEBUG(x)do { x; } while (0)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [UPDATE] zerocopy BETA 3

2001-02-25 Thread David S. Miller


Chris Wedgwood writes:
 > --- linux-2.4.2/include/net/ip.h Sun Feb 25 01:15:19 2001
 > +++ linux-2.4.2+zc-2/include/net/ip.hSun Feb 25 01:53:52 2001

You need to part that adds "id" to the sock struct too.
This won't build "as-is".

Besides, I'd like people to have to test the zerocopy stuff
for me, they'll get the ID fix if they do that :-)

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: New net features for added performance

2001-02-26 Thread David S. Miller


Jeff Garzik writes:
 > 1) Rx Skb recycling.
 ...
 > Advantages:  A de-allocation immediately followed by a reallocation is
 > eliminated, less L1 cache pollution during interrupt handling. 
 > Potentially less DMA traffic between card and host.
 ...
 > Disadvantages?

It simply cannot work, as Alexey stated, in normal circumstances
netif_rx() queues until the user reads the data.  This is the whole
basis of our receive packet processing model within softint/user
context.

Secondly, I can argue that skb recycling can give _worse_ cache
performance.  If the next use and access by the card to the
skb data is deferred, this gives the cpu a chance to displace those
lines in it's cache naturally via displacement instead of being forced
quickly to do so when the device touches that data.

If the device forces the cache displacement, those cache lines become
empty until filled with something later (smaller utilization of total
cache contents) whereas natural displacement puts useful data into
the cache at the time of the displacement (larger utilization of total
cache contents).

It is an NT/windows driver API rubbish idea, and it is full crap.

 > 2) Tx packet grouping.
 ...
 > Disadvantages?

See Torvalds vs. world discussion on this list about API entry points
which pass multiple pages at a time versus simpler ones which pass
only a single page at a time. :-)

 > 3) Slabbier packet allocation.
 ...
 > Disadvantages?  Doing this might increase cache pollution due to
 > increased code and data size, but I think the hot path is much improved
 > (dequeue a properly sized, initialized, skb-reserved'd skb off a list)
 > and would help mitigate the impact of sudden bursts of traffic.

I don't know what I think about this one, but my hunch is that it will
lead to worse data packing via such an allocator.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [UPDATE] zerocopy.. While working on ip.h stuff

2001-02-26 Thread David S. Miller


Benjamin C.R. LaHaise writes:
 > Since the ip header fits in the cache of some CPUs (like the P4),
 > this becoming a cheaper operation than ever before.

At gigapacket rates, it becomes an issue.  This guy is talking about
tinkering with new IP _options_, not just the header.  So even if the
IP header itself fits totally in a cache line, the options afterwardsd
likely will not and thus require another cache miss.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [UPDATE] zerocopy.. While working on ip.h stuff

2001-02-26 Thread David S. Miller


Michael Peddemors writes:
 > A few things.. why is ip.h not part of the linux/include/net rather than 
 > linux/include/linux hierachy?

Exported to older userlands...

 > Defined items that are not used anywhere in the source..
 > Can any of them be deleted now?
 > 

So what, userland makes use of them :-)

 > Also, I was looking into some RFC 1812 stuff. (Thanks for nothing Dave :) and 
 > was looking at 4.2.2.6 where it mentions that a router MUST implement the End 
 > of Option List option..  Havent' figured out where that is implememented yet..

egrep "IPOPT_END" net/ipv4/ip_options.c

You just aren't looking hard enough.

 > Also was trying to figure out some things. 
 > I want to create a new ip_option for use in some DOS protection experiments.
 > I have a whole 40 bytes (+/-) to share...  Now although I don't see anything 
 > explicitly prohibiting the use of unused IP Header option space, I know that 
 > it really was designed for use by the sending parties, and not routers in 
 > between.. Has anyone seen any RFC that explicitly says I MUST NOT?

Not to my knowledge.  Routers already change the time to live field,
so I see no reason why they can't do smart things with special IP
options either (besides efficiency concerns :-).

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: New net features for added performance

2001-02-26 Thread David S. Miller


Andi Kleen writes:
 > 4) Better support for aligned RX by only copying the header

Andi you can make this now:

1) Add new "post-header data pointer" field in SKB.
2) Change drivers to copy into aligned headroom as
   you mention, and they set this new post-header
   pointer as appropriate.  For normal drivers without
   alignment problem, generic code sets the pointer up
   just like it does the rest of the SKB header pointers
   now.
3) Enforce correct usage of it in all the networking :-)

I would definitely accept such a patch for the 2.5.x
series.  It seems to be a nice idea and I currently see
no holes in it.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))

2001-02-26 Thread David S. Miller


Simon Kirby writes:
 > Has such a patch gone in to the kernel yet?

Yep, it is in both the zerocopy and AC patches. (Linus is
away at the moment)

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: New net features for added performance

2001-02-26 Thread David S. Miller


Jeff Garzik writes:
 > I only want to know if more are coming, not actually pass multiples..

Ok, then my only concern is that the path from "I know more is coming"
down to hard_start_xmit invocation is long.  It would mean passing a
new piece of state a long distance inside the stack from SKB origin to
device.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: New net features for added performance

2001-02-26 Thread David S. Miller


Andi Kleen writes:
 > Or did I misunderstand you?

What is wrong with making methods, keyed off of the ethernet protocol
ID, that can do the "I know where/how-long headers are" stuff for that
protocol?  Only cards with the problem call into this function vector
or however we arrange it, and then for those that don't have these
problems at all we can make NULL a special value for this
"post-header" pointer.

You can pick some arbitrary number, sure, that is another way to
do it.  Such a size would need to be chosen very carefully though.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: RFC: vmalloc improvements

2001-02-26 Thread David S. Miller


Reto Baettig writes:
 > The RPC server needs lots of 2MB receive buffers which are
 > allocated using vmalloc because the NIC has its own pagetables.

Why not just allocate the page seperately and keep track of
where they are, since the NIC has all the page tabling facilities
on it's end, the cpu side is just a software issue.  You can keep
an array of pages how ever large you need to keep track of that.

vmalloc() was never meant to be used on this level and doing
so is asking for trouble (it's also deadly expensive on SMP due
to the cross-cpu tlb invalidates using vmalloc() causes).

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Linux 2.4.1 network (socket) performance

2001-02-26 Thread David S. Miller


Richard B. Johnson writes:
 > > unix socket sends eat into memory reserved for atomic allocs.

OK (Manfred is being quoted here, to be clear).

I'm still talking with Alexey about how to fix this, I might just
prefer killing this fallback mechanism of skb_alloc_send_skb then
make AF_UNIX act just like everyone else.

This was always just a performance hack, and one which makes less
and less sense as time goes on.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: rsync over ssh on 2.4.2 to 2.2.18

2001-02-27 Thread David S. Miller


Russell King writes:
 > Please note: although I am using 2.2.15pre13, it is _not_ the cause of
 > this problem

How do you know this?  There are so many deadly TCP bugs fixed
since 2.2.15pre13 I don't know how you can assert this.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: rx_copybreak value for non-i386 architectures

2001-02-27 Thread David S. Miller


Jun Sun writes:
 > I notice that many net drivers set rx_copybreak to 1518 (the max packet size)
 > for non-i386 architectures.  Once I thought I understood it and it seems
 > related to cache line alignment.  However, I am not sure exactly about the
 > reason now.  Can someone enlighten me a little bit?

Most non-x86 architectures take a large hit for unaligned accesses.
If the ethernet chip cannot land the beginning of the packet at an
arbitrary byte offset (a modulo 2 offset for ethernet is needed for an
aligned IP header) then the rx_copybreak is set to the ethernet MTU
so that all packets get copied into new buffers where they can have
their header aligned.

Later,
David S. Miller
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] pci_dma_set_mask()

2001-02-28 Thread David S. Miller


Zach Brown writes:
 > > extremely minor nit that I think pci_set_dma_mask should return ENODEV
 > > or EIO or something on error, and zero on success.
 > 
 > I agree, though I'd like to leave the decision up to people who live and
 > breathe this stuff.
 > 
 > please feel free to make minor adjustments and submit :)

Jeff/Zach, I agree, I'm fully for such a patch, but please update the
documentation!  It is the most important part of the patch.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: The IO problem on multiple PCI busses

2001-03-01 Thread David S. Miller


Dan Malek writes:
 > "David S. Miller" wrote:
 > 
 > > I played around with something akin to this, and some of the necessary
 > > Xfree86-4.0.x hackery needed, some time ago.  But I never finished
 > > this.
 > 
 > Sounds pretty sweet.  How about we finish it?  Any complaints (well
 > reasonable ones :-) or concerns that came out of discussions or
 > your testing we need to consider?

There is only one sticking point, and that is how to convey to the
mmap() call whether you want I/O or Memory space.  In the end, my
analysis came up with basically an ioctl() on the same PCI device
node to set this, and you could keep track of this state in the
filp private area.

I thought originally you could do this with the lower bits of the
mmap() offset, but that won't work in 2.4.x because they are stripped
out and you only get a page number by the time the driver mmap
call runs.

I really like this solution because it does not involve any new
syscalls to be added to glibc and/or the Xfree86 arch/os specific
code.  Just opening files, mmap, and an ioctl number or two.  All
of this can be shared between ports.

As a side note, Alpha has a special PCI syscall to get the "PCI
controller number" a given PCI device is behind.  We could add
another ioctl number which does the same thing on /proc/bus/pci/*/*
nodes.  This way sparc64 and Alpha could have the same user visible
API for this as well.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Kernel is unstable

2001-03-01 Thread David S. Miller


Andrea Arcangeli writes:
 > If it happened to be buggy it didn't looked unfixable from a design standpoint
 > and I think it was a very worthwhile feature, not just for memory but also to
 > avoid growing the size of the avl that we would have to pay later all the time
 > at each page fault.

Linus didn't find it to be such a gain, and in fact the one
place that does gain from such merging (sys_brk()) does the
merging by hand :-)

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: The IO problem on multiple PCI busses

2001-03-01 Thread David S. Miller


Benjamin Herrenschmidt writes:
 > Also, the problem of finding where the legacy ISA IOs of a given PCI bus
 > are is a bit different that simply mmap'ing a BAR. Some video cards
 > require some access to their VGA IOs without having a BAR covering them,
 > in some case it's necessary to switch the chip from VGA to MMIO mode.

Many platforms, sparc64 included, do not have an ISA IO space nor do
they provide VGA accesses at all.

If things such as XFree86 are coded for such platforms to not require
VGA accesses (the 'ati' driver is already like this when certain
build time defines are set), this could become a non-issue in this
case.

 > So what would be a preferred way ? Create that fake ISA bus number and
 > provide functions for looking them up, getting their IO and mem bases,
 > and eventually mapping PCI busses to ISA busses ? Or does someone have a
 > better idea ? The goal is to try not to change the semantics of inb/outb
 > and friends so that most legacy drivers can still work using the
 > "default" IO bus if they are not upgraded to the new scheme.

There is no 'fake' ISA bus number you need.  There is a 'real' one,
the one on which the PCI-->ISA bridge lives, why not use that one
:-)

Then you could find such an ISA bridge, open that PCI device, then
finally perform the PCI_IOCTL_GETIOBASE thingy on it, but I don't like
this get-iobase idea at all, see my next email in this thread for why.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] pci_set_dma_mask() + doc :)

2001-03-01 Thread David S. Miller


Zach Brown writes:
 > please feel free to flame or apply, I'm not sure I'm really fond of the
 > code example..

Seems fine to me.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: The IO problem on multiple PCI busses

2001-03-01 Thread David S. Miller


Benjamin Herrenschmidt writes:
 > I'm, of course open to any comments about this (in fact, I'd really like
 > some feedback). One thing is that we also need to find a way to pass
 > those infos to userland. Currently, we implement an arch-specific syscall
 > that allow to retreive the IO physical base of a given PCI bus. That may
 > be enough, but we may also want something that match more closely what we
 > do in the kernel.

Same problem on sparc64.  Using a special PCI syscall is fine, _if_ we
all end up using the same one.  However, I would prefer another
mechanism...

I think a cleaner scheme is to allow mmap() on
/proc/bus/pci/${BUS}/${DEVICE} nodes, that is much cleaner and solves
transparently any "different word size between userland and kernel"
issues (specifically 32-bit userlands executing on 64-bit kernels).

I played around with something akin to this, and some of the necessary
Xfree86-4.0.x hackery needed, some time ago.  But I never finished
this.

Later,
David S. Miller
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: The IO problem on multiple PCI busses

2001-03-01 Thread David S. Miller


Dan Malek writes:
 > It actually caused me to think of something elseI have cards
 > with multiple memory and I/O spaces (rare, but I have them).

So what?  All such bar's within mem/io space are part of unique
regions of the total MEM/IO space.

Thus you can pass non-conflicting offset/size pairs, based upon the
BAR value of interest, to mmap and everything is fine.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: The IO problem on multiple PCI busses

2001-03-01 Thread David S. Miller


Grant Grundler writes:
 > A nice side effect of this bloat is it will discourage use of I/O
 > Port space. That's good for everyone, AFAICT. (I know some devices
 > *only* support I/O port space and I personnally don't care about
 > them. If someone who does care about one wants to talk to me about
 > it...fine...I'll help)

There is another case you are ignoring.  Some devices support memory
space as well as I/O space, but only operate reliably when their
I/O space window is used to access it.

It just sounds to me like the hppa pci controllers are crap,
especially the GSC one.  At least the rope one does something
reasonable when you have a 64-bit kernel.  The horrors you've told me
about the IOMMUs and stream-caches on these chips further confirms my
theory :-)

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: The IO problem on multiple PCI busses

2001-03-01 Thread David S. Miller


Benjamin Herrenschmidt writes:
 > Also, an ioctl to retreive the iobase would be useful too

No, the whole point of my suggested mmap() interface is to
_ENTIRELY_ eliminate any reason for the user to even see
what the physical addressing of the machine looks like.

If you start pushing iobases to the user, you break this.

I do not want an interface where the user still has to do
grotty stuff like mmap() on /dev/{mem,kmem}, this was the
core of the problem I had with the syscall idea, don't bring
it back.

Make mmap()'s on a PCI-->ISA bridge do something special, for
example.

The user doesn't need to know anything about physical addressing of
the machine, it all can and should be abstracted away.  This is why I
really detest the XFree86 PCI bus probing layer, it should not need to
poke around at so much of the config space information of devices :-(

It is the reason why, at least still today in Xfree86 CVS, it simply
cannot cope with multiple PCI controllers in a machine because it
assumes a flat MEM/IO space.  They know about the problem and are
working on fixes, but my point is that making this overly knowledgable
PCI prober in the first place is what created these problems.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Q: explicit alignment control for the slab allocator

2001-03-01 Thread David S. Miller


Manfred, why are you changing the cache alignment to
SMP_CACHE_BYTES?  If you read the original SLAB papers
and other documents, the code intends to color the L1
cache not the L2 or subsidiary caches.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.2 TCP window shrinking

2001-03-02 Thread David S. Miller


Jim Woodward writes:
 > This has probably been covered but I saw this message in my logs and
 > wondered what it meant?
 > 
 > TCP: peer xxx.xxx.1.11:41154/80 shrinks window 2442047470:1072:2442050944.
 > Bad, what else can I say?
 > 
 > Is it potentially bad? - Ive only ever seen it twice with 2.4.x

We need desperately to know exactly what OS the xxx.xxx.1.14 machine
is running.  Because you've commented out the first two octets, I
cannot check this myself using nmap.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: The IO problem on multiple PCI busses

2001-03-02 Thread David S. Miller


Benjamin Herrenschmidt writes:
 > What I call ISA IOs here doesn't necessarily mean there's an ISA bridge
 > on the PCI.

Ok.

 > On PPC, we don't have an "IO" space neither, all we have is a range of
 > memory addresses that will cause IO cycles to happen on the PCI bus.

This is precisely what the "next MMAP is XXX space" ioctl I've
suggested is for.  I think I've addressed this concern in my
proposal already.  Look:

fd = open("/proc/bus/pci/${BUS}/${DEV}", ...);
if (fd < 0)
return -errno;
err = ioctl(fd, PCI_MMAP_IO, 0);
if (err < 0) {
close(fd);
return -errno;
}
ptr = mmap(NULL, pdev->bar[3].size, PROT_READ | PROT_WRITE,
   MAP_PRIVATE, fd, pdev->bar[3].start);

Something like that.

 > Without that, we need to create new versions of inb/outb that take a bus
 > number.

No, don't do this, it is evil.  Use mappings, specify the device
related info somehow when creating the mapping (in the userspace
variant you do this by openning a specific device to mmap, in the
kernel variant you can encode the bus/dev/etc. info in the device's
resource and decode this at ioremap() time, see?).

Later,
David S. Miller
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: The IO problem on multiple PCI busses

2001-03-02 Thread David S. Miller


Benjamin Herrenschmidt writes:
 > There is still the need, in the ioctl we use the "select" what need to be
 > mapped by the next mmap, to ask for the "legacy IO range of the bus where
 > the card reside" (if it exist of course). That would be the 0-64k (or less,
 > actually a couple of pages would probably be enough) that generates IO cycles
 > in the "low" addresses used for VGA registers on the card.

As I've stated in another email, this is perfectly fine and is
precisely the kind of thing implied by my original proposal
in this thread.

You can even have arch-specific "next mmap is" ioctl values to do
"special things".

The generic part of the ioctl()/mmap() bits the PCI driver will have
added won't care about these ioctl's all that much, the
include/asm/pcimmap.h header will deal with all such details.  This
header is also where the physical address and the actual creation of
the page table mappings will occur.  The generic PCI code will only
provide the skeletal parts of the mmap() method and call into the
arch-specific hooks coded in asm/pcimmap.h

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: TCP Congestion Window Bug?

2001-03-03 Thread David S. Miller


Mark Reginald James writes:
 > TCP only sends a packet if:
 > 
 >  tcp_packets_in_flight(tp) < tp->snd_cwnd
 > 
 >  (function tcp_snd_test in include/net/tcp.h)
 > 
 > but regards transmission as application-limited if
 > 
 >  tp->packets_out < tp->snd_cwnd
 > 
 >  (function tcp_cwnd_validate in include/net/tcp.h)
 > 
 > So the kernel _always_ thinks the connection is
 > application-limited

Why?  After the final "send a packet if" test, tp->packets_out will be
incremented and thus be equal to tp->snd_cwnd, marking the connection
as _not_ application limited.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] tiny MM performance and typo patches for 2.4.2

2001-03-04 Thread David S. Miller


Ulrich Kunitz writes:
 > patch-uk6In 2.4.x _page_hashfn divides struct address_space pointer
 >  with a parameter derived from the size of struct
 >  inode. Deriving this parameter from the size of struct
 >  address_space makes more sense -- at least for me.

The address_space is %99 of the time (unless swapping, and in that
case the address is constant :-)) inside of an inode struct so this
change actually makes the hash worse.  I looked at this one time
myself...

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SLAB vs. pci_alloc_xxx in usb-uhci patch

2001-03-05 Thread David S. Miller


Russell King writes:
 > A while ago, I looked at what was required to convert the OHCI driver
 > to pci_alloc_consistent, and it turns out that the current interface is
 > highly sub-optimal.  It looks good on the face of it, but it _really_
 > does need sub-page allocations to make sense for USB.
 > 
 > At the time, I didn't feel like creating a custom sub-allocator just
 > for USB, and since then I haven't had the inclination nor motivation
 > to go back to trying to get my USB mouse or iPAQ communicating via USB.
 > (I've not used this USB port for 3 years anyway).

Gerard Roudier wrote for the sym53c8xx driver the exact thing
UHCI/OHCI need for this.

I think people are pissing their pants over the pci_alloc_consistent
interface for no reason.  It gives PAGE

Re: So, what about kwhich on RH6.2?

2001-01-03 Thread David S. Miller

   Date:Wed, 03 Jan 2001 22:08:33 -0800
   From: Pete Zaitcev <[EMAIL PROTECTED]>

   Are we going to use Miquel's patch? I cannot build fresh 2.2.x on
   plain RH6.2 without it. The 2.2.19-pre6 comes out without it.  Or
   is "install new bash" the official answer? Alan?

I do not understand, I just got a working 2.2.19-pre6 build on one of
my 6.2 Sparc64 systems, what kind of failure do you see?

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC, PATCH] TLB flush changes for S/390

2001-01-03 Thread David S. Miller

   From: Ulrich Weigand <[EMAIL PROTECTED]>
   Date:Mon, 1 Jan 2001 23:15:26 +0100 (MET)

* Is there some reason why ptep_test_and_clear_young should
  *not*, after all, flush the TLB?

Yes, because the accuracy of that state bit is not required to
be %100 perfect.  Less SMP tlb flushing traffic from vmscan
runs is desirable, thus no flush.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0 on sparc64 build problems

2001-01-05 Thread David S. Miller


The sparc64 config should never allow you to build the amd7930 and
dbri sbus sound drivers, that is a bug, and I'll fix that.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: PROBLEM: 2.4.0 Kernel Fails to compile when CONFIG_IP_NF_FTP is selected

2001-01-05 Thread David S. Miller


You need to enable both CONNTRACK and full NAT in your configuration.

Rusty, why doesn't the Config stuff just enforece this if it
is necessary when enabling FTP support etc.?

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Error building 2.4.0-prerelease

2001-01-05 Thread David S. Miller


The netfilter configuration allowed you to illegally specify
FTP support as non-modular, yet NAT support modular.  That
cannot work.  I would suggest changing NAT support to be
non-modular if you want FTP support non-modular.

Rusty, I think this is another case where the netfilter config
should be more stringent and disallow illegal combinations such
as this one.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0 on sparc64 build problems

2001-01-05 Thread David S. Miller

   Date:Fri, 5 Jan 2001 16:00:21 -0800
   From: Joshua Uziel <[EMAIL PROTECTED]>

   Basically, those two should be removed from the config options for
   sparc64... and in the meantime, you should build without 'em. :)

Note that 2.2.x has this exact fix already, and that 2.2.x fix
came from a similar bug report from Horst von Brand :-)

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: reset_xmit_timer errors with 2.4.0

2001-01-05 Thread David S. Miller

   Date:Fri, 5 Jan 2001 19:22:39 +0100
   From: Arkadiusz Miskiewicz <[EMAIL PROTECTED]>

   On/Dnia Fri, Jan 05, 2001 at 06:52:52AM -0800, Patrick Michael Kane wrote
   > With 2.4.0 installed, I've started to see the following errors:
   > 
   > reset_xmit_timer sk=cfd889a0 1 when=0x3b4a, caller=c01e0748
   > reset_xmit_timer sk=cfd889a0 1 when=0x3a80, caller=c01e0748
   >

   the same problem here

Does the following patch fix this for people?

--- net/ipv4/tcp_input.c.~1~Wed Dec 13 10:31:48 2000
+++ net/ipv4/tcp_input.cFri Jan  5 17:01:53 2001
@@ -1705,7 +1705,7 @@
 
if ((__s32)when < (__s32)tp->rttvar)
when = tp->rttvar;
-   tcp_reset_xmit_timer(sk, TCP_TIME_RETRANS, when);
+   tcp_reset_xmit_timer(sk, TCP_TIME_RETRANS, min(when, TCP_RTO_MAX));
}
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0 TCP SYN problem

2001-01-05 Thread David S. Miller

   From: Marek Gresko <[EMAIL PROTECTED]>
   Date:Fri, 5 Jan 2001 18:16:34 +0100

   When I initiate connection from Solaris machine everything goes OK. 
   TCP/SYN,ACK segments are OK.

   Can anyone help me?

Does:

bash# echo "0" >/proc/sys/net/ipv4/tcp_ecn

Fix the problem?  If so, please send a bug report to Sun telling them
that they improperly discard IP packets using ECN.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] hashed device lookup (Does NOT meet Linus' sumission policy!)

2001-01-06 Thread David S. Miller


Unified diffs only please... Thanks.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] single copy pipe rewrite

2001-01-06 Thread David S. Miller

   Date: Sun, 07 Jan 2001 00:25:16 +0100
   From: Manfred <[EMAIL PROTECTED]>

   Last march David Miller proposed using kiobuf for these data
   transfers, I've written a new patch for 2.4.

   (David's original patch contained 2 bugs: it doesn't protect
   properly against multiple writers and it causes a BUG() in
   pipe_read() when data is stored in both the kiobuf and the normal
   buffer)

A couple months ago David posted a revised version of his patch which
fixed both these and some other problems.  Most of the fixes were done
by Alexey Kuznetsov.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: ip_conntrack locks up hard on 2.4.0 after about 10 hours

2001-01-06 Thread David S. Miller

   Date:Sat, 06 Jan 2001 10:37:54 -0500
   From: safemode <[EMAIL PROTECTED]>

   Jan  6 06:18:10 icebox kernel: reset_xmit_timer sk=c17fd040 1
   when=0x5d9e, caller=c01a6bf1

I posted a fix for this on Linux-kernel yesterday, had you tested it
you would have seen at least this part of your problem report go away.
I'm reposting the fix for your convenience:

--- net/ipv4/tcp_input.c.~1~Wed Dec 13 10:31:48 2000
+++ net/ipv4/tcp_input.cFri Jan  5 17:01:53 2001
@@ -1705,7 +1705,7 @@
 
if ((__s32)when < (__s32)tp->rttvar)
when = tp->rttvar;
-   tcp_reset_xmit_timer(sk, TCP_TIME_RETRANS, when);
+   tcp_reset_xmit_timer(sk, TCP_TIME_RETRANS, min(when, TCP_RTO_MAX));
}
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: PROBLEM: 2.4.0 Kernel Fails to compile when CONFIG_IP_NF_FTP is selected

2001-01-06 Thread David S. Miller

   From: Rusty Russell <[EMAIL PROTECTED]>
   Date: Sat, 06 Jan 2001 13:40:35 +1100

   CONFIG_IP_NF_FTP controls BOTH the ftp connection tracking and NAT
   code.  The correct fix is below (untested, but you get the idea).

I've applied this, it seems fine.  (I've also adapted it to the
pending IRC stuff, so you don't need to send me a fix for that
under seperate cover).

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] single copy pipe rewrite

2001-01-06 Thread David S. Miller

   Date: Sun, 07 Jan 2001 01:36:22 +0100
   From: Manfred <[EMAIL PROTECTED]>

   Do you still have that patch?

I think so, see below.

   Was it posted to linux-kernel?

Yes, it was.

I just found a copy, enjoy:

diff -ur ../vger3-001101/linux/fs/pipe.c linux/fs/pipe.c
--- ../vger3-001101/linux/fs/pipe.c Sat Oct 14 18:38:24 2000
+++ linux/fs/pipe.c Wed Nov  1 21:39:53 2000
@@ -8,6 +8,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 
@@ -22,6 +24,18 @@
  * -- Julian Bradfield 1999-06-07.
  */
 
+#define PIPE_UMAP(inode)   ((inode).i_pipe->umap)
+#define PIPE_UMAPOFF(inode)((inode).i_pipe->umap_offset)
+#define PIPE_UMAPLEN(inode)((inode).i_pipe->umap_length)
+
+#define PIPE_UMAP_EMPTY(inode) \
+   ((PIPE_UMAP(inode) == NULL) || \
+(PIPE_UMAPOFF(inode) >= PIPE_UMAPLEN(inode)))
+
+#define PIPE_EMPTY(inode)  \
+   ((PIPE_LEN(inode) == 0) && PIPE_UMAP_EMPTY(inode))
+
+
 /* Drop the inode semaphore and wait for a pipe event, atomically */
 void pipe_wait(struct inode * inode)
 {
@@ -36,6 +50,65 @@
 }
 
 static ssize_t
+pipe_copy_from_kiobuf(char *buf, size_t count, struct kiobuf *kio, int kio_offset)
+{
+   struct page **cur_page;
+   unsigned long cur_offset, remains_this_page;
+   char *cur_buf;
+   int kio_remains;
+
+   kio_remains = kio->length;
+   cur_page = kio->maplist;
+   cur_offset = kio->offset;
+   while (kio_offset > 0 && kio_remains > 0) {
+   remains_this_page = PAGE_SIZE - cur_offset;
+   if (kio_offset < remains_this_page) {
+   cur_offset += kio_offset;
+   kio_remains -= kio_offset;
+   break;
+   }
+   kio_offset -= remains_this_page;
+   kio_remains -= remains_this_page;
+   cur_offset = 0;
+   cur_page++;
+   }
+
+   cur_buf = buf;
+   while (kio_remains > 0) {
+   unsigned long kvaddr;
+   int err;
+
+   remains_this_page = PAGE_SIZE - cur_offset;
+   if (remains_this_page > count)
+   remains_this_page = count;
+   if (remains_this_page > kio_remains)
+   remains_this_page = kio_remains;
+
+   kvaddr = kmap(*cur_page);
+   err = copy_to_user(cur_buf, (void *)(kvaddr + cur_offset),
+  remains_this_page);
+   kunmap(*cur_page);
+
+   if (err)
+   return -EFAULT;
+
+   cur_buf += remains_this_page;
+   count -= remains_this_page;
+   if (count <= 0)
+   break;
+
+   kio_remains -= remains_this_page;
+   if (kio_remains <= 0)
+   break;
+
+   cur_offset = 0;
+   cur_page++;
+   }
+
+   return cur_buf - buf;
+}
+
+static ssize_t
 pipe_read(struct file *filp, char *buf, size_t count, loff_t *ppos)
 {
struct inode *inode = filp->f_dentry->d_inode;
@@ -84,29 +157,44 @@
 
/* Read what data is available.  */
ret = -EFAULT;
-   while (count > 0 && (size = PIPE_LEN(*inode))) {
-   char *pipebuf = PIPE_BASE(*inode) + PIPE_START(*inode);
-   ssize_t chars = PIPE_MAX_RCHUNK(*inode);
-
-   if (chars > count)
-   chars = count;
-   if (chars > size)
-   chars = size;
+   if (PIPE_UMAP(*inode)) {
+   ssize_t chars;
 
-   if (copy_to_user(buf, pipebuf, chars))
+   chars = pipe_copy_from_kiobuf(buf, count,
+ PIPE_UMAP(*inode),
+ PIPE_UMAPOFF(*inode));
+   if (chars < 0)
goto out;
 
read += chars;
-   PIPE_START(*inode) += chars;
-   PIPE_START(*inode) &= (PIPE_SIZE - 1);
-   PIPE_LEN(*inode) -= chars;
count -= chars;
buf += chars;
-   }
+   PIPE_UMAPOFF(*inode) += chars;
+   } else {
+   while (count > 0 && (size = PIPE_LEN(*inode))) {
+   char *pipebuf = PIPE_BASE(*inode) + PIPE_START(*inode);
+   ssize_t chars = PIPE_MAX_RCHUNK(*inode);
 
-   /* Cache behaviour optimization */
-   if (!PIPE_LEN(*inode))
-   PIPE_START(*inode) = 0;
+   if (chars > count)
+   chars = count;
+   if (chars > size)
+   chars = size;
+
+   if (copy_to_user(buf, pipebuf, chars))
+   goto out;
+
+   read += chars;
+   PIPE_START(*inode) += chars;
+   PIPE_START(*inode) &= (PIPE_SIZE - 1);
+  

Re: [PATCH] hashed device lookup (Does NOT meet Linus' sumission policy!)

2001-01-06 Thread David S. Miller

   Date: Sat, 06 Jan 2001 21:06:54 -0700
   From: Ben Greear <[EMAIL PROTECTED]>

   "David S. Miller" wrote:
   > 
   > Unified diffs only please... Thanks.

   Hrm, here's one with a -u option, this what you're looking for?

Yes, thanks a lot.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] hashed device lookup (Does NOT meet Linus' sumission policy!)

2001-01-06 Thread David S. Miller


   On Sat, Jan 06, 2001 at 02:33:27PM -0700, Ben Greear wrote:

   I'm hoping that I can get a few comments on this code.  It was
   added to (significantly) speed up things like 'ifconfig -a' when
   running with 4000 or so VLAN devices.  It should also help other
   instances with lots of (virtual) devices, like FrameRelay, ATM,
   and possibly virtual IP interfaces.  It probably won't help
   'normal' users much, and in it's final form, should probably be a
   selectable option in the config process.

Ben, if ifconfig uses /proc/net/dev to list devices, how can your
changes speed up ifconfig?  Andi mentioned in another email how he has
fixed the quadratic behavior in ifconfig, you should check if it fixes
your problem.  Jamal has suggested dumping ifconfig and making a dummy
"ifconfig" which just wrappers around "ip".  I like this idea the most.

Really, what I'm concerned about is what calls dev_get_by_{name,index}
so often and in such critical places that optimizing it makes any
sense?

I don't mind optimizing stuff like this where needed, in fact I'm the
most guilty of this, check out the complex TCP hash tables we have :-)
But if it's only a problem because of poorly implemented user
applications, let's fix the apps instead of adding the complexity to
the kernel.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] hashed device lookup (Does NOT meet Linus' sumissionpolicy!)

2001-01-06 Thread David S. Miller

   Date:   Sat, 6 Jan 2001 23:00:10 -0500 (EST)
   From: jamal <[EMAIL PROTECTED]>

   I think someone should just flush ifconfig down some toilet. a wrapper
   around "ip" to to give the same look and feel as ifconfig would be a good
   thing so that some stupid program that depends on ifconfig look and feel
   would be a good start.

I could not agree more.  This reminds me to do something I could not
justify before, making netlink be enabled in the kernel and
non-configurable.

I could almost, but not quite, justify it right now just because "ip"
is becomming standard and needs it.

   Not to stray from the subject, Ben's effort is still needed. I think real
   numbers are useful instead of claims like it "displayed faster"

See my previous email, if it's just slow because of some poorly coded
version of ifconfig, it does not justify the patch.  If only a
forcefully created "benchmark" can show some performance problem, that
is not an acceptable reason to champion this patch.  Ok?

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [little bit OT] ip _IS_ _NOT_ ifconfig and route ! (was Re: [PATCH] hashed device lookup (Does NOT meet Linus' sumission policy!))

2001-01-07 Thread David S. Miller

   Date:Sun, 7 Jan 2001 11:40:10 + (UTC)
   From: "Henning P. Schmiedehausen" <[EMAIL PROTECTED]>

   As long as "man ip" on my machines returns "ip(7) - ip - Linux IPv4
   protocol implementation", using "ip" exclusively instead of
   ifconfig and route is IMHO not an option for anyone else than
   bleeding edge hackers and linux gurus.

As long as "man printf" gives me that damn shell command manpage, I
will not use printf in my C applications. :-  Yes, I do
understand, "ip" needs some more documentation perhaps.

Nobody has suggested getting rid of ifconfig, rather we have suggested
to implement it in terms of "ip" because, as you even mention, "ip" is
powerful and can do everything ifconfig can do thus ifconfig can be
implemented as a wrapper on top of "ip".

Nobody has suggested to use "ip" exclusively, you will not invoke "ip"
with the suggestion I am making.  Ifconfig indirectly will, but you
won't even notice nor should you care.  They will be packaged
together, so even that won't be an issue.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] hashed device lookup (Does NOT meet Linus' sumission policy!)

2001-01-07 Thread David S. Miller

   Date:   Mon, 8 Jan 2001 01:13:08 +1300
   From: Chris Wedgwood <[EMAIL PROTECTED]>

   OK, I'm a liar -- bind does handle this. Cool.

Standard BSD allows it, what do you expect :-)

   This is good news, because it means there is a precedent for multiple
   addresses on a single interface so we can kill the :
   syntax in favor of the above which is cleaner of more accurately
   represents what is happening.

If this is really true, 2.5.x is an appropriate time to make
this, no sooner.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-07 Thread David S. Miller


I've put a patch up for testing on the kernel.org mirrors:

/pub/linux/kernel/people/davem/zerocopy-2.4.0-1.diff.gz

It provides a framework for zerocopy transmits and delayed
receive fragment coalescing.  TUX-1.01 uses this framework.

Zerocopy transmit requires some driver support, things run
as they did before for drivers which do not have the support
added.  Currently sg+csum driver support has been added to
Acenic, 3c59x, sunhme, and loopback drivers.  We had eepro100
support coded at one point, but it was removed because we didn't know
how to identify the cards which support hw csum assist vs. ones
which could not.

I would like people to test this hard and report bugs they may
discover.  _PLEASE_ try to see if 2.4.0 without this patch produces
the same problem, and if so report it is a 2.4.0 bug _not_ as a
bug in the zerocopy patch.  Thank you.

In particular, I am interested in hearing about any new breakage
caused by the zerocopy patches when using netfilter.  When reporting
bugs, please note what networking cards you are using as whether the
card actually is using hw csum assist and sg support is an important
data point.

Finally, regardless of networking card, there should be a measurable
performance boost for NFS clients with this patch due to the delayed
fragment coalescing.  KNFSD does not take full advantage of this
facility yet.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-ac3 write() to tcp socket returning errno of -3 (ESRCH: "Nosuch process")

2001-01-07 Thread David S. Miller

   Date:Sun, 7 Jan 2001 23:55:28 -0600 (CST)
   From: Paul Cassella <[EMAIL PROTECTED]>

   [1.] One line summary of the problem:

   write() returns -1 and sets errno non-sensically.  2.4.0{,-ac[23]}

What you describe I can only say is "impossible".

There are only four cases when _ANY_ part of the ipv4 networking stack
can return ESRCH.  These four cases are:

1) Adding a route
2) Deleting a route
3) Adding a FIB routing rule
3) Removing a FIB routing rule

None of them can occur via TCP socket writes (only netlink socket
operations or socket control calls).

Therefore I suspect you are perhaps getting rather some form of memory
corruption or similar, really, please search the networking code for
ESRCH value usage, you will see.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] hashed device lookup (New Benchmarks)

2001-01-07 Thread David S. Miller

   Date:Mon, 08 Jan 2001 01:12:21 -0700
   From: Ben Greear <[EMAIL PROTECTED]>

   http://grok.yi.org/~greear/hashed_dev.png
   (If you can't get to it, let me know and I'll email it to you...some
cable modem networks have I firewalled.)

It just seems that this shows that the implementation of ifconfig can
be improved, since "ip" can do the same thing several orders of
magnitude better (ie. non-quadratic system time complexity).

This is the argument I started with when this thread began, so my
position hasn't changed, it has in fact been well supported by your
tests :-)

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-ac3 write() to tcp socket returning errno of -3 (ESRCH:"No such process")

2001-01-07 Thread David S. Miller

   Date: Mon, 8 Jan 2001 01:16:27 -0600 (CST)
   From: Paul Cassella <[EMAIL PROTECTED]>

   Would it be more helpful if I were to check something like

 socki_lookup(file->f_dentry->f_inode)->ops == tcp_prot

   instead?

No, helpful would be for you to present us with a test case program
and the network device configuration you are using.  Are you using
netfilter?  Are you using tunneling, these sorts of things.

Basically, the things we would need need to know to be able to
duplicate your precise setup here locally in hopes of triggering the
problem ourselves.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-08 Thread David S. Miller

   Date: Mon, 8 Jan 2001 11:39:15 +0100
   From: Christoph Hellwig <[EMAIL PROTECTED]>

   don't you think the writepage file operation is rather hackish?

Not at all, it's simply direct sendfile support.  It does
not try to be any fancier than that.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[Linux-IrDA]Re: Delay in authentication.

2001-01-08 Thread David S. Miller

   Date:Mon, 08 Jan 2001 18:39:34 +0500
   From: Ansari <[EMAIL PROTECTED]>

   I just installed Redhat 6.0. When i run "su" command it takes much
   time to apper passwd prompt.  Its also taking much time in
   authentication after entering the password.

This definitely seems like the classic "/etc/nsswitch.conf is told to
look for YP servers and you are not using YP", so have a look and fix
nsswitch.conf if this is in fact the problem.

Later,
David S. Miller
[EMAIL PROTECTED]
___
Linux-IrDA mailing list  -  [EMAIL PROTECTED]
http://www.pasta.cs.UiT.No/mailman/listinfo/linux-irda



Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-08 Thread David S. Miller

   From: Jes Sorensen <[EMAIL PROTECTED]>
   Date: 08 Jan 2001 23:32:48 +0100

   All I am asking is that someone lets me know if they make major
   changes to my code so I can keep track of whats happening.

We have not made any major changes to your code, in lieu of this
not being code which is actually being submitted yet.

If it bothers you that publicly someone has published changes to your
driver which you disagree with, oh well... :-)

This "please check things out" phase is precisely what you are
asking of us, it is how we are saying "here is what we need to
do with your driver, please comment".

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-08 Thread David S. Miller

   Date: Mon, 8 Jan 2001 16:05:23 -0200 (BRDT)
   From: Rik van Riel <[EMAIL PROTECTED]>

   I really think the zerocopy network stuff should be ported
   to kiobuf proper.

That is how it could be done in 2.5.x, sure.

But this patch is intended for 2.4.x so "minimum impact"
applies.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-08 Thread David S. Miller

   From: Jes Sorensen <[EMAIL PROTECTED]>
   Date: 08 Jan 2001 22:56:48 +0100

   I don't think it's too much to ask that one actually tries to
   communicate with an author of a piece of code before making such
   major changes and submitting them opting for inclusion in the
   kernel.

Jes, I have not submitted this for inclusion into the kernel.

This is the "everyone, including driver authors, take a look"
part of the development process.

We _had_ to change some drivers to show how to support this
new SKB api for transmit sg+csum support.  If you can think of
a way for us to effectively do this work without changing at least a
few drivers as examples (and proof of concept), please let us know.

In the process we hit real bugs in your driver, and tried to deal
with them as best we could so that we could continue testing and
debugging our own code.

As a side note, as much as you may hate some of Alexey's changes to
your driver, several things he does fixes long standing real bugs in
the Acenic driver that you've been papering over with workarounds
for quite some time.  I would even go so far as to say that in many
regards Alexey understands the Acenic much better than you, and you
would be wise to work with Alexey and not against him.  Thanks.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Delay in authentication.gy

2001-01-08 Thread David S. Miller

   Date:Mon, 8 Jan 2001 22:01:26 + (GMT)
   From: Alan Cox <[EMAIL PROTECTED]>

   > Solaris and other systems act identically.

   And have identical bad problems with auth failures.

Actually, I believe their sunrpc library uses an extended error
facility via the streams APIs that works similar to what is available
under Linux to solve this problem.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Delay in authentication.

2001-01-08 Thread David S. Miller

   Date: Mon, 08 Jan 2001 15:24:55 -0600
   From: "M.H.VanLeeuwen" <[EMAIL PROTECTED]>

   Was this behavior intentionally changed and why?

   Looks like 2.2.X gives ECONNREFUSED, but 2.4.X doesn't and times out.

It was intentionally changed because there is no way for the "ICMP
port unreachable" message coming back to be uniquely matched to that
UDP socket.  It can reset sockets illegally in high load scenerios.

Solaris and other systems act identically.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-08 Thread David S. Miller

   Date: Mon, 8 Jan 2001 17:43:56 -0500
   From: Stephen Frost <[EMAIL PROTECTED]>

   Perhaps you missed it, but I believe Dave's intent is for
   this to only be a proof-of-concept idea at this time.

Thank you Stephen, this is the point Jes continues to miss.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread David S. Miller

   Date: Tue, 9 Jan 2001 11:31:45 +0100
   From: Christoph Hellwig <[EMAIL PROTECTED]>

   Yuck.  A new file_opo just to get a few benchmarks right ...  I
   hope the writepages stuff will not be merged in Linus tree (but I
   wish the code behind it!)

It's a "I know how to send a page somewhere via this filedescriptor
all by myself" operation.  I don't see why people need to take
painkillers over this for 2.4.x.  I think f_op->write is stupid, such
a special case file operation just to get a few benchmarks right.
This is the kind of argument I am hearing.

Orthogonal to f_op->write being for specifying a low-level
implementation of sys_write, f_op->writepage is for specifying a
low-level implementation of sys_sendfile.  Can you grok that?

Linus has already seen this.  Originally he had a gripe because in an
older revision of the code used to allow multiple pages to be passed
in an array to the writepage(s) operation.  He didn't like that, so I
made it take only one page as he requested.  He had no other major
objections to the infrastructure.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread David S. Miller

   Date: Tue, 9 Jan 2001 12:28:10 +0100
   From: Christoph Hellwig <[EMAIL PROTECTED]>

   Sure.  But sendfile is not one of the fundamental UNIX operations...

It's a fundamental Linux interface and VFS-->networking interface.

   An alloc_kiovec before and an free_kiovec after the actual call
   and the memory overhaed of a kiobuf won't hurt so much that it stands
   against a clean interface, IMHO.

This whole exercise is pointless unless it performs well.

The overhead _DOES_ matter, we've tested and profiled all of this with
full specweb99 runs, zerocopy ftp server loads, etc.  Removing one
word of information from anything involved in these code paths makes
enormous differences.  Have you run such tests with your suggested
kiobuf scheme?

Know what I really hate?  People who are talking, "almost done", and
"designing" the "real solution" to a problem and have no code to show
for it.  Ie. a total working implementation.  Often they have not one
line of code to show.

Then the folks who actually get off their lazy asses and make
something real, which works, and in fact exceeded most of our personal
performance expectations, are the ones who are getting told that what
they did was crap.

What was the first thing out of people's mouths?  Not "nice work", but
"I think writepage is ugly and an eyesore, I hope nobody seriously
considers this code for inclusion."  Keep designing... like Linus
says, "show me the code".

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



  1   2   3   4   5   6   7   8   9   10   >