Re: help on tg3 polling extension

2005-07-06 Thread David S. Miller
From: "Qinghua(Kevin) Ye" <[EMAIL PROTECTED]>
Date: Wed, 6 Jul 2005 13:15:40 -0600

> Yes, It wastes CPU cycles if there is other process running. However, as it
> being a dedicated router, it should not be a problem. The process of packets
> is the only task it is supposed to do.

Linux is a general purpose operating system.

Even as a dedicated router, a router daemon still has to execute
in userspace to do BGP etc. signaling with routing peers.  The
administrator also might want to run diagnostic tools to monitor
the network.

You cannot spin polling on the device, it's simply unacceptable
to starve out userspace and the rest of the system like that.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: help on tg3 polling extension

2005-07-06 Thread David S. Miller
From: "Qinghua(Kevin) Ye" <[EMAIL PROTECTED]>
Date: Wed, 6 Jul 2005 14:12:29 -0600

> Yes, you are right. Click acturally will release the CPU to OS at interval.
> Other processes will be responded at this interval.

It is not Click's right to make this kind of decision, that is what
we have the process scheduler for.

> The goal of polling extension is to reduce the interrupt overhead and
> improve the throughput, especailly the small packets. NAPI does solve this
> problem to some extend.

And the extent to which NAPI does not solve this problem is???

Please propose something that solves this problem better and still
respects the other processes and resources in the system.

> If not use polling, how can I make use of all the CPUs to process packets?
> Can I make all of the CPUs run SOFTIRQ and IRQ code simultaneously? It seems
> there is only one ksoftirqd process busy dealing with process, while the
> other ksoftirqd is idle in my system.

There is one ksoftirqd for each cpu in the system.  All the network
card interrupts are arriving at that one cpu on your machine, so
the other ksoftirqd doesn't have any work to do.

If ksoftirqd is running very often, this means that network processing
is consuming an enormous amount of your cpu.  So it gets scheduled
to a process and thus the packet processing is properly shared with
other processes on the system and nobody is starved out.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: help on tg3 polling extension

2005-07-06 Thread David S. Miller
From: "Qinghua(Kevin) Ye" <[EMAIL PROTECTED]>
Date: Wed, 6 Jul 2005 15:57:00 -0600

> In my SMP platform, there is no other processes running. The usage of CPUs
> are 100% and 0%. How could I make Nic interrupts not arrive at only one CPU,
> or balance the interrupt between two CPUs?

This doesn't work.  If you try to split up the work for one network
card amongst multiple cpus, you'll get SMP cache line movements for
shared data between the processors and performance will go down.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Add Networking menu and clean up net/Kconfig

2005-07-07 Thread David S. Miller
From: Sam Ravnborg <[EMAIL PROTECTED]>
Date: Wed, 6 Jul 2005 23:06:53 +0200

> When (if) accepted I expect someone (Dave?) from netdev to push this
> onwards.

I can't apply these patches because they will break several
platforms that don't use drivers/Kconfig, my workstation
(sparc64) would be one of those platforms :)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tg3 polling extension

2005-07-07 Thread David S. Miller
From: "Qinghua(Kevin) Ye" <[EMAIL PROTECTED]>
Date: Thu, 7 Jul 2005 16:04:40 -0600

> I did some small test showing that polling can improve the packet processing
> throughput a bit. I still need to do more tests. Could anyone give me some
> information about the lock scheme of RX and TX precedure? I would be very
> appreciate. Thanks.

It depends, the locking changed significantly in the current
2.6.13-rcX version of the driver.  But before that:

1) ->hard_start_xmit() needs to hold the tx_lock with hard IRQs
   disabled, as does tg3_tx().  It uses NETIF_F_LLTX locking,
   thus the callers do not grab netdev->xmit_lock and thus do
   not guarentee atomic invocation of the driver's
   ->hard_start_xmit method.

2) Interrupt processing needs to hold ->lock with hard IRQs
   disabled.  As does any code which wants to reprogram the
   hardware.

3) tg3_rx() runs without locks held because ->poll() calls are
   guarenteed to be atomic.

4) Any piece of code which wants to significantly reprogram the
   tg3 chip must:

   a) shut down ->poll() processing by doing tg3_netif_stop()
   b) grabbing ->lock with HW irqs disabled
   c) grabbing ->tx_lock

   The unlocking afterwards must be done in the precise reverse
   order.

You could have figured this out  by simply reading the driver
and looking at how the locks are used.  I merely translated the
code into english, and also there is a big fat comment at the
top of "struct tg3" describing how the locks are used.  I did
not put that comment there for my health. :-)

You also can turn on spinlock debugging to try and figure out
any SMP hang problems you might be seeing as well.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/1] net: fix sparse warnings

2005-07-07 Thread David S. Miller
From: [EMAIL PROTECTED]
Date: Thu, 07 Jul 2005 23:30:26 +0200

> From: Victor Fusco <[EMAIL PROTECTED]>
> 
> Fix the sparse warning "implicit cast to nocast type"
> 
> Signed-off-by: Victor Fusco <[EMAIL PROTECTED]>
> Signed-off-by: Domen Puncer <[EMAIL PROTECTED]>

Applied, thanks a lot.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tg3 polling extension

2005-07-07 Thread David S. Miller
From: "Qinghua(Kevin) Ye" <[EMAIL PROTECTED]>
Date: Thu, 7 Jul 2005 17:43:06 -0600

> So the tg3_tx() and tg3_start_xmit do not include any code of reprogramming
> the hardware?

Right, they just process the TX ring.

> What kinds of code can be classifed to reprogramming the hardware? Should
> the tw32_t/rx_mbox  and tw32_mailbox operation be classified into this
> catalog?

Anything other than normal packet processing.  The MBOX writes
used to process the packets in the RX ring would not be considered
reprogramming of the chip.

> Another problem is about the Flushing  the Status block to host memory. In
> your original code, this is done by
> tr32(MAILBOX_INTERRUPT_0+TG3_64BIT_REG_LOW).

This readback is necessary to flush out any posted PCI writes
to chip registers in most circumstances.  The one exception is
tg3_restart_ints() and normal interrupt handling.

The current 2.6.13-rcX tg3.c driver is totally revamped wrt.
interrupt processing, PIO flushes, and locking in general.
You may wish to check it out.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.12.1 5/12] S2io: Performance improvements

2005-07-07 Thread David S. Miller
From: "Raghavendra Koushik" <[EMAIL PROTECTED]>
Date: Thu, 7 Jul 2005 18:06:19 -0700

> wmb() is to ensure ordered PIO writes.

wmb() does no such thing.  It only has influence on
load and store instructions done by the local processor,
it has no effect on what the PCI bus may do with PIO
writes (ie. post them).

If you need a PIO to complete in a specific order, you
have to read it back.  If you need PIO operations to occur
in a specific order wrt. cpu memory operations, mmiowb()
is what you need to use.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[INCOMPLETE PATCH]: killing skb->list

2005-07-07 Thread David S. Miller

I got inspired eariler today, and found that it seemed
it might be easy to kill off the 'list' member of
struct sk_buff without changing sk_buff_head at all.

I got very far.  Nearly every single piece of code was
easy to change to pass in an explicit SKB list instead
of skb->list to the SKB queue management functions.

The big exception was SCTP.  I can't believe after being
in the kernel for several years it has all of this complicated
list handling, SKB structure overlaying, and casting all over
the place.  It was a big downer after a very positive day of
coding.

First, it casts "struct sctp_chunk *" pointers to
"struct sk_buff *" so that it can "borrow" the SKB list
handling functions.  I just copied over the skb_*() routines
it used in this way to be sctp_chunk_*(), and used them
throughout and eliminated the ugly casts.  This can be
simplified a lot further, since it really doesn't care about
'qlen'.  In fact, what it wants is just the most basic list
handling, ala linux/list.h   So just sticking a list_head
into sctp_chunk and replacing sctp_chunk_list with a list_head
as well should do the trick.

Some of the rest of the SCTP stuff was transformable with not
too much effort.

But then I really got stymied by the reassembly and partial queue
handling.  These SCTP ulp event things make a layer of abstraction to
the skb_unlink() point such that you can't know what list the SKB is
on.  One way to deal with this is to store the list pointer in
the event struct, and that's likely what will happen at first.  This
isn't trivial because you have to make sure the assignment is done
at all of the receive packet list insertion points, some places
even use sk_buff_head lists on the local stack making this chore
even more "exciting" :(

But I didn't try to do that properly for now, and SCTP needs to
be disabled in the config to play with this patch below.

Another case that needs some careful study and review by others
is the usbnet.c driver.  Man, that is another piece of networking
code that could use some serious cleanups.  I bet it would be
a simpler driver if it did things NAPI style too.  But of particular
concern is all of the SKB data area mangling it does to workaround
restrictions in various USB net device implementations.

This patch goes on top of the skb_queue_empty() diff I sent earlier
today.  I know I may have missed some skb_unlink() et al. fixups
for things that I didn't enable in my config, so patches to
cure that would be appreciated.  It's ususally a very simplistic
transformation.  Even TCP, my biggest fear, only needed some minor
modifications to tcp_collapse() and the rest was straightforward.

Frankly, other than the SCTP parts, this is not very invasive at all.
But it needs a lot of testing and review before I'd feel comfortable
sending it along.

diff --git a/drivers/bluetooth/bfusb.c b/drivers/bluetooth/bfusb.c
--- a/drivers/bluetooth/bfusb.c
+++ b/drivers/bluetooth/bfusb.c
@@ -158,7 +158,7 @@ static int bfusb_send_bulk(struct bfusb 
if (err) {
BT_ERR("%s bulk tx submit failed urb %p err %d", 
bfusb->hdev->name, urb, err);
-   skb_unlink(skb);
+   skb_unlink(skb, &bfusb->pending_q);
usb_free_urb(urb);
} else
atomic_inc(&bfusb->pending_tx);
@@ -212,7 +212,7 @@ static void bfusb_tx_complete(struct urb
 
read_lock(&bfusb->lock);
 
-   skb_unlink(skb);
+   skb_unlink(skb, &bfusb->pending_q);
skb_queue_tail(&bfusb->completed_q, skb);
 
bfusb_tx_wakeup(bfusb);
@@ -253,7 +253,7 @@ static int bfusb_rx_submit(struct bfusb 
if (err) {
BT_ERR("%s bulk rx submit failed urb %p err %d",
bfusb->hdev->name, urb, err);
-   skb_unlink(skb);
+   skb_unlink(skb, &bfusb->pending_q);
kfree_skb(skb);
usb_free_urb(urb);
}
@@ -398,7 +398,7 @@ static void bfusb_rx_complete(struct urb
buf   += len;
}
 
-   skb_unlink(skb);
+   skb_unlink(skb, &bfusb->pending_q);
kfree_skb(skb);
 
bfusb_rx_submit(bfusb, urb);
diff --git a/drivers/ieee1394/ieee1394_core.c b/drivers/ieee1394/ieee1394_core.c
--- a/drivers/ieee1394/ieee1394_core.c
+++ b/drivers/ieee1394/ieee1394_core.c
@@ -678,7 +678,7 @@ static void handle_packet_response(struc
 return;
 }
 
-   __skb_unlink(skb, skb->list);
+   __skb_unlink(skb, &host->pending_packet_queue);
 
if (packet->state == hpsb_queued) {
packet->sendtime = jiffies;
@@ -986,7 +986,7 @@ void abort_timedouts(unsigned long __opa
packet = (struct hpsb_packet *)skb->data;
 
if (time_before(packet->sendtime + expire, jiffies)) {
-   __skb_unlink(skb, skb->list);
+   __skb_unlink(skb, &host->pending_packet_queue);
pa

Re: Seekable Sockets

2005-07-08 Thread David S. Miller
From: Chase Douglas <[EMAIL PROTECTED]>
Date: Fri, 08 Jul 2005 16:12:12 -0500

> This can be useful for programs such as mpi. In mpi, a server receives
> results of computations from clients. However, the server cannot control who
> sends data when. If the server needs data from client A to know how to
> process the data from client B, then the server will want data from client A
> first. Currently, if data from Client B comes first, then the mpi library
> will copy the data into the library in userspace, then copy the data from
> client A into the server program, and then copy the data from client B from
> its own library buffer into the server program. If the socket is seekable,
> then if data from client B comes first, we can seek past it and grab the
> data from client A and copy it directly to the server program, then copy the
> data from client B directly into the server program, saving a copy from
> userspace to userspace (and possibly an allocation in userspace in the mpi
> library). Other uses can also be found.

It seems more logical to use a different socket for each client to
solve this problem.

You're trying to multiplex a single TCP connection, and that's what
multiple TCP connections are for.

This whole seekable socket idea seems quite foolhardy, and seems to
serve only to help misdesigned userspace applications.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [INCOMPLETE PATCH]: killing skb->list

2005-07-08 Thread David S. Miller
From: Sridhar Samudrala <[EMAIL PROTECTED]>
Date: Fri, 08 Jul 2005 15:47:56 -0700

> I guess we could use the generic lists rather than skb list. But
> your sctp_chunk_list looks fine for now except for a minor
> bug in __sctp_chunk_dequeue(). You missed resetting result->list
> to NULL.

Thanks for the patch.

Today I tried to actually move over to generic lists for
the chunk stuff.

I got really close (see patch below), but the last snag I hit
was the backlog processing.  The SCTP stack sneakily passes
sctp_chunk pointers into sk_add_backlog().

I would never have spotted this without explicitly looking
for all the ugly "struct sctp_chunk *" and "struct sk_buff *"
casts.

I'll see if I can figure out a way to deal with this cleanly.

Oh yeah, also, I think the list_empty() if statement in
sctp_free_chunk() can be turned entirely into a BUG_ON().
Every caller should be unlinking the chunk from whatever list
it is on beforehand.

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -592,13 +592,8 @@ int sctp_chunk_abandoned(struct sctp_chu
  * each chunk as well as a few other header pointers...
  */
 struct sctp_chunk {
-   /* These first three elements MUST PRECISELY match the first
-* three elements of struct sk_buff.  This allows us to reuse
-* all the skb_* queue management functions.
-*/
-   struct sctp_chunk *next;
-   struct sctp_chunk *prev;
-   struct sk_buff_head *list;
+   struct list_head list;
+
atomic_t refcnt;
 
/* This is our link to the per-transport transmitted list.  */
@@ -717,7 +712,7 @@ struct sctp_packet {
__u32 vtag;
 
/* This contains the payload chunks.  */
-   struct sk_buff_head chunks;
+   struct list_head chunk_list;
 
/* This is the overhead of the sctp and ip headers. */
size_t overhead;
@@ -974,7 +969,7 @@ struct sctp_inq {
/* This is actually a queue of sctp_chunk each
 * containing a partially decoded packet.
 */
-   struct sk_buff_head in;
+   struct list_head in_chunk_list;
/* This is the packet which is currently off the in queue and is
 * being worked on through the inbound chunk processing.
 */
@@ -1017,7 +1012,7 @@ struct sctp_outq {
struct sctp_association *asoc;
 
/* Data pending that has never been transmitted.  */
-   struct sk_buff_head out;
+   struct list_head out_chunk_list;
 
unsigned out_qlen;  /* Total length of queued data chunks. */
 
@@ -1025,7 +1020,7 @@ struct sctp_outq {
unsigned error;
 
/* These are control chunks we want to send.  */
-   struct sk_buff_head control;
+   struct list_head control_chunk_list;
 
/* These are chunks that have been sacked but are above the
 * CTSN, or cumulative tsn ack point.
@@ -1672,7 +1667,7 @@ struct sctp_association {
 *  which already resides in sctp_outq.  Please move this
 *  queue and its supporting logic down there.  --piggy]
 */
-   struct sk_buff_head addip_chunks;
+   struct list_head addip_chunk_list;
 
/* ADDIP Section 4.1 ASCONF Chunk Procedures
 *
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -203,7 +203,7 @@ static struct sctp_association *sctp_ass
 */
asoc->addip_serial = asoc->c.initial_tsn;
 
-   skb_queue_head_init(&asoc->addip_chunks);
+   INIT_LIST_HEAD(&asoc->addip_chunk_list);
 
/* Make an empty list of remote transport addresses.  */
INIT_LIST_HEAD(&asoc->peer.transport_addr_list);
diff --git a/net/sctp/input.c b/net/sctp/input.c
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -308,6 +308,7 @@ int sctp_backlog_rcv(struct sock *sk, st
/* One day chunk will live inside the skb, but for
 * now this works.
 */
+#error this does not work -DaveM
chunk = (struct sctp_chunk *) skb;
inqueue = &chunk->rcvr->inqueue;
 
diff --git a/net/sctp/inqueue.c b/net/sctp/inqueue.c
--- a/net/sctp/inqueue.c
+++ b/net/sctp/inqueue.c
@@ -50,7 +50,7 @@
 /* Initialize an SCTP inqueue.  */
 void sctp_inq_init(struct sctp_inq *queue)
 {
-   skb_queue_head_init(&queue->in);
+   INIT_LIST_HEAD(&queue->in_chunk_list);
queue->in_progress = NULL;
 
/* Create a task for delivering data.  */
@@ -62,11 +62,13 @@ void sctp_inq_init(struct sctp_inq *queu
 /* Release the memory associated with an SCTP inqueue.  */
 void sctp_inq_free(struct sctp_inq *queue)
 {
-   struct sctp_chunk *chunk;
+   struct sctp_chunk *chunk, *tmp;
 
/* Empty the queue.  */
-   while ((chunk = (struct sctp_chunk *) skb_dequeue(&queue->in)) != NULL)
+   list_for_each_entry_safe(chunk, tmp, &queue->in_chunk_list, list) {
+   list_del_init(&chunk->list);
sctp_chunk_free(chunk);

[PATCH]: Make SCTP use list_head for all chunk lists

2005-07-08 Thread David S. Miller
From: "David S. Miller" <[EMAIL PROTECTED]>
Date: Fri, 08 Jul 2005 16:27:56 -0700 (PDT)

> I'll see if I can figure out a way to deal with this cleanly.

I figured out a way.  Sridhar can you give this patch below
a test?

I use the control block to store the chunk pointer, then
pass skb's around.  It's use is confined only to sctp_rcv()
and sctp_backlog_rcv() so I didn't put it into any of the
SCTP header files, it can stay local to sctp/input.c

BTW, the rest of the SCTP input path should be audited to make sure
any other use of the SKB control block on input does not spam the
ipv4/ipv6 parameter area (struct inet_skb_parm and struct
inet6_skb_parm).  That must be preserved on input (unless you
unshare the SKB of course).  That's why TCP's skb control block
(in net/tcp.h) uses this header as well.

Also, if you can get this patch working, can you check to see
if it works to change sctp_chunk_free() to go:

BUG_ON(!list_empty(&chunk->list));

Thanks a lot.

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -582,7 +582,6 @@ void sctp_datamsg_track(struct sctp_chun
 void sctp_chunk_fail(struct sctp_chunk *, int error);
 int sctp_chunk_abandoned(struct sctp_chunk *);
 
-
 /* RFC2960 1.4 Key Terms
  *
  * o Chunk: A unit of information within an SCTP packet, consisting of
@@ -592,13 +591,8 @@ int sctp_chunk_abandoned(struct sctp_chu
  * each chunk as well as a few other header pointers...
  */
 struct sctp_chunk {
-   /* These first three elements MUST PRECISELY match the first
-* three elements of struct sk_buff.  This allows us to reuse
-* all the skb_* queue management functions.
-*/
-   struct sctp_chunk *next;
-   struct sctp_chunk *prev;
-   struct sk_buff_head *list;
+   struct list_head list;
+
atomic_t refcnt;
 
/* This is our link to the per-transport transmitted list.  */
@@ -717,7 +711,7 @@ struct sctp_packet {
__u32 vtag;
 
/* This contains the payload chunks.  */
-   struct sk_buff_head chunks;
+   struct list_head chunk_list;
 
/* This is the overhead of the sctp and ip headers. */
size_t overhead;
@@ -974,7 +968,7 @@ struct sctp_inq {
/* This is actually a queue of sctp_chunk each
 * containing a partially decoded packet.
 */
-   struct sk_buff_head in;
+   struct list_head in_chunk_list;
/* This is the packet which is currently off the in queue and is
 * being worked on through the inbound chunk processing.
 */
@@ -1017,7 +1011,7 @@ struct sctp_outq {
struct sctp_association *asoc;
 
/* Data pending that has never been transmitted.  */
-   struct sk_buff_head out;
+   struct list_head out_chunk_list;
 
unsigned out_qlen;  /* Total length of queued data chunks. */
 
@@ -1025,7 +1019,7 @@ struct sctp_outq {
unsigned error;
 
/* These are control chunks we want to send.  */
-   struct sk_buff_head control;
+   struct list_head control_chunk_list;
 
/* These are chunks that have been sacked but are above the
 * CTSN, or cumulative tsn ack point.
@@ -1672,7 +1666,7 @@ struct sctp_association {
 *  which already resides in sctp_outq.  Please move this
 *  queue and its supporting logic down there.  --piggy]
 */
-   struct sk_buff_head addip_chunks;
+   struct list_head addip_chunk_list;
 
/* ADDIP Section 4.1 ASCONF Chunk Procedures
 *
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -203,7 +203,7 @@ static struct sctp_association *sctp_ass
 */
asoc->addip_serial = asoc->c.initial_tsn;
 
-   skb_queue_head_init(&asoc->addip_chunks);
+   INIT_LIST_HEAD(&asoc->addip_chunk_list);
 
/* Make an empty list of remote transport addresses.  */
INIT_LIST_HEAD(&asoc->peer.transport_addr_list);
diff --git a/net/sctp/input.c b/net/sctp/input.c
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -115,6 +115,17 @@ static void sctp_rcv_set_owner_r(struct 
atomic_add(sizeof(struct sctp_chunk),&sk->sk_rmem_alloc);
 }
 
+struct sctp_input_cb {
+   union {
+   struct inet_skb_parmh4;
+#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
+   struct inet6_skb_parm   h6;
+#endif
+   } header;
+   struct sctp_chunk *chunk;
+};
+#define SCTP_INPUT_CB(__skb)   ((struct sctp_input_cb *)&((__skb)->cb[0]))
+
 /*
  * This is the routine which IP calls when receiving an SCTP packet.
  */
@@ -243,6 +254,7 @@ int sctp_rcv(struct sk_buff *skb)
ret = -ENOMEM;
goto discard_release;
}
+   SCTP_INPUT_CB(skb)->chunk = chunk;
 
sctp_rcv_set_owner_r(skb,sk);

Re: [PATCH 1/2] (INCLUDE,empty)/leave-group equivalence for full-state MSF APIs & errno fix

2005-07-08 Thread David S. Miller
From: David Stevens <[EMAIL PROTECTED]>
Date: Fri, 8 Jul 2005 14:56:34 -0600

> This patch:
> 1) Adds (INCLUDE, empty)/leave-group equivalence to the full-state 
> multicast
> source filter APIs (IPv4 and IPv6)
> 2) Fixes an incorrect errno in the IPv6 leave-group (ENOENT should be
> EADDRNOTAVAIL)

Applied, thanks David.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] fix IPv4 leave-group group matching

2005-07-08 Thread David S. Miller
From: David Stevens <[EMAIL PROTECTED]>
Date: Fri, 8 Jul 2005 14:59:30 -0600

> This patch fixes the multicast group matching for 
> IP_DROP_MEMBERSHIP,
> similar to the IP_ADD_MEMBERSHIP fix in a prior patch. Groups are 
> identified
> by  and including the interface address in the 
> match
> will fail if a leave-group is done by address when the join was done by 
> index,
> or if different addresses on the same interface are used in the join and 
> leave.

Also applied, thanks a lot.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] multicast API "join" issues

2005-07-08 Thread David S. Miller

Patch applied, thanks David.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] multicast API "join" issues

2005-07-08 Thread David S. Miller

Patch applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] multicast API "join" issues

2005-07-08 Thread David S. Miller

Also applied, thanks a lot.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]: Make SCTP use list_head for all chunk lists

2005-07-08 Thread David S. Miller
From: Sridhar Samudrala <[EMAIL PROTECTED]>
Date: Fri, 08 Jul 2005 18:40:18 -0700

> On Fri, 2005-07-08 at 17:22 -0700, David S. Miller wrote:
> > From: "David S. Miller" <[EMAIL PROTECTED]>
> > Date: Fri, 08 Jul 2005 16:27:56 -0700 (PDT)
> > 
> > > I'll see if I can figure out a way to deal with this cleanly.
> > 
> > I figured out a way.  Sridhar can you give this patch below
> > a test?
> 
> I did a quick run of the regression tests with the patch and i
> didn't see any problems.

Thank you very much Sridhar.

> > BTW, the rest of the SCTP input path should be audited to make sure
> > any other use of the SKB control block on input does not spam the
> > ipv4/ipv6 parameter area (struct inet_skb_parm and struct
> > inet6_skb_parm).  That must be preserved on input (unless you
> > unshare the SKB of course).  That's why TCP's skb control block
> > (in net/tcp.h) uses this header as well.
> 
> We do a skb_clone() before using the SKB control block to store
> the ulpevent structure, so i guess it should be OK.

Aha!  Now if you add the proper header to the front of
the ulpevent, you will not need to clone SKBs at all.

> > Also, if you can get this patch working, can you check to see
> > if it works to change sctp_chunk_free() to go:
> > 
> > BUG_ON(!list_empty(&chunk->list));
> 
> Even this works fine, so we can replace the list_empty() check
> with BUG_ON.

Thanks a lot for checking that out for me.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


SKB tutorial updated

2005-07-10 Thread David S. Miller

I've updated the SKB tutorial a little bit today, in particular
I added coverage of non-linear data areas to the SKB data
handling tutorial page at:

 http://vger.kernel.org/~davem/skb_data.html

I'll probably start working on the TCP packet output engine
next.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


TCP output engine tutorial

2005-07-10 Thread David S. Miller

Ok, I went a little bananas with the diagrams, but here
goes nothing :-)

http://vger.kernel.org/~davem/tcp_output.html

it's linked from my top-level page as well.

Enjoy.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hard_header_len

2005-07-10 Thread David S. Miller
From: Feyd <[EMAIL PROTECTED]>
Date: Mon, 11 Jul 2005 07:43:10 +0200

> can I assume that hard_start_xmit will always get skbs with hard_header_len
> reserved? I need two more bytes at the start of the packet and I'm getting
> spurious panics in skb_push.

Typically, no.  ->hard_start_xmit() has a fully built packet, hardware
headers and all.

Protocols push the hardware header and copy it into the packet
long before you get called.  For example, look at
net/ipv4/ip_output.c:ip_finish_output2(), it takes the cached
ARP response hardware header and copies it into the packet like
so:

read_lock_bh(&hh->hh_lock);
hh_alen = HH_DATA_ALIGN(hh->hh_len);
memcpy(skb->data - hh_alen, hh->hh_data, hh_alen);
read_unlock_bh(&hh->hh_lock);
skb_push(skb, hh->hh_len);
return hh->hh_output(skb);

Your driver's ->hard_start_xmit() gets the packet after this
has occured.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SKB tutorial updated

2005-07-11 Thread David S. Miller
From: "Catalin(ux aka Dino) BOIE" <[EMAIL PROTECTED]>
Date: Mon, 11 Jul 2005 13:47:20 +0300 (EEST)

> Any chance to include all this documentation in 
> Documentation/networking/skb/?

No, no intention of doing that.  For the same reason we
don't put all of lartc.org into the kernel tree :)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seekable Sockets

2005-07-11 Thread David S. Miller
From: Chase Douglas <[EMAIL PROTECTED]>
Date: Mon, 11 Jul 2005 12:33:46 -0500

> I'm sorry, I made a careless mistake in choice of context. What this would
> be useful for is applications where we want to seek ahead in one stream from
> one connection. This is not meant for seeking somehow between multiple
> connections, but for one single connection between only two computers.

What do you do is the data you want is beyond the size of
the socket's receive buffer?  You can't seek past that without
allowing the socket to go over it's receive buffer limits.
And if you limit the seek to the receive buffer limit, that's
a real grotty limitation and it's not real seek() support on
the stream.

Really, this idea has more holes than swiss cheese.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seekable Sockets

2005-07-11 Thread David S. Miller
From: Chase Douglas <[EMAIL PROTECTED]>
Date: Mon, 11 Jul 2005 15:14:44 -0500

> This may not mean much for normal use, but there are academic instances,
> especially in cluster computing, where saving many extra user-user
> copies can really add up.

So, basically, the useful situation is that the sender
sends data the receiver doesn't want.

Which still sounds like it's the applications that need
to be fixed.

If the receiver does want all the data, it simply needs
to provision enough buffer area so that it can receive
at least enough data so that it can seek first to the
data it's interested in (inside of it's application buffer)
and then go back.

It's still one copy in this case.

I still see no real use for this feature.  Either the data is
stored inside of kernel buffers, or user application buffers.
And if the data sent is not useful to the receiver, fix the
sender to not send the unwanted data.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] loop unrolling in net/sched/sch_generic.c

2005-07-11 Thread David S. Miller
From: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
Date: Fri, 8 Jul 2005 08:03:27 -0300

> Of course only the skbs created after the skb_alloc_extension() call would 
> be valid for the subsystem
> that alloced the extension, would this be a problem?

It might be.  It is entirely possible, for example, for an old skb to
pop up and appear in netfilter.

I have no idea how we'd take care of that kind of issue.

Perhaps we could explore some mechnism by which to indicate
an extension was present when an SKB was allocated.  If we
ask for a pointer to an extension which was not there at
SKB allocation time, we do a data area realloc with the new
space size, and copy the old stuff over.

This means that extensions have to be ordered in the extension
area precisely as they were allocated.  It also means you really
cannot un-allocate.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]: Make SCTP use list_head for all chunk lists

2005-07-11 Thread David S. Miller
From: Sridhar Samudrala <[EMAIL PROTECTED]>
Date: Mon, 11 Jul 2005 09:56:10 -0700

> An incoming skb can contain more than 1 user message(chunk) and
> we do a clone for each message and store the per-message information
> in the ulpevent structure.
> Moreover, the ulpevent structure is already 34 bytes which makes it
> impossible to share the 40-byte control block with ip specific info.

If the sctp_chunk structure is per-user-message, and so is
the ulpevent object, it makes no sense to store the ulpevent
information seperately from sctp_chunk.

Look at how all of the ulpevent members tend to be initialized:

event->stream = ntohs(chunk->subh.data_hdr->stream);
event->ssn = ntohs(chunk->subh.data_hdr->ssn);
event->ppid = chunk->subh.data_hdr->ppid;
if (chunk->chunk_hdr->flags & SCTP_DATA_UNORDERED) {
event->flags |= MSG_UNORDERED;
event->cumtsn = sctp_tsnmap_get_ctsn(&asoc->peer.tsn_map);
}
event->tsn = ntohl(chunk->subh.data_hdr->tsn);
event->msg_flags |= chunk->chunk_hdr->flags;
event->iif = sctp_chunk_iif(chunk);

It's just transferring chunk information over to the ulpevent
structure with only minor modifications such as endinness
swapping of packet header fields.

So I think we can store all of this stuff in the sctp_chunk
and then just make sure the chunk is available.  In fact, we
can replace all the event->{stream,ssn,ppid,cumtsn,tsn,iif}
members with just a backpointer to the sctp_chunk.

This also means you won't need to clone so much anymore either.
You'll only need to clone at the chunking level.

I'll see if I can get a spare moment to try and implement this.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]: Make SCTP use list_head for all chunk lists

2005-07-11 Thread David S. Miller
From: Sridhar Samudrala <[EMAIL PROTECTED]>
Date: Mon, 11 Jul 2005 17:00:56 -0700

> For incoming packets, the same sctp_chunk structure is used for all
> chunks in the packet whereas ulpevent is per-chunk. An sctp_chunk is
> allocated for a packet when we do a sctp_chunkify() in sctp_rcv(). We
> walk through the chunks in a packet and reuse the same chunk structure
> as we move to the next chunk(sctp_inq_pop).
> So we cannot keep per-message ulpevent info in sctp_chunk.

I see, let me think about this some more.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP output engine tutorial

2005-07-11 Thread David S. Miller
From: Lennert Buytenhek <[EMAIL PROTECTED]>
Date: Mon, 11 Jul 2005 11:31:45 +0200

> Interesting material.  What happens if there is an skb with paged
> data (say, a page cache page) on the sk_write_queue, and it ends up
> having to be retransmitted -- is it possible that the retransmit is
> sent with different data if the page is modified in the meanwhile?

Yep.  We grab references to the pages, but we do not lock
the _contents_.  This was a deliberate design decision.

This is why we only support scatter-gather (and thus paged
SKB transmission by a device) when checksum offloading is
being done.

This is why, for example, SAMBA will only use sendfile() when
the SMB client has an oplock held on the file (and thus the
file contents are guarenteed to not change).  It turns out that
Microsoft's Windows client SMB grabs the oplocks when you open
a file by default, so this is rarely a performance problem.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] loop unrolling in net/sched/sch_generic.c

2005-07-11 Thread David S. Miller
From: Andi Kleen <[EMAIL PROTECTED]>
Date: 12 Jul 2005 04:25:49 +0200

> What other plans do have? I think a lot of stuff could be moved
> into ->cb, in particular tc_* and the HIPPI field.

See:

http://vger.kernel.org/~davem/net_todo.html

there is an entry entitled "SKBs are too large", it lists
our exact plans.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH _ALMOST_]: Kill skb->list

2005-07-11 Thread David S. Miller

Ok, this almost fully kills of skb->list.  The thing that is missing
is that a couple ugly ATM drivers need to have their skb_unlink()
calls fixed to pass in the list head pointer as the second arg.  I
gave it a quick shot, but I was unsuccessful.  I can't even compile
one of them on my workstation (nicstar.c) because it casts pointers to
"u32" and stuff like that :-/

Sridhar, I resolved the remaining SCTP issues by taking advantage of
the fact that when we are collecting onto a list, the SKB of the
"event" we use is the first member of that on-stack temp list we
are using.  Thus, "(struct sk_buff_head *) sctp_event2skb(event)->prev"
gets us the list head pointer we need, and if the 'prev' pointer is
NULL then it wasn't on a list.

It is not the prettiest solution in the world, but for now it works.
We should really do serious cleanups in this area in the future to
flesh all of this out more nicely.

But anyways, let me know if SCTP still passes your tests with this
change installed.  Thanks.

Please, if anyone has the stomache to try and fix the ATM drivers
(I think the two that need fixing is nicstar.c and zatm.c, grep
for skb_unlink() calls using only one argument), I would _seriously_
appreciate it, thanks.

diff --git a/drivers/bluetooth/bfusb.c b/drivers/bluetooth/bfusb.c
--- a/drivers/bluetooth/bfusb.c
+++ b/drivers/bluetooth/bfusb.c
@@ -158,7 +158,7 @@ static int bfusb_send_bulk(struct bfusb 
if (err) {
BT_ERR("%s bulk tx submit failed urb %p err %d", 
bfusb->hdev->name, urb, err);
-   skb_unlink(skb);
+   skb_unlink(skb, &bfusb->pending_q);
usb_free_urb(urb);
} else
atomic_inc(&bfusb->pending_tx);
@@ -212,7 +212,7 @@ static void bfusb_tx_complete(struct urb
 
read_lock(&bfusb->lock);
 
-   skb_unlink(skb);
+   skb_unlink(skb, &bfusb->pending_q);
skb_queue_tail(&bfusb->completed_q, skb);
 
bfusb_tx_wakeup(bfusb);
@@ -253,7 +253,7 @@ static int bfusb_rx_submit(struct bfusb 
if (err) {
BT_ERR("%s bulk rx submit failed urb %p err %d",
bfusb->hdev->name, urb, err);
-   skb_unlink(skb);
+   skb_unlink(skb, &bfusb->pending_q);
kfree_skb(skb);
usb_free_urb(urb);
}
@@ -398,7 +398,7 @@ static void bfusb_rx_complete(struct urb
buf   += len;
}
 
-   skb_unlink(skb);
+   skb_unlink(skb, &bfusb->pending_q);
kfree_skb(skb);
 
bfusb_rx_submit(bfusb, urb);
diff --git a/drivers/ieee1394/ieee1394_core.c b/drivers/ieee1394/ieee1394_core.c
--- a/drivers/ieee1394/ieee1394_core.c
+++ b/drivers/ieee1394/ieee1394_core.c
@@ -681,7 +681,7 @@ static void handle_packet_response(struc
 return;
 }
 
-   __skb_unlink(skb, skb->list);
+   __skb_unlink(skb, &host->pending_packet_queue);
 
if (packet->state == hpsb_queued) {
packet->sendtime = jiffies;
@@ -989,7 +989,7 @@ void abort_timedouts(unsigned long __opa
packet = (struct hpsb_packet *)skb->data;
 
if (time_before(packet->sendtime + expire, jiffies)) {
-   __skb_unlink(skb, skb->list);
+   __skb_unlink(skb, &host->pending_packet_queue);
packet->state = hpsb_complete;
packet->ack_code = ACKX_TIMEOUT;
queue_packet_complete(packet);
diff --git a/drivers/isdn/act2000/capi.c b/drivers/isdn/act2000/capi.c
--- a/drivers/isdn/act2000/capi.c
+++ b/drivers/isdn/act2000/capi.c
@@ -606,7 +606,7 @@ handle_ack(act2000_card *card, act2000_c
 if m->msg.data_b3_req.fakencci >> 8) & 0xff) == 
chan->ncci) &&
(m->msg.data_b3_req.blocknr == blocknr)) {
/* found corresponding DATA_B3_REQ */
-skb_unlink(tmp);
+skb_unlink(tmp, &card->ackq);
chan->queued -= m->msg.data_b3_req.datalen;
if (m->msg.data_b3_req.flags)
ret = m->msg.data_b3_req.datalen;
diff --git a/drivers/net/shaper.c b/drivers/net/shaper.c
--- a/drivers/net/shaper.c
+++ b/drivers/net/shaper.c
@@ -156,52 +156,6 @@ static int shaper_start_xmit(struct sk_b
 
SHAPERCB(skb)->shapelen= shaper_clocks(shaper,skb);

-#ifdef SHAPER_COMPLEX /* and broken.. */
-
-   while(ptr && ptr!=(struct sk_buff *)&shaper->sendq)
-   {
-   if(ptr->pripri 
-   && jiffies - SHAPERCB(ptr)->shapeclock < SHAPER_MAXSLIP)
-   {
-   struct sk_buff *tmp=ptr->prev;
-
-   /*
-*  It goes before us therefore we slip the length
-*  of the new frame.
-*/
-
- 

Re: [PATCH] SCTP: __nocast annotations

2005-07-11 Thread David S. Miller
From: Alexey Dobriyan <[EMAIL PROTECTED]>
Date: Tue, 12 Jul 2005 04:39:25 +0400

> Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>

Applied, thanks Alexey.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Prevent oops when printing martian source

2005-07-11 Thread David S. Miller
From: Olaf Kirch <[EMAIL PROTECTED]>
Date: Mon, 11 Jul 2005 12:58:59 +0200

> In some cases, we may be generating packets with a source address that
> qualifies as martian. This can happen when we're in the middle of setting
> up or tearing down the network, and netfilter decides to reject a packet
> with an RST.  The routing code would detect the martian, and try to
> print a warning.  This would oops, because locally generated packets do
> not have a valid skb->mac.raw pointer at this point.
> 
> I didn't actually investigate why netfilter was generating an RST with
> invalid IP source, but from what it looked like, the system was in the
> middle of some interface setup/teardown.

It is better to be safe than sorry, I've applied your patch
thanks Olaf.

Please provide a proper "Signed-off-by: " line for yourself
in future patches, thanks a lot.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] loop unrolling in net/sched/sch_generic.c

2005-07-11 Thread David S. Miller
From: Andi Kleen <[EMAIL PROTECTED]>
Date: Tue, 12 Jul 2005 06:32:33 +0200

> On Mon, Jul 11, 2005 at 07:44:47PM -0700, David S. Miller wrote:
> > From: Andi Kleen <[EMAIL PROTECTED]>
> > Date: 12 Jul 2005 04:25:49 +0200
> > 
> > > What other plans do have? I think a lot of stuff could be moved
> > > into ->cb, in particular tc_* and the HIPPI field.
> > 
> > See:
> > 
> > http://vger.kernel.org/~davem/net_todo.html
> > 
> > there is an entry entitled "SKBs are too large", it lists
> > our exact plans.
> 
> You could add my proposal if you agree. 

It could only be legal to move things into ->cb[] if they are
only referenced before the protocols get their hands on it.
It certainly looks to be the case for tc_*, although HIPPI
I am less sure of.

I also have the idea to move ->real_dev into ->cb[] so that
the bonding driver, the only user of skb->real_dev, can
maintain the pointer there.  So we'd need to be careful
that there is agreement on how the pre-protocol ->cb[]
area is layed out to avoid conflict.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: more comments on skb shrinking

2005-07-11 Thread David S. Miller
From: Andi Kleen <[EMAIL PROTECTED]>
Date: Tue, 12 Jul 2005 06:48:02 +0200

> >>
> #  skb->real_dev, it is used in one place, in bonding device. Suggestion
> is to remove it and replace with an ugly per-cpu static variable (but
> still less ugly than keeping a useless pointer in struct sk_buff) We
> can add per-cpu variable real_dev, save real_dev there and bonding can
> fetch it from there. The trick relies on the fact that real_dev can be
> forgotten after we leave softirq handler.
> <<
> 
> You can just put it into ->cb while bond is active.

Yes, but we must have a consistent layout with other things
potentially stuff into there in the pre-protocol input path.
See my other email.

> >>
> skb->h is really useless and can be eliminated immediately. The only place 
> where it is really used is checksumming offload on output. skb->h is used 
> there to mark the beginning of area to checksum, the idea was to support 
> offload for protocols other than TCP and UDP. Given that this generality is 
> not used, it can be replaced with direct parsing of IP header.
> <<
> 
> I would rather add an u16 header_offset field instead of adding
> header parsing code in all drivers. With some other fields
> being u16 there should be enough padding for that.

The idea is, rather, that skb->data is not pushed by the drivers,
it is left at the MAC header when the SKB is given to netif_receive_skb().
I think this is a much cleaner thing than what happens now.

Then we just pass skb->data to the ptype handlers.

So it's not really "all drivers", it's things like eth_header_type()
and friends that get changed.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seekable Sockets

2005-07-12 Thread David S. Miller
From: Harald Welte <[EMAIL PROTECTED]>
Date: Tue, 12 Jul 2005 13:31:23 +0200

> On Mon, Jul 11, 2005 at 01:21:03PM -0700, David S. Miller wrote:
> > I still see no real use for this feature.  Either the data is
> > stored inside of kernel buffers, or user application buffers.
> > And if the data sent is not useful to the receiver, fix the
> > sender to not send the unwanted data.
> 
> Well, this assumes that you have full control over both ends.  In
> reality you are often confronted with misdesigned "standard" protocols
> (I'm not even talking about proprietary senders/servers, or boxes
>  outside of your control).
> 
> So I have to disagree in that I think the feature is useful.  Whether or
> not it is feasible without complicating the codebase unneccessary, I
> don't know.

But even if you can't control the sender, the seeking buys you no
memory savings at all.  You don't consume less memory, because even
the data you're not interested in sits in kernel buffers while you
wait for the stuff you're interested in to arrive.

I do see how you can avoid some copies, but that seems easier to
implement with a "MSG_NOCOPY" or similar recvmsg() flag rather than
seeking.  Ie, to skip the crap you're not interested in you go:

  int len = recv(sock_fd, NULL, ignore_len, MSG_NOCOPY);

Then you do normal reads into a buffer for the parts you really
do want.

That is infinitely cleaner than this seek() idea, really.
Sockets were not meant to be seek()'d upon, so don't go there.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] __be'ify *_type_trans()

2005-07-12 Thread David S. Miller
From: Alexey Dobriyan <[EMAIL PROTECTED]>
Date: Tue, 12 Jul 2005 23:13:44 +0400

> tr_type_trans(), hippi_type_trans() left as-is.
> 
> Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>

Applied, thanks Alexey.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 4810] New: Early vlan adding leads to not functional device

2005-07-12 Thread David S. Miller

I've applied Tommy's patch to fix this bug for now.

Tommy, please provide a proper "Signed-off-by:" for patches
you post in the future, thanks a lot.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SKB tutorial, Blog, and NET TODO

2005-07-12 Thread David S. Miller
From: Zhu Yi <[EMAIL PROTECTED]>
Date: Thu, 30 Jun 2005 10:07:03 +0800

> Agreed. The ipw2200 card provides 4 hardware queues for QoS. But current
> network stack only supports one Tx queue.

This is actually difficult to implement support for.
Well, not difficult, but rather I mean that it is costly.

We would need to change the 'qdisc' member of struct netdev
into an array of qdiscs.  How to do that efficiently so that
other devices do not eat the space cost assosciated with
this is unclear.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SKB tutorial, Blog, and NET TODO

2005-07-12 Thread David S. Miller
From: "Leonid Grossman" <[EMAIL PROTECTED]>
Date: Wed, 29 Jun 2005 14:11:13 -0700

> - TSO support for IPv6
> - USO (UDP TSO) support
> - support for multiple hardware queues/channels and TCP traffic steering;
> there are number of benefits in the ability to associate TCP flows with a
> particular hw queue/cpu/MSI (MSI-X), one of them is improving receive
> bottleneck for high-speed networks at 1500mtu 
> - support for Large Receive Offload, mainly to the same end of reducing cpu
> utilization and solving 1500 mtu receive bottleneck

I've added entries for this stuff, thanks for the suggestions.

I've labelled the TCP flow assosciation and LRO stuff as
"Investigate .." because it is still unclear how exactly
we should proceed here.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] net: add a top-level Networking menu to *config

2005-07-12 Thread David S. Miller
From: Sam Ravnborg <[EMAIL PROTECTED]>
Date: Fri, 8 Jul 2005 00:38:32 +

> Create a new top-level menu named "Networking" thus moving
> net related options and protocol selection way from the drivers
> menu and up on the top-level where they belong.
> 
> To implement this all architectures has to source "net/Kconfig" before
> drivers/*/Kconfig in their Kconfig file. This change has been
> implemented for all architectures.
> 
> Device drivers for ordinary NIC's are still to be found
> in the Device Drivers section, but Bluetooth, IrDA and ax25
> are located with their corresponding menu entries under the new
> networking menu item.
> 
> Signed-off-by: Sam Ravnborg <[EMAIL PROTECTED]>

Patch applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] net: add a top-level Networking menu to *config

2005-07-12 Thread David S. Miller
From: randy_dunlap <[EMAIL PROTECTED]>
Date: Fri, 8 Jul 2005 10:02:11 -0700

> Can the NETPOLL options that are under
>   Networking support  + Networking options
> (that depend on NETCONSOLE) and the NETCONSOLE option that is under
>   Device Drivers + Network device support
> be moved to the same area?  I don't really care which area,
> but it's a hassle to have to move between them to enable/disable
> them.
> 
> I also think that there is some room for more consistency in
> the presentation of similar Network device categories, but
> maybe that (other) menuconfig patch will address this after
> your patch is merged, since that isn't what this patch was
> addressing.

Please provide a followon patch to make these changes,
as I put Sam's patches into my tree last night.

Thanks a lot.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] net: move config options out to individual protocols

2005-07-12 Thread David S. Miller
From: Sam Ravnborg <[EMAIL PROTECTED]>
Date: Fri, 8 Jul 2005 00:43:18 +

> Move the protocol specific config options out to the specific protocols.
> With this change net/Kconfig now starts to become readable and serve as a
> good basis for further re-structuring.
> 
> The menu structure is left almost intact, except that indention is
> fixed in most cases. Most visible are the INET changes where several
> "depends on INET" are replaced with a single ifdef INET / endif pair.
> 
> Several new files were created to accomplish this change - they are
> small but serve the purpose that config options are now distributed
> out where they belongs.
> 
> Signed-off-by: Sam Ravnborg <[EMAIL PROTECTED]>

Patch applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seekable Sockets

2005-07-12 Thread David S. Miller
From: David Stevens <[EMAIL PROTECTED]>
Date: Tue, 12 Jul 2005 14:16:30 -0700

> In its RFC incantation, it allows for out-of-order delivery of an
> arbitrary (but limited) amount of data. The BSD implementation
> made it largely unusable by widely distributing something that
> didn't compute the offset correctly and only supported 1 byte of
> urgent data, but its original form seems pretty close to what you
> want, without the receiver having to know where the special
> data is in advance. And the BSD-compatible form can be used
> in a similar way, with the app doing the buffering instead of the
> kernel.
> Or am I missing something??

URG is generally unusable.  First, it's disuse results in nearly all
TCP stacks going to the slow path when it is used.  TSO (on output)
and the TCP input fast path (on input) are both not used when URG is
active.

Secondly, if you know where the data is, my MSG_NOCOPY idea takes care
of things quite nicely.

URG also has the nasty side effect of using signals.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SKB tutorial, Blog, and NET TODO

2005-07-12 Thread David S. Miller
From: Patrick McHardy <[EMAIL PROTECTED]>
Date: Wed, 13 Jul 2005 01:38:37 +0200

> If its about outgoing traffic, shouldn't a prio-qdisc as root qdisc do
> just fine? skb->priority can be used to select a queue. Incoming traffic
> with pre-classification by the NIC would require multiple input queues
> though ..

I forgot what the real problem was, sorry.

Yes, the issue is on outgoing traffic, and it has to do with
netif_queue_stop().  We need one piece of queue plugging state for
every queue the hardware supports.

So if queue 0 fills up, packets can still be queued for
queue 1, 2, ...  This can't be cleanly done with a single
binary queue-stopped state like we have now.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ALIGN at crypt/cipher.c

2005-07-15 Thread David S. Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Fri, 15 Jul 2005 12:27:56 +1000

> On Thu, Jul 14, 2005 at 02:36:16PM +, Ken-ichirou MATSUZAWA wrote:
> > 
> > No, I think I can understand. align should be unsigned long too.
> > After changing align to unsigned long from int, it works fine.
> 
> Thanks for pin-pointing the problem Matsuzawa-san.  The following
> patch implements your suggestion to fix the bug where the alignment
> mask is incorrectly zero-extended on 64-bit architectures.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied, thanks Herbert.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/1] drivers/net/pcmcia/smc91c92_cs.c : Use of time_after macro

2005-07-15 Thread David S. Miller
From: [EMAIL PROTECTED]
Date: Thu, 14 Jul 2005 23:41:51 +0200

> Use of the time_after() macro, defined at linux/jiffies.h, which deal
> with wrapping correctly and are nicer to read.
> 
> Signed-off-by: Marcelo Feitoza Parisi <[EMAIL PROTECTED]>
> Signed-off-by: Domen Puncer <[EMAIL PROTECTED]>

Patch applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/1] drivers/net/wan/: use of time_after macro

2005-07-15 Thread David S. Miller
From: [EMAIL PROTECTED]
Date: Thu, 14 Jul 2005 23:41:43 +0200

> Use of the time_after() macro, defined at linux/jiffies.h, which deal
> with wrapping correctly and are nicer to read.
> 
> Signed-off-by: Marcelo Feitoza Parisi <[EMAIL PROTECTED]>
> Signed-off-by: Domen Puncer <[EMAIL PROTECTED]>

Patch applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.13-rc3] tg3: Move tg3 firmware into separate file

2005-07-17 Thread David S. Miller
From: "Nathanael Nerode" <[EMAIL PROTECTED]>
Date: Sun, 17 Jul 2005 07:55:45 -0400

> This is partly for the purpose of doing firmware loading in the future,
> but it's also a matter of tidiness.

So make the change when we do the loading like that in the
future.

The fact that you are forcing the issue right now makes
me suspicious of your real reason for desiring this change.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [PKT_SCHED]: Reduce branch mispredictions in pfifo_fast_dequeue

2005-07-18 Thread David S. Miller
From: Thomas Graf <[EMAIL PROTECTED]>
Date: Mon, 18 Jul 2005 15:36:36 +0200

> The current call to __qdisc_dequeue_head leads to a branch
> misprediction for every loop iteration, the fact that the
> most common priority is 2 makes this even worse.  This issue
> has been brought up by Eric Dumazet <[EMAIL PROTECTED]>
> but unlike his solution which was to manually unroll the loop,
> this approach preserves the possibility to increase the number
> of bands at compile time. 
> 
> Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>

Also applied, thanks Thomas.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [PKT_SCHED]: Remove debugging leftover from textsearch ematch

2005-07-18 Thread David S. Miller
From: Thomas Graf <[EMAIL PROTECTED]>
Date: Mon, 18 Jul 2005 15:35:02 +0200

> Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>

Applied, thanks Thomas.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/3] net/sctp/objcnt: Audit return code of create_proc_*

2005-07-18 Thread David S. Miller
From: [EMAIL PROTECTED]
Date: Thu, 14 Jul 2005 23:42:00 +0200

> From: Christophe Lucas <[EMAIL PROTECTED]>
> 
> Audit return of create_proc_* functions.
> 
> Signed-off-by: Christophe Lucas <[EMAIL PROTECTED]>
> Signed-off-by: Domen Puncer <[EMAIL PROTECTED]>

Applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/3] netlink: Fix "nocast type" warnings

2005-07-18 Thread David S. Miller
From: [EMAIL PROTECTED]
Date: Thu, 14 Jul 2005 23:41:58 +0200

> From: Victor Fusco <[EMAIL PROTECTED]>
> 
> Fix the sparse warning "implicit cast to nocast type"
> 
> File/Subsystem:net/netlink/af_netlink.c
> 
> Signed-off-by: Victor Fusco <[EMAIL PROTECTED]>
> Signed-off-by: Domen Puncer <[EMAIL PROTECTED]>

Applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 2.6 3/3]ioctl: Add support for getting a permanent hardware address

2005-07-18 Thread David S. Miller
From: Jon Wetzel <[EMAIL PROTECTED]>
Subject: [Patch 2.6 3/3]ioctl: Add support for getting a permanent hardware 
address
Date: Thu, 14 Jul 2005 16:43:50 -0500

> This patch is the third of three, designed to allow access to the 
> permanent hardware address of a network device.  This patch adds a new 
> ioctl to get the field, "perm_addr," which was added to net_device by
> the first patch.  
> 
> Signed-off-by: Jon Wetzel <[EMAIL PROTECTED]>

No new BSD style device ioctls, _please_!

If we want to create new facilities, they should be done via
netlink and possibly the ethtool interfaces.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net configs: NETCONSOLE and NETPOLL together

2005-07-18 Thread David S. Miller
From: randy_dunlap <[EMAIL PROTECTED]>
Date: Tue, 12 Jul 2005 21:27:28 -0700

> Put NETCONSOLE and NETPOLL options together since they are related.
> This cuts down on the hassle of flipping back and forth between
> the Networking menu and the Network drivers menu to change their
> config settings.
> 
> Tested with menuconfig, gconfig, and xconfig.
> gconfig has a small problem with this.  I think that it's
> a bug in gconfig and I will take it up with Romain Lievin.
> 
> Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>

Applied, thanks Randy.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/3] skbuff.h: Fix "nocast type" warnings

2005-07-18 Thread David S. Miller
From: [EMAIL PROTECTED]
Date: Thu, 14 Jul 2005 23:41:59 +0200

> From: Victor Fusco <[EMAIL PROTECTED]>
> 
> Fix the sparse warning "implicit cast to nocast type"
> 
> File/Subsystem:include/linux/skbuff.h
> 
> Signed-off-by: Victor Fusco <[EMAIL PROTECTED]>
> Signed-off-by: Domen Puncer <[EMAIL PROTECTED]>

Applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET]: Kill skb->tc_classid

2005-07-18 Thread David S. Miller
From: Patrick McHardy <[EMAIL PROTECTED]>
Date: Mon, 18 Jul 2005 06:39:11 +0200

> OK, here's the patch to remove it. Dave, please apply together with the
> previous patch.

Patch applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] convert nfmark and conntrack mark to 32bit

2005-07-18 Thread David S. Miller
From: Harald Welte <[EMAIL PROTECTED]>
Date: Sun, 17 Jul 2005 23:42:23 +0200

> As discussed at netconf'05, we convert nfmark and conntrack-mark to be
> 32bits even on 64bit architectures. 
> 
> Signed-off-by: Harald Welte <[EMAIL PROTECTED]>

Applied, thanks Harald.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] reduce netfilte sk_buff enlargement

2005-07-18 Thread David S. Miller
From: Harald Welte <[EMAIL PROTECTED]>
Date: Mon, 18 Jul 2005 00:04:51 +0200

> The only real in-tree user of nfcache was IPVS, who only needs a single
> bit.  Unfortunately I couldn't find some other free bit in sk_buff to
> stuff that bit into, so I introduced a separate field for them.  Maybe
> the IPVS guys can resolve that to further save space.

I think we must resolve this one before 2.6.14 goes out, which
gives us a lot of time, but for now I'll eat that one-bit member.

> Initially I wanted to shrink pkt_type to three bits (PACKET_HOST and
> alike are only 6 values defined), but unfortunately the bluetooth code
> overloads pkt_type :(

This also must be cured somehow, that really isn't a clean nor nice
usage of this field.

> - remove all never-implemented 'nfcache' code
> - don't have ipvs code abuse 'nfcache' field. currently get's their own
>   compile-conditional skb->ipvs_property field.  IPVS maintainers can
>   decide to move this bit elswhere, but nfcache needs to die.
> - remove skb->nfcache field to save 4 bytes
> - move skb->nfctinfo into three unused bits to save further 4 bytes
> 
> Signed-off-by: Harald Welte <[EMAIL PROTECTED]>

Applied, thanks Harald.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 2.6 3/3]ioctl: Add support for getting a permanent hardware address

2005-07-18 Thread David S. Miller
From: Matt Domsch <[EMAIL PROTECTED]>
Date: Mon, 18 Jul 2005 22:30:11 -0500

> Do you want a patch for netlink too then, given the ethtool kernel work is
> already done?

I think what we're going to end up doing is have a netlink
interface for the ethtool stuff, so if you add some ethtool
bits they will show up in the netlink think we come up with
automatically.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


net-2.6.14 tree made

2005-07-18 Thread David S. Miller

I just put up the first batch of changes due for the 2.6.14 networking
at:

rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.14.git

Anything that isn't a bug fix will end up there, and once 2.6.13
goes out the door I'll push the stuff in that tree to Linus.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] reduce netfilte sk_buff enlargement

2005-07-19 Thread David S. Miller
From: Jan Engelhardt <[EMAIL PROTECTED]>
Date: Tue, 19 Jul 2005 09:18:38 +0200 (MEST)

> >but for now I'll eat that one-bit member.
> 
> What is more important? Being as small as possible using bitfields, or being 
> as fast as possible? (Usage of bitfields is some CPU overhead for their 
> extraction)

I'm conjuring that we can store the state elsewhere, for example
in the SKB ->cb[] control block.  But that requires some verifications.

Memory access overhead dwarfs whatever the cpu has to do to extract
the bits.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netdev TODO list

2005-07-19 Thread David S. Miller
From: Ben Greear <[EMAIL PROTECTED]>
Date: Tue, 19 Jul 2005 10:58:33 -0700

> That way, any out-of-tree code that uses skb->stamp will no longer
> compile (it is much better to fail at compile time than run time).

Sure.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET]: Kill skb->tc_classid

2005-07-19 Thread David S. Miller
From: Patrick McHardy <[EMAIL PROTECTED]>
Date: Tue, 19 Jul 2005 20:29:30 +0200

> Did you also get the patch to kill skb->tc_classid? I can only see
> the patch to remove the define in your 2.6.14 tree.

I just put it into the tree right now, it should show up on
kernel.org in about a half hour.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8][ATM]: [speedtch] cure atm_printk() macro gcc-2.95 compile error

2005-07-19 Thread David S. Miller

All 8 patches applied, thanks Chas.

Chas, please update your address book, [EMAIL PROTECTED] is
no longer in service.  The current address for the list is
netdev@vger.kernel.org

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET]: Only build flow.o if CONFIG_XFRM=y

2005-07-19 Thread David S. Miller
From: Patrick McHardy <[EMAIL PROTECTED]>
Date: Tue, 19 Jul 2005 20:44:02 +0200

> [NET]: Only build flow.o if CONFIG_XFRM=y
> 
> Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>

Applied to net-2.6, thanks Patrick.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPV4]: Don't select XFRM for ip_gre

2005-07-19 Thread David S. Miller
From: Patrick McHardy <[EMAIL PROTECTED]>
Date: Tue, 19 Jul 2005 20:47:18 +0200

> [IPV4]: Don't select XFRM for ip_gre
> 
> Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>

Applied to net-2.6, thanks Patrick.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fib_trie whitespace fixes

2005-07-19 Thread David S. Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Tue, 19 Jul 2005 08:49:17 -0400

> Fix up lots of little whitespace indentation stuff in fib_trie.
> 
> Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

Applied to net-2.6, thanks Stephen.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET]: Make ipip/ip6_tunnel independant of XFRM

2005-07-19 Thread David S. Miller
From: Patrick McHardy <[EMAIL PROTECTED]>
Date: Tue, 19 Jul 2005 21:23:55 +0200

> [NET]: Make ipip/ip6_tunnel independant of XFRM
> 
> Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>

Looks good, applied to net-2.6, thanks Patrick.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] BRIDGE_EBT_ARPREPLY must depend on INET

2005-07-19 Thread David S. Miller
From: Adrian Bunk <[EMAIL PROTECTED]>
Date: Tue, 19 Jul 2005 15:55:29 +0200

> BRIDGE_EBT_ARPREPLY=y and INET=n results in the following compile error:

Applied, thanks Adrian.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] NETCONSOLE must depend on INET

2005-07-19 Thread David S. Miller
From: Adrian Bunk <[EMAIL PROTECTED]>
Date: Tue, 19 Jul 2005 20:29:19 +0200

> NETCONSOLE=y and INET=n results in the following compile error:

Also applied, thanks Adrian.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] reduce netfilte sk_buff enlargement

2005-07-20 Thread David S. Miller
From: Harald Welte <[EMAIL PROTECTED]>
Date: Wed, 20 Jul 2005 09:23:05 -0400

> On Mon, Jul 18, 2005 at 08:31:45PM -0700, David S. Miller wrote:
> > From: Harald Welte <[EMAIL PROTECTED]>
> > Date: Mon, 18 Jul 2005 00:04:51 +0200
> > 
> > > The only real in-tree user of nfcache was IPVS, who only needs a single
> > > bit.  Unfortunately I couldn't find some other free bit in sk_buff to
> > > stuff that bit into, so I introduced a separate field for them.  Maybe
> > > the IPVS guys can resolve that to further save space.
> > 
> > I think we must resolve this one before 2.6.14 goes out, which
> > gives us a lot of time, but for now I'll eat that one-bit member.
> 
> Well, I hope IPVS people will take care of this.  I don't really know
> that code too well...

Ok, I might take a look at this myself.

> > > Initially I wanted to shrink pkt_type to three bits (PACKET_HOST and
> > > alike are only 6 values defined), but unfortunately the bluetooth code
> > > overloads pkt_type :(
> > 
> > This also must be cured somehow, that really isn't a clean nor nice
> > usage of this field.
> 
> I just ran into Marcel Holtmann earlier today.  He thinks moving that
> data into the cb is fine, though he has to double-check that.
> 
> He also said that he really only needs 5 bits, so even if the current
> pkt_type overloading would persist, we could probably shrink it to make
> space for the IPVS bit.

Ok, sounds great.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/*] re-add NFC_ defines

2005-07-20 Thread David S. Miller

All 3 patches applied, thanks Harald.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] reduce netfilte sk_buff enlargement

2005-07-21 Thread David S. Miller
From: Marcel Holtmann <[EMAIL PROTECTED]>
Date: Thu, 21 Jul 2005 20:20:35 +0200

> However after a look trough the Bluetooth core it should be quite
> easy too move the pkt_type into the control buffer. We already use
> it for a direction bit. The nasty thing is that I have to modify all
> the drivers. So when you finally decided to shrink the pkt_type, I
> think that I can come up with a patch for it quiet quickly.

We are trimming SKB madly right now, so if you could work on
the bluetooth patch so we can trim the pkt_type size ASAP
that would be much appreciated.  You can send diffs against
my net-2.6.14 tree at:

   rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.14.git

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2][REQSK] Move the syn_table destruction from tcp_listen_stop to reqsk_queue_destroy

2005-07-21 Thread David S. Miller
From: [EMAIL PROTECTED] (Arnaldo Carvalho de Melo)
Date: Wed, 20 Jul 2005 20:22:28 -0300

> + if (lopt->qlen != 0) {
> + struct request_sock *req;
> + int i;
> +
> + for (i = 0; i < lopt->nr_table_entries; i++)
> + while ((req = lopt->syn_table[i]) != NULL) {
> + lopt->syn_table[i] = req->dl_next;
> + lopt->qlen--;
> + reqsk_free(req);
> + }
> + }

Please fix the tabbing of the closing braces.

In fact, put an openning brace after the for() statement,
then add the necessary closing brace at the proper
tabbing level to close the top-level if() basic block.

I'll hold on both patches until you fix this up, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] reduce netfilte sk_buff enlargement

2005-07-21 Thread David S. Miller
From: Marcel Holtmann <[EMAIL PROTECTED]>
Date: Thu, 21 Jul 2005 23:42:11 +0200

> unfortunatly it is not that straight forward as I thought. The attached
> patch which modifies the Bluetooth core and the hci_usb driver is not
> working on my machine.

Hmmm... I'll see if I can spot anything obvious in the patch.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] reduce netfilte sk_buff enlargement

2005-07-21 Thread David S. Miller
From: Marcel Holtmann <[EMAIL PROTECTED]>
Date: Thu, 21 Jul 2005 23:42:11 +0200

> unfortunatly it is not that straight forward as I thought. The attached
> patch which modifies the Bluetooth core and the hci_usb driver is not
> working on my machine.

This probably has nothing to do with why the patch doesn't
work for you, but the transformation of "incoming" to a "u8"
from an "int" is not fully correct, because hci_sock.c
does this:

put_cmsg(msg, SOL_HCI, HCI_CMSG_DIR, sizeof(int), 
&bt_cb(skb)->incoming);

I haven't found any other problems though...

Maybe the bluetooth code was somehow depending upon the initial
value of skb->pkt_type or something like that?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET,RFC]: Introduce SO_{SND,RCV}BUFFORCE socket options

2005-07-21 Thread David S. Miller
From: Patrick McHardy <[EMAIL PROTECTED]>
Date: Thu, 21 Jul 2005 02:36:13 +0200

> ctnetlink needs large socket buffer sizes. To avoid increasing
> the system wide limit we would like to have something that allows
> CAP_NET_ADMIN to override these limits. The first idea was to
> change the SO_{SND,RCV}BUF behaviour, but since a valid way of
> getting the largest possible size is to use ~0 this would possibly
> break existing applications. So this patch introduces two new
> socket options, SO_SNDBUFFORCE and SO_RCVBUFFORCE, that allow to
> set it to any value.

I couldn't come up with a better way to do this, so the patch
is applied to my net-2.6.14 tree.

I thought perhaps we could special case "~0", but actually there
are many programs which use an algorithm like "double socket
buffer size until reading it back does not show an increase".
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] reduce netfilte sk_buff enlargement

2005-07-21 Thread David S. Miller
From: Marcel Holtmann <[EMAIL PROTECTED]>
Date: Fri, 22 Jul 2005 01:49:51 +0200

> The pkt_type zero is not a valid one. We only use 1-4 and 0xff. So this
> can't be the problem. I assume that the cb is not copied from the driver
> into the core at some point.

All clones and copies of SKBs copy of the ->cb[] for you.
So perhaps something is spamming the ->cb[] between these
two places.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC]: Killing skb->real_dev

2005-07-21 Thread David S. Miller

I studied this and it's merely a matter of parameter passing.
Specifically, at ptype->func() time, it is plainly the skb->dev
before skb_bond() is applied.

So I added a "real_dev" arg to ptype->func() and converted
the tree over to that.

Thomas, this kills the TCF_META_ID_REALDEV stuff, so we should
kill it in 2.6.13-rcX too so that nobody starts using it in
userspace ok?

I'm trying to figure out if it matters for multiple levels of
decapsulation.  As far as I can tell for the bond_3ad() case,
it doesn't, and that's the only user of this thing.

if_vlan.h was setting ->real_dev but that looked totally wrong
and had no usage, so I simply deleted that.

Comments?

diff --git a/drivers/block/aoe/aoenet.c b/drivers/block/aoe/aoenet.c
--- a/drivers/block/aoe/aoenet.c
+++ b/drivers/block/aoe/aoenet.c
@@ -120,7 +120,7 @@ aoenet_xmit(struct sk_buff *sl)
  * (1) len doesn't include the header by default.  I want this. 
  */
 static int
-aoenet_rcv(struct sk_buff *skb, struct net_device *ifp, struct packet_type *pt)
+aoenet_rcv(struct sk_buff *skb, struct net_device *ifp, struct packet_type 
*pt, struct net_device *real_dev)
 {
struct aoe_hdr *h;
u32 n;
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2419,22 +2419,19 @@ out:
return 0;
 }
 
-int bond_3ad_lacpdu_recv(struct sk_buff *skb, struct net_device *dev, struct 
packet_type* ptype)
+int bond_3ad_lacpdu_recv(struct sk_buff *skb, struct net_device *dev, struct 
packet_type* ptype, struct net_device *real_dev)
 {
struct bonding *bond = dev->priv;
struct slave *slave = NULL;
int ret = NET_RX_DROP;
 
-   if (!(dev->flags & IFF_MASTER)) {
+   if (!(dev->flags & IFF_MASTER))
goto out;
-   }
 
read_lock(&bond->lock);
-   slave = bond_get_slave_by_dev((struct bonding *)dev->priv,
- skb->real_dev);
-   if (slave == NULL) {
+   slave = bond_get_slave_by_dev(bond, real_dev);
+   if (!slave)
goto out_unlock;
-   }
 
bond_3ad_rx_indication((struct lacpdu *) skb->data, slave, skb->len);
 
diff --git a/drivers/net/bonding/bond_3ad.h b/drivers/net/bonding/bond_3ad.h
--- a/drivers/net/bonding/bond_3ad.h
+++ b/drivers/net/bonding/bond_3ad.h
@@ -295,6 +295,6 @@ void bond_3ad_adapter_duplex_changed(str
 void bond_3ad_handle_link_change(struct slave *slave, char link);
 int  bond_3ad_get_active_agg_info(struct bonding *bond, struct ad_info 
*ad_info);
 int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev);
-int bond_3ad_lacpdu_recv(struct sk_buff *skb, struct net_device *dev, struct 
packet_type* ptype);
+int bond_3ad_lacpdu_recv(struct sk_buff *skb, struct net_device *dev, struct 
packet_type* ptype, struct net_device *real_dev);
 #endif //__BOND_3AD_H__
 
diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -354,15 +354,14 @@ static void rlb_update_entry_from_arp(st
_unlock_rx_hashtbl(bond);
 }
 
-static int rlb_arp_recv(struct sk_buff *skb, struct net_device *bond_dev, 
struct packet_type *ptype)
+static int rlb_arp_recv(struct sk_buff *skb, struct net_device *bond_dev, 
struct packet_type *ptype, struct net_device *real_dev)
 {
struct bonding *bond = bond_dev->priv;
struct arp_pkt *arp = (struct arp_pkt *)skb->data;
int res = NET_RX_DROP;
 
-   if (!(bond_dev->flags & IFF_MASTER)) {
+   if (!(bond_dev->flags & IFF_MASTER))
goto out;
-   }
 
if (!arp) {
dprintk("Packet has no ARP data\n");
diff --git a/drivers/net/hamradio/bpqether.c b/drivers/net/hamradio/bpqether.c
--- a/drivers/net/hamradio/bpqether.c
+++ b/drivers/net/hamradio/bpqether.c
@@ -98,7 +98,7 @@ static char bcast_addr[6]={0xFF,0xFF,0xF
 
 static char bpq_eth_addr[6];
 
-static int bpq_rcv(struct sk_buff *, struct net_device *, struct packet_type 
*);
+static int bpq_rcv(struct sk_buff *, struct net_device *, struct packet_type 
*, struct net_device *);
 static int bpq_device_event(struct notifier_block *, unsigned long, void *);
 static const char *bpq_print_ethaddr(const unsigned char *);
 
@@ -165,7 +165,7 @@ static inline int dev_is_ethdev(struct n
 /*
  * Receive an AX.25 frame via an ethernet interface.
  */
-static int bpq_rcv(struct sk_buff *skb, struct net_device *dev, struct 
packet_type *ptype)
+static int bpq_rcv(struct sk_buff *skb, struct net_device *dev, struct 
packet_type *ptype, struct net_device *real_dev)
 {
int len;
char * ptr;
diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
--- a/drivers/net/pppoe.c
+++ b/drivers/net/pppoe.c
@@ -377,7 +377,8 @@ abort_kfree:
  ***/
 static int pppoe_rcv(struct sk_buff *skb,
 struct net_

Re: [PATCH RFC]: Killing skb->real_dev

2005-07-22 Thread David S. Miller
From: Ben Greear <[EMAIL PROTECTED]>
Date: Thu, 21 Jul 2005 17:41:55 -0700

> Er, now I feel like an idiot.  I am using the real_dev that
> is saved in the vlan device logic, not the thing in the
> skb.

Don't feel bad, that usage confused me as well while working
on this patch :-)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC]: Killing skb->real_dev

2005-07-22 Thread David S. Miller
From: Jay Vosburgh <[EMAIL PROTECTED]>
Date: Thu, 21 Jul 2005 17:35:24 -0700

>   FWIW, there have been a couple of proposals floating around
> bonding-devel for a while from people looking to get the skb->real_dev
> in user space (for network manager applications and user-level link
> state monitor type things).  There was a patch posted to bonding-devel a
> couple of months ago proposing a sockopt to pass the real_dev up to user
> space.  I'm not sure where things stand with them now.

I don't think we really want that.  People could ask for that
for any similar relationship of encapsulation, and real_dev only
works for one level so doesn't cover all cases anyways.

Going the other way is simpler of course, because of dev->master
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [IPV4] fib_trie cleanups

2005-07-22 Thread David S. Miller

A lot of the spacing and tabbing has been cleaned up by
Stephen Hemminger, so you might want to patch against
the copy in my 2.6.14 networking it tree at:

rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.14.git

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC]: Killing skb->real_dev

2005-07-22 Thread David S. Miller
From: Thomas Graf <[EMAIL PROTECTED]>
Date: Fri, 22 Jul 2005 17:04:00 +0200

> Sure. I was just thinking that maybe we should delay
> the iproute2 release with the ematch bits until we
> finished to shrink the skb. Stephen?

Hopefully we can weed out the unusable ematch bits before 2.6.13 is
released.  Therefore, once 2.6.13 goes out the iproute2 update should
be OK.

I'm hoping that since we're doing the SKB shrinking in parallel in the
net-2.6.14 tree with the ongoing 2.6.13 bug fixing, we should be
able to catch all such cases.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] inlining failing in ip_conntrack

2005-07-22 Thread David S. Miller
From: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
Date: Fri, 22 Jul 2005 03:00:13 -0300

>   CC [M]  net/ipv4/netfilter/ip_conntrack_core.o
> net/ipv4/netfilter/ip_conntrack_core.c: In function
> `ip_conntrack_event_cache_init':
> include/linux/netfilter_ipv4/ip_conntrack.h:296: sorry, unimplemented:
> inlining failed in call to 'ip_conntrack_put': function body not
> available
> net/ipv4/netfilter/ip_conntrack_core.c:139: sorry, unimplemented:
> called from here
> make[3]: ** [net/ipv4/netfilter/ip_conntrack_core.o] Erro 1
> make[2]: ** [net/ipv4/netfilter] Erro 2
> make[1]: ** [net/ipv4] Erro 2
> 
> This is on the net-2.6.14 tree, using gcc 3.4.3

It's marked inline in the header file yet not in the implementation.
I think we should work out that descrepancy :-)

Since it might conflict, I'm going to apply Harald's ctnetlink stuff,
and then remove the inline tag from ip_conntrack.h's extern declaration
of ip_conntrack_put().  If we really want it to be inline, a followon
patch can do that.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2][NET] cleanup INET_REFCNT_DEBUG code

2005-07-22 Thread David S. Miller
From: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
Date: Thu, 21 Jul 2005 23:02:03 -0300

> The second one again, also at:
> 
> rsync://rsync.kernel.org/pub/scm/linux/kernel/git/acme/net-2.6.14.git

How is this handling properly the case where sk_prot changes?

Do you remember we had that problem with socket SLAB caches,
because of how IPV6 and IPV4 sockets can change into the other
type?  That's why we store the socket SLAB cache in there, as
well as the sk_prot.

Also, would be nice to have some "do { } while (0)" for the NOP
version of the debug macros just in case :-)

The first patch doing the reqsk stuff looks fine, so I'll apply
that and push it into the net-2.6.14 tree.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ctnetlink

2005-07-22 Thread David S. Miller
From: Harald Welte <[EMAIL PROTECTED]>
Date: Fri, 22 Jul 2005 09:11:30 -0400

> This is a patch for your net-2.6.14 tree (incremental to the
> expect-double-free fix).  It adds the ctnetlink code, and all the
> required core conntrack/nat changes that it needs.

Applied and pushed to net-2.6.14
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slow tcp acks on loopback device

2005-07-22 Thread David S. Miller
From: Steve French <[EMAIL PROTECTED]>
Date: 22 Jul 2005 14:56:59 -0500

> Noticing that the loopback device (at least on RHEL4) has an unfortunate
> mtu size 16384 (which is about 50 bytes too small for SMB read
> responses), I did try increasing the MTU slightly.  Changing that to
> 18000 did avoid the fragmentation and the 40ms delay - but what puzzled
> me was why setting TCP_NODELAY after the socket was created did not
> eliminate the delay on the ack and if there is a way to avoid the huge
> tcp ack delay by either doing something else to force client acking
> immediately or to do something on the client side of the stack to get
> the server to send the whole 16K+ frame - it looks like the tcp windows
> is 32K if the value in the tcp acks in the network trace is to be
> trusted.

TCP_NODELAY does not control ACK generation, instead it modifies
the Nagle algorithm behavior when sending data packets.

Please take networking discussions to netdev@vger.kernel.org which
is where the networking developers are.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC]: Killing skb->real_dev

2005-07-22 Thread David S. Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Fri, 22 Jul 2005 13:48:42 -0700

> I don't see how the ematch iproute2 stuff depends on SKB shrinking.
> The CVS repository has the latest ematch stuff, just testing and
> checking before the next drop.

We're killing SKB members that the ematch stuff supports
keying on.  Thus we're deleting the enumeration constants
that supported that stuff, which changes the value of the
rest of the enumeration constants.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC]: Killing skb->real_dev

2005-07-22 Thread David S. Miller
From: Thomas Graf <[EMAIL PROTECTED]>
Date: Fri, 22 Jul 2005 22:51:05 +0200

> Yes, currently we have TCF_META_ID_SECURITY still in there
> with a "/* obsolete */" comment so we can remove that
> immediately. Other candidates for removal are indev, realdev,
> and tcverdict so it's not a big problem, we can just remove
> all of them before the release and in the unlikely case that
> we continue to use one of them, readd it. Reasonable?

Sounds fine.

It might have been less painful if these constants were really
constant defines and not an enumeration where removing one
value changes all the others after it.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC]: Killing skb->real_dev

2005-07-22 Thread David S. Miller
From: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>
Date: Fri, 22 Jul 2005 17:06:38 -0400 (EDT)

> No, please, please do not break binaries, whenever it is possible.
> It is definitely much better to have many deaf entries in enums.

That is why we are trying to kill the constants before 2.6.13
gets released.  These new interfaces do not exist in 2.6.12,
and 2.6.13 is not released yet.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC]: Killing skb->real_dev

2005-07-22 Thread David S. Miller
From: Thomas Graf <[EMAIL PROTECTED]>
Date: Fri, 22 Jul 2005 22:51:05 +0200

> Yes, currently we have TCF_META_ID_SECURITY still in there
> with a "/* obsolete */" comment so we can remove that
> immediately. Other candidates for removal are indev, realdev,
> and tcverdict so it's not a big problem, we can just remove
> all of them before the release and in the unlikely case that
> we continue to use one of them, readd it. Reasonable?

Done and pushed to net-2.6 like so.  I think due to the mounting
number of net-2.6 --> net-2.6.14 tree conflicts, I'll start to
work on rebuilding the net-2.6.14 GIT tree using net-2.6 as a
base.

diff-tree 261688d01ec07d3a265b8ace6ec68310fbd96a96 (from 
d3984a6b6abac6203868f0e9095c0ed9e33ece03)
Author: David S. Miller <[EMAIL PROTECTED]>
Date:   Fri Jul 22 14:43:52 2005 -0700

[PKT_SCHED]: em_meta: Kill TCF_META_ID_{INDEV,SECURITY,TCVERDICT}

More unusable TCF_META_* match types that need to get eliminated
before 2.6.13 goes out the door.

Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
Acked-by: Thomas Graf <[EMAIL PROTECTED]>

diff --git a/include/linux/tc_ematch/tc_em_meta.h 
b/include/linux/tc_ematch/tc_em_meta.h
--- a/include/linux/tc_ematch/tc_em_meta.h
+++ b/include/linux/tc_ematch/tc_em_meta.h
@@ -41,17 +41,14 @@ enum
TCF_META_ID_LOADAVG_1,
TCF_META_ID_LOADAVG_2,
TCF_META_ID_DEV,
-   TCF_META_ID_INDEV,
TCF_META_ID_PRIORITY,
TCF_META_ID_PROTOCOL,
-   TCF_META_ID_SECURITY, /* obsolete */
TCF_META_ID_PKTTYPE,
TCF_META_ID_PKTLEN,
TCF_META_ID_DATALEN,
TCF_META_ID_MACLEN,
TCF_META_ID_NFMARK,
TCF_META_ID_TCINDEX,
-   TCF_META_ID_TCVERDICT,
TCF_META_ID_RTCLASSID,
TCF_META_ID_RTIIF,
TCF_META_ID_SK_FAMILY,
diff --git a/net/sched/em_meta.c b/net/sched/em_meta.c
--- a/net/sched/em_meta.c
+++ b/net/sched/em_meta.c
@@ -27,17 +27,17 @@
  *  lvalue   rvalue
  *   +---+   +---+
  *   | type: INT |   | type: INT |
- *  def  | id: INDEV |   | id: VALUE |
+ *  def  | id: DEV   |   | id: VALUE |
  *   | data: |   | data: 3   |
  *   +---+   +---+
  * |   |
- * ---> meta_ops[INT][INDEV](...)  |
+ * ---> meta_ops[INT][DEV](...)|
  *   | |
  * --- |
  * V   V
  *   +---+   +---+
  *   | type: INT |   | type: INT |
- *  obj  | id: INDEV |   | id: VALUE |
+ *  obj  | id: DEV | | id: VALUE |
  *   | data: 2   |<--data got filled out | data: 3   |
  *   +---+   +---+
  * | |
@@ -170,16 +170,6 @@ META_COLLECTOR(var_dev)
*err = var_dev(skb->dev, dst);
 }
 
-META_COLLECTOR(int_indev)
-{
-   *err = int_dev(skb->input_dev, dst);
-}
-
-META_COLLECTOR(var_indev)
-{
-   *err = var_dev(skb->input_dev, dst);
-}
-
 /**
  * skb attributes
  **/
@@ -235,13 +225,6 @@ META_COLLECTOR(int_tcindex)
dst->value = skb->tc_index;
 }
 
-#ifdef CONFIG_NET_CLS_ACT
-META_COLLECTOR(int_tcverd)
-{
-   dst->value = skb->tc_verd;
-}
-#endif
-
 /**
  * Routing
  **/
@@ -490,7 +473,6 @@ struct meta_ops
 static struct meta_ops __meta_ops[TCF_META_TYPE_MAX+1][TCF_META_ID_MAX+1] = {
[TCF_META_TYPE_VAR] = {
[META_ID(DEV)]  = META_FUNC(var_dev),
-   [META_ID(INDEV)]= META_FUNC(var_indev),
[META_ID(SK_BOUND_IF)]  = META_FUNC(var_sk_bound_if),
},
[TCF_META_TYPE_INT] = {
@@ -499,7 +481,6 @@ static struct meta_ops __meta_ops[TCF_ME
[META_ID(LOADAVG_1)]= META_FUNC(int_loadavg_1),
[META_ID(LOADAVG_2)]= META_FUNC(int_loadavg_2),
[META_ID(DEV)]  = META_FUNC(int_dev),
-   [META_ID(INDEV)]= META_FUNC(int_indev),
[META_ID(PRIORITY)] = META_FUNC(int_priority),

Re: SKB tutorial, Blog, and NET TODO

2005-07-22 Thread David S. Miller
From: Patrick McHardy <[EMAIL PROTECTED]>
Date: Fri, 22 Jul 2005 04:59:30 +0200

> We have multiple queue states, one for each hardware TX queue.
> But instead of multiple qdiscs per device we add a "prio"-argument
> to the dequeue-function. The top-level qdisc is dequeued with the
> highest active priority and hands out a packet of this priority, or,
> if it doesn't support priorities, any packet. The priority of the
> dequeued packet is either passed as argument to hard_start_xmit or
> stored in skb->priority. This approach has a great advantage over
> multiple top-level qdiscs, we can use all the existing classification
> stuff, including SO_PRIORITY etc., and non-work-conserving qdiscs
> like HTB, HFSC, ... can still be used to enforce bandwidth limits.
> It should be possible to implement it in a way that causes only minimal
> overhead for devices not supporting multiple TX queues.
> 
> If everyone can agree to this approach I'll hack something up.

Sounds OK.  What happens if the top-level queue pulls out
a packet with a certain priority, and that priority's queue
in the device is stopped?  Will it look for lower-priority
packets and try to send those?  All of this kind of logic could
result in some ugly loops :)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] reduce netfilte sk_buff enlargement

2005-07-22 Thread David S. Miller
From: Marcel Holtmann <[EMAIL PROTECTED]>
Date: Fri, 22 Jul 2005 02:26:34 +0200

> I found the problem. The hci_usb is using the cb[] by itself and so
> overwriting the pkt_type value. The attached patch works for me with the
> hci_usb driver. However I haven't converted all other drivers and
> checked them. This won't happen until I am back home, because I don't
> have any of these devices with me around. However it looks like this
> seems to work without any problems.

Great.  I'll wait until you get back and code up an updated
patch that takes care of all of the drivers.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


net-2.6.14 GIT rebased

2005-07-22 Thread David S. Miller

Ok, I rebased the net-2.6.14 GIT tree based upon the
current net-2.6 tree.  It may take a few hours for this
rebasing to hit the kernel.org mirror system, so please
be patient :)

You'll have to be careful when resyncing to this thing
since all the changesets were redone.  I would recommend
pulling out local changes into patches, rsync'ing the
new net-2.6.14 tree, running git-prune-script, then readding
your local changes.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: net-2.6.14 GIT rebased

2005-07-24 Thread David S. Miller
From: Harald Welte <[EMAIL PROTECTED]>
Date: Sat, 23 Jul 2005 01:11:08 -0400

> so there now is 
> rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.14.git/
> and
> rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.14.git/new-net-2.6.14.git
> 
> I don't think this is this intentional?  Which one to use ?

I'm quite the bozo, and it figures I'd do something like
that right before going away for a day and a half, sorry.

The latter is the correct one, which I've moved over to the
correct net-2.6.14.git location.

Sorry again.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: reduce skb input dev on 64 bit machines

2005-07-24 Thread David S. Miller
From: Jamal Hadi Salim <[EMAIL PROTECTED]>
Date: Sat, 23 Jul 2005 09:32:07 -0400

> This is part of mission skb diet. 
> Against  git/davem/2.6.14 that was on vger 30 minutes back: It changes
> input_dev to be an ifindex so we dont bother holding devices. 
> Would only cut a 4 byte fat on 64 bit machines.
> 
> Signed-off-by: Jamal Hadi Salim <[EMAIL PROTECTED]>

I have a better change in the wings that totally eliminates
real_dev _AND_ input_dev completely, and passes them as
parameters into pt->func() and ->enqueue() as it should have
been from the beginning.

input_dev is "skb->dev at time netif_receive_skb()" was called,
and also, this is pretty much what the "real_dev" code wants too,
it wants the device before skb_bond() was invoked.

So both cases want the same exact device pointer, and we can pass
them around as parameters instead of all of the current bogus
stuff.  And the mere act of passing this "orig_dev" in as a parameter
makes it easier to verify that references to it will not escape
from the softirq input packet processing context.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SKB tutorial, Blog, and NET TODO

2005-07-24 Thread David S. Miller
From: Jamal Hadi Salim <[EMAIL PROTECTED]>
Date: Sat, 23 Jul 2005 10:14:52 -0400

> Setting the skb->prio to be used by the driver sounds reasonable.
> Another alternative would be what was already mentioned to change the
> call to hardware_start_transmit() to take a prio option.
> The driver should take care of what that all means given that we have
> views that differ depending on the h/ware.

But this simply doesn't work by itself, that's why we need the
per-queue "stopped" states.

We need something that properly synchronizes the queue "full"
state transitions, so that the queue does not deadlock and when
one priority queue fills up, we do the right thing.

All of the packet scheduler is keyed off of being able to atomically
"send the queue X while not stopped", and that transition from
stopped to not-stopped is interlocked properly with the asynchronous
sending path.

Alexey explained this to both you and I about 3 years ago.  At
the time we were talking about the prioritized queues provides
by a few gigabit NICs at the time.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SKB tutorial, Blog, and NET TODO

2005-07-24 Thread David S. Miller
From: Thomas Graf <[EMAIL PROTECTED]>
Date: Sat, 23 Jul 2005 18:14:58 +0200

> The simplest case is if the hardware does strict prio and does
> the queueing itself based on skb->priority or similiar. We don't
> need to change anything in this case except for adding the
> interface to transfer the classification result to the driver.

The key is what should happen when the ring for prio X fills
up?  netif_stop_queue() in it's current form is the wrong
thing to do, because it prevents lower priority packets from
being queued which is exactly what we want to do if those
queues have space.  The higher-prio packets will still go out
first, of course, but queueing to lower prio rings should
still be possible.

So we need some kind of netif_stop_queue_prio(dev, prio_nr)
or similar.

The next issue is how to demultiplex from the number of prios
we want, to what the hardware actually supports if the latter
is smaller.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: reduce skb input dev on 64 bit machines

2005-07-24 Thread David S. Miller
From: Patrick McHardy <[EMAIL PROTECTED]>
Date: Sat, 23 Jul 2005 18:54:13 +0200

> Let me propose again to just set it here for all cases. So far there
> hasn't been a single exception where indev is not the input device
> as seen by netif_rx(), and I don't expect any to come up. In any case
> you should guard this printk by net_ratelimit() to avoid spamming
> peoples logs.

I totally agree, that madness has existed for far too long.

See my other email, and hold on for a bit so I can try and
finish up my real_dev/input_dev killing patch.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net/ipv6/ip6_tunnel.c: implicit declaration of function `xfrm6_tunnel_unregister'

2005-07-24 Thread David S. Miller
From: Cal Peake <[EMAIL PROTECTED]>
Date: Sat, 23 Jul 2005 20:50:48 -0400 (EDT)

> This patch seems correct:
> 
> Signed-off-by: Cal Peake <[EMAIL PROTECTED]>

Applied, thanks Cal.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


<    1   2   3   4   5   6   7   8   9   10   >