date:20070927

Re: e100 problems in .23rc8 ?

2007-09-27 Thread Herbert Xu

Kok, Auke [EMAIL PROTECTED] wrote:
 Dave Jones wrote:
 Last night, I hit this bug during boot up..
 http://www.codemonkey.org.uk/junk/e100-2.jpg
 
 This morning, I got a mail from a Fedora user of the same
 .23-rc8 based kernel that has seen a different trace
 also implicating e100..
 
 http://www.codemonkey.org.uk/junk/e100.jpg
 
 It may be that the two problems are unrelated, and it's
 just coincidence that both reports happen to be on an e100,
 but the timing is odd.  Have there been other reports
 of similar problems recently ?
 
 there hasn't been a change to e100 in two months now - perhaps something 
 slipped
 into the stack that broke it? If this reproduces, could you bisect?

Well this looks exactly like the e1000 race that we fixed around
the time of the last kernel release.  That fix never made it into
e100 so it's no surprise that we get a similar crash here.

The problem is that if a spurious interrupt comes in between
request_irq and netif_poll_enable then you'll get a crash at
the next netif_rx_complete.

It'd be good if this were reproducible as it would allow us
to identify the source of the spurious interrupt, which may
well be caused by an unrelated bug somewhere else.

In any case, e100 should be prepared to deal with spurious
interrupts as e1000 has been fixed to do.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 2/2][BNX2]: Add iSCSI support to BNX2 devices.

2007-09-27 Thread Jeff Garzik


FUJITA Tomonori wrote:

Yeah, we could nicely handle lld's restrictions (especially with
stacking devices). But iommu code needs only max_segment_size and
seg_boundary_mask, right? If so, the first simple approach to add two
values to device structure is not so bad, I think.


(replying to slightly older email in the thread)
(added benh, since we've discussed this issue in the past)

dumb question, what happened to seg_boundary_mask?

If you look at drivers/ata/libata-core.c:ata_fill_sg(), you will note 
that we split s/g segments after DMA-mapping.  Looking at libata LLDD's, 
you will also note judicious use of ATA_DMA_BOUNDARY (0x).


It was drilled into my head by James and benh that I cannot rely on the 
DMA boundary + block/scsi + dma_map_sg() to ensure that my S/G segments 
never cross a 64K boundary, a legacy IDE requirement.  Thus the 
additional code in ata_fill_sg() to split S/G segments straddling 64K, 
in addition to setting dma boundary to 0x.


A key problem I was hoping would be solved with your work here was the 
elimination of that post dma_map_sg() split.


If I understood James and Ben correctly, one of the key problems was 
always in communicating libata's segment boundary needs to the IOMMU layers?


Jeff


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PKT_SCHED]: Add stateless NAT

2007-09-27 Thread Herbert Xu

Hi:

[PKT_SCHED]: Add stateless NAT

Stateless NAT is useful in controlled environments where restrictions are
placed on through traffic such that we don't need connection tracking to
correctly NAT protocol-specific data.

In particular, this is of interest when the number of flows or the number
of addresses being NATed is large, or if connection tracking information
has to be replicated and where it is not practical to do so.

Previously we had stateless NAT functionality which was integrated into
the IPv4 routing subsystem.  This was a great solution as long as the NAT
worked on a subnet to subnet basis such that the number of NAT rules was
relatively small.  The reason is that for SNAT the routing based system
had to perform a linear scan through the rules.

If the number of rules is large then major renovations would have take
place in the routing subsystem to make this practical.

For the time being, the least intrusive way of achieving this is to use
the u32 classifier written by Alexey Kuznetsov along with the actions
infrastructure implemented by Jamal Hadi Salim.

The following patch is an attempt at this problem by creating a new nat
action that can be invoked from u32 hash tables which would allow large
number of stateless NAT rules that can be used/updated in constant time.

The actual NAT code is mostly based on the previous stateless NAT code
written by Alexey.  In future we might be able to utilise the protocol
NAT code from netfilter to improve support for other protocols.

Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/include/linux/tc_act/tc_nat.h b/include/linux/tc_act/tc_nat.h
new file mode 100644
index 000..9280c6f
--- /dev/null
+++ b/include/linux/tc_act/tc_nat.h
@@ -0,0 +1,29 @@
+#ifndef __LINUX_TC_NAT_H
+#define __LINUX_TC_NAT_H
+
+#include linux/pkt_cls.h
+#include linux/types.h
+
+#define TCA_ACT_NAT 9
+
+enum
+{
+   TCA_NAT_UNSPEC,
+   TCA_NAT_PARMS,
+   TCA_NAT_TM,
+   __TCA_NAT_MAX
+};
+#define TCA_NAT_MAX (__TCA_NAT_MAX - 1)
+
+#define TCA_NAT_FLAG_EGRESS 1
+   
+struct tc_nat
+{
+   tc_gen;
+   __be32 old_addr;
+   __be32 new_addr;
+   __be32 mask;
+   __u32 flags;
+};
+
+#endif
diff --git a/include/net/tc_act/tc_nat.h b/include/net/tc_act/tc_nat.h
new file mode 100644
index 000..4a691f3
--- /dev/null
+++ b/include/net/tc_act/tc_nat.h
@@ -0,0 +1,21 @@
+#ifndef __NET_TC_NAT_H
+#define __NET_TC_NAT_H
+
+#include linux/types.h
+#include net/act_api.h
+
+struct tcf_nat {
+   struct tcf_common common;
+
+   __be32 old_addr;
+   __be32 new_addr;
+   __be32 mask;
+   u32 flags;
+};
+
+static inline struct tcf_nat *to_tcf_nat(struct tcf_common *pc)
+{
+   return container_of(pc, struct tcf_nat, common);
+}
+
+#endif /* __NET_TC_NAT_H */
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 8a74cac..22b34f2 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -447,6 +447,17 @@ config NET_ACT_IPT
  To compile this code as a module, choose M here: the
  module will be called ipt.
 
+config NET_ACT_NAT
+tristate Stateless NAT
+depends on NET_CLS_ACT
+select NETFILTER
+---help---
+ Say Y here to do stateless NAT on IPv4 packets.  You should use
+ netfilter for NAT unless you know what you are doing.
+
+ To compile this code as a module, choose M here: the
+ module will be called ipt.
+
 config NET_ACT_PEDIT
 tristate Packet Editing
 depends on NET_CLS_ACT
diff --git a/net/sched/Makefile b/net/sched/Makefile
index b67c36f..81ecbe8 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -11,6 +11,7 @@ obj-$(CONFIG_NET_ACT_POLICE)  += act_police.o
 obj-$(CONFIG_NET_ACT_GACT) += act_gact.o
 obj-$(CONFIG_NET_ACT_MIRRED)   += act_mirred.o
 obj-$(CONFIG_NET_ACT_IPT)  += act_ipt.o
+obj-$(CONFIG_NET_ACT_NAT)  += act_nat.o
 obj-$(CONFIG_NET_ACT_PEDIT)+= act_pedit.o
 obj-$(CONFIG_NET_ACT_SIMP) += act_simple.o
 obj-$(CONFIG_NET_SCH_FIFO) += sch_fifo.o
diff --git a/net/sched/act_nat.c b/net/sched/act_nat.c
new file mode 100644
index 000..efd6d7d
--- /dev/null
+++ b/net/sched/act_nat.c
@@ -0,0 +1,313 @@
+/*
+ * Stateless NAT actions
+ *
+ * Copyright (c) 2007 Herbert Xu [EMAIL PROTECTED]
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+#include linux/errno.h
+#include linux/init.h
+#include linux/kernel.h
+#include linux/module.h
+#include linux/netfilter.h
+#include linux/rtnetlink.h
+#include

Re: [PATCH v3 2/2][BNX2]: Add iSCSI support to BNX2 devices.

2007-09-27 Thread Benjamin Herrenschmidt


On Thu, 2007-09-27 at 03:31 -0400, Jeff Garzik wrote:
 A key problem I was hoping would be solved with your work here was
 the 
 elimination of that post dma_map_sg() split.
 
 If I understood James and Ben correctly, one of the key problems was 
 always in communicating libata's segment boundary needs to the IOMMU
 layers?

Yup. If we can put some constraint in struct device that the dma mapping
code can then look at ... we also need to ensure that what's passed in
for DMA'ing already matches those constraints as well since no-iommu
platforms will basically just keep the dma table as-is.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/4] net: Dynamically allocate the per cpu counters for the loopback device.

2007-09-27 Thread Eric W. Biederman

David Miller [EMAIL PROTECTED] writes:

 From: [EMAIL PROTECTED] (Eric W. Biederman)
 Date: Wed, 26 Sep 2007 17:53:40 -0600

 This patch add support for dynamically allocating the statistics counters
 for the loopback device and adds appropriate device methods for allocating
 and freeing the loopback device.

 This completes support for creating multiple instances of the loopback
 device,  in preparation for creating per network namespace instances.

 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

 Applied to net-2.6.24, thanks.

 @@ -155,7 +154,8 @@ static int loopback_xmit(struct sk_buff *skb, struct
 net_device *dev)
  dev-last_rx = jiffies;

  /* it's OK to use __get_cpu_var() because BHs are off */
 -lb_stats = __get_cpu_var(pcpu_lstats);
 +pcpu_lstats = netdev_priv(dev);
 +lb_stats = per_cpu_ptr(pcpu_lstats, smp_processor_id());
  lb_stats-bytes += skb-len;
  lb_stats-packets++;

 I'm going to add a followon change that gets rid of that
 comment about __get_cpu_var() since it is no longer
 relevant.

Good point.

I'm not doing get_cpu/put_cpu so does the comment make sense
in relationship to per_cpu_ptr?

Eric

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 2/2][BNX2]: Add iSCSI support to BNX2 devices.

2007-09-27 Thread Jeff Garzik


Benjamin Herrenschmidt wrote:

On Thu, 2007-09-27 at 03:31 -0400, Jeff Garzik wrote:

A key problem I was hoping would be solved with your work here was
the 
elimination of that post dma_map_sg() split.


If I understood James and Ben correctly, one of the key problems was 
always in communicating libata's segment boundary needs to the IOMMU

layers?


Yup. If we can put some constraint in struct device that the dma mapping
code can then look at ... we also need to ensure that what's passed in
for DMA'ing already matches those constraints as well since no-iommu
platforms will basically just keep the dma table as-is.


That's a good point...  no-iommu platforms would need to be updated to 
do the split for me.  I suppose we can steal that code from swiotlb or 
somewhere.


Jeff



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 2/2][BNX2]: Add iSCSI support to BNX2 devices.

2007-09-27 Thread FUJITA Tomonori

CC'ed Jens, James, and linux-scsi.

On Thu, 27 Sep 2007 03:31:55 -0400
Jeff Garzik [EMAIL PROTECTED] wrote:

 FUJITA Tomonori wrote:
  Yeah, we could nicely handle lld's restrictions (especially with
  stacking devices). But iommu code needs only max_segment_size and
  seg_boundary_mask, right? If so, the first simple approach to add two
  values to device structure is not so bad, I think.
 
 (replying to slightly older email in the thread)
 (added benh, since we've discussed this issue in the past)
 
 dumb question, what happened to seg_boundary_mask?

I'll work on it too after finishing max_seg_size.


 If you look at drivers/ata/libata-core.c:ata_fill_sg(), you will note 
 that we split s/g segments after DMA-mapping.  Looking at libata LLDD's, 
 you will also note judicious use of ATA_DMA_BOUNDARY (0x).

I know the workaround since I fixed libata's sg chaining patch.


 It was drilled into my head by James and benh that I cannot rely on the 
 DMA boundary + block/scsi + dma_map_sg() to ensure that my S/G segments 
 never cross a 64K boundary, a legacy IDE requirement.  Thus the 
 additional code in ata_fill_sg() to split S/G segments straddling 64K, 
 in addition to setting dma boundary to 0x.

I think that the block layer can handle both max_segment_size and
seg_boundary_mask properly (and SCSI-ml just uses the block layer). So
if we fix iommu, then we can remove a workaround to fix sg lists in
llds.


 A key problem I was hoping would be solved with your work here was the 
 elimination of that post dma_map_sg() split.

Yeah, that's my goal too.


 If I understood James and Ben correctly, one of the key problems was 
 always in communicating libata's segment boundary needs to the IOMMU layers?
 
   Jeff
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 2/2][BNX2]: Add iSCSI support to BNX2 devices.

2007-09-27 Thread Benjamin Herrenschmidt


On Thu, 2007-09-27 at 03:49 -0400, Jeff Garzik wrote:
 Benjamin Herrenschmidt wrote:
  On Thu, 2007-09-27 at 03:31 -0400, Jeff Garzik wrote:
  A key problem I was hoping would be solved with your work here was
  the 
  elimination of that post dma_map_sg() split.
 
  If I understood James and Ben correctly, one of the key problems was 
  always in communicating libata's segment boundary needs to the IOMMU
  layers?
  
  Yup. If we can put some constraint in struct device that the dma mapping
  code can then look at ... we also need to ensure that what's passed in
  for DMA'ing already matches those constraints as well since no-iommu
  platforms will basically just keep the dma table as-is.
 
 That's a good point...  no-iommu platforms would need to be updated to 
 do the split for me.  I suppose we can steal that code from swiotlb or 
 somewhere.

Doing the split means being able to grow the sglist... which the dma_*
calls can't do at least not in their current form.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 2/2][BNX2]: Add iSCSI support to BNX2 devices.

2007-09-27 Thread Jeff Garzik


Benjamin Herrenschmidt wrote:

On Thu, 2007-09-27 at 03:49 -0400, Jeff Garzik wrote:

Benjamin Herrenschmidt wrote:

On Thu, 2007-09-27 at 03:31 -0400, Jeff Garzik wrote:

A key problem I was hoping would be solved with your work here was
the 
elimination of that post dma_map_sg() split.


If I understood James and Ben correctly, one of the key problems was 
always in communicating libata's segment boundary needs to the IOMMU

layers?

Yup. If we can put some constraint in struct device that the dma mapping
code can then look at ... we also need to ensure that what's passed in
for DMA'ing already matches those constraints as well since no-iommu
platforms will basically just keep the dma table as-is.
That's a good point...  no-iommu platforms would need to be updated to 
do the split for me.  I suppose we can steal that code from swiotlb or 
somewhere.


Doing the split means being able to grow the sglist... which the dma_*
calls can't do at least not in their current form.


IMO one straightforward approach is for the struct scatterlist owner to 
provide a table large enough to accomodate the possible splits (perhaps 
along with communicate that table's max size to the IOMMU/dma layers).


Jeff


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 2/2][BNX2]: Add iSCSI support to BNX2 devices.

2007-09-27 Thread Jeff Garzik


FUJITA Tomonori wrote:

CC'ed Jens, James, and linux-scsi.

On Thu, 27 Sep 2007 03:31:55 -0400
Jeff Garzik [EMAIL PROTECTED] wrote:


FUJITA Tomonori wrote:

Yeah, we could nicely handle lld's restrictions (especially with
stacking devices). But iommu code needs only max_segment_size and
seg_boundary_mask, right? If so, the first simple approach to add two
values to device structure is not so bad, I think.

(replying to slightly older email in the thread)
(added benh, since we've discussed this issue in the past)

dumb question, what happened to seg_boundary_mask?


I'll work on it too after finishing max_seg_size.


If you look at drivers/ata/libata-core.c:ata_fill_sg(), you will note 
that we split s/g segments after DMA-mapping.  Looking at libata LLDD's, 
you will also note judicious use of ATA_DMA_BOUNDARY (0x).


I know the workaround since I fixed libata's sg chaining patch.


It was drilled into my head by James and benh that I cannot rely on the 
DMA boundary + block/scsi + dma_map_sg() to ensure that my S/G segments 
never cross a 64K boundary, a legacy IDE requirement.  Thus the 
additional code in ata_fill_sg() to split S/G segments straddling 64K, 
in addition to setting dma boundary to 0x.


I think that the block layer can handle both max_segment_size and
seg_boundary_mask properly (and SCSI-ml just uses the block layer). So
if we fix iommu, then we can remove a workaround to fix sg lists in
llds.


A key problem I was hoping would be solved with your work here was the 
elimination of that post dma_map_sg() split.


Yeah, that's my goal too.


Great :)  Well, I'm generally happy with your max-seg-size stuff (sans 
the minor nits I pointed out in another email).


Thanks for pursuing this,

Jeff



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 2/2][BNX2]: Add iSCSI support to BNX2 devices.

2007-09-27 Thread FUJITA Tomonori

CC'ed Jens, James, and linux-scsi again.

On Thu, 27 Sep 2007 04:22:15 -0400
Jeff Garzik [EMAIL PROTECTED] wrote:

 Benjamin Herrenschmidt wrote:
  On Thu, 2007-09-27 at 03:49 -0400, Jeff Garzik wrote:
  Benjamin Herrenschmidt wrote:
  On Thu, 2007-09-27 at 03:31 -0400, Jeff Garzik wrote:
  A key problem I was hoping would be solved with your work here was
  the 
  elimination of that post dma_map_sg() split.
 
  If I understood James and Ben correctly, one of the key problems was 
  always in communicating libata's segment boundary needs to the IOMMU
  layers?
  Yup. If we can put some constraint in struct device that the dma mapping
  code can then look at ... we also need to ensure that what's passed in
  for DMA'ing already matches those constraints as well since no-iommu
  platforms will basically just keep the dma table as-is.
  That's a good point...  no-iommu platforms would need to be updated to 
  do the split for me.  I suppose we can steal that code from swiotlb or 
  somewhere.
  
  Doing the split means being able to grow the sglist... which the dma_*
  calls can't do at least not in their current form.
 
 IMO one straightforward approach is for the struct scatterlist owner to 
 provide a table large enough to accomodate the possible splits (perhaps 
 along with communicate that table's max size to the IOMMU/dma layers).

As I said in another mail, the block layer and scsi-ml work properly,
I think. So there is no need to split sg lists for no-iommu platforms.

We need to fix only iommu code merge sglists (already done) for the
segment size restriction but we need to fix all iommu code and swiotlb
for the segment boundary restriction. Splitting sg list might be
useful for the case that iommu can't find a proper boundary memory
area. But I think that it rarely happens (and there are few llds has
the boundary restriction).
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: Add network namespace clone unshare support.

2007-09-27 Thread Cedric Le Goater

Eric W. Biederman wrote:
 This patch allows you to create a new network namespace
 using sys_clone, or sys_unshare.
 
 As the network namespace is still experimental and under development
 clone and unshare support is only made available when CONFIG_NET_NS is
 selected at compile time.
 
 As this patch introduces network namespace support into code paths
 that exist when the CONFIG_NET is not selected there are a few
 additions made to net_namespace.h to allow a few more functions
 to be used when the networking stack is not compiled in.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
 ---
  include/linux/sched.h   |1 +
  include/net/net_namespace.h |   18 ++
  kernel/fork.c   |3 ++-
  kernel/nsproxy.c|   15 +--
  net/Kconfig |8 
  net/core/net_namespace.c|   43 
 +--
  6 files changed, 83 insertions(+), 5 deletions(-)
 
 diff --git a/include/linux/sched.h b/include/linux/sched.h
 index a01ac6d..e10a0a8 100644
 --- a/include/linux/sched.h
 +++ b/include/linux/sched.h
 @@ -27,6 +27,7 @@
  #define CLONE_NEWUTS 0x0400  /* New utsname group? */
  #define CLONE_NEWIPC 0x0800  /* New ipcs */
  #define CLONE_NEWUSER0x1000  /* New user namespace */
 +#define CLONE_NEWNET 0x2000  /* New network namespace */

This new flag is going to conflict with the pid namespace flag 
CLONE_NEWPID in -mm. It might be worth changing it to:

#define CLONE_NEWNET0x4000

The changes in nxproxy.c and fork.c will also conflict but I don't 
think we can do much about it for now.

C. 

  /*
   * Scheduling policies
 diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
 index ac8f830..3ea4194 100644
 --- a/include/net/net_namespace.h
 +++ b/include/net/net_namespace.h
 @@ -38,11 +38,23 @@ extern struct net init_net;
 
  extern struct list_head net_namespace_list;
 
 +#ifdef CONFIG_NET
 +extern struct net *copy_net_ns(unsigned long flags, struct net *net_ns);
 +#else
 +static inline struct net *copy_net_ns(unsigned long flags, struct net 
 *net_ns)
 +{
 + /* There is nothing to copy so this is a noop */
 + return net_ns;
 +}
 +#endif
 +
  extern void __put_net(struct net *net);
 
  static inline struct net *get_net(struct net *net)
  {
 +#ifdef CONFIG_NET
   atomic_inc(net-count);
 +#endif
   return net;
  }
 
 @@ -60,19 +72,25 @@ static inline struct net *maybe_get_net(struct net *net)
 
  static inline void put_net(struct net *net)
  {
 +#ifdef CONFIG_NET
   if (atomic_dec_and_test(net-count))
   __put_net(net);
 +#endif
  }
 
  static inline struct net *hold_net(struct net *net)
  {
 +#ifdef CONFIG_NET
   atomic_inc(net-use_count);
 +#endif
   return net;
  }
 
  static inline void release_net(struct net *net)
  {
 +#ifdef CONFIG_NET
   atomic_dec(net-use_count);
 +#endif
  }
 
  extern void net_lock(void);
 diff --git a/kernel/fork.c b/kernel/fork.c
 index 33f12f4..5e67f90 100644
 --- a/kernel/fork.c
 +++ b/kernel/fork.c
 @@ -1608,7 +1608,8 @@ asmlinkage long sys_unshare(unsigned long unshare_flags)
   err = -EINVAL;
   if (unshare_flags  ~(CLONE_THREAD|CLONE_FS|CLONE_NEWNS|CLONE_SIGHAND|
   CLONE_VM|CLONE_FILES|CLONE_SYSVSEM|
 - CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWUSER))
 + CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWUSER|
 + CLONE_NEWNET))
   goto bad_unshare_out;
 
   if ((err = unshare_thread(unshare_flags)))
 diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
 index a4fb7d4..f1decd2 100644
 --- a/kernel/nsproxy.c
 +++ b/kernel/nsproxy.c
 @@ -20,6 +20,7 @@
  #include linux/mnt_namespace.h
  #include linux/utsname.h
  #include linux/pid_namespace.h
 +#include net/net_namespace.h
 
  static struct kmem_cache *nsproxy_cachep;
 
 @@ -98,8 +99,17 @@ static struct nsproxy *create_new_namespaces(unsigned long 
 flags,
   goto out_user;
   }
 
 + new_nsp-net_ns = copy_net_ns(flags, tsk-nsproxy-net_ns);
 + if (IS_ERR(new_nsp-net_ns)) {
 + err = PTR_ERR(new_nsp-net_ns);
 + goto out_net;
 + }
 +
   return new_nsp;
 
 +out_net:
 + if (new_nsp-user_ns)
 + put_user_ns(new_nsp-user_ns);
  out_user:
   if (new_nsp-pid_ns)
   put_pid_ns(new_nsp-pid_ns);
 @@ -132,7 +142,7 @@ int copy_namespaces(unsigned long flags, struct 
 task_struct *tsk)
 
   get_nsproxy(old_ns);
 
 - if (!(flags  (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | 
 CLONE_NEWUSER)))
 + if (!(flags  (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | 
 CLONE_NEWUSER | CLONE_NEWNET)))
   return 0;
 
   if (!capable(CAP_SYS_ADMIN)) {
 @@ -164,6 +174,7 @@ void free_nsproxy(struct nsproxy *ns)
   put_pid_ns(ns-pid_ns);
   if (ns-user_ns)

Re: [PATCH] sky2: sky2 FE+ receive status workaround

2007-09-27 Thread Jochen Voß


Hi Stephen,

On 27 Sep 2007, at 01:58, Stephen Hemminger wrote:

+   /* This chip has hardware problems that generates bogus status.
+* So do only marginal checking and expect higher level protocols
+* to handle crap frames.
+*/
+   if (sky2-hw-chip_id == CHIP_ID_YUKON_FE_P 
+   sky2-hw-chip_rev == CHIP_REV_YU_FE2_A0 
+   length != count)
+   goto okay;


Shouldn't the condition be length == count?

I hope this helps,
Jochen
--
http://seehuhn.de/


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PKT_SCHED]: Add stateless NAT

2007-09-27 Thread Evgeniy Polyakov

Hi Herbert.

On Thu, Sep 27, 2007 at 03:34:47PM +0800, Herbert Xu ([EMAIL PROTECTED]) wrote:
 Hi:
 
 [PKT_SCHED]: Add stateless NAT
 
 Stateless NAT is useful in controlled environments where restrictions are
 placed on through traffic such that we don't need connection tracking to
 correctly NAT protocol-specific data.

Couple of comments below.
 --- a/net/sched/Kconfig
 +++ b/net/sched/Kconfig
 @@ -447,6 +447,17 @@ config NET_ACT_IPT
 To compile this code as a module, choose M here: the
 module will be called ipt.

 +config NET_ACT_NAT
 +tristate Stateless NAT
 +depends on NET_CLS_ACT
 +select NETFILTER

Argh... People usually do not understand such jokes :)
What about not using netfilter helpers and just move them to the
accessible header so that no additional slow path would ever be enabled?

 +---help---
 +   Say Y here to do stateless NAT on IPv4 packets.  You should use
 +   netfilter for NAT unless you know what you are doing.
 +
 +   To compile this code as a module, choose M here: the
 +   module will be called ipt.
 +

Modile will be called 'nat' I believe.

 +++ b/net/sched/act_nat.c
...
 +#define NAT_TAB_MASK 15

This really wants to be configurable at least via module parameter.

 +static struct tcf_common *tcf_nat_ht[NAT_TAB_MASK + 1];
 +static u32 nat_idx_gen;
 +static DEFINE_RWLOCK(nat_lock);

 +static struct tcf_hashinfo nat_hash_info = {
 + .htab   =   tcf_nat_ht,
 + .hmask  =   NAT_TAB_MASK,
 + .lock   =   nat_lock,
 +};

When I read this I swear I heard 'I want to be RCU'.
But that is another task.

 +static int tcf_nat(struct sk_buff *skb, struct tc_action *a,
 +struct tcf_result *res)
 +{
 + struct tcf_nat *p = a-priv;
 + struct iphdr *iph;
 + __be32 old_addr;
 + __be32 new_addr;
 + __be32 mask;
 + __be32 addr;
 + int egress;
 + int action;
 + int ihl;
 +
 + spin_lock(p-tcf_lock);
 +
 + p-tcf_tm.lastuse = jiffies;
 + old_addr = p-old_addr;
 + new_addr = p-new_addr;
 + mask = p-mask;
 + egress = p-flags  TCA_NAT_FLAG_EGRESS;
 + action = p-tcf_action;
 +
 + p-tcf_bstats.bytes += skb-len;
 + p-tcf_bstats.packets++;
 +
 + spin_unlock(p-tcf_lock);
 +
 + if (!pskb_may_pull(skb, sizeof(*iph)))
 + return TC_ACT_SHOT;
 +
 + iph = ip_hdr(skb);
 +
 + if (egress)
 + addr = iph-saddr;
 + else
 + addr = iph-daddr;
 +
 + if (!((old_addr ^ addr)  mask)) {
 + if (skb_cloned(skb) 
 + !skb_clone_writable(skb, sizeof(*iph)) 
 + pskb_expand_head(skb, 0, 0, GFP_ATOMIC))
 + return TC_ACT_SHOT;
 +
 + new_addr = mask;
 + new_addr |= addr  ~mask;
 +
 + /* Rewrite IP header */
 + iph = ip_hdr(skb);
 + if (egress)
 + iph-saddr = new_addr;
 + else
 + iph-daddr = new_addr;
 +
 + nf_csum_replace4(iph-check, addr, new_addr);
 + }
 +
 + ihl = iph-ihl * 4;
 +
 + /* It would be nice to share code with stateful NAT. */
 + switch (iph-frag_off  htons(IP_OFFSET) ? 0 : iph-protocol) {
 + case IPPROTO_TCP:
 + {
 + struct tcphdr *tcph;
 +
 + if (!pskb_may_pull(skb, ihl + sizeof(*tcph)) ||
 + (skb_cloned(skb) 
 +  !skb_clone_writable(skb, ihl + sizeof(*tcph)) 
 +  pskb_expand_head(skb, 0, 0, GFP_ATOMIC)))
 + return TC_ACT_SHOT;
 +
 + tcph = (void *)(skb_network_header(skb) + ihl);

Were you too lazy to write struct tcphdr here and in other places? :)


-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PKT_SCHED]: Add stateless NAT

2007-09-27 Thread Herbert Xu

On Thu, Sep 27, 2007 at 01:25:12PM +0400, Evgeniy Polyakov wrote:

 Couple of comments below.

Thanks Evgeniey :)

  --- a/net/sched/Kconfig
  +++ b/net/sched/Kconfig
  @@ -447,6 +447,17 @@ config NET_ACT_IPT
To compile this code as a module, choose M here: the
module will be called ipt.
 
  +config NET_ACT_NAT
  +tristate Stateless NAT
  +depends on NET_CLS_ACT
  +select NETFILTER
 
 Argh... People usually do not understand such jokes :)
 What about not using netfilter helpers and just move them to the
 accessible header so that no additional slow path would ever be enabled?

Sure.  However, as it is it's just including the netfilter core
which does not mean the inclusion of connection trakcing.  It's
only connection tracking that *may* (so don't flame me for this :)
pose a scalability problem.

  +---help---
  + Say Y here to do stateless NAT on IPv4 packets.  You should use
  + netfilter for NAT unless you know what you are doing.
  +
  + To compile this code as a module, choose M here: the
  + module will be called ipt.
  +
 
 Modile will be called 'nat' I believe.

Good catch, now you know where I copied it from :)

  +++ b/net/sched/act_nat.c
 ...
  +#define NAT_TAB_MASK   15
 
 This really wants to be configurable at least via module parameter.
 
  +static struct tcf_common *tcf_nat_ht[NAT_TAB_MASK + 1];
  +static u32 nat_idx_gen;
  +static DEFINE_RWLOCK(nat_lock);
 
  +static struct tcf_hashinfo nat_hash_info = {
  +   .htab   =   tcf_nat_ht,
  +   .hmask  =   NAT_TAB_MASK,
  +   .lock   =   nat_lock,
  +};
 
 When I read this I swear I heard 'I want to be RCU'.
 But that is another task.

Yes there are a lot of clean-up's that can be done for all
actions.  You're most welcome to send patches in this area.

  +   tcph = (void *)(skb_network_header(skb) + ihl);
 
 Were you too lazy to write struct tcphdr here and in other places? :)

Unfortunately it doesn't work.  For prerouting, we've not
entered the IP stack yet so the transport header isn't set.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PKT_SCHED]: Add stateless NAT

2007-09-27 Thread Evgeniy Polyakov

On Thu, Sep 27, 2007 at 05:33:58PM +0800, Herbert Xu ([EMAIL PROTECTED]) wrote:
   +config NET_ACT_NAT
   +tristate Stateless NAT
   +depends on NET_CLS_ACT
   +select NETFILTER
  
  Argh... People usually do not understand such jokes :)
  What about not using netfilter helpers and just move them to the
  accessible header so that no additional slow path would ever be enabled?
 
 Sure.  However, as it is it's just including the netfilter core
 which does not mean the inclusion of connection trakcing.  It's
 only connection tracking that *may* (so don't flame me for this :)
 pose a scalability problem.

It forces all inpuit/pre/post/forward hooks to be enbled not as a direct
function call, but as additional lookups. And unability to remove
netfilter from config. And just because of couple of checksum helpers...

   +++ b/net/sched/act_nat.c
  ...
   +#define NAT_TAB_MASK 15
  
  This really wants to be configurable at least via module parameter.
  
   +static struct tcf_common *tcf_nat_ht[NAT_TAB_MASK + 1];
   +static u32 nat_idx_gen;
   +static DEFINE_RWLOCK(nat_lock);
  
   +static struct tcf_hashinfo nat_hash_info = {
   + .htab   =   tcf_nat_ht,
   + .hmask  =   NAT_TAB_MASK,
   + .lock   =   nat_lock,
   +};
  
  When I read this I swear I heard 'I want to be RCU'.
  But that is another task.
 
 Yes there are a lot of clean-up's that can be done for all
 actions.  You're most welcome to send patches in this area.
 
   + tcph = (void *)(skb_network_header(skb) + ihl);
  
  Were you too lazy to write struct tcphdr here and in other places? :)
 
 Unfortunately it doesn't work.  For prerouting, we've not
 entered the IP stack yet so the transport header isn't set.

I meant instead of dereferencing to void * it should be struct tcphdr *.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PKT_SCHED]: Add stateless NAT

2007-09-27 Thread Herbert Xu

On Thu, Sep 27, 2007 at 02:07:53PM +0400, Evgeniy Polyakov wrote:

 It forces all inpuit/pre/post/forward hooks to be enbled not as a direct
 function call, but as additional lookups. And unability to remove
 netfilter from config. And just because of couple of checksum helpers...

I'm certainly not against patches moving that code out of
netfilter.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/4] net ipv4: When possible test for IFF_LOOPBACK and not dev == loopback_dev

2007-09-27 Thread Daniel Lezcano


Eric W. Biederman wrote:

Now that multiple loopback devices are becoming possible it makes
the code a little cleaner and more maintainable to test if a deivice
is th a loopback device by testing dev-flags  IFF_LOOPBACK instead
of dev == loopback_dev.

Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]



Urs Thuermann posted the patch:

[PATCH 5/7] CAN: Add virtual CAN netdevice driver

This network driver set its flag to IFF_LOOPBACK for testing.
Is it possible this can be a collision with your patch ?

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] fixed broken bootp compilation

2007-09-27 Thread Denis V. Lunev

Compilation fix. Extra bracket removed.
Broken by [NET]: Wrap netdevice hardware header creation from
Stephen Hemminger [EMAIL PROTECTED]

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]

--- ./net/ipv4/ipconfig.c.compile   2007-09-27 13:32:35.0 +0400
+++ ./net/ipv4/ipconfig.c   2007-09-27 14:36:19.0 +0400
@@ -758,7 +758,7 @@ static void __init ic_bootp_send_if(stru
skb-dev = dev;
skb-protocol = htons(ETH_P_IP);
if (dev_hard_header(skb, dev, ntohs(skb-protocol),
-   dev-broadcast, dev-dev_addr, skb-len)  0) ||
+   dev-broadcast, dev-dev_addr, skb-len)  0 ||
dev_queue_xmit(skb)  0)
printk(E);
 }
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Linux networking implementation and packet capture

2007-09-27 Thread Alan Menegotto


Gaurav Aggarwal escreveu:

Hi,

I am trying to understand the implementation of linux 2.4 and linux 
2.6's networking (IPV4) . Can anyone give me some idea/pointers about 
some of the good resources/whitepapers available in the market to 
understand the same. If there is any document that mention the changes 
between the implementation of networking in 2.4  2.6


I am also trying to write a simple program(preferably a userspace 
application) which captures all the incoming and outgoing packets of a 
particular machine (preferably at PREROUTING stage), then according to 
the SRC/DST addresses, changes the IP address of some of the packets 
and then reinject it back into the local IP stack. I am able to do 
that in 2.4 kernel by using libipq and ip_tables but that prog is not 
running in 2.6 kernel. (It hits at ip_route_BUG). Any idea or code 
snippet will be really appreciated.


--
Regards,
Gaurav Aggarwal

You may also see the information on the linux-net wiki: 
http://linux-net.osdl.org/index.php/Main_Page.


Also, read some threads on netdev list. It has a lot of information 
there too.






--


--
Best Regards

Alan Menegotto

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

ax88796: add 93cx6 eeprom support

2007-09-27 Thread Magnus Damm

ax88796: add 93cx6 eeprom support

This patch hooks up the 93cx6 eeprom code to the ax88796 driver and modifies
the ax88796 driver to read out the mac address from the eeprom. We need
this for the ax88796 on certain SuperH boards. The pin configuration used
to connect the eeprom to the ax88796 on these boards is the same as pointed
out by the ax88796 datasheet, so we can probably reuse this code for multiple
platforms in the future.

Signed-off-by: Magnus Damm [EMAIL PROTECTED]
---

 This is a broken out version of the larger patch recently posted to netdev:
 http://www.mail-archive.com/netdev@vger.kernel.org/msg47278.html

 drivers/net/Kconfig  |7 ++
 drivers/net/ax88796.c|   49 ++
 include/linux/eeprom_93cx6.h |3 +-
 include/net/ax88796.h|1 
 4 files changed, 59 insertions(+), 1 deletion(-)

--- 0001/drivers/net/Kconfig
+++ work/drivers/net/Kconfig2007-09-27 19:32:10.0 +0900
@@ -225,6 +225,13 @@ config AX88796
  AX88796 driver, using platform bus to provide
  chip detection and resources
 
+config AX88796_93CX6
+   bool ASIX AX88796 external 93CX6 eeprom support
+   depends on AX88796
+   select EEPROM_93CX6
+   help
+ Select this if your platform comes with an external 93CX6 eeprom.
+
 config MACE
tristate MACE (Power Mac ethernet) support
depends on PPC_PMAC  PPC32
--- 0001/drivers/net/ax88796.c
+++ work/drivers/net/ax88796.c  2007-09-27 19:17:44.0 +0900
@@ -24,6 +24,7 @@
 #include linux/etherdevice.h
 #include linux/ethtool.h
 #include linux/mii.h
+#include linux/eeprom_93cx6.h
 
 #include net/ax88796.h
 
@@ -582,6 +583,37 @@ static const struct ethtool_ops ax_ethto
.get_link   = ax_get_link,
 };
 
+#ifdef CONFIG_AX88796_93CX6
+static void ax_eeprom_register_read(struct eeprom_93cx6 *eeprom)
+{
+   struct ei_device *ei_local = eeprom-data;
+   u8 reg = ei_inb(ei_local-mem + AX_MEMR);
+
+   eeprom-reg_data_in = reg  AX_MEMR_EEI;
+   eeprom-reg_data_out = reg  AX_MEMR_EEO; /* Input pin */
+   eeprom-reg_data_clock = reg  AX_MEMR_EECLK;
+   eeprom-reg_chip_select = reg  AX_MEMR_EECS;
+}
+
+static void ax_eeprom_register_write(struct eeprom_93cx6 *eeprom)
+{
+   struct ei_device *ei_local = eeprom-data;
+   u8 reg = ei_inb(ei_local-mem + AX_MEMR);
+
+   reg = ~(AX_MEMR_EEI | AX_MEMR_EECLK | AX_MEMR_EECS);
+
+   if (eeprom-reg_data_in)
+   reg |= AX_MEMR_EEI;
+   if (eeprom-reg_data_clock)
+   reg |= AX_MEMR_EECLK;
+   if (eeprom-reg_chip_select)
+   reg |= AX_MEMR_EECS;
+
+   ei_outb(reg, ei_local-mem + AX_MEMR);
+   udelay(10);
+}
+#endif
+
 /* setup code */
 
 static void ax_initial_setup(struct net_device *dev, struct ei_device 
*ei_local)
@@ -640,6 +672,23 @@ static int ax_init_dev(struct net_device
memcpy(dev-dev_addr,  SA_prom, 6);
}
 
+#ifdef CONFIG_AX88796_93CX6
+   if (first_init  ax-plat-flags  AXFLG_HAS_93CX6) {
+   unsigned char mac_addr[6];
+   struct eeprom_93cx6 eeprom;
+
+   eeprom.data = ei_local;
+   eeprom.register_read = ax_eeprom_register_read;
+   eeprom.register_write = ax_eeprom_register_write;
+   eeprom.width = PCI_EEPROM_WIDTH_93C56;
+
+   eeprom_93cx6_multiread(eeprom, 0,
+  (__le16 __force *)mac_addr,
+  sizeof(mac_addr)  1);
+
+   memcpy(dev-dev_addr,  mac_addr, 6);
+   }
+#endif
if (ax-plat-wordlength == 2) {
/* We must set the 8390 for word mode. */
ei_outb(ax-plat-dcr_val, ei_local-mem + EN0_DCFG);
--- 0001/include/linux/eeprom_93cx6.h
+++ work/include/linux/eeprom_93cx6.h   2007-09-27 19:17:44.0 +0900
@@ -21,13 +21,14 @@
 /*
Module: eeprom_93cx6
Abstract: EEPROM reader datastructures for 93cx6 chipsets.
-   Supported chipsets: 93c46  93c66.
+   Supported chipsets: 93c46, 93c56 and 93c66.
  */
 
 /*
  * EEPROM operation defines.
  */
 #define PCI_EEPROM_WIDTH_93C46 6
+#define PCI_EEPROM_WIDTH_93C56 8
 #define PCI_EEPROM_WIDTH_93C66 8
 #define PCI_EEPROM_WIDTH_OPCODE3
 #define PCI_EEPROM_WRITE_OPCODE0x05
--- 0001/include/net/ax88796.h
+++ work/include/net/ax88796.h  2007-09-27 19:17:44.0 +0900
@@ -14,6 +14,7 @@
 
 #define AXFLG_HAS_EEPROM   (10)
 #define AXFLG_MAC_FROMDEV  (11)  /* device already has MAC */
+#define AXFLG_HAS_93CX6(12)  /* use eeprom_93cx6 
driver */
 
 struct ax_plat_data {
unsigned int flags;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.23-rc8-mm2 - drivers/net/ibm_newemac/mal - broken

2007-09-27 Thread Kamalesh Babulal

Andrew Morton wrote:
 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc8/2.6.23-rc8-mm2/

Hi Andrew,

The drivers/net/ibm_newemac/mal seems to be broken with 2.6.23-rc8-mm2 also, it 
was
reported on 2.6.23-rc8-mm1 (http://lkml.org/lkml/2007/9/25/173).


-- 
Thanks  Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Devel] [PATCH 4/4] net: Make the loopback device per network namespace

2007-09-27 Thread Denis V. Lunev

Eric W. Biederman wrote:
 This patch makes loopback_dev per network namespace.  Adding
 code to create a different loopback device for each network
 namespace and adding the code to free a loopback device
 when a network namespace exits.
 
 This patch modifies all users the loopback_dev so they
 access it as init_net.loopback_dev, keeping all of the
 code compiling and working.  A later pass will be needed to
 update the users to use something other than the initial network
 namespace.

A pity that an important bit of explanation is missed. The
initialization of loopback_dev is moved from a chain of devices
(init_module) to a subsystem initialization to keep proper order, i.e.
we must be sure that the initialization order is correct.

Regards,
Den
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] proper comment for loopback initialization order

2007-09-27 Thread Denis V. Lunev

Loopback device is special. It should be initialized at the very
beginning.  Initialization order has been changed by
Eric W. Biederman [EMAIL PROTECTED] and this change is non-obvious
and important enough to add proper comment.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]

--- ./drivers/net/loopback.c.loopcomment2007-08-26 19:30:38.0 
+0400
+++ ./drivers/net/loopback.c2007-09-27 16:08:06.0 +0400
@@ -293,4 +293,6 @@ static int __init loopback_init(void)
return register_pernet_device(loopback_net_ops);
 }
 
+/* Loopback is special. It should be initialized before any other network
+   device and network subsystem */
 fs_initcall(loopback_init);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PKT_SCHED]: Add stateless NAT

2007-09-27 Thread jamal

On Thu, 2007-27-09 at 16:41 +0400, Evgeniy Polyakov wrote:

 I've attached simple patch which moves checksum helpers out of
 CONFIG_NETFILTER option but still in the same linux/netfilter.h header.
 This should be enough for removing 'select NETFILTER' in your patch.

Is there any point in keeping the code inside netfilter or keeping
the nf_ prefix? something in net/utilities/ or net/core maybe?
the nf_* can still exist in netfilter as aliases to wherever this is
moved to.

cheers,
jamal



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PKT_SCHED]: Add stateless NAT

2007-09-27 Thread Evgeniy Polyakov

On Thu, Sep 27, 2007 at 08:52:03AM -0400, jamal ([EMAIL PROTECTED]) wrote:
 On Thu, 2007-27-09 at 16:41 +0400, Evgeniy Polyakov wrote:
 
  I've attached simple patch which moves checksum helpers out of
  CONFIG_NETFILTER option but still in the same linux/netfilter.h header.
  This should be enough for removing 'select NETFILTER' in your patch.
 
 Is there any point in keeping the code inside netfilter or keeping
 the nf_ prefix? something in net/utilities/ or net/core maybe?
 the nf_* can still exist in netfilter as aliases to wherever this is
 moved to.

No, there is no point in keeping that, I just wanted the smallest
possible change :)

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PKT_SCHED]: Add stateless NAT

2007-09-27 Thread Herbert Xu

On Thu, Sep 27, 2007 at 08:39:45AM -0400, jamal wrote:

 Do you have plans to do the iproute bits? If you do it will be nice to
 also update the doc/examples with some simple example(s).

Oh yes, I didn't test this by poking bits in the kernel
you know :)

Here are the iproute bits.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/include/utils.h b/include/utils.h
index a3fd335..b0dc03e 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -36,7 +36,7 @@ extern char * _SL_;
 
 extern void incomplete_command(void) __attribute__((noreturn));
 
-#define NEXT_ARG() do { argv++; if (--argc = 0) incomplete_command(); } 
while(0)
+#define NEXT_ARG() do { argv++; if (--argc  0) incomplete_command(); } 
while(0)
 #define NEXT_ARG_OK() (argc - 1  0)
 #define PREV_ARG() do { argv--; argc++; } while(0)
 
diff --git a/tc/Makefile b/tc/Makefile
index 22cd437..cd5a69e 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -26,6 +26,7 @@ TCMODULES += q_htb.o
 TCMODULES += m_gact.o
 TCMODULES += m_mirred.o
 TCMODULES += m_ipt.o
+TCMODULES += m_nat.o
 TCMODULES += m_pedit.o
 TCMODULES += p_ip.o
 TCMODULES += p_icmp.o
diff --git a/tc/m_nat.c b/tc/m_nat.c
new file mode 100644
index 000..9a6c7da
--- /dev/null
+++ b/tc/m_nat.c
@@ -0,0 +1,208 @@
+/*
+ * m_nat.c NAT module
+ *
+ * This program is free software; you can distribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Authors:Herbert Xu [EMAIL PROTECTED]
+ *
+ */
+
+#include stdio.h
+#include stdlib.h
+#include unistd.h
+#include syslog.h
+#include fcntl.h
+#include sys/socket.h
+#include netinet/in.h
+#include arpa/inet.h
+#include string.h
+#include dlfcn.h
+#include utils.h
+#include tc_util.h
+#include linux/tc_act/tc_nat.h
+
+static void
+explain(void)
+{
+   fprintf(stderr, Usage: ... nat NAT\n
+   NAT := DIRECTION OLD NEW\n
+   DIRECTION := { ingress | egress }\n
+   OLD := PREFIX\n
+   NEW := ADDRESS\n);
+}
+
+static void
+usage(void)
+{
+   explain();
+   exit(-1);
+}
+
+static int
+parse_nat_args(int *argc_p, char ***argv_p,struct tc_nat *sel)
+{
+   int argc = *argc_p;
+   char **argv = *argv_p;
+   inet_prefix addr;
+
+   if (argc = 0)
+   return -1;
+
+   if (matches(*argv, egress) == 0)
+   sel-flags |= TCA_NAT_FLAG_EGRESS;
+   else if (matches(*argv, ingress) != 0)
+   goto bad_val;
+
+   NEXT_ARG();
+
+   if (get_prefix_1(addr, *argv, AF_INET))
+   goto bad_val;
+
+   sel-old_addr = addr.data[0];
+   sel-mask = htonl(~0u  (32 - addr.bitlen));
+
+   NEXT_ARG();
+
+   if (get_prefix_1(addr, *argv, AF_INET))
+   goto bad_val;
+
+   sel-new_addr = addr.data[0];
+
+   NEXT_ARG();
+
+   *argc_p = argc;
+   *argv_p = argv;
+   return 0;
+
+bad_val:
+   return -1;
+}
+
+static int
+parse_nat(struct action_util *a, int *argc_p, char ***argv_p, int tca_id, 
struct nlmsghdr *n)
+{
+   struct tc_nat sel;
+
+   int argc = *argc_p;
+   char **argv = *argv_p;
+   int ok = 0, iok = 0;
+   struct rtattr *tail;
+
+   memset(sel, 0, sizeof(sel));
+
+   while (argc  0) {
+   if (matches(*argv, nat) == 0) {
+   NEXT_ARG();
+   if (parse_nat_args(argc, argv, sel)) {
+   fprintf(stderr, Illegal nat construct (%s) \n,
+   *argv);
+   explain();
+   return -1;
+   }
+   ok++;
+   continue;
+   } else if (matches(*argv, help) == 0) {
+   usage();
+   } else {
+   break;
+   }
+
+   }
+
+   if (!ok) {
+   explain();
+   return -1;
+   }
+
+   if (argc) {
+   if (matches(*argv, reclassify) == 0) {
+   sel.action = TC_ACT_RECLASSIFY;
+   NEXT_ARG();
+   } else if (matches(*argv, pipe) == 0) {
+   sel.action = TC_ACT_PIPE;
+   NEXT_ARG();
+   } else if (matches(*argv, drop) == 0 ||
+   matches(*argv, shot) == 0) {
+   sel.action = TC_ACT_SHOT;
+   NEXT_ARG();
+   } else if (matches(*argv, continue) == 0) {
+   sel.action = TC_ACT_UNSPEC;
+   NEXT_ARG();
+   } else if

Re: [PKT_SCHED]: Add stateless NAT

2007-09-27 Thread Herbert Xu

On Thu, Sep 27, 2007 at 08:39:45AM -0400, jamal wrote:

 You also need to p-tcf_qstats.drops++ for all packets that get shot.

I was rather hoping that my packets wouldn't get shot :)
But yeah let's increment the drops counter for consistency.

[PKT_SCHED]: Add stateless NAT

Stateless NAT is useful in controlled environments where restrictions are
placed on through traffic such that we don't need connection tracking to
correctly NAT protocol-specific data.

In particular, this is of interest when the number of flows or the number
of addresses being NATed is large, or if connection tracking information
has to be replicated and where it is not practical to do so.

Previously we had stateless NAT functionality which was integrated into
the IPv4 routing subsystem.  This was a great solution as long as the NAT
worked on a subnet to subnet basis such that the number of NAT rules was
relatively small.  The reason is that for SNAT the routing based system
had to perform a linear scan through the rules.

If the number of rules is large then major renovations would have take
place in the routing subsystem to make this practical.

For the time being, the least intrusive way of achieving this is to use
the u32 classifier written by Alexey Kuznetsov along with the actions
infrastructure implemented by Jamal Hadi Salim.

The following patch is an attempt at this problem by creating a new nat
action that can be invoked from u32 hash tables which would allow large
number of stateless NAT rules that can be used/updated in constant time.

The actual NAT code is mostly based on the previous stateless NAT code
written by Alexey.  In future we might be able to utilise the protocol
NAT code from netfilter to improve support for other protocols.

Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/include/linux/tc_act/tc_nat.h b/include/linux/tc_act/tc_nat.h
new file mode 100644
index 000..9280c6f
--- /dev/null
+++ b/include/linux/tc_act/tc_nat.h
@@ -0,0 +1,29 @@
+#ifndef __LINUX_TC_NAT_H
+#define __LINUX_TC_NAT_H
+
+#include linux/pkt_cls.h
+#include linux/types.h
+
+#define TCA_ACT_NAT 9
+
+enum
+{
+   TCA_NAT_UNSPEC,
+   TCA_NAT_PARMS,
+   TCA_NAT_TM,
+   __TCA_NAT_MAX
+};
+#define TCA_NAT_MAX (__TCA_NAT_MAX - 1)
+
+#define TCA_NAT_FLAG_EGRESS 1
+   
+struct tc_nat
+{
+   tc_gen;
+   __be32 old_addr;
+   __be32 new_addr;
+   __be32 mask;
+   __u32 flags;
+};
+
+#endif
diff --git a/include/net/tc_act/tc_nat.h b/include/net/tc_act/tc_nat.h
new file mode 100644
index 000..4a691f3
--- /dev/null
+++ b/include/net/tc_act/tc_nat.h
@@ -0,0 +1,21 @@
+#ifndef __NET_TC_NAT_H
+#define __NET_TC_NAT_H
+
+#include linux/types.h
+#include net/act_api.h
+
+struct tcf_nat {
+   struct tcf_common common;
+
+   __be32 old_addr;
+   __be32 new_addr;
+   __be32 mask;
+   u32 flags;
+};
+
+static inline struct tcf_nat *to_tcf_nat(struct tcf_common *pc)
+{
+   return container_of(pc, struct tcf_nat, common);
+}
+
+#endif /* __NET_TC_NAT_H */
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 8a74cac..92435a8 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -447,6 +447,17 @@ config NET_ACT_IPT
  To compile this code as a module, choose M here: the
  module will be called ipt.
 
+config NET_ACT_NAT
+tristate Stateless NAT
+depends on NET_CLS_ACT
+select NETFILTER
+---help---
+ Say Y here to do stateless NAT on IPv4 packets.  You should use
+ netfilter for NAT unless you know what you are doing.
+
+ To compile this code as a module, choose M here: the
+ module will be called nat.
+
 config NET_ACT_PEDIT
 tristate Packet Editing
 depends on NET_CLS_ACT
diff --git a/net/sched/Makefile b/net/sched/Makefile
index b67c36f..81ecbe8 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -11,6 +11,7 @@ obj-$(CONFIG_NET_ACT_POLICE)  += act_police.o
 obj-$(CONFIG_NET_ACT_GACT) += act_gact.o
 obj-$(CONFIG_NET_ACT_MIRRED)   += act_mirred.o
 obj-$(CONFIG_NET_ACT_IPT)  += act_ipt.o
+obj-$(CONFIG_NET_ACT_NAT)  += act_nat.o
 obj-$(CONFIG_NET_ACT_PEDIT)+= act_pedit.o
 obj-$(CONFIG_NET_ACT_SIMP) += act_simple.o
 obj-$(CONFIG_NET_SCH_FIFO) += sch_fifo.o
diff --git a/net/sched/act_nat.c b/net/sched/act_nat.c
new file mode 100644
index 000..1bce750
--- /dev/null
+++ b/net/sched/act_nat.c
@@ -0,0 +1,322 @@
+/*
+ * Stateless NAT actions
+ *
+ * Copyright (c) 2007 Herbert Xu [EMAIL PROTECTED]
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either

Re: [PKT_SCHED]: Add stateless NAT

2007-09-27 Thread Evgeniy Polyakov

On Thu, Sep 27, 2007 at 08:45:15PM +0800, Herbert Xu ([EMAIL PROTECTED]) wrote:
 On Thu, Sep 27, 2007 at 04:41:21PM +0400, Evgeniy Polyakov wrote:
 
  I've attached simple patch which moves checksum helpers out of
  CONFIG_NETFILTER option but still in the same linux/netfilter.h header.
  This should be enough for removing 'select NETFILTER' in your patch.
 
 Close but no cigar :)

:) take 2.

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index 1dd075e..5313739 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -40,6 +40,41 @@
 #endif
 
 #ifdef __KERNEL__
+
+static inline void nf_csum_replace4(__sum16 *sum, __be32 from, __be32 to)
+{
+   __be32 diff[] = { ~from, to };
+
+   *sum = csum_fold(csum_partial((char *)diff, sizeof(diff), 
~csum_unfold(*sum)));
+}
+
+static inline void nf_csum_replace2(__sum16 *sum, __be16 from, __be16 to)
+{
+   nf_csum_replace4(sum, (__force __be32)from, (__force __be32)to);
+}
+
+static inline void nf_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb,
+   __be32 from, __be32 to, int pseudohdr)
+{
+   __be32 diff[] = { ~from, to };
+   if (skb-ip_summed != CHECKSUM_PARTIAL) {
+   *sum = csum_fold(csum_partial(diff, sizeof(diff),
+   ~csum_unfold(*sum)));
+   if (skb-ip_summed == CHECKSUM_COMPLETE  pseudohdr)
+   skb-csum = ~csum_partial(diff, sizeof(diff),
+   ~skb-csum);
+   } else if (pseudohdr)
+   *sum = ~csum_fold(csum_partial(diff, sizeof(diff),
+   csum_unfold(*sum)));
+}
+
+static inline void nf_proto_csum_replace2(__sum16 *sum, struct sk_buff *skb,
+ __be16 from, __be16 to, int pseudohdr)
+{
+   nf_proto_csum_replace4(sum, skb, (__force __be32)from,
+   (__force __be32)to, pseudohdr);
+}
+
 #ifdef CONFIG_NETFILTER
 
 extern void netfilter_init(void);
@@ -289,28 +324,6 @@ extern void nf_invalidate_cache(int pf);
Returns true or false. */
 extern int skb_make_writable(struct sk_buff **pskb, unsigned int writable_len);
 
-static inline void nf_csum_replace4(__sum16 *sum, __be32 from, __be32 to)
-{
-   __be32 diff[] = { ~from, to };
-
-   *sum = csum_fold(csum_partial((char *)diff, sizeof(diff), 
~csum_unfold(*sum)));
-}
-
-static inline void nf_csum_replace2(__sum16 *sum, __be16 from, __be16 to)
-{
-   nf_csum_replace4(sum, (__force __be32)from, (__force __be32)to);
-}
-
-extern void nf_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb,
- __be32 from, __be32 to, int pseudohdr);
-
-static inline void nf_proto_csum_replace2(__sum16 *sum, struct sk_buff *skb,
- __be16 from, __be16 to, int pseudohdr)
-{
-   nf_proto_csum_replace4(sum, skb, (__force __be32)from,
-   (__force __be32)to, pseudohdr);
-}
-
 struct nf_afinfo {
unsigned short  family;
__sum16 (*checksum)(struct sk_buff *skb, unsigned int hook,
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 381a77c..9ffbbe2 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -226,22 +226,6 @@ copy_skb:
 }
 EXPORT_SYMBOL(skb_make_writable);
 
-void nf_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb,
-   __be32 from, __be32 to, int pseudohdr)
-{
-   __be32 diff[] = { ~from, to };
-   if (skb-ip_summed != CHECKSUM_PARTIAL) {
-   *sum = csum_fold(csum_partial(diff, sizeof(diff),
-   ~csum_unfold(*sum)));
-   if (skb-ip_summed == CHECKSUM_COMPLETE  pseudohdr)
-   skb-csum = ~csum_partial(diff, sizeof(diff),
-   ~skb-csum);
-   } else if (pseudohdr)
-   *sum = ~csum_fold(csum_partial(diff, sizeof(diff),
-   csum_unfold(*sum)));
-}
-EXPORT_SYMBOL(nf_proto_csum_replace4);
-
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
 /* This does not belong here, but locally generated errors need it if 
connection
tracking in use: without this, connection may not be in hash table, and 
hence

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RTL8111 PCI Express Gigabit driver r8169 produces slow file transfers

2007-09-27 Thread Achim Frase

Dear Linux r8169 crew,

I have got your e-mail address from the modinfo of the r8196 module.

I am not sure if this is the right way to contact you, but I hope you
could help me.

The current driver in Kernel 2.6.22 produces very bad network speeds.
I only geht 100 kb/s.

Maybe you could take a look at this bug-report at launchpad.net.

https://bugs.launchpad.net/ubuntu/+source/linux-ubuntu-modules-2.6.22/+bug/114171

The latest driver from realtek is working very well.
ftp://210.51.181.211/cn/nic/r8168-8.003.00.tar.bz2

What I would like to know, is, if the latest realtek driver will make it
into the kernel, or if the problems with the r8196 module are already
solved.

If there are any questions feel free to contact me.

Thanks in Advanced

Achim Frase

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PKT_SCHED]: Add stateless NAT

2007-09-27 Thread jamal

On Thu, 2007-27-09 at 21:01 +0800, Herbert Xu wrote:
 On Thu, Sep 27, 2007 at 08:39:45AM -0400, jamal wrote:
 
  Do you have plans to do the iproute bits? If you do it will be nice to
  also update the doc/examples with some simple example(s).
 
 Oh yes, I didn't test this by poking bits in the kernel
 you know :)

Trust me - it has been done before ;-

Thanks Herbert, looks good to me. 

cheers,
jamal


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PKT_SCHED]: Add stateless NAT

2007-09-27 Thread Patrick McHardy

Evgeniy Polyakov wrote:
 +static inline void nf_csum_replace4(__sum16 *sum, __be32 from, __be32 to)
 +{
 + __be32 diff[] = { ~from, to };
 +
 + *sum = csum_fold(csum_partial((char *)diff, sizeof(diff), 
 ~csum_unfold(*sum)));
 +}
 +
 +static inline void nf_csum_replace2(__sum16 *sum, __be16 from, __be16 to)
 +{
 + nf_csum_replace4(sum, (__force __be32)from, (__force __be32)to);
 +}
 +
 +static inline void nf_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb,
 + __be32 from, __be32 to, int pseudohdr)
 +{
 + __be32 diff[] = { ~from, to };
 + if (skb-ip_summed != CHECKSUM_PARTIAL) {
 + *sum = csum_fold(csum_partial(diff, sizeof(diff),
 + ~csum_unfold(*sum)));
 + if (skb-ip_summed == CHECKSUM_COMPLETE  pseudohdr)
 + skb-csum = ~csum_partial(diff, sizeof(diff),
 + ~skb-csum);
 + } else if (pseudohdr)
 + *sum = ~csum_fold(csum_partial(diff, sizeof(diff),
 + csum_unfold(*sum)));
 +}
 +
 +static inline void nf_proto_csum_replace2(__sum16 *sum, struct sk_buff *skb,
 +   __be16 from, __be16 to, int pseudohdr)
 +{
 + nf_proto_csum_replace4(sum, skb, (__force __be32)from,
 + (__force __be32)to, pseudohdr);
 +}


These are way too large to get inlined, please move somewhere below
net/core.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PKT_SCHED]: Add stateless NAT

2007-09-27 Thread Herbert Xu

On Thu, Sep 27, 2007 at 05:10:08PM +0400, Evgeniy Polyakov wrote:

 +static inline void nf_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb,
 + __be32 from, __be32 to, int pseudohdr)
 +{
 + __be32 diff[] = { ~from, to };
 + if (skb-ip_summed != CHECKSUM_PARTIAL) {
 + *sum = csum_fold(csum_partial(diff, sizeof(diff),
 + ~csum_unfold(*sum)));
 + if (skb-ip_summed == CHECKSUM_COMPLETE  pseudohdr)
 + skb-csum = ~csum_partial(diff, sizeof(diff),
 + ~skb-csum);
 + } else if (pseudohdr)
 + *sum = ~csum_fold(csum_partial(diff, sizeof(diff),
 + csum_unfold(*sum)));
 +}

The embedded people are going to hate you for this :)

How about putting it in net/core/utils.c?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PKT_SCHED]: Add stateless NAT

2007-09-27 Thread Evgeniy Polyakov

On Thu, Sep 27, 2007 at 03:16:48PM +0200, Patrick McHardy ([EMAIL PROTECTED]) 
wrote:
 Evgeniy Polyakov wrote:
  +static inline void nf_proto_csum_replace4(__sum16 *sum, struct sk_buff 
  *skb,
  +   __be32 from, __be32 to, int pseudohdr)
  +{
  +   __be32 diff[] = { ~from, to };
  +   if (skb-ip_summed != CHECKSUM_PARTIAL) {
  +   *sum = csum_fold(csum_partial(diff, sizeof(diff),
  +   ~csum_unfold(*sum)));
  +   if (skb-ip_summed == CHECKSUM_COMPLETE  pseudohdr)
  +   skb-csum = ~csum_partial(diff, sizeof(diff),
  +   ~skb-csum);
  +   } else if (pseudohdr)
  +   *sum = ~csum_fold(csum_partial(diff, sizeof(diff),
  +   csum_unfold(*sum)));
  +}
  +
  +static inline void nf_proto_csum_replace2(__sum16 *sum, struct sk_buff 
  *skb,
  + __be16 from, __be16 to, int pseudohdr)
  +{
  +   nf_proto_csum_replace4(sum, skb, (__force __be32)from,
  +   (__force __be32)to, pseudohdr);
  +}
 
 
 These are way too large to get inlined, please move somewhere below
 net/core.

I knew that... :)
I'm pretty sure new files called net/core/helpers.c which will host that
helper is not a good solution too?

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index 1dd075e..624d78b 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -40,6 +40,29 @@
 #endif
 
 #ifdef __KERNEL__
+
+static inline void nf_csum_replace4(__sum16 *sum, __be32 from, __be32 to)
+{
+   __be32 diff[] = { ~from, to };
+
+   *sum = csum_fold(csum_partial((char *)diff, sizeof(diff), 
~csum_unfold(*sum)));
+}
+
+static inline void nf_csum_replace2(__sum16 *sum, __be16 from, __be16 to)
+{
+   nf_csum_replace4(sum, (__force __be32)from, (__force __be32)to);
+}
+
+extern void nf_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb,
+   __be32 from, __be32 to, int pseudohdr);
+
+static inline void nf_proto_csum_replace2(__sum16 *sum, struct sk_buff *skb,
+ __be16 from, __be16 to, int pseudohdr)
+{
+   nf_proto_csum_replace4(sum, skb, (__force __be32)from,
+   (__force __be32)to, pseudohdr);
+}
+
 #ifdef CONFIG_NETFILTER
 
 extern void netfilter_init(void);
@@ -289,28 +312,6 @@ extern void nf_invalidate_cache(int pf);
Returns true or false. */
 extern int skb_make_writable(struct sk_buff **pskb, unsigned int writable_len);
 
-static inline void nf_csum_replace4(__sum16 *sum, __be32 from, __be32 to)
-{
-   __be32 diff[] = { ~from, to };
-
-   *sum = csum_fold(csum_partial((char *)diff, sizeof(diff), 
~csum_unfold(*sum)));
-}
-
-static inline void nf_csum_replace2(__sum16 *sum, __be16 from, __be16 to)
-{
-   nf_csum_replace4(sum, (__force __be32)from, (__force __be32)to);
-}
-
-extern void nf_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb,
- __be32 from, __be32 to, int pseudohdr);
-
-static inline void nf_proto_csum_replace2(__sum16 *sum, struct sk_buff *skb,
- __be16 from, __be16 to, int pseudohdr)
-{
-   nf_proto_csum_replace4(sum, skb, (__force __be32)from,
-   (__force __be32)to, pseudohdr);
-}
-
 struct nf_afinfo {
unsigned short  family;
__sum16 (*checksum)(struct sk_buff *skb, unsigned int hook,
diff --git a/net/core/Makefile b/net/core/Makefile
index 4751613..5757323 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -3,7 +3,7 @@
 #
 
 obj-y := sock.o request_sock.o skbuff.o iovec.o datagram.o stream.o scm.o \
-gen_stats.o gen_estimator.o
+gen_stats.o gen_estimator.o helpers.o
 
 obj-$(CONFIG_SYSCTL) += sysctl_net_core.o
 
diff --git a/net/core/helpers.c b/net/core/helpers.c
new file mode 100644
index 000..d3c8d97
--- /dev/null
+++ b/net/core/helpers.c
@@ -0,0 +1,23 @@
+/*
+ * Generic helper functions.
+ */
+
+#include linux/types.h
+#include linux/skbuff.h
+
+#include net/checksum.h
+
+void nf_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb,
+   __be32 from, __be32 to, int pseudohdr)
+{
+   __be32 diff[] = { ~from, to };
+   if (skb-ip_summed != CHECKSUM_PARTIAL) {
+   *sum = csum_fold(csum_partial(diff, sizeof(diff),
+   ~csum_unfold(*sum)));
+   if (skb-ip_summed == CHECKSUM_COMPLETE  pseudohdr)
+   skb-csum = ~csum_partial(diff, sizeof(diff),
+   ~skb-csum);
+   } else if (pseudohdr)
+   *sum = ~csum_fold(csum_partial(diff, sizeof(diff),
+   csum_unfold(*sum)));
+}
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 381a77c..9ffbbe2 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -226,22 +226,6 @@ copy_skb:
 }

Re: [PKT_SCHED]: Add stateless NAT

2007-09-27 Thread Evgeniy Polyakov

On Thu, Sep 27, 2007 at 09:20:37PM +0800, Herbert Xu ([EMAIL PROTECTED]) wrote:
 How about putting it in net/core/utils.c?

I knew, that was a bad idea to try to fix netfilter dependency :)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index 1dd075e..51b5a22 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -40,6 +40,35 @@
 #endif
 
 #ifdef __KERNEL__
+
+static inline void nf_csum_replace4(__sum16 *sum, __be32 from, __be32 to)
+{
+   __be32 diff[] = { ~from, to };
+
+   *sum = csum_fold(csum_partial((char *)diff, sizeof(diff), 
~csum_unfold(*sum)));
+}
+
+static inline void nf_csum_replace2(__sum16 *sum, __be16 from, __be16 to)
+{
+   nf_csum_replace4(sum, (__force __be32)from, (__force __be32)to);
+}
+
+extern void proto_csum_replace(__sum16 *sum, struct sk_buff *skb,
+   __be32 from, __be32 to, int pseudohdr);
+
+static inline void nf_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb,
+   __be32 from, __be32 to, int pseudohdr)
+{
+   proto_csum_replace(sum, skb, from, to, pseudohdr);
+}
+
+static inline void nf_proto_csum_replace2(__sum16 *sum, struct sk_buff *skb,
+ __be16 from, __be16 to, int pseudohdr)
+{
+   nf_proto_csum_replace4(sum, skb, (__force __be32)from,
+   (__force __be32)to, pseudohdr);
+}
+
 #ifdef CONFIG_NETFILTER
 
 extern void netfilter_init(void);
@@ -289,28 +318,6 @@ extern void nf_invalidate_cache(int pf);
Returns true or false. */
 extern int skb_make_writable(struct sk_buff **pskb, unsigned int writable_len);
 
-static inline void nf_csum_replace4(__sum16 *sum, __be32 from, __be32 to)
-{
-   __be32 diff[] = { ~from, to };
-
-   *sum = csum_fold(csum_partial((char *)diff, sizeof(diff), 
~csum_unfold(*sum)));
-}
-
-static inline void nf_csum_replace2(__sum16 *sum, __be16 from, __be16 to)
-{
-   nf_csum_replace4(sum, (__force __be32)from, (__force __be32)to);
-}
-
-extern void nf_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb,
- __be32 from, __be32 to, int pseudohdr);
-
-static inline void nf_proto_csum_replace2(__sum16 *sum, struct sk_buff *skb,
- __be16 from, __be16 to, int pseudohdr)
-{
-   nf_proto_csum_replace4(sum, skb, (__force __be32)from,
-   (__force __be32)to, pseudohdr);
-}
-
 struct nf_afinfo {
unsigned short  family;
__sum16 (*checksum)(struct sk_buff *skb, unsigned int hook,
diff --git a/net/core/utils.c b/net/core/utils.c
index 0bf17da..2f6d4d2 100644
--- a/net/core/utils.c
+++ b/net/core/utils.c
@@ -293,3 +293,20 @@ out:
 }
 
 EXPORT_SYMBOL(in6_pton);
+
+void proto_csum_replace(__sum16 *sum, struct sk_buff *skb,
+   __be32 from, __be32 to, int pseudohdr)
+{
+   __be32 diff[] = { ~from, to };
+   if (skb-ip_summed != CHECKSUM_PARTIAL) {
+   *sum = csum_fold(csum_partial(diff, sizeof(diff),
+   ~csum_unfold(*sum)));
+   if (skb-ip_summed == CHECKSUM_COMPLETE  pseudohdr)
+   skb-csum = ~csum_partial(diff, sizeof(diff),
+   ~skb-csum);
+   } else if (pseudohdr)
+   *sum = ~csum_fold(csum_partial(diff, sizeof(diff),
+   csum_unfold(*sum)));
+}
+
+EXPORT_SYMBOL(proto_csum_replace);
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 381a77c..9ffbbe2 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -226,22 +226,6 @@ copy_skb:
 }
 EXPORT_SYMBOL(skb_make_writable);
 
-void nf_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb,
-   __be32 from, __be32 to, int pseudohdr)
-{
-   __be32 diff[] = { ~from, to };
-   if (skb-ip_summed != CHECKSUM_PARTIAL) {
-   *sum = csum_fold(csum_partial(diff, sizeof(diff),
-   ~csum_unfold(*sum)));
-   if (skb-ip_summed == CHECKSUM_COMPLETE  pseudohdr)
-   skb-csum = ~csum_partial(diff, sizeof(diff),
-   ~skb-csum);
-   } else if (pseudohdr)
-   *sum = ~csum_fold(csum_partial(diff, sizeof(diff),
-   csum_unfold(*sum)));
-}
-EXPORT_SYMBOL(nf_proto_csum_replace4);
-
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
 /* This does not belong here, but locally generated errors need it if 
connection
tracking in use: without this, connection may not be in hash table, and 
hence

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PKT_SCHED]: Add stateless NAT

2007-09-27 Thread Patrick McHardy

Evgeniy Polyakov wrote:
 On Thu, Sep 27, 2007 at 03:16:48PM +0200, Patrick McHardy ([EMAIL PROTECTED]) 
 wrote:
 

These are way too large to get inlined, please move somewhere below
net/core.
 
 
 I knew that... :)
 I'm pretty sure new files called net/core/helpers.c which will host that
 helper is not a good solution too?


I like Herbert's suggestion of net/core/utils.c better (and without the
nf_ prefix please).
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PKT_SCHED]: Add stateless NAT

2007-09-27 Thread Evgeniy Polyakov

On Thu, Sep 27, 2007 at 03:30:12PM +0200, Patrick McHardy ([EMAIL PROTECTED]) 
wrote:
 Evgeniy Polyakov wrote:
  On Thu, Sep 27, 2007 at 03:16:48PM +0200, Patrick McHardy ([EMAIL 
  PROTECTED]) wrote:
  
 
 These are way too large to get inlined, please move somewhere below
 net/core.
  
  
  I knew that... :)
  I'm pretty sure new files called net/core/helpers.c which will host that
  helper is not a good solution too?
 
 
 I like Herbert's suggestion of net/core/utils.c better (and without the
 nf_ prefix please).

I've put it there without nf_ prefix and updated netfilter header to
create new inlune function with that prefix for private netfilter usage.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PKT_SCHED]: Add stateless NAT

2007-09-27 Thread jamal

On Thu, 2007-27-09 at 15:30 +0200, Patrick McHardy wrote:

 
 I like Herbert's suggestion of net/core/utils.c better (and without the
 nf_ prefix please).

me too. Evgeniy, you are the man if you finish the whole cow as some
wise Africans would say;-

cheers,
jamal 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PKT_SCHED]: Add stateless NAT

2007-09-27 Thread Patrick McHardy

Evgeniy Polyakov wrote:
 On Thu, Sep 27, 2007 at 09:20:37PM +0800, Herbert Xu ([EMAIL PROTECTED]) 
 wrote:
 
How about putting it in net/core/utils.c?
 
 
 I knew, that was a bad idea to try to fix netfilter dependency :)
 
 diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h


This looks good to me.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] various dst_ifdown routines to catch refcounting bugs

2007-09-27 Thread Denis V. Lunev

Moving dst entries into init_net.loopback_dev is not a good thing.
This hides obvious and non-obvious ref-counting bugs.

This patch uses net_ns loopback instead of init_net loopback. 
This allowes to catch various bugs like recent one in IPv6 DAD handling.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]

--- ./net/core/dst.c.loop   2007-08-26 19:30:38.0 +0400
+++ ./net/core/dst.c2007-08-26 19:30:38.0 +0400
@@ -279,11 +279,11 @@ static inline void dst_ifdown(struct dst
if (!unregister) {
dst-input = dst-output = dst_discard;
} else {
-   dst-dev = init_net.loopback_dev;
+   dst-dev = dst-dev-nd_net-loopback_dev;
dev_hold(dst-dev);
dev_put(dev);
if (dst-neighbour  dst-neighbour-dev == dev) {
-   dst-neighbour-dev = init_net.loopback_dev;
+   dst-neighbour-dev = dst-dev;
dev_put(dev);
dev_hold(dst-neighbour-dev);
}
--- ./net/ipv4/route.c.loop 2007-08-26 19:30:38.0 +0400
+++ ./net/ipv4/route.c  2007-08-26 19:30:38.0 +0400
@@ -1402,8 +1402,9 @@ static void ipv4_dst_ifdown(struct dst_e
 {
struct rtable *rt = (struct rtable *) dst;
struct in_device *idev = rt-idev;
-   if (dev != init_net.loopback_dev  idev  idev-dev == dev) {
-   struct in_device *loopback_idev = 
in_dev_get(init_net.loopback_dev);
+   if (dev != dev-nd_net-loopback_dev  idev  idev-dev == dev) {
+   struct in_device *loopback_idev =
+   in_dev_get(dev-nd_net-loopback_dev);
if (loopback_idev) {
rt-idev = loopback_idev;
in_dev_put(idev);
--- ./net/ipv4/xfrm4_policy.c.loop  2007-08-26 19:30:38.0 +0400
+++ ./net/ipv4/xfrm4_policy.c   2007-08-26 19:30:38.0 +0400
@@ -306,7 +306,8 @@ static void xfrm4_dst_ifdown(struct dst_
 
xdst = (struct xfrm_dst *)dst;
if (xdst-u.rt.idev-dev == dev) {
-   struct in_device *loopback_idev = 
in_dev_get(init_net.loopback_dev);
+   struct in_device *loopback_idev =
+   in_dev_get(dev-nd_net-loopback_dev);
BUG_ON(!loopback_idev);
 
do {
--- ./net/ipv6/route.c.loop 2007-08-26 19:30:38.0 +0400
+++ ./net/ipv6/route.c  2007-08-26 19:30:38.0 +0400
@@ -220,9 +220,12 @@ static void ip6_dst_ifdown(struct dst_en
 {
struct rt6_info *rt = (struct rt6_info *)dst;
struct inet6_dev *idev = rt-rt6i_idev;
+   struct net_device *loopback_dev =
+   dev-nd_net-loopback_dev;
 
-   if (dev != init_net.loopback_dev  idev != NULL  idev-dev == dev) {
-   struct inet6_dev *loopback_idev = 
in6_dev_get(init_net.loopback_dev);
+   if (dev != loopback_dev  idev != NULL  idev-dev == dev) {
+   struct inet6_dev *loopback_idev =
+   in6_dev_get(loopback_dev);
if (loopback_idev != NULL) {
rt-rt6i_idev = loopback_idev;
in6_dev_put(idev);
@@ -1185,12 +1188,12 @@ int ip6_route_add(struct fib6_config *cf
if ((cfg-fc_flags  RTF_REJECT) ||
(dev  (dev-flagsIFF_LOOPBACK)  
!(addr_typeIPV6_ADDR_LOOPBACK))) {
/* hold loopback dev/idev if we haven't done so. */
-   if (dev != init_net.loopback_dev) {
+   if (dev != dev-nd_net-loopback_dev) {
if (dev) {
dev_put(dev);
in6_dev_put(idev);
}
-   dev = init_net.loopback_dev;
+   dev = dev-nd_net-loopback_dev;
dev_hold(dev);
idev = in6_dev_get(dev);
if (!idev) {
@@ -1894,13 +1897,13 @@ struct rt6_info *addrconf_dst_alloc(stru
if (rt == NULL)
return ERR_PTR(-ENOMEM);
 
-   dev_hold(init_net.loopback_dev);
+   dev_hold(idev-dev-nd_net-loopback_dev);
in6_dev_hold(idev);
 
rt-u.dst.flags = DST_HOST;
rt-u.dst.input = ip6_input;
rt-u.dst.output = ip6_output;
-   rt-rt6i_dev = init_net.loopback_dev;
+   rt-rt6i_dev = idev-dev-nd_net-loopback_dev;
rt-rt6i_idev = idev;
rt-u.dst.metrics[RTAX_MTU-1] = ipv6_get_mtu(rt-rt6i_dev);
rt-u.dst.metrics[RTAX_ADVMSS-1] = ipv6_advmss(dst_mtu(rt-u.dst));
--- ./net/ipv6/xfrm6_policy.c.loop  2007-08-26 19:30:38.0 +0400
+++ ./net/ipv6/xfrm6_policy.c   2007-08-26 19:30:38.0 +0400
@@ -375,7 +375,8 @@ static void xfrm6_dst_ifdown(struct dst_
 
xdst = (struct xfrm_dst *)dst;
if (xdst-u.rt6.rt6i_idev-dev == dev) {
-   struct inet6_dev *loopback_idev = 
in6_dev_get(init_net.loopback_dev);
+   struct inet6_dev

Re: [PATCH] sky2: sky2 FE+ receive status workaround

2007-09-27 Thread Stephen Hemminger

On Thu, 27 Sep 2007 09:14:11 +0100
Jochen Voß [EMAIL PROTECTED] wrote:

 Hi Stephen,
 
 On 27 Sep 2007, at 01:58, Stephen Hemminger wrote:
  +   /* This chip has hardware problems that generates bogus status.
  +* So do only marginal checking and expect higher level protocols
  +* to handle crap frames.
  +*/
  +   if (sky2-hw-chip_id == CHIP_ID_YUKON_FE_P 
  +   sky2-hw-chip_rev == CHIP_REV_YU_FE2_A0 
  +   length != count)
  +   goto okay;
 
 Shouldn't the condition be length == count?
 

No, the code is correct as is.  Basically if length == count, then
the status field is correct, and the driver can go ahead and use it.
If length != count, then the status is bogus but the data is okay.


-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

IPSec on Linux Kernel

2007-09-27 Thread Fabio Souto


Hi,

I'm currently doing some research work and I thought that maybe you guys
could help me out on this.
I'm currently trying to find where can I understand more about the IPSec
implementation on the current Linux Kernel (2.6.22). I need to find where
the AH calls are made so I can reroute those functions calls to an external
module, for a safer AH generation.
It would be helpful to find the source code files where I can study the
IPSec stack in detail, and reroute the function call.
Any hints on these topics?

Thanks in advance

Fabio Souto
Portugal
-- 
View this message in context: 
http://www.nabble.com/IPSec-on-Linux-Kernel-tf4528613.html#a12922013
Sent from the netdev mailing list archive at Nabble.com.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP Spike

2007-09-27 Thread Stephen Hemminger

On Thu, 27 Sep 2007 11:58:01 +0800
Majumder, Rajib [EMAIL PROTECTED] wrote:

 Hi,
 
 We have observed 40ms latency spikes in TCP connections in burst type of 
 traffic. This affects regular TCP sockets. We observed this issue in kernels 
 of 2.4.21 and kernel 2.6.5.

Unfortunately, 2.6.5 is out of my short term memory at this point. I do 
remember that 2.6.5
used BIC for congestion control, and there were some math errors in the 
congestion control
logic that caused it to be way to aggressive.
  
 
 Aparently, this seems to be fixed in 2.6.19.  
 
 Can someone throw some light on this? 

My guess is that the addition of the SACK hinting might be the major win.  The 
code
takes 3 passes over the SACK list, so with large outstanding data that was a 
major
bottleneck, not sure if it was 4ms worth though.

 
 Is this a congestion control/avoidance issue? What congestion control 
 algorithm is used before 2.6.8?   

Default congestion control in early 2.6 was BIC, then after CUBIC stabilized it 
was made the default in 2.6.19.

Another thing that may cause changes in latency is Appropriate Byte Counting 
(ABC).
It was added in 2.6.14, but then turned off by default in 2.6.18.  The problem 
is
that ABC caused performance problems with some applications that sent messages
as many small writes.


-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] af_packet: allow disabling timestamps

2007-09-27 Thread Unai Uribarri

This small modification to Stephen's patch timestamps the skb when
needed, so the timestamp can be reused by other af_packet sockets.

Signed-off-by: Unai Uribarri [EMAIL PROTECTED]

--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -259,7 +259,8 @@ static void sock_disable_timestamp(struct sock *sk)
 {
if (sock_flag(sk, SOCK_TIMESTAMP)) {
sock_reset_flag(sk, SOCK_TIMESTAMP);
-   net_disable_timestamp();
+   if (sk-sk_family != PF_PACKET)
+   net_disable_timestamp();
}
 }
 
@@ -1655,7 +1656,8 @@ void sock_enable_timestamp(struct sock *sk)
 {
if (!sock_flag(sk, SOCK_TIMESTAMP)) {
sock_set_flag(sk, SOCK_TIMESTAMP);
-   net_enable_timestamp();
+   if (sk-sk_family != PF_PACKET)
+   net_enable_timestamp();
}
 }
 EXPORT_SYMBOL(sock_enable_timestamp);
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -570,7 +570,6 @@ static int tpacket_rcv(struct sk_buff *skb, struct
net_device *dev, struct packe
unsigned long status = TP_STATUS_LOSING|TP_STATUS_USER;
unsigned short macoff, netoff;
struct sk_buff *copy_skb = NULL;
-   struct timeval tv;
 
if (dev-nd_net != init_net)
goto drop;
@@ -648,12 +647,18 @@ static int tpacket_rcv(struct sk_buff *skb, struct
net_device *dev, struct packe
h-tp_snaplen = snaplen;
h-tp_mac = macoff;
h-tp_net = netoff;
-   if (skb-tstamp.tv64)
+
+   if (sock_flag(sk, SOCK_TIMESTAMP)) {
+   struct timeval tv;
+   if (skb-tstamp.tv64 == 0)
+   __net_timestamp(skb);
tv = ktime_to_timeval(skb-tstamp);
-   else
-   do_gettimeofday(tv);
-   h-tp_sec = tv.tv_sec;
-   h-tp_usec = tv.tv_usec;
+   h-tp_sec = tv.tv_sec;
+   h-tp_usec = tv.tv_usec;
+   } else {
+   h-tp_sec = 0;
+   h-tp_usec = 0;
+   }
 
sll = (struct sockaddr_ll*)((u8*)h + TPACKET_ALIGN(sizeof(*h)));
sll-sll_halen = dev_parse_header(skb, sll-sll_addr);
@@ -1004,6 +1009,7 @@ static int packet_create(struct net *net, struct
socket *sock, int protocol)
sock-ops = packet_ops_spkt;
 
sock_init_data(sock, sk);
+   sock_set_flag(sk, SOCK_TIMESTAMP);
 
po = pkt_sk(sk);
sk-sk_family = PF_PACKET;



On jue, 2007-09-13 at 12:42 +0200, Stephen Hemminger wrote:
 Currently, af_packet does not allow disabling timestamps. This patch changes
 that but doesn't force global timestamps on.
 
 This shows up in bugzilla as:
   http://bugzilla.kernel.org/show_bug.cgi?id=4809
 
 Patch against net-2.6.24 tree.
 
 Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
 
 --- a/net/core/sock.c 2007-09-12 15:08:43.0 +0200
 +++ b/net/core/sock.c 2007-09-13 12:10:19.0 +0200
 @@ -259,7 +259,8 @@ static void sock_disable_timestamp(struc
  {
   if (sock_flag(sk, SOCK_TIMESTAMP)) {
   sock_reset_flag(sk, SOCK_TIMESTAMP);
 - net_disable_timestamp();
 + if (sk-sk_family != PF_PACKET)
 + net_disable_timestamp();
   }
  }
  
 @@ -1645,7 +1646,8 @@ void sock_enable_timestamp(struct sock *
  {
   if (!sock_flag(sk, SOCK_TIMESTAMP)) {
   sock_set_flag(sk, SOCK_TIMESTAMP);
 - net_enable_timestamp();
 + if (sk-sk_family != PF_PACKET)
 + net_enable_timestamp();
   }
  }
  EXPORT_SYMBOL(sock_enable_timestamp);
 --- a/net/packet/af_packet.c  2007-09-12 17:07:00.0 +0200
 +++ b/net/packet/af_packet.c  2007-09-13 12:09:10.0 +0200
 @@ -572,7 +572,6 @@ static int tpacket_rcv(struct sk_buff *s
   unsigned long status = TP_STATUS_LOSING|TP_STATUS_USER;
   unsigned short macoff, netoff;
   struct sk_buff *copy_skb = NULL;
 - struct timeval tv;
  
   if (dev-nd_net != init_net)
   goto drop;
 @@ -650,12 +649,19 @@ static int tpacket_rcv(struct sk_buff *s
   h-tp_snaplen = snaplen;
   h-tp_mac = macoff;
   h-tp_net = netoff;
 - if (skb-tstamp.tv64)
 - tv = ktime_to_timeval(skb-tstamp);
 - else
 - do_gettimeofday(tv);
 - h-tp_sec = tv.tv_sec;
 - h-tp_usec = tv.tv_usec;
 +
 + if (sock_flag(sk, SOCK_TIMESTAMP)) {
 + struct timeval tv;
 + if (skb-tstamp.tv64)
 + tv = ktime_to_timeval(skb-tstamp);
 + else
 + do_gettimeofday(tv);
 + h-tp_sec = tv.tv_sec;
 + h-tp_usec = tv.tv_usec;
 + } else {
 + h-tp_sec = 0;
 + h-tp_usec = 0;
 + }
  
   sll = (struct sockaddr_ll*)((u8*)h + TPACKET_ALIGN(sizeof(*h)));
   sll-sll_halen = 0;
 @@ -1014,6 +1020,7 @@ static int packet_create(struct net *net
   sock-ops = packet_ops_spkt;
  
   sock_init_data(sock, sk);
 +

Re: [RFC] af_packet: allow disabling timestamps

2007-09-27 Thread Unai Uribarri

On vie, 2007-09-14 at 12:26 +0200, Stephen Hemminger wrote:
 On Thu, 13 Sep 2007 14:24:06 +0200
 Eric Dumazet [EMAIL PROTECTED] wrote:
 
  On Thu, 13 Sep 2007 12:42:53 +0200
  Stephen Hemminger [EMAIL PROTECTED] wrote:
  
   Currently, af_packet does not allow disabling timestamps. This patch 
   changes
   that but doesn't force global timestamps on.
   
   This shows up in bugzilla as:
 http://bugzilla.kernel.org/show_bug.cgi?id=4809
   
   Patch against net-2.6.24 tree.
   
  
  I am not sure I understood this patch.
  
  This means that tcpdump/ethereal wont get precise timestamps 
  (gathered when packet is received), but imprecise ones (gathered when the 
  sniffer reads the packet)
  
  I added some time ago ktime infrastructure to eventually get nanosecond 
  precision in libpcap, so I would prefer a step in the right direction :)
  
  Should'nt we use something like :
  
  [PATCH] af_packet : allow disabling timestamps, or requesting nanosecond 
  precision.
  
  Signed-off-by: Eric Dumazet [EMAIL PROTECTED]
  
  diff --git a/net/core/sock.c b/net/core/sock.c
  index 5a16e38..1c10b9d 100644
  --- a/net/core/sock.c
  +++ b/net/core/sock.c
  @@ -563,6 +563,7 @@ set_rcvbuf:
  } else {
  sock_reset_flag(sk, SOCK_RCVTSTAMP);
  sock_reset_flag(sk, SOCK_RCVTSTAMPNS);
  +   sock_disable_timestamp(sk);
  }
  break;
   
  diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
  index 745e2cb..409de44 100644
  --- a/net/packet/af_packet.c
  +++ b/net/packet/af_packet.c
  @@ -650,12 +650,27 @@ static int tpacket_rcv(struct sk_buff *skb, struct 
  net_device *dev, struct packe
  h-tp_snaplen = snaplen;
  h-tp_mac = macoff;
  h-tp_net = netoff;
  -   if (skb-tstamp.tv64)
  -   tv = ktime_to_timeval(skb-tstamp);
  -   else
  -   do_gettimeofday(tv);
  -   h-tp_sec = tv.tv_sec;
  -   h-tp_usec = tv.tv_usec;
  +   h-tp_sec = 0;
  +   h-tp_usec = 0;
  +   if ((sock_flag(sk, SOCK_TIMESTAMP))) {
  +   if (sock_flag(sk, SOCK_RCVTSTAMPNS)) {
  +   struct timespec ts;
  +   if (skb-tstamp.tv64)
  +   ts = ktime_to_timespec(skb-tstamp);
  +   else
  +   getnstimeofday(ts);
  +   h-tp_sec = ts.tv_sec;
  +   h-tp_usec = ts.tv_nsec; /* cheat a litle bit */
  +   }
  +   else {
  +   if (skb-tstamp.tv64)
  +   tv = ktime_to_timeval(skb-tstamp);
  +   else
  +   do_gettimeofday(tv);
  +   h-tp_sec = tv.tv_sec;
  +   h-tp_usec = tv.tv_usec;
  +   }
  +   }
   
  sll = (struct sockaddr_ll*)((u8*)h + TPACKET_ALIGN(sizeof(*h)));
  sll-sll_halen = 0;
  @@ -1014,6 +1029,7 @@ static int packet_create(struct net *net, struct 
  socket *sock, int protocol)
  sock-ops = packet_ops_spkt;
   
  sock_init_data(sock, sk);
  +   sock_enable_timestamp(sk);
   
  po = pkt_sk(sk);
  sk-sk_family = PF_PACKET;
 
 No, then we end up timestamping all the packets, even if they get dropped by
 packet filter. The change in 2.6.24 allows dhclient (and rstp) to only call
 hires clock source for packets they want, not all packets.
 
 Perhaps the timestamping needs to change into a tristate flag?
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

Eric's patch has a feature your previous patch hasn't: a way to disable
timestamping from userspace (the changes at net/core/sock.c). But it
changes the userspace API.

I really think that any developer that sets SO_TIMESTAMP to 0 and still
expect to receive valid timestamp is terminally insane and doesn't
deserve any mercy. But we should take pity of these poor souls that uses
(suffers) closed software and found another way that doesn't changes the
API.

Bye.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] sky2: sky2 FE+ receive status workaround

2007-09-27 Thread Jochen Voss

Hi,

On Thu, Sep 27, 2007 at 06:58:07AM -0700, Stephen Hemminger wrote:
 On Thu, 27 Sep 2007 09:14:11 +0100 Jochen Voß [EMAIL PROTECTED] wrote:
  On 27 Sep 2007, at 01:58, Stephen Hemminger wrote:
   + /* This chip has hardware problems that generates bogus status.
   +  * So do only marginal checking and expect higher level protocols
   +  * to handle crap frames.
   +  */
   + if (sky2-hw-chip_id == CHIP_ID_YUKON_FE_P 
   + sky2-hw-chip_rev == CHIP_REV_YU_FE2_A0 
   + length != count)
   + goto okay;
  
  Shouldn't the condition be length == count?
 
 No, the code is correct as is.  Basically if length == count, then
 the status field is correct, and the driver can go ahead and use it.
 If length != count, then the status is bogus but the data is okay.

Oh, I see.  Thanks for the explanation.

All the best,
Jochen
-- 
http://seehuhn.de/


signature.asc
Description: Digital signature

Re: IPSec on Linux Kernel

2007-09-27 Thread Rami Rosen

Hi,Fabio,

  - Assuming that you intend to deal with IPV4, I suggest that you will
start by looking at the ah4.ko module sources, which are in net/ipv4/ah.c,
especially at the ah_output() and the ah_input() methods.

(for ipv6 there are the ah6.c in net/ipv6).

- May I ask: are you aware that the Authentication Header protocol deals
only with authentication and not with encryption? and as a result, the ESP
protocol, which supports authentication and also, when needed, encryption, is
much more widely used?

Regards,
  Rosen Rami
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/7] CAN: Add virtual CAN netdevice driver

2007-09-27 Thread Eric W. Biederman

Urs Thuermann [EMAIL PROTECTED] writes:

 This patch adds the virtual CAN bus (vcan) network driver.
 The vcan device is just a loopback device for CAN frames, no
 real CAN hardware is involved.

I'm trying to wrap my head around the CAN use of IFF_LOOPBACK.

  6.2 loopback

  As described in chapter 3.2 the CAN network device driver should
  support a local loopback functionality. In this case the driver flag
  IFF_LOOPBACK has to be set to cause the PF_CAN core to not perform the
  loopback as fallback solution:

dev-flags = (IFF_NOARP | IFF_LOOPBACK);


Currently IFF_LOOPBACK set in dev-flags means we are dealing
with drivers/net/loopback.c. 

In other networking layers loopback functionality (i.e. for broadcast)
is never expected to be provided by the drivers and is instead 
always provided by the networking layer.  Keeping the drivers
simpler.  Further you already have this functionality in the
generic CAN layer for doing loopback without driver support.

So at a first glance the CAN usage of IFF_LOOPBACK looks completely
broken, and likely to confuse other networking layers if they see
a CAN device.  Say if someone attempts to run IP over CAN or
something like that.

Do you think you can remove this incompatible usage of IFF_LOOPBACK
from the can code?

If I have read your documentation properly the only reason you are
doing this is so that the timing of frames to cansniffer more
accurately reflects when the frame hits the wire.  If CAN runs over a
very slow medium I guess I can see where that can be a concern.  But
the usage of IFF_LOOPBACK to do this still feels fairly hackish
to me. 

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/4] net ipv4: When possible test for IFF_LOOPBACK and not dev == loopback_dev

2007-09-27 Thread Eric W. Biederman

Daniel Lezcano [EMAIL PROTECTED] writes:

 Eric W. Biederman wrote:
 Now that multiple loopback devices are becoming possible it makes
 the code a little cleaner and more maintainable to test if a deivice
 is th a loopback device by testing dev-flags  IFF_LOOPBACK instead
 of dev == loopback_dev.

 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]


 Urs Thuermann posted the patch:

   [PATCH 5/7] CAN: Add virtual CAN netdevice driver

 This network driver set its flag to IFF_LOOPBACK for testing.
 Is it possible this can be a collision with your patch ?

I have brought it up on that thread.  As best as I tell the CAN usage
of IFF_LOOPBACK will be a problem even without my patch.  Assuming
something other then the CAN layer will see the CAN devices.

The CAN documentations IFF_LOOPBACK should be set on all CAN devices.

It seems that the people who want high performance predictable CAN
don't want this and the people who want something they can trace
easily want this.

It sounds to me like CAN routers don't exist.

Anyway hopefully that usage can be resolved as that code is reviewed,
and made ready to merge.

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/7] CAN: Add virtual CAN netdevice driver

2007-09-27 Thread Eric W. Biederman


I guess in particular IFF_LOOPBACK means that all packets from
a device will come right back to the current machine, and go
nowhere else.

That usage sounds completely different then the CAN usage which
appears to mean.  Broadcast packets will be returned to this machine
as well as being sent out onto the wire.

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] various dst_ifdown routines to catch refcounting bugs

2007-09-27 Thread Eric W. Biederman

Denis V. Lunev [EMAIL PROTECTED] writes:

 Moving dst entries into init_net.loopback_dev is not a good thing.
 This hides obvious and non-obvious ref-counting bugs.

Acked-by: Eric W. Biederman [EMAIL PROTECTED]

To be clear using init_net.loopback is currently safe because we don't
have any destination cache entries for anything except the initial
network namespace.

I have not yet made this change simply because I haven't gotten around
to this part in my patches.

I do have a question I would like to bring up, because I like avoiding
explicit references to loopback_dev when I can.

/* Dirty hack. We did it in 2.2 (in __dst_free),
 * we have _very_ good reasons not to repeat
 * this mistake in 2.3, but we have no choice
 * now. _It_ _is_ _explicit_ _deliberate_
 * _race_ _condition_.
 *
 * Commented and originally written by Alexey.
 */

What is the race that is talked about in that comment.  Can we just
assign NULL instead of the loopback device when we bring a route down.
My gut feeling is that something like:
dst-input = dst-output = dst_discard;
may be enough.But I don't know where the deliberate race is.

I haven't traced this all of the way through but from the obvious
parts I just get this nagging feeling that something isn't quite
right.

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Devel] [PATCH 4/4] net: Make the loopback device per network namespace

2007-09-27 Thread Eric W. Biederman

Denis V. Lunev [EMAIL PROTECTED] writes:

 Eric W. Biederman wrote:
 This patch makes loopback_dev per network namespace.  Adding
 code to create a different loopback device for each network
 namespace and adding the code to free a loopback device
 when a network namespace exits.
 
 This patch modifies all users the loopback_dev so they
 access it as init_net.loopback_dev, keeping all of the
 code compiling and working.  A later pass will be needed to
 update the users to use something other than the initial network
 namespace.

 A pity that an important bit of explanation is missed. The
 initialization of loopback_dev is moved from a chain of devices
 (init_module) to a subsystem initialization to keep proper order, i.e.
 we must be sure that the initialization order is correct.

That didn't happen in the patch you mentioned.  That happened
when we started dynamically allocating the loopback device.
That was the patch Daniel sent out a bit ago.

There are certainly some ordering issues and it may have helped
to talk about them.  But they are because things assume the
loopback device is present.  We have various bits of code that
is around such as the dst_ifdown case that assumes if another
network device is present the loopback device is present.  To
fulfill that assumption I guess that means we have both an
initialization order dependency and a destruction order dependency.

The fact we were using module_init before actually appears to
me to have been racy, but we got away with it because the actual
data structure was statically allocated.

Since it appears that for a dynamically allocated loopback
registering it first and unregistering it last is necessary
for routing.   It is likely worth looking at this a little
more closely and making a guarantee.  So we can make it easier
for networking layers like ipv6, that want to memorize which
device is the loopback device.

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RTL8111 PCI Express Gigabit driver r8169 produces slow file transfers

2007-09-27 Thread Francois Romieu

Achim Frase [EMAIL PROTECTED] :
 [...]
 but I hope you could help me.

Yes. Please try any of:
- current 2.6.23-git
- 2.6.23-rc8 + patch below

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index b85ab4a..c921ec3 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -1228,7 +1228,10 @@ static void rtl8169_hw_phy_config(struct net_device *dev)
return;
}
 
-   /* phy config for RTL8169s mac_version C chip */
+   if ((tp-mac_version != RTL_GIGA_MAC_VER_02) 
+   (tp-mac_version != RTL_GIGA_MAC_VER_03))
+   return;
+
mdio_write(ioaddr, 31, 0x0001); //w 31 2 0 1
mdio_write(ioaddr, 21, 0x1000); //w 21 15 0 1000
mdio_write(ioaddr, 24, 0x65c7); //w 24 15 0 65c7
@@ -2567,6 +2570,15 @@ static void rtl8169_tx_interrupt(struct net_device *dev,
(TX_BUFFS_AVAIL(tp) = MAX_SKB_FRAGS)) {
netif_wake_queue(dev);
}
+   /*
+* 8168 hack: TxPoll requests are lost when the Tx packets are
+* too close. Let's kick an extra TxPoll request when a burst
+* of start_xmit activity is detected (if it is not detected,
+* it is slow enough). -- FR
+*/
+   smp_rmb();
+   if (tp-cur_tx != dirty_tx)
+   RTL_W8(TxPoll, NPQ);
}
 }
 
-- 
Ueimor
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: Add network namespace clone unshare support.

2007-09-27 Thread Eric W. Biederman

Cedric Le Goater [EMAIL PROTECTED] writes:

 diff --git a/include/linux/sched.h b/include/linux/sched.h
 index a01ac6d..e10a0a8 100644
 --- a/include/linux/sched.h
 +++ b/include/linux/sched.h
 @@ -27,6 +27,7 @@
  #define CLONE_NEWUTS0x0400  /* New utsname group? */
  #define CLONE_NEWIPC0x0800  /* New ipcs */
 #define CLONE_NEWUSER 0x1000 /* New user namespace */
 +#define CLONE_NEWNET0x2000  /* New network 
 namespace */

 This new flag is going to conflict with the pid namespace flag 
 CLONE_NEWPID in -mm. It might be worth changing it to:

 #define CLONE_NEWNET  0x4000

Interesting, it would have been nice if someone had caught this
detail earlier.  Oh well.

Thanks for pointing this out, it's on my todo list to look into,
and ensure we resolve.

I'm confused because my notes have 0x8000 for the pid namespace,
and 0x4000 for the time namespace.

 The changes in nxproxy.c and fork.c will also conflict but I don't 
 think we can do much about it for now.

They should also be fairly easy conflicts to resolve.

I guess we are likely to hit this conflict in the next -mm or the
merge window, which ever comes first.

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/PATCH 0/3] UDP memory usage accounting

2007-09-27 Thread Hideo AOKI


Hello,

Apologies for late response.

Evgeniy Polyakov wrote:

Hi.

On Fri, Sep 21, 2007 at 09:18:07PM +0900, Satoshi OSHIMA ([EMAIL PROTECTED]) 
wrote:

This patch set try to introduce memory usage accounting for
UDP(currently ipv4 only).

Currently, memory usage of UDP can be observed as the sam of
usage of tx_queue and rx_queue. But I believe that the system
wide accounting is usefull when heavy loaded condition.

In the next step, I would like to add memory usage quota
for UDP to avoid unlimited memory consumption problem
under DDOS attack.


Could you please desribed such attack in more details?
Each UDP socket has its queue length which can not be exceeded
(roughly), no new sockets are created when remote side sends a packet
(like after special steps in TCP), so where is possibility to eat all
the mem?


I think Satoshi will answer this question soon.


This patch set is for 2.6.23-rc7.


I seriously doubt you want to put udp specific hacks and zillions of
atomic ops all around the code just to know exact number of bytes eaten
for UDP.


I'll revise the patch to reduce the number of atomic operations.


Please use udp specific code (like udp_sendmsg()) for proper accounting
if you need that, but not hacks in generic ip code.


As far as I know, Satoshi is improving this part right now. Please wait his
response.

Many thanks for your comments.

Best regards,
Hideo Aoki

--
Hitachi Computer Products (America) Inc.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/4] net: Dynamically allocate the per cpu counters for the loopback device.

2007-09-27 Thread David Miller

From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Thu, 27 Sep 2007 01:48:00 -0600

 I'm not doing get_cpu/put_cpu so does the comment make sense
 in relationship to per_cpu_ptr?

It is possible.  But someone would need to go check for
sure.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/PATCH 2/3] UDP memory usage accounting: accounting unit and variable

2007-09-27 Thread Hideo AOKI


Hello,

I apologize for not replying sooner.

Andi Kleen wrote:

Satoshi OSHIMA [EMAIL PROTECTED] writes:


This patch introduces global variable for UDP memory accounting.
The unit is page.


The global variable doesn't seem to be very MP scalable, especially
if you change it for each packet. This will be a very hot cache line,
in the worst case bouncing around a large machine.

Possible alternatives:
- Per CPU variables
- You only change the global on socket creation time (by pre allocating a large
amount) or when the system comes under memory pressure.
- Batching of the global updates for multiple packets [that's a variant
of the previous one, might be still too costly though]

Also for such variables it's usually good to cache line pad them on SMP
to avoid false sharing with something else.

-Andi


Thank you so much for your suggestions.

The implementation of the patch basically followed implementation of
tcp_memory_allocated. However, I should agree that the patch introduces
atomic operations too much. Then, I'll try to use the batching to reduce
the number of atomic operations.

Best regards,
Hideo Aoki

--
Hitachi Computer Products (America) Inc.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [ofa-general] [PATCH v3] iw_cxgb3: Support iwarp-only interfacesto avoid 4-tuple conflicts.

2007-09-27 Thread Kanevsky, Arkady

Sean,
What is the model on how client connects, say for iSCSI,
when client and server both support, iWARP and 10GbE or 1GbE,
and would like to setup most performant connection for ULP?
Thanks,

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Sean Hefty [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, September 27, 2007 2:39 PM
 To: Steve Wise
 Cc: netdev@vger.kernel.org; [EMAIL PROTECTED]; 
 [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: Re: [ofa-general] [PATCH v3] iw_cxgb3: Support 
 iwarp-only interfacesto avoid 4-tuple conflicts.
 
  The sysadmin creates for iwarp use only alias interfaces 
 of the form 
  devname:iw* where devname is the native interface name 
 (eg eth0) for 
  the iwarp netdev device.  The alias label can be anything 
 starting with iw.
  The iw immediately after the ':' is the key used by the 
 iw_cxgb3 driver.
 
 I'm still not sure about this, but haven't come up with 
 anything better myself.  And if there's a good chance of 
 other rnic's needing the same support, I'd rather see the 
 common code separated out, even if just encapsulated within 
 this module for easy re-use.
 
 As for the code, I have a couple of questions about whether 
 deadlock and a race condition are possible, plus a few minor comments.
 
  +static void insert_ifa(struct iwch_dev *rnicp, struct 
 in_ifaddr *ifa) 
  +{
  +   struct iwch_addrlist *addr;
  +
  +   addr = kmalloc(sizeof *addr, GFP_KERNEL);
  +   if (!addr) {
  +   printk(KERN_ERR MOD %s - failed to alloc memory!\n,
  +  __FUNCTION__);
  +   return;
  +   }
  +   addr-ifa = ifa;
  +   mutex_lock(rnicp-mutex);
  +   list_add_tail(addr-entry, rnicp-addrlist);
  +   mutex_unlock(rnicp-mutex);
  +}
 
 Should this return success/failure?
 
  +static int nb_callback(struct notifier_block *self, 
 unsigned long event,
  +  void *ctx)
  +{
  +   struct in_ifaddr *ifa = ctx;
  +   struct iwch_dev *rnicp = container_of(self, struct 
 iwch_dev, nb);
  +
  +   PDBG(%s rnicp %p event %lx\n, __FUNCTION__, rnicp, event);
  +
  +   switch (event) {
  +   case NETDEV_UP:
  +   if (netdev_is_ours(rnicp, ifa-ifa_dev-dev) 
  +   is_iwarp_label(ifa-ifa_label)) {
  +   PDBG(%s label %s addr 0x%x added\n,
  +   __FUNCTION__, ifa-ifa_label, 
 ifa-ifa_address);
  +   insert_ifa(rnicp, ifa);
  +   iwch_listeners_add_addr(rnicp, 
 ifa-ifa_address);
 
 If insert_ifa() fails, what will iwch_listeners_add_addr() 
 do?  (I'm not easily seeing the relationship between the 
 address list and the listen list at this point.)
 
  +   }
  +   break;
  +   case NETDEV_DOWN:
  +   if (netdev_is_ours(rnicp, ifa-ifa_dev-dev) 
  +   is_iwarp_label(ifa-ifa_label)) {
  +   PDBG(%s label %s addr 0x%x deleted\n,
  +   __FUNCTION__, ifa-ifa_label, 
 ifa-ifa_address);
  +   iwch_listeners_del_addr(rnicp, 
 ifa-ifa_address);
  +   remove_ifa(rnicp, ifa);
  +   }
  +   break;
  +   default:
  +   break;
  +   }
  +   return 0;
  +}
  +
  +static void delete_addrlist(struct iwch_dev *rnicp) {
  +   struct iwch_addrlist *addr, *tmp;
  +
  +   mutex_lock(rnicp-mutex);
  +   list_for_each_entry_safe(addr, tmp, rnicp-addrlist, entry) {
  +   list_del(addr-entry);
  +   kfree(addr);
  +   }
  +   mutex_unlock(rnicp-mutex);
  +}
  +
  +static void populate_addrlist(struct iwch_dev *rnicp) {
  +   int i;
  +   struct in_device *indev;
  +
  +   for (i = 0; i  rnicp-rdev.port_info.nports; i++) {
  +   indev = in_dev_get(rnicp-rdev.port_info.lldevs[i]);
  +   if (!indev)
  +   continue;
  +   for_ifa(indev)
  +   if (is_iwarp_label(ifa-ifa_label)) {
  +   PDBG(%s label %s addr 0x%x added\n,
  +__FUNCTION__, ifa-ifa_label,
  +ifa-ifa_address);
  +   insert_ifa(rnicp, ifa);
  +   }
  +   endfor_ifa(indev);
  +   }
  +}
  +
   static void rnic_init(struct iwch_dev *rnicp)  {
  PDBG(%s iwch_dev %p\n, __FUNCTION__,  rnicp); @@ 
 -70,6 +187,12 @@ 
  static void rnic_init(struct iwch_dev *r
  idr_init(rnicp-qpidr);
  idr_init(rnicp-mmidr);
  spin_lock_init(rnicp-lock);
  +   INIT_LIST_HEAD(rnicp-addrlist);
  +   INIT_LIST_HEAD(rnicp-listen_eps);
  +   mutex_init(rnicp-mutex);
  +   rnicp-nb.notifier_call = nb_callback;
  +   populate_addrlist(rnicp);
  +   register_inetaddr_notifier(rnicp-nb);
   
  rnicp-attr.vendor_id = 0x168;
  rnicp-attr.vendor_part_id = 7;
  @@ -148,6 +271,8 @@ static void

Re: [ofa-general] [PATCH v3] iw_cxgb3: Support iwarp-only interfaces to avoid 4-tuple conflicts.

2007-09-27 Thread Sean Hefty


The sysadmin creates for iwarp use only alias interfaces of the form
devname:iw* where devname is the native interface name (eg eth0) for the
iwarp netdev device.  The alias label can be anything starting with iw.
The iw immediately after the ':' is the key used by the iw_cxgb3 driver.


I'm still not sure about this, but haven't come up with anything better 
myself.  And if there's a good chance of other rnic's needing the same 
support, I'd rather see the common code separated out, even if just 
encapsulated within this module for easy re-use.


As for the code, I have a couple of questions about whether deadlock and 
a race condition are possible, plus a few minor comments.



+static void insert_ifa(struct iwch_dev *rnicp, struct in_ifaddr *ifa)
+{
+   struct iwch_addrlist *addr;
+
+   addr = kmalloc(sizeof *addr, GFP_KERNEL);
+   if (!addr) {
+   printk(KERN_ERR MOD %s - failed to alloc memory!\n,
+  __FUNCTION__);
+   return;
+   }
+   addr-ifa = ifa;
+   mutex_lock(rnicp-mutex);
+   list_add_tail(addr-entry, rnicp-addrlist);
+   mutex_unlock(rnicp-mutex);
+}


Should this return success/failure?


+static int nb_callback(struct notifier_block *self, unsigned long event,
+  void *ctx)
+{
+   struct in_ifaddr *ifa = ctx;
+   struct iwch_dev *rnicp = container_of(self, struct iwch_dev, nb);
+
+   PDBG(%s rnicp %p event %lx\n, __FUNCTION__, rnicp, event);
+
+   switch (event) {
+   case NETDEV_UP:
+   if (netdev_is_ours(rnicp, ifa-ifa_dev-dev) 
+   is_iwarp_label(ifa-ifa_label)) {
+   PDBG(%s label %s addr 0x%x added\n,
+   __FUNCTION__, ifa-ifa_label, ifa-ifa_address);
+   insert_ifa(rnicp, ifa);
+   iwch_listeners_add_addr(rnicp, ifa-ifa_address);


If insert_ifa() fails, what will iwch_listeners_add_addr() do?  (I'm not 
easily seeing the relationship between the address list and the listen 
list at this point.)



+   }
+   break;
+   case NETDEV_DOWN:
+   if (netdev_is_ours(rnicp, ifa-ifa_dev-dev) 
+   is_iwarp_label(ifa-ifa_label)) {
+   PDBG(%s label %s addr 0x%x deleted\n,
+   __FUNCTION__, ifa-ifa_label, ifa-ifa_address);
+   iwch_listeners_del_addr(rnicp, ifa-ifa_address);
+   remove_ifa(rnicp, ifa);
+   }
+   break;
+   default:
+   break;
+   }
+   return 0;
+}
+
+static void delete_addrlist(struct iwch_dev *rnicp)
+{
+   struct iwch_addrlist *addr, *tmp;
+
+   mutex_lock(rnicp-mutex);
+   list_for_each_entry_safe(addr, tmp, rnicp-addrlist, entry) {
+   list_del(addr-entry);
+   kfree(addr);
+   }
+   mutex_unlock(rnicp-mutex);
+}
+
+static void populate_addrlist(struct iwch_dev *rnicp)
+{
+   int i;
+   struct in_device *indev;
+
+   for (i = 0; i  rnicp-rdev.port_info.nports; i++) {
+   indev = in_dev_get(rnicp-rdev.port_info.lldevs[i]);
+   if (!indev)
+   continue;
+   for_ifa(indev)
+   if (is_iwarp_label(ifa-ifa_label)) {
+   PDBG(%s label %s addr 0x%x added\n,
+__FUNCTION__, ifa-ifa_label,
+ifa-ifa_address);
+   insert_ifa(rnicp, ifa);
+   }
+   endfor_ifa(indev);
+   }
+}
+
 static void rnic_init(struct iwch_dev *rnicp)
 {
PDBG(%s iwch_dev %p\n, __FUNCTION__,  rnicp);
@@ -70,6 +187,12 @@ static void rnic_init(struct iwch_dev *r
idr_init(rnicp-qpidr);
idr_init(rnicp-mmidr);
spin_lock_init(rnicp-lock);
+   INIT_LIST_HEAD(rnicp-addrlist);
+   INIT_LIST_HEAD(rnicp-listen_eps);
+   mutex_init(rnicp-mutex);
+   rnicp-nb.notifier_call = nb_callback;
+   populate_addrlist(rnicp);
+   register_inetaddr_notifier(rnicp-nb);
 
 	rnicp-attr.vendor_id = 0x168;

rnicp-attr.vendor_part_id = 7;
@@ -148,6 +271,8 @@ static void close_rnic_dev(struct t3cdev
mutex_lock(dev_mutex);
list_for_each_entry_safe(dev, tmp, dev_list, entry) {
if (dev-rdev.t3cdev_p == tdev) {
+   unregister_inetaddr_notifier(dev-nb);
+   delete_addrlist(dev);
list_del(dev-entry);
iwch_unregister_device(dev);
cxio_rdev_close(dev-rdev);
diff --git a/drivers/infiniband/hw/cxgb3/iwch.h 
b/drivers/infiniband/hw/cxgb3/iwch.h
index caf4e60..7fa0a47 100644
--- a/drivers/infiniband/hw/cxgb3/iwch.h
+++ b/drivers/infiniband/hw/cxgb3/iwch.h
@@ -36,6 +36,8 @@ #include linux/mutex.h
 #include linux/list.h
 #include linux/spinlock.h

Re: [PATCH] fixed broken bootp compilation

2007-09-27 Thread David Miller

From: Denis V. Lunev [EMAIL PROTECTED]
Date: Thu, 27 Sep 2007 14:46:22 +0400

 Compilation fix. Extra bracket removed.
 Broken by [NET]: Wrap netdevice hardware header creation from
 Stephen Hemminger [EMAIL PROTECTED]

 Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]

Applied, thanks Denis.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] proper comment for loopback initialization order

2007-09-27 Thread David Miller

From: Denis V. Lunev [EMAIL PROTECTED]
Date: Thu, 27 Sep 2007 16:25:27 +0400

 Subject: [PATCH] proper comment for loopback initialization order
 From: Denis V. Lunev [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], netdev@vger.kernel.org,
   [EMAIL PROTECTED]
 Date: Thu, 27 Sep 2007 16:25:27 +0400
 Sender:  [EMAIL PROTECTED]
 User-Agent: Mutt/1.5.16 (2007-06-09)

 Loopback device is special. It should be initialized at the very
 beginning.  Initialization order has been changed by
 Eric W. Biederman [EMAIL PROTECTED] and this change is non-obvious
 and important enough to add proper comment.

 Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]

Applied, but I had to fix the coding style of your comment,
please do it like this in the future:

/* Loopback is special. It should be initialized before any other network
 * device and network subsystem.
 */

Thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PKT_SCHED]: Add stateless NAT

2007-09-27 Thread David Miller

From: jamal [EMAIL PROTECTED]
Date: Thu, 27 Sep 2007 08:39:45 -0400

 nice work. I like the egress flag idea;-
 and who would have thunk stateless nat could be written in such a few
 lines ;- I would have put the checksum as a separate action but it is
 fine the way you did it since it simplifies config.
 more comments below.

 On Thu, 2007-27-09 at 15:34 +0800, Herbert Xu wrote:

  +config NET_ACT_NAT
  +tristate Stateless NAT
  +depends on NET_CLS_ACT
  +select NETFILTER

 I am gonna have to agree with Evgeniy on this Herbert;-
 The rewards are it will improve performance for people who dont need
 netfilter.
 Ok, who is gonna move the csum utility functions out? /me looks at
 Evgeniy;-
 I could do it realsoonnow if noone raises their hands. 
 In any case, it would be real nice to have but i dont see it as a show
 stopper for inclusion.

I agree that we should move the functions out.   However...

You have to realize that this logic is a complete crock
of shit for %99. of Linux users out there
who get and only use distribution compiled kernels which are
going to enable everything anyways.

So we better make sure there are zero performance implications at some
point just for compiling netfilter into the tree.  I know that isn't
the case currently, but that means that we aren't helping out the
majority of Linux users and are thus only adding these optimizations
for such a small sliver of users and that is totally pointless and
sucks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/7] CAN: Add virtual CAN netdevice driver

2007-09-27 Thread David Miller

From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Thu, 27 Sep 2007 10:16:37 -0600

 I guess in particular IFF_LOOPBACK means that all packets from
 a device will come right back to the current machine, and go
 nowhere else.

 That usage sounds completely different then the CAN usage which
 appears to mean.  Broadcast packets will be returned to this machine
 as well as being sent out onto the wire.

It's bogus and it should be removed from the CAN code, they
can add some other attribute to achieve their goals.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ofa-general] [PATCH v3] iw_cxgb3: Support iwarp-only interfaces to avoid 4-tuple conflicts.

2007-09-27 Thread Steve Wise




Sean Hefty wrote:

The sysadmin creates for iwarp use only alias interfaces of the form
devname:iw* where devname is the native interface name (eg eth0) for 
the

iwarp netdev device.  The alias label can be anything starting with iw.
The iw immediately after the ':' is the key used by the iw_cxgb3 
driver.


I'm still not sure about this, but haven't come up with anything better 
myself.  And if there's a good chance of other rnic's needing the same 
support, I'd rather see the common code separated out, even if just 
encapsulated within this module for easy re-use.


As for the code, I have a couple of questions about whether deadlock and 
a race condition are possible, plus a few minor comments.




Thanks for reviewing!  Responses are in-line below.



+static void insert_ifa(struct iwch_dev *rnicp, struct in_ifaddr *ifa)
+{
+struct iwch_addrlist *addr;
+
+addr = kmalloc(sizeof *addr, GFP_KERNEL);
+if (!addr) {
+printk(KERN_ERR MOD %s - failed to alloc memory!\n,
+   __FUNCTION__);
+return;
+}
+addr-ifa = ifa;
+mutex_lock(rnicp-mutex);
+list_add_tail(addr-entry, rnicp-addrlist);
+mutex_unlock(rnicp-mutex);
+}


Should this return success/failure?



I think so.  See below...


+static int nb_callback(struct notifier_block *self, unsigned long event,
+   void *ctx)
+{
+struct in_ifaddr *ifa = ctx;
+struct iwch_dev *rnicp = container_of(self, struct iwch_dev, nb);
+
+PDBG(%s rnicp %p event %lx\n, __FUNCTION__, rnicp, event);
+
+switch (event) {
+case NETDEV_UP:
+if (netdev_is_ours(rnicp, ifa-ifa_dev-dev) 
+is_iwarp_label(ifa-ifa_label)) {
+PDBG(%s label %s addr 0x%x added\n,
+__FUNCTION__, ifa-ifa_label, ifa-ifa_address);
+insert_ifa(rnicp, ifa);
+iwch_listeners_add_addr(rnicp, ifa-ifa_address);


If insert_ifa() fails, what will iwch_listeners_add_addr() do?  (I'm not 
easily seeing the relationship between the address list and the listen 
list at this point.)




I guess insert_ifa() needs to return success/failure.  Then if we failed 
to add the ifa to the list we won't update the listeners.


The relationship is this:

- when a listen is done on addr 0.0.0.0, the code walks the list of 
addresses to do specific listens on each address.


- when an address is added or deleted, then the list of current 
listeners is walked and updated accordingly.



+}
+break;
+case NETDEV_DOWN:
+if (netdev_is_ours(rnicp, ifa-ifa_dev-dev) 
+is_iwarp_label(ifa-ifa_label)) {
+PDBG(%s label %s addr 0x%x deleted\n,
+__FUNCTION__, ifa-ifa_label, ifa-ifa_address);
+iwch_listeners_del_addr(rnicp, ifa-ifa_address);
+remove_ifa(rnicp, ifa);
+}
+break;
+default:
+break;
+}
+return 0;
+}
+
+static void delete_addrlist(struct iwch_dev *rnicp)
+{
+struct iwch_addrlist *addr, *tmp;
+
+mutex_lock(rnicp-mutex);
+list_for_each_entry_safe(addr, tmp, rnicp-addrlist, entry) {
+list_del(addr-entry);
+kfree(addr);
+}
+mutex_unlock(rnicp-mutex);
+}
+
+static void populate_addrlist(struct iwch_dev *rnicp)
+{
+int i;
+struct in_device *indev;
+
+for (i = 0; i  rnicp-rdev.port_info.nports; i++) {
+indev = in_dev_get(rnicp-rdev.port_info.lldevs[i]);
+if (!indev)
+continue;
+for_ifa(indev)
+if (is_iwarp_label(ifa-ifa_label)) {
+PDBG(%s label %s addr 0x%x added\n,
+ __FUNCTION__, ifa-ifa_label,
+ ifa-ifa_address);
+insert_ifa(rnicp, ifa);
+}
+endfor_ifa(indev);
+}
+}
+
 static void rnic_init(struct iwch_dev *rnicp)
 {
 PDBG(%s iwch_dev %p\n, __FUNCTION__,  rnicp);
@@ -70,6 +187,12 @@ static void rnic_init(struct iwch_dev *r
 idr_init(rnicp-qpidr);
 idr_init(rnicp-mmidr);
 spin_lock_init(rnicp-lock);
+INIT_LIST_HEAD(rnicp-addrlist);
+INIT_LIST_HEAD(rnicp-listen_eps);
+mutex_init(rnicp-mutex);
+rnicp-nb.notifier_call = nb_callback;
+populate_addrlist(rnicp);
+register_inetaddr_notifier(rnicp-nb);
 
 rnicp-attr.vendor_id = 0x168;

 rnicp-attr.vendor_part_id = 7;
@@ -148,6 +271,8 @@ static void close_rnic_dev(struct t3cdev
 mutex_lock(dev_mutex);
 list_for_each_entry_safe(dev, tmp, dev_list, entry) {
 if (dev-rdev.t3cdev_p == tdev) {
+unregister_inetaddr_notifier(dev-nb);
+delete_addrlist(dev);
 list_del(dev-entry);
 iwch_unregister_device(dev);
 cxio_rdev_close(dev-rdev);
diff --git a/drivers/infiniband/hw/cxgb3/iwch.h 
b/drivers/infiniband/hw/cxgb3/iwch.h

index caf4e60..7fa0a47 100644
--- a/drivers/infiniband/hw/cxgb3/iwch.h
+++ b/drivers/infiniband/hw/cxgb3/iwch.h
@@ -36,6 +36,8 @@ #include linux/mutex.h
 #include linux/list.h
 #include

[PATCH] Added Ethernet PHY support for the Realtek 821x

2007-09-27 Thread ljd015

From: Joe D'Abbraccio [EMAIL PROTECTED]

The MPC837xERDB platform uses the RTL8211B Ethernet PHY on the
WAN port (on eth0).  Also added the kernel configuration options for
selecting the PHY.

Signed-off-by: Johnson Leung [EMAIL PROTECTED]
Signed-off-by: Kevin Lam [EMAIL PROTECTED]
Signed-off-by: Joe D'Abbraccio [EMAIL PROTECTED]
---
 arch/powerpc/configs/mpc837x_rdb_defconfig |1 +
 drivers/net/phy/Kconfig|5 ++
 drivers/net/phy/Makefile   |1 +
 drivers/net/phy/realtek.c  |   84 
 4 files changed, 91 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/phy/realtek.c

diff --git a/arch/powerpc/configs/mpc837x_rdb_defconfig 
b/arch/powerpc/configs/mpc837x_rdb_defconfig
index e398e9f..9837493 100644
--- a/arch/powerpc/configs/mpc837x_rdb_defconfig
+++ b/arch/powerpc/configs/mpc837x_rdb_defconfig
@@ -411,6 +411,7 @@ CONFIG_MARVELL_PHY=y
 # CONFIG_SMSC_PHY is not set
 # CONFIG_BROADCOM_PHY is not set
 # CONFIG_ICPLUS_PHY is not set
+CONFIG_REALTEK_PHY=y
 # CONFIG_FIXED_PHY is not set
 CONFIG_NET_ETHERNET=y
 CONFIG_MII=y
diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index dd09011..9ce95c9 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -60,6 +60,11 @@ config ICPLUS_PHY
---help---
  Currently supports the IP175C PHY.
 
+config REALTEK_PHY
+   tristate Drivers for Realtek PHYs
+   ---help---
+ Supports the Realtek 821x PHY.
+
 config FIXED_PHY
tristate Drivers for PHY emulation on fixed speed/link
---help---
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 8885650..d7bfa4e 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -12,4 +12,5 @@ obj-$(CONFIG_SMSC_PHY)+= smsc.o
 obj-$(CONFIG_VITESSE_PHY)  += vitesse.o
 obj-$(CONFIG_BROADCOM_PHY) += broadcom.o
 obj-$(CONFIG_ICPLUS_PHY)   += icplus.o
+obj-$(CONFIG_REALTEK_PHY)  += realtek.o
 obj-$(CONFIG_FIXED_PHY)+= fixed.o
diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
new file mode 100644
index 000..546c25f
--- /dev/null
+++ b/drivers/net/phy/realtek.c
@@ -0,0 +1,84 @@
+/*
+ * drivers/net/phy/realtek.c
+ *
+ * Driver for Realtek PHYs
+ *
+ * Author: Johnson Leung [EMAIL PROTECTED]
+ *
+ * Copyright (c) 2004 Freescale Semiconductor, Inc.
+ *
+ * This program is free software; you can redistribute  it and/or modify it
+ * under  the terms of  the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ *
+ */
+#include linux/phy.h
+
+#define RTL821x_PHYSR  0x11
+#define RTL821x_PHYSR_DUPLEX   0x2000
+#define RTL821x_PHYSR_SPEED0xc000
+#define RTL821x_INER   0x12
+#define RTL821x_INER_INIT  0x6400
+#define RTL821x_INSR   0x13
+
+MODULE_DESCRIPTION(Realtek PHY driver);
+MODULE_AUTHOR(Johnson Leung);
+MODULE_LICENSE(GPL);
+
+static int rtl821x_config_init(struct phy_device *phydev)
+{
+   return 0;
+}
+
+static int rtl821x_ack_interrupt(struct phy_device *phydev)
+{
+   int err;
+
+   err = phy_read(phydev, RTL821x_INSR);
+   return (err  0) ? err : 0;
+}
+
+static int rtl821x_config_intr(struct phy_device *phydev)
+{
+   int err;
+
+   if (phydev-interrupts == PHY_INTERRUPT_ENABLED)
+   err = phy_write(phydev, RTL821x_INER,
+   RTL821x_INER_INIT);
+   else
+   err = phy_write(phydev, RTL821x_INER, 0);
+
+   return err;
+}
+
+/* RTL8211B */
+static struct phy_driver rtl821x_driver = {
+   .phy_id = 0x0001cc912,
+   .name   = RTL821x Gigabit Ethernet,
+   .phy_id_mask= 0x001f,
+   .features   = PHY_GBIT_FEATURES,
+   .flags  = PHY_HAS_INTERRUPT,
+   .config_init= rtl821x_config_init,
+   .config_aneg= genphy_config_aneg,
+   .read_status= genphy_read_status,
+   .ack_interrupt  = rtl821x_ack_interrupt,
+   .config_intr= rtl821x_config_intr,
+   .driver = { .owner = THIS_MODULE,},
+};
+
+static int __init realtek_init(void)
+{
+   int ret;
+
+   ret = phy_driver_register(rtl821x_driver);
+   return ret;
+}
+
+static void __exit realtek_exit(void)
+{
+   phy_driver_unregister(rtl821x_driver);
+}
+
+module_init(realtek_init);
+module_exit(realtek_exit);
-- 
1.5.2.2

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [ofa-general] [PATCH v3] iw_cxgb3: Support iwarp-onlyinterfacesto avoid 4-tuple conflicts.

2007-09-27 Thread Sean Hefty

What is the model on how client connects, say for iSCSI,
when client and server both support, iWARP and 10GbE or 1GbE,
and would like to setup most performant connection for ULP?

For the most performance connection, the ULP would use IB, and all these
problems go away.  :)

This proposal is for each iwarp interface to have its own IP address.  Clients
would need an iwarp usable address of the server and would connect using
rdma_connect().  If that call (or rdma_resolve_addr/route) fails, the client
could try connecting using sockets, aoi, or some other interface.  I don't see
that Steve's proposal changes anything from the client's perspective.

- Sean
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] sky2: FE+ vlan workaround

2007-09-27 Thread Stephen Hemminger

The FE+ workaround means the driver can no longer trust the status register
to indicate VLAN tagged frames.  The fix for this is to just disable VLAN
acceleration for that chip version. Tested and works fine.

This patch applies to 2.6.23-rc8 after yesterday's patch:
  sky2 FE+ receive status workaround

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- a/drivers/net/sky2.c2007-09-27 08:45:13.0 -0700
+++ b/drivers/net/sky2.c2007-09-27 09:43:15.0 -0700
@@ -3970,8 +3970,12 @@ static __devinit struct net_device *sky2
dev-features |= NETIF_F_HIGHDMA;
 
 #ifdef SKY2_VLAN_TAG_USED
-   dev-features |= NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX;
-   dev-vlan_rx_register = sky2_vlan_rx_register;
+   /* The workaround for FE+ status conflicts with VLAN tag detection. */
+   if (!(sky2-hw-chip_id == CHIP_ID_YUKON_FE_P 
+ sky2-hw-chip_rev == CHIP_REV_YU_FE2_A0)) {
+   dev-features |= NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX;
+   dev-vlan_rx_register = sky2_vlan_rx_register;
+   }
 #endif
 
/* read the mac address */
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP Spike

2007-09-27 Thread Ilpo Järvinen

On Thu, 27 Sep 2007, Majumder, Rajib wrote:

 We have observed 40ms latency spikes in TCP connections in burst type of 
 traffic.
 This affects regular TCP sockets.

Are segments being sent full-sized, or is there perhaps some Nagle 
component in it as well? I.e., are the applications using TCP_NODELAY?

 We observed this issue in kernels of 2.4.21 and kernel 2.6.5.  
 
 Aparently, this seems to be fixed in 2.6.19.  

 Can someone throw some light on this? 

I think somebody, probably Alexey, enabled sending of ACK on every 2nd 
segment. Previously small segment senders playing with Nagle were 
complaining every now and then about performance because two small 
segments did not generate ACKs but one had to accumulate, IIRC, half MSS 
worth of data before ACK was sent. Could this be related to your case?

...In case you're having too much time, you can always try bisecting it
which finds out the causing commit... :-)

 Is this a congestion control/avoidance issue?

Congestion control is basically ACK clocked math for cwnd, ssthresh, etc.
state, which then results in permission to send new segments out etc. 
(except for RTO part of course, which I'll ignore in the next statement). 
Any delay gaps to sent packet after ACK receival, which triggered the 
state changing math, isn't there due to congestion control but due to 
other factors! 40ms is much below MIN_RTO (200ms), so it shouldn't be due 
to RTO either... Note that also delayed ACKs are exception to the general 
rule.

Congestion control is controlled like your CPU is. In your CPU there's 
this whatever GHz clock which determines when the state changing events 
take place, state changes don't just happen arbitarily but are _clocked_ 
(ACK _clocked_ in case of congestion control). Of course there will be 
some propagation delay after the change to put in effect all the state 
changes that are result of what occurred at clock edge (and this delay 
assimilating to processing delay in the context of congestion control).


-- 
 i.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] sky2: fix transmit state on resume

2007-09-27 Thread Stephen Hemminger

This should fix http://bugzilla.kernel.org/show_bug.cgi?id=8667

After resume, driver has reset the chip so the current state
of transmit checksum offload state machine and DMA state machine
will be undefined.

The fix is to set the state so that first Tx will set MSS and offset
values.

Patch is against 2.6.23-rc8 after last patch:
 sky2: FE+ vlan workaround

(Should also work on older releases with minor fuzz).

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- a/drivers/net/sky2.c2007-09-27 08:45:13.0 -0700
+++ b/drivers/net/sky2.c2007-09-27 09:39:49.0 -0700
@@ -910,6 +910,20 @@ static inline struct sky2_tx_le *get_tx_
return le;
 }
 
+static void tx_init(struct sky2_port *sky2)
+{
+   struct sky2_tx_le *le;
+
+   sky2-tx_prod = sky2-tx_cons = 0;
+   sky2-tx_tcpsum = 0;
+   sky2-tx_last_mss = 0;
+
+   le = get_tx_le(sky2);
+   le-addr = 0;
+   le-opcode = OP_ADDR64 | HW_OWNER;
+   sky2-tx_addr64 = 0;
+}
+
 static inline struct tx_ring_info *tx_le_re(struct sky2_port *sky2,
struct sky2_tx_le *le)
 {
@@ -1320,7 +1334,8 @@ static int sky2_up(struct net_device *de
GFP_KERNEL);
if (!sky2-tx_ring)
goto err_out;
-   sky2-tx_prod = sky2-tx_cons = 0;
+
+   tx_init(sky2);
 
sky2-rx_le = pci_alloc_consistent(hw-pdev, RX_LE_BYTES,
   sky2-rx_le_map);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] various dst_ifdown routines to catch refcounting bugs

2007-09-27 Thread David Miller

From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Thu, 27 Sep 2007 10:27:43 -0600

 Denis V. Lunev [EMAIL PROTECTED] writes:

  Moving dst entries into init_net.loopback_dev is not a good thing.
  This hides obvious and non-obvious ref-counting bugs.

 Acked-by: Eric W. Biederman [EMAIL PROTECTED]

Patch applied.

 I do have a question I would like to bring up, because I like avoiding
 explicit references to loopback_dev when I can.

 /* Dirty hack. We did it in 2.2 (in __dst_free),
  * we have _very_ good reasons not to repeat
  * this mistake in 2.3, but we have no choice
  * now. _It_ _is_ _explicit_ _deliberate_
  * _race_ _condition_.
  *
  * Commented and originally written by Alexey.
  */

 What is the race that is talked about in that comment.  Can we just
 assign NULL instead of the loopback device when we bring a route down.
 My gut feeling is that something like:
   dst-input = dst-output = dst_discard;
 may be enough.But I don't know where the deliberate race is.

The packet output path accesses the cached route device
asynchronously, and we are resetting the device to be loopback without
any synchronization whatsoever.  None is in fact possible, and we
don't want to add it because that would be way too expensive.

So another thread on the system can either see the original device or
the loopback one.

It all works out because as the device goes down we'll purge any
packets queued into the transmit queue and packet scheduler for that
device.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: Add network namespace clone unshare support.

2007-09-27 Thread David Miller

From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Thu, 27 Sep 2007 11:14:33 -0600

 Thanks for pointing this out, it's on my todo list to look into,
 and ensure we resolve.

 I'm confused because my notes have 0x8000 for the pid namespace,
 and 0x4000 for the time namespace.

Eric, pick an appropriate new non-conflicting number NOW.

This adds unnecessary extra work for Andrew Morton, which he has
enough of already.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PKT_SCHED]: Add stateless NAT

2007-09-27 Thread David Miller

From: Herbert Xu [EMAIL PROTECTED]
Date: Thu, 27 Sep 2007 20:58:01 +0800

 On Thu, Sep 27, 2007 at 08:39:45AM -0400, jamal wrote:

  You also need to p-tcf_qstats.drops++ for all packets that get shot.

 I was rather hoping that my packets wouldn't get shot :)
 But yeah let's increment the drops counter for consistency.

 [PKT_SCHED]: Add stateless NAT

Applied to net-2.6.24, thanks Herbert!
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PKT_SCHED]: Add stateless NAT

2007-09-27 Thread David Miller

From: Patrick McHardy [EMAIL PROTECTED]
Date: Thu, 27 Sep 2007 15:39:34 +0200

 Evgeniy Polyakov wrote:
  On Thu, Sep 27, 2007 at 09:20:37PM +0800, Herbert Xu ([EMAIL PROTECTED]) 
  wrote:

 How about putting it in net/core/utils.c?

  I knew, that was a bad idea to try to fix netfilter dependency :)

  diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h

 This looks good to me.

I still think the nf_*() prefixes should all go and the extern
prototypes should go into an independant header file.

These are not netfilter routines, they are INET helpers.

And we should make similar treatment for all of the ipv6
packet parser helper functions that ipv6 netfilter needs.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [ofa-general] [PATCH v3] iw_cxgb3: Supportiwarp-onlyinterfacesto avoid 4-tuple conflicts.

2007-09-27 Thread Kanevsky, Arkady

Sean,
IB aside,
it looks like an ULP which is capable of being both RDMA aware and RDMA
not-aware,
like iSER and iSCSI, NFS-RDMA and NFS, SDP and sockets, 
will be treated as two separete ULPs.
Each has its own IP address, since there is a different IP address for
iWARP
port and regular Ethernet port. So it falls on the users of ULPs to
handle it
via DNS or some other services.
Is this acceptable to users? I doubt it.

Recall that ULPs are going in opposite directions by having a different
port number for RDMA aware and RDMA unaware versions of the ULP.
This way, ULP connection manager handles RDMA-ness under the covers,
while users plug an IP address for a server to connect to.
Thanks,

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Sean Hefty [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, September 27, 2007 3:12 PM
 To: Kanevsky, Arkady; Sean Hefty; Steve Wise
 Cc: netdev@vger.kernel.org; [EMAIL PROTECTED]; 
 [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: RE: [ofa-general] [PATCH v3] iw_cxgb3: 
 Supportiwarp-onlyinterfacesto avoid 4-tuple conflicts.
 
 What is the model on how client connects, say for iSCSI, when client 
 and server both support, iWARP and 10GbE or 1GbE, and would like to 
 setup most performant connection for ULP?
 
 For the most performance connection, the ULP would use IB, 
 and all these problems go away.  :)
 
 This proposal is for each iwarp interface to have its own IP 
 address.  Clients would need an iwarp usable address of the 
 server and would connect using rdma_connect().  If that call 
 (or rdma_resolve_addr/route) fails, the client could try 
 connecting using sockets, aoi, or some other interface.  I 
 don't see that Steve's proposal changes anything from the 
 client's perspective.
 
 - Sean
 ___
 general mailing list
 [EMAIL PROTECTED]
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
 
 To unsubscribe, please visit 
 http://openib.org/mailman/listinfo/openib-general
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [ofa-general] [PATCH v3] iw_cxgb3: Support iwarp-only interfaces to avoid 4-tuple conflicts.

2007-09-27 Thread Sean Hefty

It is ok to block while holding a mutex, yes?

It's okay, I just didn't try to trace through the code to see if it ever tries
to acquire the same mutex in the thread that needs to signal the event.

- Sean
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/4] net: Dynamically allocate the per cpu counters for the loopback device.

2007-09-27 Thread Eric W. Biederman

David Miller [EMAIL PROTECTED] writes:

 From: [EMAIL PROTECTED] (Eric W. Biederman)
 Date: Thu, 27 Sep 2007 01:48:00 -0600

 I'm not doing get_cpu/put_cpu so does the comment make sense
 in relationship to per_cpu_ptr?

 It is possible.  But someone would need to go check for
 sure.

Verified.

hard_start_xmit is called inside of a
rcu_read_lock_bh(),rcu_read_unlock_bh() pair.  Which means
the code will only run on one cpu.

Therefore we do not need get_cpu/put_cpu.

In addition per_cpu_ptr is valid.  As it is just a lookup
into a NR_CPUS sized array by smp_processor_id() to return
the address of the specific cpu.

The only difference between per_cpu_ptr and __get_cpu_var()
are the implementation details between statically allocated
and dynamically allocated per cpu state.

So the comment is still valid, and still interesting it just
should say per_cpu_ptr instead of __get_cpu_var.

Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
---

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 0f9d8c6..756e267 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -154,7 +154,7 @@ static int loopback_xmit(struct sk_buff *skb, struct 
net_device *dev)
 #endif
dev-last_rx = jiffies;

-   /* it's OK to use __get_cpu_var() because BHs are off */
+   /* it's OK to use per_cpu_ptr() because BHs are off */
pcpu_lstats = netdev_priv(dev);
lb_stats = per_cpu_ptr(pcpu_lstats, smp_processor_id());
lb_stats-bytes += skb-len;

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/4] net: Dynamically allocate the per cpu counters for the loopback device.

2007-09-27 Thread David Miller

From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Thu, 27 Sep 2007 14:44:37 -0600

 David Miller [EMAIL PROTECTED] writes:

  From: [EMAIL PROTECTED] (Eric W. Biederman)
  Date: Thu, 27 Sep 2007 01:48:00 -0600

  I'm not doing get_cpu/put_cpu so does the comment make sense
  in relationship to per_cpu_ptr?

  It is possible.  But someone would need to go check for
  sure.

 Verified.

 hard_start_xmit is called inside of a
 rcu_read_lock_bh(),rcu_read_unlock_bh() pair.  Which means
 the code will only run on one cpu.

 Therefore we do not need get_cpu/put_cpu.

 In addition per_cpu_ptr is valid.  As it is just a lookup
 into a NR_CPUS sized array by smp_processor_id() to return
 the address of the specific cpu.

 The only difference between per_cpu_ptr and __get_cpu_var()
 are the implementation details between statically allocated
 and dynamically allocated per cpu state.

 So the comment is still valid, and still interesting it just
 should say per_cpu_ptr instead of __get_cpu_var.

 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

I've already removed the comment, so you'll have to give
me a patch that adds it back with the new content :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] rfkill: Move rfkill_switch_all out of global header

2007-09-27 Thread Ivo van Doorn

rfkill_switch_all shouldn't be called by drivers directly,
instead they should send a signal over the input device.

To prevent confusion for driver developers, move the
function into a rfkill private header.

Signed-off-by: Ivo van Doorn [EMAIL PROTECTED]

---
diff --git a/include/linux/rfkill.h b/include/linux/rfkill.h
index f9a50da..67096b5 100644
--- a/include/linux/rfkill.h
+++ b/include/linux/rfkill.h
@@ -2,7 +2,7 @@
 #define __RFKILL_H
 
 /*
- * Copyright (C) 2006 Ivo van Doorn
+ * Copyright (C) 2006 - 2007 Ivo van Doorn
  * Copyright (C) 2007 Dmitry Torokhov
  *
  * This program is free software; you can redistribute it and/or modify
@@ -84,6 +84,4 @@ void rfkill_free(struct rfkill *rfkill);
 int rfkill_register(struct rfkill *rfkill);
 void rfkill_unregister(struct rfkill *rfkill);
 
-void rfkill_switch_all(enum rfkill_type type, enum rfkill_state state);
-
 #endif /* RFKILL_H */
diff --git a/net/rfkill/rfkill-input.c b/net/rfkill/rfkill-input.c
index 8e4516a..eaabf08 100644
--- a/net/rfkill/rfkill-input.c
+++ b/net/rfkill/rfkill-input.c
@@ -17,6 +17,8 @@
 #include linux/init.h
 #include linux/rfkill.h
 
+#include rfkill-input.h
+
 MODULE_AUTHOR(Dmitry Torokhov [EMAIL PROTECTED]);
 MODULE_DESCRIPTION(Input layer to RF switch connector);
 MODULE_LICENSE(GPL);
diff --git a/net/rfkill/rfkill-input.h b/net/rfkill/rfkill-input.h
new file mode 100644
index 000..4dae500
--- /dev/null
+++ b/net/rfkill/rfkill-input.h
@@ -0,0 +1,16 @@
+/*
+ * Copyright (C) 2007 Ivo van Doorn
+ */
+
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#ifndef __RFKILL_INPUT_H
+#define __RFKILL_INPUT_H
+
+void rfkill_switch_all(enum rfkill_type type, enum rfkill_state state);
+
+#endif /* __RFKILL_INPUT_H */
diff --git a/net/rfkill/rfkill.c b/net/rfkill/rfkill.c
index 03ed7fd..00ee534 100644
--- a/net/rfkill/rfkill.c
+++ b/net/rfkill/rfkill.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2006 Ivo van Doorn
+ * Copyright (C) 2006 - 2007 Ivo van Doorn
  * Copyright (C) 2007 Dmitry Torokhov
  *
  * This program is free software; you can redistribute it and/or modify
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] rfkill: Move rfkill_switch_all out of global header

2007-09-27 Thread David Miller

From: Ivo van Doorn [EMAIL PROTECTED]
Date: Fri, 28 Sep 2007 00:07:41 +0200

 rfkill_switch_all shouldn't be called by drivers directly,
 instead they should send a signal over the input device.

 To prevent confusion for driver developers, move the
 function into a rfkill private header.

 Signed-off-by: Ivo van Doorn [EMAIL PROTECTED]

Applied to net-2.6.24, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ax88796: add 93cx6 eeprom support

2007-09-27 Thread Andrew Morton

On Thu, 27 Sep 2007 19:51:19 +0900
Magnus Damm [EMAIL PROTECTED] wrote:

 ax88796: add 93cx6 eeprom support
 
 This patch hooks up the 93cx6 eeprom code to the ax88796 driver and modifies
 the ax88796 driver to read out the mac address from the eeprom. We need
 this for the ax88796 on certain SuperH boards. The pin configuration used
 to connect the eeprom to the ax88796 on these boards is the same as pointed
 out by the ax88796 datasheet, so we can probably reuse this code for multiple
 platforms in the future.

I'm showing a minor reject between this and Francois's git-r8169.patch.

***
*** 21,33 
  /*
Module: eeprom_93cx6
Abstract: EEPROM reader datastructures for 93cx6 chipsets.
-   Supported chipsets: 93c46  93c66.
   */
  
  /*
   * EEPROM operation defines.
   */
  #define PCI_EEPROM_WIDTH_93C466
  #define PCI_EEPROM_WIDTH_93C668
  #define PCI_EEPROM_WIDTH_OPCODE   3
  #define PCI_EEPROM_WRITE_OPCODE   0x05
--- 21,34 
  /*
Module: eeprom_93cx6
Abstract: EEPROM reader datastructures for 93cx6 chipsets.
+   Supported chipsets: 93c46/93c56/93c66.
   */
  
  /*
   * EEPROM operation defines.
   */
  #define PCI_EEPROM_WIDTH_93C466
+ #define PCI_EEPROM_WIDTH_93C568
  #define PCI_EEPROM_WIDTH_93C668
  #define PCI_EEPROM_WIDTH_OPCODE   3
  #define PCI_EEPROM_WRITE_OPCODE   0x05


You both made the same change to eeprom_93cx6.h.  That all sounds good but
it would be comforting if you could review each other's work, please...

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Please pull 'upstream-davem' branch of wireless-2.6 (2007-09-27)

2007-09-27 Thread Jeff Garzik


John W. Linville wrote:

Dave  Jeff,

Here are some more wireless stack and driver updates for 2.6.24.  Please
pull at your earliest convenience.


ACK (I presume davem will pull)

it looks like this includes my adm feedback, thanks!


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC] Make TCP prequeue configurable

2007-09-27 Thread Eric Dumazet


Hi all

I am sure some of you are going to tell me that prequeue is not
all black :)

Thank you

[RFC] Make TCP prequeue configurable

The TCP prequeue thing is based on old facts, and has drawbacks.

1) It adds 48 bytes per 'struct tcp_sock'
2) It adds some ugly code in hot paths
3) It has a small hit ratio on typical servers using many sockets
4) It may have a high hit ratio on UP machines running one process,
   where the prequeue adds litle gain. (In fact, letting the user
   doing the copy after being woke up is better for cache reuse)
5) Doing a copy to user in softirq handler is not good, because of
   potential page faults :(
6) Maybe the NET_DMA thing is the only thing that might need prequeue.

This patch introduces a CONFIG_TCP_PREQUEUE, automatically selected if 
CONFIG_NET_DMA is on.


Signed-off-by: Eric Dumazet [EMAIL PROTECTED]

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 8f670da..14e3f01 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -16,6 +16,7 @@ comment DMA Clients
 config NET_DMA
bool Network: TCP receive copy offload
depends on DMA_ENGINE  NET
+   select TCP_PREQUEUE
default y
---help---
  This enables the use of DMA engines in the network stack to
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index c6b9f92..844a05e 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -268,11 +268,13 @@ struct tcp_sock {
 
/* Data for direct copy to user */
struct {
+#ifdef CONFIG_TCP_PREQUEUE
struct sk_buff_head prequeue;
struct task_struct  *task;
struct iovec*iov;
int memory;
int len;
+#endif
 #ifdef CONFIG_NET_DMA
/* members for async copy */
struct dma_chan *dma_chan;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 185c7ec..3430d8e 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -835,10 +835,12 @@ static inline int tcp_checksum_complete(struct sk_buff 
*skb)
 
 static inline void tcp_prequeue_init(struct tcp_sock *tp)
 {
+#ifdef CONFIG_TCP_PREQUEUE
tp-ucopy.task = NULL;
tp-ucopy.len = 0;
tp-ucopy.memory = 0;
skb_queue_head_init(tp-ucopy.prequeue);
+#endif
 #ifdef CONFIG_NET_DMA
tp-ucopy.dma_chan = NULL;
tp-ucopy.wakeup = 0;
@@ -857,6 +859,7 @@ static inline void tcp_prequeue_init(struct tcp_sock *tp)
  */
 static inline int tcp_prequeue(struct sock *sk, struct sk_buff *skb)
 {
+#ifdef CONFIG_TCP_PREQUEUE
struct tcp_sock *tp = tcp_sk(sk);
 
if (!sysctl_tcp_low_latency  tp-ucopy.task) {
@@ -882,6 +885,7 @@ static inline int tcp_prequeue(struct sock *sk, struct 
sk_buff *skb)
}
return 1;
}
+#endif
return 0;
 }
 
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index fb79097..b770829 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -616,5 +616,20 @@ config TCP_MD5SIG
 
  If unsure, say N.
 
+config TCP_PREQUEUE
+   bool Enable TCP prequeue
+   default n
+   ---help---
+ TCP PREQUEUE is an 'optimization' loosely based on the famous
+ 30 instruction TCP receive Van Jacobson mail.
+ Van's trick is to deposit buffers into socket queue
+ on a device interrupt, to call tcp_recv function
+ on the receive process context and checksum and copy
+ the buffer to user space. smart...
+
+ Some people believe this 'optimization' is not really needed
+ but for some benchmarks. Also, taking potential pagefaults in 
+ softirq handler seems a high price to pay.
+
 source net/ipv4/ipvs/Kconfig
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 7e74011..8659533 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -994,6 +994,7 @@ void tcp_cleanup_rbuf(struct sock *sk, int copied)
tcp_send_ack(sk);
 }
 
+#ifdef CONFIG_TCP_PREQUEUE
 static void tcp_prequeue_process(struct sock *sk)
 {
struct sk_buff *skb;
@@ -1011,6 +1012,7 @@ static void tcp_prequeue_process(struct sock *sk)
/* Clear memory counter. */
tp-ucopy.memory = 0;
 }
+#endif
 
 static inline struct sk_buff *tcp_recv_skb(struct sock *sk, u32 seq, u32 *off)
 {
@@ -1251,6 +1253,7 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, 
struct msghdr *msg,
 
tcp_cleanup_rbuf(sk, copied);
 
+#ifdef CONFIG_TCP_PREQUEUE
if (!sysctl_tcp_low_latency  tp-ucopy.task == user_recv) {
/* Install new reader */
if (!user_recv  !(flags  (MSG_TRUNC | MSG_PEEK))) {
@@ -1295,7 +1298,7 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, 
struct msghdr *msg,
 
/* __ Set realtime policy in scheduler __ */
}
-
+#endif
if (copied = target) {
/* Do not sleep, just

[PATCH] netns: CLONE_NEWNET don't use the same clone flag as the pid namespace.

2007-09-27 Thread Eric W. Biederman


Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
---
 include/linux/sched.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index e10a0a8..d82c1f7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -27,7 +27,7 @@
 #define CLONE_NEWUTS   0x0400  /* New utsname group? */
 #define CLONE_NEWIPC   0x0800  /* New ipcs */
 #define CLONE_NEWUSER  0x1000  /* New user namespace */
-#define CLONE_NEWNET   0x2000  /* New network namespace */
+#define CLONE_NEWNET   0x4000  /* New network namespace */
 
 /*
  * Scheduling policies
-- 
1.5.3.rc6.17.g1911

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Bring comments in loopback.c uptodate.

2007-09-27 Thread Eric W. Biederman


A hint as to why it is safe to use per cpu variables,
and note that we actually can have multiple instances
of the loopback device now.

Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
---
 drivers/net/loopback.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 2617320..cba5c76 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -154,6 +154,7 @@ static int loopback_xmit(struct sk_buff *skb, struct 
net_device *dev)
 #endif
dev-last_rx = jiffies;
 
+   /* it's OK to use per_cpu_ptr() because BHs are off */
pcpu_lstats = netdev_priv(dev);
lb_stats = per_cpu_ptr(pcpu_lstats, smp_processor_id());
lb_stats-bytes += skb-len;
@@ -221,7 +222,8 @@ static void loopback_dev_free(struct net_device *dev)
 }
 
 /*
- * The loopback device is special. There is only one instance.
+ * The loopback device is special. There is only one instance 
+ * per network namespace.
  */
 static void loopback_setup(struct net_device *dev)
 {
-- 
1.5.3.rc6.17.g1911

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Zero-length write() does not generate a datagram on connected socket

2007-09-27 Thread Stephen Hemminger

On Thu, 27 Sep 2007 13:53:34 -0700 (PDT)
David Miller [EMAIL PROTECTED] wrote:

 From: Stephen Hemminger [EMAIL PROTECTED]
 Date: Mon, 24 Sep 2007 15:34:35 -0700

  The bug http://bugzilla.kernel.org/show_bug.cgi?id=5731
  describes an issue where write() can't be used to generate a zero-length
  datagram (but send, and sendto do work).

  I think the following is needed:

  --- a/net/socket.c  2007-08-20 09:54:28.0 -0700
  +++ b/net/socket.c  2007-09-24 15:31:25.0 -0700
  @@ -777,8 +777,11 @@ static ssize_t sock_aio_write(struct kio
  if (pos != 0)
  return -ESPIPE;

  -   if (iocb-ki_left == 0) /* Match SYS5 behaviour */
  -   return 0;
  +   if (unlikely(iocb-ki_left == 0)) {
  +   struct socket *sock = iocb-ki_filp-private_data;
  +   if (sock-type == SOCK_STREAM)
  +   return 0;
  +   }

  x = alloc_sock_iocb(iocb, siocb);
  if (!x)

 We should simply remove the check completely.

 There is no need to add special code for different types of protocols
 and sockets.

 As is hinted in the bugzilla, the exact same thing can happen with a
 suitably constructed sendto() or sendmsg() call.  write() on a socket
 is a sendmsg() with a NULL msg_control and a single entry iovec, plain
 and simple.

 It's how BSD and many other systems behave, and I double checked
 Steven's Volume 2 just to make sure.

 So I'm going to check in the following to fix this bugzilla.  There is
 a similarly ugly test for len==0 in sys_read() on sockets.  If someone
 would do some research on the validity of that thing I'd really
 appreciate it :-)

Read of zero length should be a no-op for SOCK_STREAM but
for SOCK_DATAGRAM or SOCK_SEQPACKET it might be useful as a 
remote wait for event.

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Make TCP prequeue configurable

2007-09-27 Thread Stephen Hemminger

On Fri, 28 Sep 2007 00:08:33 +0200
Eric Dumazet [EMAIL PROTECTED] wrote:

 Hi all
 
 I am sure some of you are going to tell me that prequeue is not
 all black :)
 
 Thank you
 
 [RFC] Make TCP prequeue configurable
 
 The TCP prequeue thing is based on old facts, and has drawbacks.
 
 1) It adds 48 bytes per 'struct tcp_sock'
 2) It adds some ugly code in hot paths
 3) It has a small hit ratio on typical servers using many sockets
 4) It may have a high hit ratio on UP machines running one process,
 where the prequeue adds litle gain. (In fact, letting the user
 doing the copy after being woke up is better for cache reuse)
 5) Doing a copy to user in softirq handler is not good, because of
 potential page faults :(
 6) Maybe the NET_DMA thing is the only thing that might need prequeue.
 
 This patch introduces a CONFIG_TCP_PREQUEUE, automatically selected if 
 CONFIG_NET_DMA is on.
 
 Signed-off-by: Eric Dumazet [EMAIL PROTECTED]
 

Rather than having a two more compile cases and test cases to deal
with.  If you can prove it is useless, make a case for killing
it completely.

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: Add network namespace clone unshare support.

2007-09-27 Thread Eric W. Biederman

David Miller [EMAIL PROTECTED] writes:

 Eric, pick an appropriate new non-conflicting number NOW.

Done.  My apologies for the confusion.  I thought the
way Cedric and the IBM guys were testing someone would have
shouted at me long before now.

 This adds unnecessary extra work for Andrew Morton, which he has
 enough of already.

Cedric made a good point that we will have conflicts of code
being added to the same place in nsproxy.c and the like.  So
I copied Andrew to give him a heads up.

I will gladly do what I can, to help.  Working against 3 trees
development at the moment is a bit of a development challenge.

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Make TCP prequeue configurable

2007-09-27 Thread David Miller

From: Eric Dumazet [EMAIL PROTECTED]
Date: Fri, 28 Sep 2007 00:08:33 +0200

 1) It adds 48 bytes per 'struct tcp_sock'
 2) It adds some ugly code in hot paths
 3) It has a small hit ratio on typical servers using many sockets
 4) It may have a high hit ratio on UP machines running one process,
 where the prequeue adds litle gain. (In fact, letting the user
 doing the copy after being woke up is better for cache reuse)
 5) Doing a copy to user in softirq handler is not good, because of
 potential page faults :(
 6) Maybe the NET_DMA thing is the only thing that might need prequeue.

If you want to make changes at least get your facts straight in your
changelog message :-)

The prequeue doesn't do copies in softirqs, it acquires the user side
socket lock and runs the packet input path directly from there,
copying into userspace along the way.

You are making claims about performance based upon your understanding
of the code and your understanding of typical workloads, rather than
from actual measurements.  In scientific communities this would make
you a quack at best :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Make TCP prequeue configurable

2007-09-27 Thread John Heffner


Stephen Hemminger wrote:

On Fri, 28 Sep 2007 00:08:33 +0200
Eric Dumazet [EMAIL PROTECTED] wrote:


Hi all

I am sure some of you are going to tell me that prequeue is not
all black :)

Thank you

[RFC] Make TCP prequeue configurable

The TCP prequeue thing is based on old facts, and has drawbacks.

1) It adds 48 bytes per 'struct tcp_sock'
2) It adds some ugly code in hot paths
3) It has a small hit ratio on typical servers using many sockets
4) It may have a high hit ratio on UP machines running one process,
where the prequeue adds litle gain. (In fact, letting the user
doing the copy after being woke up is better for cache reuse)
5) Doing a copy to user in softirq handler is not good, because of
potential page faults :(
6) Maybe the NET_DMA thing is the only thing that might need prequeue.

This patch introduces a CONFIG_TCP_PREQUEUE, automatically selected if 
CONFIG_NET_DMA is on.


Signed-off-by: Eric Dumazet [EMAIL PROTECTED]



Rather than having a two more compile cases and test cases to deal
with.  If you can prove it is useless, make a case for killing
it completely.



I think it really does help in case (4) with old NICs that don't do rx 
checksumming.  I'm not sure how many people really care about this 
anymore, but probably some...?


OTOH, it would be nice to get rid of sysctl_tcp_low_latency.

  -John
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] netns: CLONE_NEWNET don't use the same clone flag as the pid namespace.

2007-09-27 Thread David Miller

From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Thu, 27 Sep 2007 16:40:31 -0600

 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Please pull 'upstream-davem' branch of wireless-2.6 (2007-09-27)

2007-09-27 Thread David Miller

From: Jeff Garzik [EMAIL PROTECTED]
Date: Thu, 27 Sep 2007 18:21:54 -0400

 John W. Linville wrote:
  Dave  Jeff,

  Here are some more wireless stack and driver updates for 2.6.24.  Please
  pull at your earliest convenience.

 ACK (I presume davem will pull)

 it looks like this includes my adm feedback, thanks!

Pulled into net-2.6.24 and pushed back out, thanks!
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Bring comments in loopback.c uptodate.

2007-09-27 Thread David Miller

From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Thu, 27 Sep 2007 16:39:53 -0600

 A hint as to why it is safe to use per cpu variables,
 and note that we actually can have multiple instances
 of the loopback device now.

 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: Add network namespace clone unshare support.

2007-09-27 Thread David Miller

From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Thu, 27 Sep 2007 17:00:23 -0600

 I will gladly do what I can, to help.  Working against 3 trees
 development at the moment is a bit of a development challenge.

Andrew has to work against 30 or so, so multiply your pain
by 10 to understand what he has to deal with :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: Add network namespace clone unshare support.

2007-09-27 Thread Andrew Morton

On Thu, 27 Sep 2007 17:10:53 -0700 (PDT)
David Miller [EMAIL PROTECTED] wrote:

  I will gladly do what I can, to help.  Working against 3 trees
  development at the moment is a bit of a development challenge.
 
 Andrew has to work against 30 or so

I wish!  A remerge presently involves pulling and merging 73 git trees, 9
quilt trees and maybe 1,500 -mm patches.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

2.6.23-rc8 network problem. Mem leak? ip1000a?

2007-09-27 Thread linux

Uniprocessor Althlon 64, 64-bit kernel, 2G ECC RAM,
2.6.23-rc8 + linuxpps (5.0.0) + ip1000a driver.
(patch from http://marc.info/?l=linux-netdevm=118980588419882)

After a few hours of operation, ntp loses the ability to send packets.
sendto() returns -EAGAIN to everything, including the 24-byte UDP packet
that is a response to ntpq.

-EAGAIN on a sendto() makes me think of memory problems, so here's
meminfo at the time:

### FAILED state ###
# cat /proc/meminfo 
MemTotal:  2059384 kB
MemFree: 15332 kB
Buffers:665608 kB
Cached:  18212 kB
SwapCached:  0 kB
Active: 380384 kB
Inactive:   355020 kB
SwapTotal: 5855208 kB
SwapFree:  5854552 kB
Dirty:   28504 kB
Writeback:   0 kB
AnonPages:   51608 kB
Mapped:  11852 kB
Slab:  1285348 kB
SReclaimable:   152968 kB
SUnreclaim:1132380 kB
PageTables:   3888 kB
NFS_Unstable:0 kB
Bounce:  0 kB
CommitLimit:   6884900 kB
Committed_AS:   590528 kB
VmallocTotal: 34359738367 kB
VmallocUsed:265628 kB
VmallocChunk: 34359472059 kB


Killing and restarting ntpd gets it running again for a few hours.
Here's after about two hours of successful operation.  (I'll try to
remember to run slabinfo before killing ntpd next time.)

### WORKING state ###
# cat /proc/meminfo
MemTotal:  2059384 kB
MemFree: 20252 kB
Buffers:242688 kB
Cached:  41556 kB
SwapCached:200 kB
Active: 285012 kB
Inactive:   147348 kB
SwapTotal: 5855208 kB
SwapFree:  5854212 kB
Dirty:  36 kB
Writeback:   0 kB
AnonPages:  148052 kB
Mapped:  12756 kB
Slab:  1582512 kB
SReclaimable:   134348 kB
SUnreclaim:1448164 kB
PageTables:   4500 kB
NFS_Unstable:0 kB
Bounce:  0 kB
CommitLimit:   6884900 kB
Committed_AS:   689956 kB
VmallocTotal: 34359738367 kB
VmallocUsed:265628 kB
VmallocChunk: 34359472059 kB
# /usr/src/linux/Documentation/vm/slabinfo
Name   Objects ObjsizeSpace Slabs/Part/Cpu  O/S O %Fr %Ef 
Flg
:016  1478  1624.5K  6/3/1  256 0  50  96 *
:024   170  24 4.0K  1/0/1  170 0   0  99 *
:032  1339  3245.0K 11/2/1  128 0  18  95 *
:040   102  40 4.0K  1/0/1  102 0   0  99 *
:064  5937  64   413.6K   101/15/1   64 0  14  91 *
:07256  72 4.0K  1/0/1   56 0   0  98 *
:088  6946  88   618.4K151/0/1   46 0   0  98 *
:096 23851  96 2.5M  616/144/1   42 0  23  90 *
:128   730 128   114.6K 28/6/1   32 0  21  81 *
:136   232 13636.8K  9/6/1   30 0  66  85 *
:192   474 19298.3K 24/4/1   21 0  16  92 *
:256   1385376 256   354.6M  86587/0/1   16 0   0  99 *
:32012 304 4.0K  1/0/1   12 0   0  89 *A
:384   359 384   180.2K44/23/1   10 0  52  76 *A
:512   1384316 512   708.7M 173040/1/18 0   0  99 *
:64072 61653.2K 13/5/16 0  38  83 *A
:704  1870 696 1.3M170/0/1   11 1   0  93 *A
:0001024   4271024   454.6K111/9/14 0   8  96 *
:0001472   1501472   245.7K 30/0/15 1   0  89 *
:00020481589912048   325.7M 39759/25/14 1   0  99 *
:0004096514096   245.7K 30/9/12 1  30  85 *
Acpi-State  51  80 4.0K  1/0/1   51 0   0  99 
anon_vma  1032  1628.6K  7/5/1  170 0  71  57 
bdev_cache  43 72036.8K  9/1/15 0  11  83 Aa
blkdev_requests 42 28812.2K  3/0/1   14 0   0  98 
buffer_head  59173 10411.1M2734/1690/1   39 0  61  54 a
cfq_io_context 223 15240.9K 10/6/1   26 0  60  82 
dentry   98641 19219.7M 4813/274/1   21 0   5  96 a
ext3_inode_cache115690 68886.3M 10545/77/1   11 1   0  92 a
file_lock_cache 23 168 4.0K  1/0/1   23 0   0  94 
idr_layer_cache118 52869.6K 17/1/17 0   5  89 
inode_cache   1365 528   798.7K195/0/17 0   0  90 a
kmalloc-131072   1  131072   131.0K  1/0/11 5   0 100 
kmalloc-163848   16384   131.0K  8/0/11 2   0 100 
kmalloc-327681   3276832.7K  1/0/11 3   0 100 
kmalloc-8 1535   812.2K  3/1/1  512 0  33  99 
kmalloc-819210

Re: [PATCH] net: Add network namespace clone unshare support.

2007-09-27 Thread Eric W. Biederman

Andrew Morton [EMAIL PROTECTED] writes:

 On Thu, 27 Sep 2007 17:10:53 -0700 (PDT)
 David Miller [EMAIL PROTECTED] wrote:

  I will gladly do what I can, to help.  Working against 3 trees
  development at the moment is a bit of a development challenge.
 
 Andrew has to work against 30 or so

 I wish!  A remerge presently involves pulling and merging 73 git trees, 9
 quilt trees and maybe 1,500 -mm patches.

Yep.  There is a lot of chaos and keeping on top of it all is a pain,
and nobody has it easy.

Andrew probably wins award for the biggest challenge.

My todo list pales in comparison.   I only have 80+ patches in my
queue that I need to reviewed and then pushed upstream.  50 sysfs
patches to review and get a handle on so hopefully we can out of the
sysfs quagmire.

Plus I don't know how many little gotchas that need to be fixed with
a new patch of their own.

It's coming together but it takes time.  

David, Andrew thanks you both are really are good upstream maintainers
to work with.

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[git patches] net driver fixes

2007-09-27 Thread Jeff Garzik


And an e1000 id patch.

Please pull from 'upstream-linus' branch of
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git 
upstream-linus

to receive the following updates:

 drivers/net/e1000/e1000_ethtool.c |1 +
 drivers/net/e1000/e1000_hw.c  |1 +
 drivers/net/e1000/e1000_hw.h  |1 +
 drivers/net/e1000/e1000_main.c|2 +
 drivers/net/sky2.c|   53 +++--
 5 files changed, 44 insertions(+), 14 deletions(-)

Auke Kok (1):
  e1000: Add device IDs of blade version of the 82571 quad port

Stephen Hemminger (3):
  sky2: sky2 FE+ receive status workaround
  sky2: FE+ vlan workaround
  sky2: fix transmit state on resume

diff --git a/drivers/net/e1000/e1000_ethtool.c 
b/drivers/net/e1000/e1000_ethtool.c
index 4c3785c..9ecc3ad 100644
--- a/drivers/net/e1000/e1000_ethtool.c
+++ b/drivers/net/e1000/e1000_ethtool.c
@@ -1726,6 +1726,7 @@ static int e1000_wol_exclusion(struct e1000_adapter 
*adapter, struct ethtool_wol
case E1000_DEV_ID_82571EB_QUAD_COPPER:
case E1000_DEV_ID_82571EB_QUAD_FIBER:
case E1000_DEV_ID_82571EB_QUAD_COPPER_LOWPROFILE:
+   case E1000_DEV_ID_82571PT_QUAD_COPPER:
case E1000_DEV_ID_82546GB_QUAD_COPPER_KSP3:
/* quad port adapters only support WoL on port A */
if (!adapter-quad_port_a) {
diff --git a/drivers/net/e1000/e1000_hw.c b/drivers/net/e1000/e1000_hw.c
index ba120f7..8604adb 100644
--- a/drivers/net/e1000/e1000_hw.c
+++ b/drivers/net/e1000/e1000_hw.c
@@ -387,6 +387,7 @@ e1000_set_mac_type(struct e1000_hw *hw)
case E1000_DEV_ID_82571EB_SERDES_DUAL:
case E1000_DEV_ID_82571EB_SERDES_QUAD:
case E1000_DEV_ID_82571EB_QUAD_COPPER:
+   case E1000_DEV_ID_82571PT_QUAD_COPPER:
case E1000_DEV_ID_82571EB_QUAD_FIBER:
case E1000_DEV_ID_82571EB_QUAD_COPPER_LOWPROFILE:
hw-mac_type = e1000_82571;
diff --git a/drivers/net/e1000/e1000_hw.h b/drivers/net/e1000/e1000_hw.h
index fe87146..07f0ea7 100644
--- a/drivers/net/e1000/e1000_hw.h
+++ b/drivers/net/e1000/e1000_hw.h
@@ -475,6 +475,7 @@ int32_t e1000_check_phy_reset_block(struct e1000_hw *hw);
 #define E1000_DEV_ID_82571EB_FIBER   0x105F
 #define E1000_DEV_ID_82571EB_SERDES  0x1060
 #define E1000_DEV_ID_82571EB_QUAD_COPPER 0x10A4
+#define E1000_DEV_ID_82571PT_QUAD_COPPER 0x10D5
 #define E1000_DEV_ID_82571EB_QUAD_FIBER  0x10A5
 #define E1000_DEV_ID_82571EB_QUAD_COPPER_LOWPROFILE  0x10BC
 #define E1000_DEV_ID_82571EB_SERDES_DUAL 0x10D9
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 4a22595..e7c8951 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -108,6 +108,7 @@ static struct pci_device_id e1000_pci_tbl[] = {
INTEL_E1000_ETHERNET_DEVICE(0x10BC),
INTEL_E1000_ETHERNET_DEVICE(0x10C4),
INTEL_E1000_ETHERNET_DEVICE(0x10C5),
+   INTEL_E1000_ETHERNET_DEVICE(0x10D5),
INTEL_E1000_ETHERNET_DEVICE(0x10D9),
INTEL_E1000_ETHERNET_DEVICE(0x10DA),
/* required last entry */
@@ -1101,6 +1102,7 @@ e1000_probe(struct pci_dev *pdev,
case E1000_DEV_ID_82571EB_QUAD_COPPER:
case E1000_DEV_ID_82571EB_QUAD_FIBER:
case E1000_DEV_ID_82571EB_QUAD_COPPER_LOWPROFILE:
+   case E1000_DEV_ID_82571PT_QUAD_COPPER:
/* if quad port adapter, disable WoL on all but port A */
if (global_quad_port_a != 0)
adapter-eeprom_wol = 0;
diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index 0792031..162489b 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -910,6 +910,20 @@ static inline struct sky2_tx_le *get_tx_le(struct 
sky2_port *sky2)
return le;
 }
 
+static void tx_init(struct sky2_port *sky2)
+{
+   struct sky2_tx_le *le;
+
+   sky2-tx_prod = sky2-tx_cons = 0;
+   sky2-tx_tcpsum = 0;
+   sky2-tx_last_mss = 0;
+
+   le = get_tx_le(sky2);
+   le-addr = 0;
+   le-opcode = OP_ADDR64 | HW_OWNER;
+   sky2-tx_addr64 = 0;
+}
+
 static inline struct tx_ring_info *tx_le_re(struct sky2_port *sky2,
struct sky2_tx_le *le)
 {
@@ -1320,7 +1334,8 @@ static int sky2_up(struct net_device *dev)
GFP_KERNEL);
if (!sky2-tx_ring)
goto err_out;
-   sky2-tx_prod = sky2-tx_cons = 0;
+
+   tx_init(sky2);
 
sky2-rx_le = pci_alloc_consistent(hw-pdev, RX_LE_BYTES,
   sky2-rx_le_map);
@@ -2148,6 +2163,18 @@ static struct sk_buff *sky2_receive(struct net_device 
*dev,
sky2-rx_next = (sky2-rx_next + 1) % sky2-rx_pending;
prefetch(sky2-rx_ring + sky2-rx_next);
 
+   if (length  ETH_ZLEN || length  sky2-rx_data_size)
+   goto len_error;
+
+   /* This chip has hardware problems that generates bogus status.
+* So do only marginal checking and expect

Re: [PATCH] net: Add network namespace clone unshare support.

2007-09-27 Thread David Miller

From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Thu, 27 Sep 2007 21:28:45 -0600

 David, Andrew thanks you both are really are good upstream
 maintainers to work with.

Just keep the coffee flowing :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Clean up redundant PHY write line for ULi526x Ethernet driver

2007-09-27 Thread Grant Grundler

On Thu, Sep 27, 2007 at 11:36:58PM -0400, Jeff Garzik wrote:
 Zang Roy-r61911 wrote:
 From: Roy Zang [EMAIL PROTECTED]
 Clean up redundant PHY write line for ULi526x Ethernet
 Driver.
 Signed-off-by: Roy Zang [EMAIL PROTECTED]
 ---
  drivers/net/tulip/uli526x.c |1 -
  1 files changed, 0 insertions(+), 1 deletions(-)
 diff --git a/drivers/net/tulip/uli526x.c b/drivers/net/tulip/uli526x.c
 index ca2548e..53a8e65 100644
 --- a/drivers/net/tulip/uli526x.c
 +++ b/drivers/net/tulip/uli526x.c
 @@ -1512,7 +1512,6 @@ static void uli526x_process_mode(struct 
 uli526x_board_info *db)
  case ULI526X_100MFD: phy_reg = 0x2100; break;
  }
  phy_write(db-ioaddr, db-phy_addr, 0, phy_reg, 
 db-chip_id);
 -phy_write(db-ioaddr, db-phy_addr, 0, phy_reg, 
 db-chip_id);

 Kyle and Grant, I'll queue this up, unless ya'll object...

please do, I've already Ack'd it for akpm's tree when he sent out the
initial cc.

thanks,
grant


   Jeff

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: e1000 tcp checksum incorrect, x86 64b

2007-09-27 Thread Herbert Xu

Jon Smirl [EMAIL PROTECTED] wrote:

 App is writing seven bytes to the socket. Socket write timeout expires
 and the seven bytes are sent. The checksum is not getting inserted
 into the packet. It is set to a constant 0x8389 instead of the right
 value.  App is gmpc 0.15.4.95, Revision: 6794
 
 Attached Wireshark packet trace show the problem. e1000 is 192.168.1.4
 64bit, Q6600. Dell Dimension 9200

Wireshark is broken.  It needs to know TP_STATUS_CSUMNOTREADY
means that the checksum is partial and will only be completed
when the hardware sends the packet out.

Alternatively disable checksum offload with ethtool.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

100 matches

Mail list logo