Re: NETPOLL=y , NETDEVICES=n compile error ( Re: 2.6.23-rc1-mm1 )

2007-08-03 Thread Jarek Poplawski
On Thu, Aug 02, 2007 at 10:59:23AM -0500, Matt Mackall wrote:
 On Thu, Aug 02, 2007 at 11:00:08AM +0200, Jarek Poplawski wrote:
  On Wed, Aug 01, 2007 at 09:02:19PM -0500, Matt Mackall wrote:
...
   How about cc:ing the netpoll maintainer?
  
  Is there a new one or do you suggest possibility of abusing the
  authority of the netpoll's author with such trifles...?!
 
 I'm just subtly suggesting that if you're going to have a discussion
 about netpoll, you ought to cc: me.

Thanks! I'm very honored. I've suspected there is some subtlety, but
wasn't sure of possible new patches to MAINTAINERS, so tried to be
subtle too...

 
  There are some notions about other diagnostic tools in some
  net drivers, eg. 3c509.c, so there would be a little bit of
  work if, after changing this, they really exist (and even if
  not - maybe it's reasonable to save such possibility for the
  future?).
 
 I created it for netpoll, only netpoll clients have ever cared.

So, probably you're the best person to change this! Alas, it seems,
for some time any changes to netpoll could have a cold reception
here (pity for Ingo's laptop...).

Regards,
Jarek P.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] genirq: fix simple and fasteoi irq handlers

2007-08-03 Thread Ingo Molnar

* Jarek Poplawski [EMAIL PROTECTED] wrote:

 I can't guarantee this is all needed to fix this bug, but I think this 
 patch is necessary here.

hmmm ... very interesting! Now _this_ is something we'd like to see 
tested. Could you send a patch to Marcin that also undoes the workaround 
we have in place now, so that he could check whether ne2k-pci works fine 
with your fix alone?

Ingo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange tcp behavior

2007-08-03 Thread Evgeniy Polyakov
On Thu, Aug 02, 2007 at 07:21:34PM -0700, David Miller ([EMAIL PROTECTED]) 
wrote:
  On Thu, Aug 02, 2007 at 10:08:42PM +0400, Evgeniy Polyakov ([EMAIL 
  PROTECTED]) wrote:
   So, following patch fixes problem for me.
  
  Or this one. Essentially the same though.
  
  Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED]
 
 So, this bug got introduced partly in 2.3.15, which is when
 we SMP threaded the networking stack.
 
 The error check was present in inet_sendmsg() previously, it
 looked like this:
 
 int inet_sendmsg(struct socket *sock, struct msghdr *msg, int size,
struct scm_cookie *scm)
 {
   struct sock *sk = sock-sk;
 
   if (sk-shutdown  SEND_SHUTDOWN) {
   if (!(msg-msg_flagsMSG_NOSIGNAL))
   send_sig(SIGPIPE, current, 1);
   return(-EPIPE);
   }

This one would caught our problem.

   if (sk-prot-sendmsg == NULL) 
   return(-EOPNOTSUPP);
   if(sk-err)
   return sock_error(sk);

And this one too.

   /* We may need to bind the socket. */
   if (inet_autobind(sk) != 0)
   return -EAGAIN;
 
   return sk-prot-sendmsg(sk, msg, size);
 }
 
 I believe the idea was to move the sk-err check down into
 tcp_sendmsg().
 
 But this raises a major issue.
 
 What in the world are we doing allowing stream sockets to autobind?
 That is totally bogus.  Even if we autobind, that won't make a connect
 happen.

For accepted socket it is perfectly valid assumption - we could autobind 
it during the first send. Or may bind it during accept. Its a matter of
taste I think. Autobinding during first sending can end up being a 
protection against DoS in some obscure rare case...

 There is logic down in TCP to handle all of these details properly
 as long as we don't do this bogus autobind stuff.

Yes, TCP sending function will catch this problems.

 do_tcp_sendpages() and tcp_sendmsg() both invoke sk_stream_wait_connect()
 if TCP is in a state where data sending is not possible.  Inside of
 sk_stream_wait_connect() it handles socket errors as first priority,
 then if no socket errors are pending it checks if we are trying to
 connect currently and if not returns -EPIPE.  It is exactly what we
 want under these circumstances.
 
 So the bug is purely that autobind is attempted for TCP sockets at
 all.
 
 TCP's sendpage handles this correctly already, it calls directly down
 into tcp_sendpage(), inet_sendpage() is not used at all.
 
 So the fix is to make tcp_sendmsg() direct as well, that bypasses all
 of this autobind madness.  The error checking and state verification
 in TCP's sendmsg() and sendpage() implementations will do the right
 thing.
 
 Comments?

 Signed-off-by: David S. Miller [EMAIL PROTECTED]
 
 diff --git a/include/net/tcp.h b/include/net/tcp.h
 index c209361..185c7ec 100644
 --- a/include/net/tcp.h
 +++ b/include/net/tcp.h
 @@ -281,7 +281,7 @@ extern int
 tcp_v4_remember_stamp(struct sock *sk);
  
  extern int   tcp_v4_tw_remember_stamp(struct 
 inet_timewait_sock *tw);
  
 -extern int   tcp_sendmsg(struct kiocb *iocb, struct sock *sk,
 +extern int   tcp_sendmsg(struct kiocb *iocb, struct socket 
 *sock,
   struct msghdr *msg, size_t size);

Maybe recvmsg should be changed too for symmetry?

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange tcp behavior

2007-08-03 Thread Evgeniy Polyakov
On Thu, Aug 02, 2007 at 07:58:03PM +0100, Simon Arlott ([EMAIL PROTECTED]) 
wrote:
 19:24:32.897071 IP 192.168.7.4.5  192.168.7.8.2500: S 
 705362199:705362199(0) win 1500
 19:24:32.897211 IP 192.168.7.8.2500  192.168.7.4.5: S 
 4159455228:4159455228(0) ack 705362200 win 14360 mss 7180
 19:24:32.920784 IP 192.168.7.4.5  192.168.7.8.2500: . ack 1 win 1500
 19:24:32.921732 IP 192.168.7.4.5  192.168.7.8.2500: P 1:17(16) ack 1 win 
 1500
 19:24:32.921795 IP 192.168.7.8.2500  192.168.7.4.5: . ack 17 win 14360
 19:24:32.922881 IP 192.168.7.4.5  192.168.7.8.2500: R 
 705362216:705362216(0) win 1500
 19:24:34.927717 IP 192.168.7.8.2500  192.168.7.4.5: R 1:1(0) ack 17 win 
 14360
 
 According to RFC 793, the RST from .4 means that the connection 
 is CLOSED.

RFC 2525 - common tcp problems, says we should send RST in this case,
although it does not specify should we send it if socket is in CLOSED
state or not. Well, we send :)
Even if tcp_send_active_reset() will check if socket is in CLOSED state
and will not send data, but is still there, it will not be easily
triggered though, but it can be possible.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 0/5][RFC] Update network drivers to use devres

2007-08-03 Thread Tejun Heo
On Fri, Aug 03, 2007 at 09:58:57AM +0100, Stephen Hemminger wrote:
 On Thu, 2 Aug 2007 15:42:06 -0700
 Brandon Philips [EMAIL PROTECTED] wrote:
 
  This patch set adds support for devres in the net core and converts the
  e100 and e1000 drivers to devres.  Devres is a simple resource manager
  for device drivers, see Documentation/driver-model/devres.txt for more
  information.
  
  The use of devres will remain optional for drivers with this patch set.
  Drivers can be converted when it makes sense.
 
 Just because devres exists is not sufficient motivation to change.
 
 It seems that devres was a band-aid rather than fixing storage drivers
 to have proper DMA lifetimes.

I don't really get what you mean by having proper DMA lifetimes but
please don't write devres off too fast.  devres doesn't solve any
problem that you can't fix without it but it does make the 'solving'
much easier.

IMHO, libata drivers generally have been well maintained and reviewed
but I could still find quite a few bugs (resource leaks or
occasionally double free) in init failure and removal paths.  Init
failure paths are especially prone to bugs because they don't get
excercised often.  It's just very easy to make a mistake and fail to
notice and low level drivers don't always get sufficient amount of
review or testing.

Skimming through drivers... via-rhine doesn't disable PCI device on
init failure path but does so on removal.  sky2 doesn't free
consistent memory if sky2_init() fails.  acenic calls iounmap() with
NULL parameter which I'm not sure whether it's safe or not.  natsemi
doesn't disable PCI device on failure or removal.

Devres makes low level drivers simpler, easier to get right and
maintain.  Writing new drivers becomes easier too.  So, why not?

 Network devices seem to work fine thanks, and the resource requirements
 are different. If ain't broke, don't fix it.

Care to enlighten me on how the resource requirments are different
from ATA drivers?

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] TCP: H-TCP maxRTT estimation at startup

2007-08-03 Thread Stephen Hemminger
Small patch to H-TCP from Douglas Leith. 

Fix estimation of maxRTT.  The original code ignores rtt measurements
during slow start (via the check tp-snd_ssthresh  0x) yet this
is probably a good time to try to estimate max rtt as delayed acking
is disabled and slow start will only exit on a loss which presumably
corresponds to a maxrtt measurement.  Second, the original code (via
the check htcp_ccount(ca)  3) ignores rtt data during what it
estimates to be the first 3 round-trip times.  This seems like an
unnecessary check now that the RCV timestamp are no longer used
for rtt estimation.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- a/net/ipv4/tcp_htcp.c   2007-08-03 10:51:51.0 +0100
+++ b/net/ipv4/tcp_htcp.c   2007-08-03 10:51:53.0 +0100
@@ -79,7 +79,6 @@ static u32 htcp_cwnd_undo(struct sock *s
 static inline void measure_rtt(struct sock *sk, u32 srtt)
 {
const struct inet_connection_sock *icsk = inet_csk(sk);
-   const struct tcp_sock *tp = tcp_sk(sk);
struct htcp *ca = inet_csk_ca(sk);
 
/* keep track of minimum RTT seen so far, minRTT is zero at first */
@@ -87,8 +86,7 @@ static inline void measure_rtt(struct so
ca-minRTT = srtt;
 
/* max RTT */
-   if (icsk-icsk_ca_state == TCP_CA_Open
-tp-snd_ssthresh  0x  htcp_ccount(ca)  3) {
+   if (icsk-icsk_ca_state == TCP_CA_Open) {
if (ca-maxRTT  ca-minRTT)
ca-maxRTT = ca-minRTT;
if (ca-maxRTT  srtt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REGRESSION] tg3 dead after s2ram

2007-08-03 Thread Joachim Deguara
On Thursday 02 August 2007 21:10:29 Michael Chan wrote:
 Alternatively, we can also fix it by calling pci_enable_device() again
 in tg3_open().  But I think it is better to just always save and restore
 in suspend/resume.  bnx2.c will also require the same fix.

 Thanks Joachim for helping to debug this problem.  Please try this
 patch:

Patch works for me.


-Joachim



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Distributed storage.

2007-08-03 Thread Evgeniy Polyakov
On Fri, Aug 03, 2007 at 02:26:29PM +0400, Evgeniy Polyakov ([EMAIL PROTECTED]) 
wrote:
  Memory deadlock is a concern of course.  From a cursory glance through, 
  it looks like this code is pretty vm-friendly and you have thought 
  quite a lot about it, however I respectfully invite peterz 
  (obsessive/compulsive memory deadlock hunter) to help give it a good 
  going over with me.

Another major issue is network allocations.

Your initial work and subsequent releases made by Peter were originally
opposed on my side, but now I think the right way is to use both
positive moments from your approach and specialized allocator -
essentially what I proposed (in the blog only though) is to bind a
independent reserve for any socket - such a reserve can be stolen from
socket buffer itself (each socket has a limited socket buffer where
packets are allocated from, it accounts both data and control (skb)
lengths), so when main allocation via common path fails, it would be
possible to get data from own reserve. This allows sending sockets to
make a progress in case of deadlock.

For receiving situation is worse, since system does not know in advance
to which socket given packet will belong to, so it must allocate from
global pool (and thus there must be independent global reserve), and
then exchange part of the socket's reserve to the global one (or just
copy packet to the new one, allocated from socket's reseve is it was
setup, or drop it otherwise). Global independent reserve is what I
proposed when stopped to advertise network allocator, but it seems that
it was not taken into account, and reserve was always allocated only
when system has serious memory pressure in Peter's patches without any
meaning for per-socket reservation.

It allows to separate sockets and effectively make them fair - system
administrator or programmer can limit socket's buffer a bit and request
a reserve for special communication channels, which will have guaranteed
ability to have both sending and receiving progress, no matter how many
of them were setup. And it does not require any changes behind network
side.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 0/5][RFC] Update network drivers to use devres

2007-08-03 Thread Stephen Hemminger
On Fri, 3 Aug 2007 19:26:45 +0900
Tejun Heo [EMAIL PROTECTED] wrote:

 On Fri, Aug 03, 2007 at 09:58:57AM +0100, Stephen Hemminger wrote:
  On Thu, 2 Aug 2007 15:42:06 -0700
  Brandon Philips [EMAIL PROTECTED] wrote:
  
   This patch set adds support for devres in the net core and converts the
   e100 and e1000 drivers to devres.  Devres is a simple resource manager
   for device drivers, see Documentation/driver-model/devres.txt for more
   information.
   
   The use of devres will remain optional for drivers with this patch set.
   Drivers can be converted when it makes sense.
  
  Just because devres exists is not sufficient motivation to change.
  
  It seems that devres was a band-aid rather than fixing storage drivers
  to have proper DMA lifetimes.
 
 I don't really get what you mean by having proper DMA lifetimes but
 please don't write devres off too fast.  devres doesn't solve any
 problem that you can't fix without it but it does make the 'solving'
 much easier.
 
 IMHO, libata drivers generally have been well maintained and reviewed
 but I could still find quite a few bugs (resource leaks or
 occasionally double free) in init failure and removal paths.  Init
 failure paths are especially prone to bugs because they don't get
 excercised often.  It's just very easy to make a mistake and fail to
 notice and low level drivers don't always get sufficient amount of
 review or testing.
 
 Skimming through drivers... via-rhine doesn't disable PCI device on
 init failure path but does so on removal.  sky2 doesn't free
 consistent memory if sky2_init() fails.  acenic calls iounmap() with
 NULL parameter which I'm not sure whether it's safe or not.  natsemi
 doesn't disable PCI device on failure or removal.

Did you report these to the developers?

 Devres makes low level drivers simpler, easier to get right and
 maintain.  Writing new drivers becomes easier too.  So, why not?
 
  Network devices seem to work fine thanks, and the resource requirements
  are different. If ain't broke, don't fix it.
 
 Care to enlighten me on how the resource requirments are different
 from ATA drivers?

I was thinking of the hot remove (no mod ref counts) and lingering
/sys open issues.  ATA drivers use ref counts.

My take on devres is that it is similar to talloc() for device drivers.
Not a bad idea in itself, but the real advantage of hierarchical allocation
is that it makes exception handling easier if things are layered deeply.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange tcp behavior

2007-08-03 Thread Evgeniy Polyakov
On Fri, Aug 03, 2007 at 12:21:46PM +0100, Simon Arlott ([EMAIL PROTECTED]) 
wrote:
 Since the connection is considered closed, couldn't another socket re-use it?
 
 Socket A: Recv data (unread)
 Socket A: Recv RST
 Socket B: Reuses connection (same IPs/ports)
 Socket A: Close
 
 Wouldn't that disrupt socket B's use of the connection?

Then it will drop our data, since there were no appropriate handhsake.

 -- 
 Simon Arlott

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] genirq: fix simple and fasteoi irq handlers

2007-08-03 Thread Marcin Ślusarz
2007/8/3, Jarek Poplawski [EMAIL PROTECTED]:
 On Fri, Aug 03, 2007 at 10:04:08AM +0200, Ingo Molnar wrote:
 
  * Jarek Poplawski [EMAIL PROTECTED] wrote:
 
   I can't guarantee this is all needed to fix this bug, but I think this
   patch is necessary here.
 
  hmmm ... very interesting! Now _this_ is something we'd like to see
  tested. Could you send a patch to Marcin that also undoes the workaround
  we have in place now, so that he could check whether ne2k-pci works fine
  with your fix alone?

 I'm not sure this is needed... Marcin got this patch, I hope, and I
 don't have another possibility to contact with him. Since he managed
 with this bisection and all the previous patches I don't think there
 could be any problems, so:

 Marcin! I'd be very glad if you could test this patch alone; this
 should apply without any problems to 2.6.21 (with some offset) and
 later vanilla versions (or try to revert Ingo's last patch with
 patch -p1 -R). Please, contact me on any problems (alas not during
 the weekend...).

I'll test this patch tomorrow (and confirm that the last one from Ingo
works fine) and report results on monday (sorry, no internet at home
since I moved out of city :|).

Marcin
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] genirq: fix simple and fasteoi irq handlers

2007-08-03 Thread Jarek Poplawski
On Fri, Aug 03, 2007 at 01:57:00PM +0200, Marcin Ślusarz wrote:
...
 I'll test this patch tomorrow (and confirm that the last one from Ingo
 works fine) and report results on monday (sorry, no internet at home
 since I moved out of city :|).

So, you are a lucky guy! I have only no internet at home.
...and time for dreaming about moving out of a city...

Cheers,
Jarek P.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Distributed storage.

2007-08-03 Thread Peter Zijlstra
On Fri, 2007-08-03 at 14:57 +0400, Evgeniy Polyakov wrote:

 For receiving situation is worse, since system does not know in advance
 to which socket given packet will belong to, so it must allocate from
 global pool (and thus there must be independent global reserve), and
 then exchange part of the socket's reserve to the global one (or just
 copy packet to the new one, allocated from socket's reseve is it was
 setup, or drop it otherwise). Global independent reserve is what I
 proposed when stopped to advertise network allocator, but it seems that
 it was not taken into account, and reserve was always allocated only
 when system has serious memory pressure in Peter's patches without any
 meaning for per-socket reservation.

This is not true. I have a global reserve which is set-up a priori. You
cannot allocate a reserve when under pressure, that does not make sense.

Let me explain my approach once again.

At swapon(8) time we allocate a global reserve. And associate the needed
sockets with it. The size of this global reserve is make up of two
parts:
  - TX
  - RX

The RX pool is the most interresting part. It again is made up of two
parts:
  - skb
  - auxilary data

The skb part is scaled such that it can overflow the IP fragment
reassembly, the aux pool such that it can overflow the route cache (that
was the largest other allocator in the RX path)

All (reserve) RX skb allocations are accounted, so as to never allocate
more than we reserved.

All packets are received (given the limit) and are processed up to
socket demux. At that point all packets not targeted at an associated
socket are dropped and the skb memory freed - ready for another packet.

All packets targeted for associated sockets get processed. This requires
that this packet processing happens in-kernel. Since we are swapping
user-space might be waiting for this data, and we'd deadlock.


I'm not quite sure why you need per socket reservations.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 0/5][RFC] Update network drivers to use devres

2007-08-03 Thread Stephen Hemminger
On Fri, 03 Aug 2007 20:33:04 +0900
Tejun Heo [EMAIL PROTECTED] wrote:

 Hello,
 
 Stephen Hemminger wrote:
  Skimming through drivers... via-rhine doesn't disable PCI device on
  init failure path but does so on removal.  sky2 doesn't free
  consistent memory if sky2_init() fails.  acenic calls iounmap() with
  NULL parameter which I'm not sure whether it's safe or not.  natsemi
  doesn't disable PCI device on failure or removal.
  
  Did you report these to the developers?
 
 Just skimmed through.  I'm pretty sure Brandon will pick those up later.
 
  Devres makes low level drivers simpler, easier to get right and
  maintain.  Writing new drivers becomes easier too.  So, why not?
 
  Network devices seem to work fine thanks, and the resource requirements
  are different. If ain't broke, don't fix it.
  Care to enlighten me on how the resource requirments are different
  from ATA drivers?
  
  I was thinking of the hot remove (no mod ref counts) and lingering
  /sys open issues.  ATA drivers use ref counts.
 
 I guess the hot removing is done by severing netdev from the actual
 device, right?  I don't see how that affects usage of devres on network
 drivers.  Am I missing something?

The issue is that device may be removed at any time. So you can't rely
on module ref counts to save you. And netdevice structure must still
linger after module is removed, till dev ref count goes to zero.

 On a separate note, can you explain lingering /sys open issue to me a
 bit?  With recent sysfs changes, sysfs nodes are disconnected
 immediately on deletion.  Would that make any difference to netdevs?

Examples are in Documentation/networking/netdevices.txt

  My take on devres is that it is similar to talloc() for device drivers.
  Not a bad idea in itself, but the real advantage of hierarchical allocation
  is that it makes exception handling easier if things are layered deeply.
 
 Yeah, devres made layering easier in libata, especially SFF stuff.
 Dunno how much of that is applicable to netdev but, with or without
 layering, it'll be a nice cleanup and I don't see much negative side.
 Conversion would take some work and bugs might be introduced in the
 process as with any changes but the good thing about devres is that
 you're very likely to get failure/release paths right if you get the
 init path right, and if you get the init path wrong, it will stand out
 like a sore thumb - easy to spot, easy to fix.
 
 So, I think using devres on net drivers is a good idea, well, for that
 matter, for any driver, but me being the devres writer, that isn't
 really surprising, is it?
 
 Thanks.
 
 -- 
 tejun
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] lro: eHEA example how to use LRO

2007-08-03 Thread Jan-Bernd Themann
This patch shows how the generic LRO interface is used for SKB mode

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---
 drivers/net/Kconfig |1 +
 drivers/net/ehea/ehea.h |9 -
 drivers/net/ehea/ehea_ethtool.c |   15 +++
 drivers/net/ehea/ehea_main.c|   84 +++---
 4 files changed, 101 insertions(+), 8 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index f8a602c..fec4004 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2399,6 +2399,7 @@ config CHELSIO_T3
 config EHEA
tristate eHEA Ethernet support
depends on IBMEBUS
+   select INET_LRO
---help---
  This driver supports the IBM pSeries eHEA ethernet adapter.
 
diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h
index d67f97b..70e33fe 100644
--- a/drivers/net/ehea/ehea.h
+++ b/drivers/net/ehea/ehea.h
@@ -33,13 +33,14 @@
 #include linux/ethtool.h
 #include linux/vmalloc.h
 #include linux/if_vlan.h
+#include linux/inet_lro.h
 
 #include asm/ibmebus.h
 #include asm/abs_addr.h
 #include asm/io.h
 
 #define DRV_NAME   ehea
-#define DRV_VERSIONEHEA_0073
+#define DRV_VERSIONEHEA_0074
 
 /* eHEA capability flags */
 #define DLPAR_PORT_ADD_REM 1
@@ -58,6 +59,7 @@
 
 #define EHEA_SMALL_QUEUES
 #define EHEA_NUM_TX_QP 1
+#define EHEA_LRO_MAX_AGGR 64
 
 #ifdef EHEA_SMALL_QUEUES
 #define EHEA_MAX_CQE_COUNT  1023
@@ -84,6 +86,8 @@
 #define EHEA_RQ2_PKT_SIZE   1522
 #define EHEA_L_PKT_SIZE 256/* low latency */
 
+#define MAX_LRO_DESCRIPTORS 8
+
 /* Send completion signaling */
 
 /* Protection Domain Identifier */
@@ -376,6 +380,8 @@ struct ehea_port_res {
u64 tx_packets;
u64 rx_packets;
u32 poll_counter;
+   struct net_lro_mgr lro_mgr;
+   struct net_lro_desc lro_desc[MAX_LRO_DESCRIPTORS];
 };
 
 
@@ -427,6 +433,7 @@ struct ehea_port {
u32 msg_enable;
u32 sig_comp_iv;
u32 state;
+   u32 lro_max_aggr;
u8 full_duplex;
u8 autoneg;
u8 num_def_qps;
diff --git a/drivers/net/ehea/ehea_ethtool.c b/drivers/net/ehea/ehea_ethtool.c
index decec8c..29ef7a9 100644
--- a/drivers/net/ehea/ehea_ethtool.c
+++ b/drivers/net/ehea/ehea_ethtool.c
@@ -183,6 +183,9 @@ static char ehea_ethtool_stats_keys[][ETH_GSTRING_LEN] = {
{PR5 free_swqes},
{PR6 free_swqes},
{PR7 free_swqes},
+   {LRO aggregated},
+   {LRO flushed},
+   {LRO no_desc},
 };
 
 static void ehea_get_strings(struct net_device *dev, u32 stringset, u8 *data)
@@ -239,6 +242,18 @@ static void ehea_get_ethtool_stats(struct net_device *dev,
for (k = 0; k  8; k++)
data[i++] = atomic_read(port-port_res[k].swqe_avail);
 
+   for (k = 0, tmp = 0; k  EHEA_MAX_PORT_RES; k++)
+   tmp |= port-port_res[k].lro_mgr.stats.aggregated;
+   data[i++] = tmp;
+
+   for (k = 0, tmp = 0; k  EHEA_MAX_PORT_RES; k++)
+   tmp |= port-port_res[k].lro_mgr.stats.flushed;
+   data[i++] = tmp;
+
+   for (k = 0, tmp = 0; k  EHEA_MAX_PORT_RES; k++)
+   tmp |= port-port_res[k].lro_mgr.stats.no_desc;
+   data[i++] = tmp;
+
 }
 
 const struct ethtool_ops ehea_ethtool_ops = {
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 9756211..fbaa395 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -52,6 +52,8 @@ static int rq2_entries = EHEA_DEF_ENTRIES_RQ2;
 static int rq3_entries = EHEA_DEF_ENTRIES_RQ3;
 static int sq_entries = EHEA_DEF_ENTRIES_SQ;
 static int use_mcs = 0;
+static int use_lro = 0;
+static int lro_max_aggr = EHEA_LRO_MAX_AGGR;
 static int num_tx_qps = EHEA_NUM_TX_QP;
 
 module_param(msg_level, int, 0);
@@ -60,6 +62,8 @@ module_param(rq2_entries, int, 0);
 module_param(rq3_entries, int, 0);
 module_param(sq_entries, int, 0);
 module_param(use_mcs, int, 0);
+module_param(use_lro, int, 0);
+module_param(lro_max_aggr, int, 0);
 module_param(num_tx_qps, int, 0);
 
 MODULE_PARM_DESC(num_tx_qps, Number of TX-QPS);
@@ -77,6 +81,10 @@ MODULE_PARM_DESC(sq_entries,  Number of entries for the 
Send Queue  
 [2^x - 1], x = [6..14]. Default = 
 __MODULE_STRING(EHEA_DEF_ENTRIES_SQ) ));
 MODULE_PARM_DESC(use_mcs,  0:NAPI, 1:Multiple receive queues, Default = 1 );
+MODULE_PARM_DESC(lro_max_aggr,  LRO: Max packets to be aggregated. Default = 
+__MODULE_STRING(EHEA_LRO_MAX_AGGR));
+MODULE_PARM_DESC(use_lro,  Large Receive Offload, 1: enable, 0: disable, 
+ Default = 0);
 
 static int port_name_cnt = 0;
 static LIST_HEAD(adapter_list);
@@ -389,6 +397,60 @@ static int ehea_treat_poll_error(struct ehea_port_res *pr, 
int rq,
return 0;
 }
 
+static int get_skb_hdr(struct sk_buff *skb, void **iphdr,
+  void **tcph, u64 *hdr_flags, void *priv)
+{
+   struct ehea_cqe *cqe = priv;
+   unsigned int ip_len;
+   struct iphdr *iph;
+
+

[PATCH 1/1] lro: Generic Large Receive Offload for TCP traffic

2007-08-03 Thread Jan-Bernd Themann
This patch provides generic Large Receive Offload (LRO) functionality
for IPv4/TCP traffic.

LRO combines received tcp packets to a single larger tcp packet and 
passes them then to the network stack in order to increase performance
(throughput). The interface supports two modes: Drivers can either pass
SKBs or fragment lists to the LRO engine. 

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]


---
 include/linux/inet_lro.h |  177 ++
 net/ipv4/Kconfig |8 +
 net/ipv4/Makefile|1 +
 net/ipv4/inet_lro.c  |  600 ++
 4 files changed, 786 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/inet_lro.h
 create mode 100644 net/ipv4/inet_lro.c

diff --git a/include/linux/inet_lro.h b/include/linux/inet_lro.h
new file mode 100644
index 000..e1fc1d1
--- /dev/null
+++ b/include/linux/inet_lro.h
@@ -0,0 +1,177 @@
+/*
+ *  linux/include/linux/inet_lro.h
+ *
+ *  Large Receive Offload (ipv4 / tcp)
+ *
+ *  (C) Copyright IBM Corp. 2007
+ *
+ *  Authors:
+ *   Jan-Bernd Themann [EMAIL PROTECTED]
+ *   Christoph Raisch [EMAIL PROTECTED]
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef __INET_LRO_H_
+#define __INET_LRO_H_
+
+#include net/ip.h
+#include net/tcp.h
+
+/*
+ * LRO statistics
+ */
+
+struct net_lro_stats {
+   unsigned long aggregated;
+   unsigned long flushed;
+   unsigned long no_desc;
+};
+
+/*
+ * LRO descriptor for a tcp session
+ */
+struct net_lro_desc {
+   struct sk_buff *parent;
+   struct sk_buff *last_skb;
+   struct skb_frag_struct *next_frag;
+   struct iphdr *iph;
+   struct tcphdr *tcph;
+   struct vlan_group *vgrp;
+   __wsum  data_csum;
+   u32 tcp_rcv_tsecr;
+   u32 tcp_rcv_tsval;
+   u32 tcp_ack;
+   u32 tcp_next_seq;
+   u32 skb_tot_frags_len;
+   u16 ip_tot_len;
+   u16 tcp_saw_tstamp; /* timestamps enabled */
+   u16 tcp_window;
+   u16 vlan_tag;
+   int pkt_aggr_cnt;   /* counts aggregated packets */
+   int vlan_packet;
+   int mss;
+   int active;
+};
+
+/*
+ * Large Receive Offload (LRO) Manager
+ *
+ * Fields must be set by driver
+ */
+
+struct net_lro_mgr {
+   struct net_device *dev;
+   struct net_lro_stats stats;
+
+   /* LRO features */
+   unsigned long features;
+#define LRO_F_NAPI1  /* Pass packets to stack via NAPI */
+#define LRO_F_EXTRACT_VLAN_ID 2  /* Set flag if VLAN IDs are extracted
+   from received packets and eth protocol
+   is still ETH_P_8021Q */
+
+   u32 ip_summed;  /* Set in non generated SKBs in page mode */
+   u32 ip_summed_aggr; /* Set in aggregated SKBs: CHECKSUM_UNNECESSARY
+* or CHECKSUM_NONE */
+
+   int max_desc; /* Max number of LRO descriptors  */
+   int max_aggr; /* Max number of LRO packets to be aggregated */
+
+   struct net_lro_desc *lro_arr; /* Array of LRO descriptors */
+
+   /*
+* Optimized driver functions
+*
+* get_skb_header: returns tcp and ip header for packet in SKB
+*/
+   int (*get_skb_header)(struct sk_buff *skb, void **ip_hdr,
+ void **tcpudp_hdr, u64 *hdr_flags, void *priv);
+
+   /* hdr_flags: */
+#define LRO_IPV4 1 /* ip_hdr is IPv4 header */
+#define LRO_TCP  2 /* tcpudp_hdr is TCP header */
+
+   /*
+* get_frag_header: returns mac, tcp and ip header for packet in SKB
+*
+* @hdr_flags: Indicate what kind of LRO has to be done
+* (IPv4/IPv6/TCP/UDP)
+*/
+   int (*get_frag_header)(struct skb_frag_struct *frag, void **mac_hdr,
+  void **ip_hdr, void **tcpudp_hdr, u64 *hdr_flags,
+  void *priv);
+};
+
+/*
+ * Processes a SKB
+ *
+ * @lro_mgr: LRO manager to use
+ * @skb: SKB to aggregate
+ * @priv: Private data that may be used by driver functions
+ *(for example get_tcp_ip_hdr)
+ */
+
+void lro_receive_skb(struct net_lro_mgr *lro_mgr,
+struct sk_buff *skb,
+void *priv);
+
+/*
+ * Processes a SKB with VLAN HW acceleration support
+ */
+
+void lro_vlan_hwaccel_receive_skb(struct 

[PATCH 0/1] lro: Generic Large Receive Offload for TCP traffic

2007-08-03 Thread Jan-Bernd Themann
Hi,

I think this patch could be the final version for now. It has been tested
on two platforms (power and x86_64) and works very well.

Apart from David Miller and Evgeniy Polaykov, we'd like to thank especially
Andrew Gallatin for his great reviews and help to make that happen.

After some discussion we decided to post the LRO patch separately from the 
driver patches. Our final driver patches for LRO will be posted later with
some additional fixes for upstream inclusion to the netdev git.
However, I'll also post our LRO patch for the driver today as an example
of how to use this interface.

Thanks a lot,
Jan-Bernd 

[PATCH 1/1] lro: Generic Large Receive Offload for TCP traffic

Changes to http://www.spinics.net/lists/netdev/msg37084.html

1) Fixed the LRO_MAX_PG_HLEN bug

2) skb-ip_summed can now be defined by driver for aggregated packets

3) The problem that the ramp up for tcp connections between machines
   with different MTU size (1500 vs 9000) is very slow has been fixed
   by setting skb-gso_size.

4) Checksum problem for little endian machines has been fixed

5) missing additon of vlan_hdr_len for TCP header determination has been added.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange tcp behavior

2007-08-03 Thread Evgeniy Polyakov
On Fri, Aug 03, 2007 at 01:03:46PM +0100, Simon Arlott ([EMAIL PROTECTED]) 
wrote:
 On Fri, August 3, 2007 12:56, Evgeniy Polyakov wrote:
  On Fri, Aug 03, 2007 at 12:21:46PM +0100, Simon Arlott ([EMAIL PROTECTED]) 
  wrote:
  Since the connection is considered closed, couldn't another socket re-use 
  it?
 
  Socket A: Recv data (unread)
  Socket A: Recv RST
  Socket B: Reuses connection (same IPs/ports)
  Socket A: Close
 
  Wouldn't that disrupt socket B's use of the connection?
 
  Then it will drop our data, since there were no appropriate handhsake.
 
 Couldn't the sequence numbers be close enough to make the RST valid?

It does not matter - if connection is not in synchronized state all
unrelated data is dropped, so remote side is only allowed to receive syn
flag only, anything else must be dropped. If remote side does not do
that, it violates RFC.

 -- 
 Simon Arlott

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch]support for USB autosuspend in the asix driver

2007-08-03 Thread Oliver Neukum
Hi,

this implements support for USB autosuspend in the asix USB ethernet
driver.

Regards
Oliver
Signed-off-by: Oliver Neukum [EMAIL PROTECTED]
---

--- a/drivers/net/usb/asix.c2007-08-03 13:16:31.0 +0200
+++ b/drivers/net/usb/asix.c2007-08-03 13:17:05.0 +0200
@@ -1474,6 +1474,7 @@ static struct usb_driver asix_driver = {
.suspend =  usbnet_suspend,
.resume =   usbnet_resume,
.disconnect =   usbnet_disconnect,
+   .supports_autosuspend = 1,
 };
 
 static int __init asix_init(void)
--- a/drivers/net/usb/usbnet.c  2007-08-03 13:16:53.0 +0200
+++ b/drivers/net/usb/usbnet.c  2007-08-03 13:19:31.0 +0200
@@ -588,6 +588,7 @@ static int usbnet_stop (struct net_devic
dev-flags = 0;
del_timer_sync (dev-delay);
tasklet_kill (dev-bh);
+   usb_autopm_put_interface(dev-intf);
 
return 0;
 }
@@ -601,9 +602,19 @@ static int usbnet_stop (struct net_devic
 static int usbnet_open (struct net_device *net)
 {
struct usbnet   *dev = netdev_priv(net);
-   int retval = 0;
+   int retval;
struct driver_info  *info = dev-driver_info;
 
+   if ((retval = usb_autopm_get_interface(dev-intf))  0) {
+   if (netif_msg_ifup (dev))
+   devinfo (dev,
+   resumption fail (%d) usbnet usb-%s-%s, %s,
+   retval,
+   dev-udev-bus-bus_name, dev-udev-devpath,
+   info-description);
+   goto done_nopm;
+   }
+
// put into known safe state
if (info-reset  (retval = info-reset (dev))  0) {
if (netif_msg_ifup (dev))
@@ -657,7 +668,10 @@ static int usbnet_open (struct net_devic
 
// delay posting reads until we're fully open
tasklet_schedule (dev-bh);
+   return retval;
 done:
+   usb_autopm_put_interface(dev-intf);
+done_nopm:
return retval;
 }
 
@@ -1141,6 +1155,7 @@ usbnet_probe (struct usb_interface *udev
 
dev = netdev_priv(net);
dev-udev = xdev;
+   dev-intf = udev;
dev-driver_info = info;
dev-driver_name = name;
dev-msg_enable = netif_msg_init (msg_level, NETIF_MSG_DRV
@@ -1265,12 +1280,18 @@ int usbnet_suspend (struct usb_interface
struct usbnet   *dev = usb_get_intfdata(intf);
 
if (!dev-suspend_count++) {
-   /* accelerate emptying of the rx and queues, to avoid
+   /*
+* accelerate emptying of the rx and queues, to avoid
 * having everything error out.
 */
netif_device_detach (dev-net);
(void) unlink_urbs (dev, dev-rxq);
(void) unlink_urbs (dev, dev-txq);
+   /*
+* reattach so runtime management can use and
+* wake the device
+*/
+   netif_device_attach (dev-net);
}
return 0;
 }
@@ -1280,10 +1301,9 @@ int usbnet_resume (struct usb_interface 
 {
struct usbnet   *dev = usb_get_intfdata(intf);
 
-   if (!--dev-suspend_count) {
-   netif_device_attach (dev-net);
+   if (!--dev-suspend_count)
tasklet_schedule (dev-bh);
-   }
+
return 0;
 }
 EXPORT_SYMBOL_GPL(usbnet_resume);
--- a/drivers/net/usb/usbnet.h  2007-08-03 13:16:44.0 +0200
+++ b/drivers/net/usb/usbnet.h  2007-08-03 13:17:05.0 +0200
@@ -28,6 +28,7 @@
 struct usbnet {
/* housekeeping */
struct usb_device   *udev;
+   struct usb_interface*intf;
struct driver_info  *driver_info;
const char  *driver_name;
wait_queue_head_t   *wait;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 0/5][RFC] Update network drivers to use devres

2007-08-03 Thread Tejun Heo
Hello,

Stephen Hemminger wrote:
 Skimming through drivers... via-rhine doesn't disable PCI device on
 init failure path but does so on removal.  sky2 doesn't free
 consistent memory if sky2_init() fails.  acenic calls iounmap() with
 NULL parameter which I'm not sure whether it's safe or not.  natsemi
 doesn't disable PCI device on failure or removal.
 
 Did you report these to the developers?

Just skimmed through.  I'm pretty sure Brandon will pick those up later.

 Devres makes low level drivers simpler, easier to get right and
 maintain.  Writing new drivers becomes easier too.  So, why not?

 Network devices seem to work fine thanks, and the resource requirements
 are different. If ain't broke, don't fix it.
 Care to enlighten me on how the resource requirments are different
 from ATA drivers?
 
 I was thinking of the hot remove (no mod ref counts) and lingering
 /sys open issues.  ATA drivers use ref counts.

I guess the hot removing is done by severing netdev from the actual
device, right?  I don't see how that affects usage of devres on network
drivers.  Am I missing something?

On a separate note, can you explain lingering /sys open issue to me a
bit?  With recent sysfs changes, sysfs nodes are disconnected
immediately on deletion.  Would that make any difference to netdevs?

 My take on devres is that it is similar to talloc() for device drivers.
 Not a bad idea in itself, but the real advantage of hierarchical allocation
 is that it makes exception handling easier if things are layered deeply.

Yeah, devres made layering easier in libata, especially SFF stuff.
Dunno how much of that is applicable to netdev but, with or without
layering, it'll be a nice cleanup and I don't see much negative side.
Conversion would take some work and bugs might be introduced in the
process as with any changes but the good thing about devres is that
you're very likely to get failure/release paths right if you get the
init path right, and if you get the init path wrong, it will stand out
like a sore thumb - easy to spot, easy to fix.

So, I think using devres on net drivers is a good idea, well, for that
matter, for any driver, but me being the devres writer, that isn't
really surprising, is it?

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Distributed storage.

2007-08-03 Thread Evgeniy Polyakov
On Fri, Aug 03, 2007 at 02:27:52PM +0200, Peter Zijlstra ([EMAIL PROTECTED]) 
wrote:
 On Fri, 2007-08-03 at 14:57 +0400, Evgeniy Polyakov wrote:
 
  For receiving situation is worse, since system does not know in advance
  to which socket given packet will belong to, so it must allocate from
  global pool (and thus there must be independent global reserve), and
  then exchange part of the socket's reserve to the global one (or just
  copy packet to the new one, allocated from socket's reseve is it was
  setup, or drop it otherwise). Global independent reserve is what I
  proposed when stopped to advertise network allocator, but it seems that
  it was not taken into account, and reserve was always allocated only
  when system has serious memory pressure in Peter's patches without any
  meaning for per-socket reservation.
 
 This is not true. I have a global reserve which is set-up a priori. You
 cannot allocate a reserve when under pressure, that does not make sense.

I probably did not cut enough details - my main position is to allocate
per socket reserve from socket's queue, and copy data there from main
reserve, all of which are allocated either in advance (global one) or
per sockoption, so that there would be no fairness issues what to mark 
as special and what to not.

Say we have a page per socket, each socket can assign a reserve for
itself from own memory, this accounts both tx and rx side. Tx is not
interesting, it is simple, rx has global reserve (always allocated on 
startup or sometime way before reclaim/oom)where data is originally 
received (including skb, shared info and whatever is needed, page is 
just an exmaple), then it is copied into per-socket reserve and reused 
for the next packet. Having per-socket reserve allows to have progress 
in any situation not only in cases where single action must be 
received/processed, and allows to be completely fair for all users, but
not only special sockets, thus admin for example would be allowed to
login, ipsec would work and so on...

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Distributed storage.

2007-08-03 Thread Evgeniy Polyakov
Hi Mike.

On Fri, Aug 03, 2007 at 12:09:02AM -0400, Mike Snitzer ([EMAIL PROTECTED]) 
wrote:
  * storage can be formed on top of remote nodes and be exported
  simultaneously (iSCSI is peer-to-peer only, NBD requires device
  mapper and is synchronous)
 
 Having the in-kernel export is a great improvement over NBD's
 userspace nbd-server (extra copy, etc).
 
 But NBD's synchronous nature is actually an asset when coupled with MD
 raid1 as it provides guarantees that the data has _really_ been
 mirrored remotely.

I believe, that the right answer to this is barrier, but not synchronous
sending/receiving, which might slow things down noticebly. Barrier must
wait until remote side received data and send back a notice. Until
acknowledge is received, no one can say if data mirrored or ever
received by remote node or not.

  TODO list currently includes following main items:
  * redundancy algorithm (drop me a request of your own, but it is highly
  unlikley that Reed-Solomon based will ever be used - it is too slow
  for distributed RAID, I consider WEAVER codes)
 
 I'd like to better understand where you see DST heading in the area of
 redundancy.Based on your blog entries:
 http://tservice.net.ru/~s0mbre/blog/devel/dst/2007_07_24_1.html
 http://tservice.net.ru/~s0mbre/blog/devel/dst/2007_07_31_2.html
 (and your todo above) implementing a mirroring algorithm appears to be
 a near-term goal for you.  Can you comment on how your intended
 implementation would compare, in terms of correctness and efficiency,
 to say MD (raid1) + NBD?  MD raid1 has a write intent bitmap that is
 useful to speed resyncs; what if any mechanisms do you see DST
 embracing to provide similar and/or better reconstruction
 infrastructure?  Do you intend to embrace any exisiting MD or DM
 infrastructure?

Depending on what algorithm will be preferred - I do not want mirroring,
it is _too_ wasteful in terms of used storage, but it is the simplest.
Right now I still consider WEAVER codes as the fastest in distributed
envornment from what I checked before, but it is quite complex and spec
is (at least for me) not clear in all aspects right now. I did not even
start userspace implementation of that codes. (Hint: spec sucks, kidding :)

For simple mirroring each node must be split to chunks, each one has
representation bin in main node mask, when dirty full chunk is resynced.
Depending on node size and amount of memory chunk size varies. Setup is
performed during node initialization. Having checksum for each chunk
is a good step.

All interfaces are already there, although require cleanup and move from
place to place, but I decided to make initial release small.

 BTW, you have definitely published some very compelling work and its
 sad that you're predisposed to think DST won't be recieved well if you
 pushed for inclusion (for others, as much was said in the 7.31.2007
 blog post I referenced above).  Clearly others need to embrace DST to
 help inclusion become a reality.  To that end, its great to see that
 Daniel Phillips and the other zumastor folks will be putting DST
 through its paces.

In that blog entry I misspelled Zen with Xen - that's an error,
according to prognosis - time will judge :)

 regards,
 Mike
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 0/5][RFC] Update network drivers to use devres

2007-08-03 Thread Stephen Hemminger
On Thu, 2 Aug 2007 15:42:06 -0700
Brandon Philips [EMAIL PROTECTED] wrote:

 This patch set adds support for devres in the net core and converts the
 e100 and e1000 drivers to devres.  Devres is a simple resource manager
 for device drivers, see Documentation/driver-model/devres.txt for more
 information.
 
 The use of devres will remain optional for drivers with this patch set.
 Drivers can be converted when it makes sense.

Just because devres exists is not sufficient motivation to change.

It seems that devres was a band-aid rather than fixing storage drivers
to have proper DMA lifetimes. 
Network devices seem to work fine thanks, and the resource requirements
are different. If ain't broke, don't fix it.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] genirq: fix simple and fasteoi irq handlers

2007-08-03 Thread Jarek Poplawski
On Fri, Aug 03, 2007 at 10:04:08AM +0200, Ingo Molnar wrote:
 
 * Jarek Poplawski [EMAIL PROTECTED] wrote:
 
  I can't guarantee this is all needed to fix this bug, but I think this 
  patch is necessary here.
 
 hmmm ... very interesting! Now _this_ is something we'd like to see 
 tested. Could you send a patch to Marcin that also undoes the workaround 
 we have in place now, so that he could check whether ne2k-pci works fine 
 with your fix alone?

I'm not sure this is needed... Marcin got this patch, I hope, and I
don't have another possibility to contact with him. Since he managed
with this bisection and all the previous patches I don't think there
could be any problems, so:

Marcin! I'd be very glad if you could test this patch alone; this
should apply without any problems to 2.6.21 (with some offset) and
later vanilla versions (or try to revert Ingo's last patch with
patch -p1 -R). Please, contact me on any problems (alas not during
the weekend...).

Thanks,
Jarek P.

PS: of course, I'm very curious of this testing too, but, on the other
hand, as I've written earlier, I think this patch is needed for logical
reasons only, and it really doesn't look like it could make any damage
here.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange tcp behavior

2007-08-03 Thread Simon Arlott
On 03/08/07 13:09, Evgeniy Polyakov wrote:
 On Fri, Aug 03, 2007 at 01:03:46PM +0100, Simon Arlott ([EMAIL PROTECTED]) 
 wrote:
 On Fri, August 3, 2007 12:56, Evgeniy Polyakov wrote:
  On Fri, Aug 03, 2007 at 12:21:46PM +0100, Simon Arlott ([EMAIL PROTECTED]) 
  wrote:
  Since the connection is considered closed, couldn't another socket re-use 
  it?
 
  Socket A: Recv data (unread)
  Socket A: Recv RST
  Socket B: Reuses connection (same IPs/ports)
  Socket A: Close
 
  Wouldn't that disrupt socket B's use of the connection?
 
  Then it will drop our data, since there were no appropriate handhsake.
 
 Couldn't the sequence numbers be close enough to make the RST valid?
 
 It does not matter - if connection is not in synchronized state all
 unrelated data is dropped, so remote side is only allowed to receive syn
 flag only, anything else must be dropped. If remote side does not do
 that, it violates RFC.

Except the remote side has a connection, because another one can be made 
before the existing connection is closed:

17:37:37.377571 IP 192.168.7.4.50550  192.168.7.8.2500: S 
134077329:134077329(0) win 1500 (raw)
17:37:37.382352 IP 192.168.7.8.2500  192.168.7.4.50550: S 
3460060233:3460060233(0) ack 134077330 win 14360 mss 7180 (accept)
17:37:37.377966 IP 192.168.7.4.50550  192.168.7.8.2500: . ack 1 win 1500 (raw)
17:37:37.378128 IP 192.168.7.4.50550  192.168.7.8.2500: P 1:17(16) ack 1 win 
1500 (raw)
17:37:37.378162 IP 192.168.7.8.2500  192.168.7.4.50550: . ack 17 win 14360
17:37:37.378131 IP 192.168.7.4.50550  192.168.7.8.2500: R 
134077346:134077346(0) win 1500 (raw)

17:37:37.412709 IP 192.168.7.4.50550  192.168.7.8.2500: SWE 
3257207813:3257207813(0) win 14280 mss 7140,sackOK,timestamp 3601441543 
0,nop,wscale 5 (connect)
17:37:37.412785 IP 192.168.7.8.2500  192.168.7.4.50550: SE 
3495384256:3495384256(0) ack 3257207814 win 14336 mss 7180,sackOK,timestamp 
4294812905 3601441543,nop,wscale 6 (accept)
17:37:37.412960 IP 192.168.7.4.50550  192.168.7.8.2500: . ack 1 win 447 
nop,nop,timestamp 3601441543 4294812905

17:37:38.383085 IP 192.168.7.8.2500  192.168.7.4.50550: R 
4259643274:4259643274(0) ack 1171836829 win 14360 (close (previous connection))

17:37:47.417649 IP 192.168.7.8.2500  192.168.7.4.50550: F 1:1(0) ack 1 win 224 
nop,nop,timestamp 4294822910 3601441543 (close)
17:37:47.417993 IP 192.168.7.4.50550  192.168.7.8.2500: F 1:1(0) ack 2 win 447 
nop,nop,timestamp 3601444045 4294822910 (read returned)
17:37:47.418466 IP 192.168.7.8.2500  192.168.7.4.50550: . ack 2 win 224 
nop,nop,timestamp 4294822911 3601444045


The second connection also modified the RST|ACK that was sent compared to no 
second connection:

17:38:03.532703 IP 192.168.7.4.50550  192.168.7.8.2500: S 82517575:82517575(0) 
win 1500 (raw)
17:38:03.532832 IP 192.168.7.8.2500  192.168.7.4.50550: S 
3495449795:3495449795(0) ack 82517576 win 14360 mss 7180 (accept)
17:38:03.533388 IP 192.168.7.4.50550  192.168.7.8.2500: . ack 1 win 1500 (raw)
17:38:03.533457 IP 192.168.7.4.50550  192.168.7.8.2500: P 1:17(16) ack 1 win 
1500 (raw)
17:38:03.533597 IP 192.168.7.8.2500  192.168.7.4.50550: . ack 17 win 14360
17:38:03.533589 IP 192.168.7.4.50550  192.168.7.8.2500: R 82517592:82517592(0) 
win 1500 (raw)

17:38:04.536277 IP 192.168.7.8.2500  192.168.7.4.50550: R 1:1(0) ack 17 win 
14360 (close)


17:38:04.536277 IP 192.168.7.8.2500  192.168.7.4.50550: R 1:1(0) ack 17 win 
14360
vs
17:37:38.383085 IP 192.168.7.8.2500  192.168.7.4.50550: R 
4259643274:4259643274(0) ack 1171836829 win 14360
What happened there ?


On the server, run tcptest-server.c, which waits for 1s on the first connection 
then 10s on the second connection.

On the client, run:
iptables -I INPUT -i eth0 -p tcp --dport 50550 -j DROP; ./client; iptables -D 
INPUT -i eth0 -p tcp --dport 50550 -j DROP; ./tcptest-client

(client.c from john's original email)

-- 
Simon Arlott

#include sys/types.h
#include sys/socket.h
#include arpa/inet.h
#include poll.h
#include fcntl.h

#define PORT 2500

#define xerror(str) do { perror(str); exit(1); } while (0)

int main(void) {
	struct sockaddr_in sa;
	int l, s, tmp;
	int t = 0;

	memset(sa, 0, sizeof(sa));
	
	l = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
	if (!l)
		xerror(socket);
		
	sa.sin_family = AF_INET;
	sa.sin_addr.s_addr = htonl(INADDR_ANY);
	sa.sin_port = htons(PORT);

	tmp = 1;
	setsockopt(l, SOL_SOCKET, SO_REUSEADDR, (char*)tmp, sizeof(tmp));

	if (bind(l, (struct sockaddr*)sa, sizeof(sa)) != 0)
		xerror(bind);

	if (listen(l, 0) != 0)
		xerror(listen);

	printf(server %d ready...\n, getpid());

	for (t = 1; t = 2; t++) {
		s = accept(l, NULL, NULL);
		switch (fork()) {
			case -1:
xerror(fork);
break;
			case 0:
switch (t) {
	case 1:
		printf(server %d accepted connection\n, getpid());

#if 0
		tmp = fcntl(s, F_GETFL, 0);
		if (fcntl(s, F_SETFL, tmp | O_NONBLOCK) != 0)
			xerror(fcntl);

		if (send(s, AAA, 7, 0) != 7)
			xerror(send);
#endif

		printf(server %d waiting for 1 

Re: [patch] genirq: fix simple and fasteoi irq handlers

2007-08-03 Thread Ingo Molnar

* Ingo Molnar [EMAIL PROTECTED] wrote:

 * Jarek Poplawski [EMAIL PROTECTED] wrote:
 
  I can't guarantee this is all needed to fix this bug, but I think 
  this patch is necessary here.
 
 hmmm ... very interesting! Now _this_ is something we'd like to see 
 tested. Could you send a patch to Marcin that also undoes the 
 workaround we have in place now, so that he could check whether 
 ne2k-pci works fine with your fix alone?

or it would be nice if Marcin could test pure 2.6.22 plus your fix 
(without any other patches applied).

Ingo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PATCH] ucc_geth fixes for 2.6.22-rc1

2007-08-03 Thread Li Yang
Please pull from 'ucc_geth' branch of
master.kernel.org:/pub/scm/linux/kernel/git/leo/fsl-soc.git ucc_geth

to receive the following fixes:

 drivers/net/ucc_geth_ethtool.c |1 -
 drivers/net/ucc_geth_mii.c |3 ++-
 2 files changed, 2 insertions(+), 2 deletions(-)

Domen Puncer (1):
  ucc_geth: fix section mismatch

Jan Altenberg (1):
  ucc_geth: remove get_perm_addr from ucc_geth_ethtool.c


diff --git a/drivers/net/ucc_geth_ethtool.c
b/drivers/net/ucc_geth_ethtool.c
index a8994c7..64bef7c 100644
--- a/drivers/net/ucc_geth_ethtool.c
+++ b/drivers/net/ucc_geth_ethtool.c
@@ -379,7 +379,6 @@ static const struct ethtool_ops uec_ethtool_ops = {
.get_stats_count= uec_get_stats_count,
.get_strings= uec_get_strings,
.get_ethtool_stats  = uec_get_ethtool_stats,
-   .get_perm_addr  = ethtool_op_get_perm_addr,
 };
 
 void uec_set_ethtool_ops(struct net_device *netdev)
diff --git a/drivers/net/ucc_geth_mii.c b/drivers/net/ucc_geth_mii.c
index 5f8c2d3..6c257b8 100644
--- a/drivers/net/ucc_geth_mii.c
+++ b/drivers/net/ucc_geth_mii.c
@@ -272,7 +272,8 @@ int __init uec_mdio_init(void)
return of_register_platform_driver(uec_mdio_driver);
 }
 
-void __exit uec_mdio_exit(void)
+/* called from __init ucc_geth_init, therefore can not be __exit */
+void uec_mdio_exit(void)
 {
of_unregister_platform_driver(uec_mdio_driver);
 }

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/5][RFC] Update net core to use devres.

2007-08-03 Thread Tejun Heo
 +static inline void * register_netdev_devres(struct device *gendev,
 + struct net_device *dev)
 +{
 + struct net_device **p;
 +
 + /* 0 size because we don't need it. The net_device is already alloc'd
 +  * in alloc_netdev_mq.  We can't use devm_kzalloc in alloc_netdeev_mq
 +  * because a net_device cannot be free'd directly as it can be a
 +  * kobject.  See free_netdev.
 +  */
 + p = devres_alloc(devm_free_netdev, 0, GFP_KERNEL);

s/0/sizeof(*p)/

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/5][RFC] Update e100 driver to use devres.

2007-08-03 Thread Tejun Heo
On Thu, Aug 02, 2007 at 03:45:37PM -0700, Brandon Philips wrote:
   if((err = pci_request_regions(pdev, DRV_NAME))) {
   DPRINTK(PROBE, ERR, Cannot obtain PCI resources, aborting.\n);
 - goto err_out_disable_pdev;
 + return err;
   }
  
   if((err = pci_set_dma_mask(pdev, DMA_32BIT_MASK))) {
   DPRINTK(PROBE, ERR, No usable DMA configuration, aborting.\n);
 - goto err_out_free_res;
 + return err;
   }
  
   SET_MODULE_OWNER(netdev);
 @@ -2613,11 +2606,11 @@ static int __devinit e100_probe(struct p
   if (use_io)
   DPRINTK(PROBE, INFO, using i/o access mode\n);
  
 - nic-csr = pci_iomap(pdev, (use_io ? 1 : 0), sizeof(struct csr));
 + nic-csr = pcim_iomap(pdev, (use_io ? 1 : 0), sizeof(struct csr));
   if(!nic-csr) {
   DPRINTK(PROBE, ERR, Cannot map device registers, aborting.\n);
   err = -ENOMEM;
 - goto err_out_free_res;
 + return err;

Calls to pci_request_regions() and pcim_iomap() can be merged into
pcim_iomap_regions().

Other than that, Acked-by: Tejun Heo [EMAIL PROTECTED]

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/5][RFC] NET: Change pci_enable_device to pci_reenable_device to keep device enable balance

2007-08-03 Thread Tejun Heo
Brandon Philips wrote:
 On a slot_reset event pci_disable_device() is never called so calling
 pci_enable_device() will unbalance the enable count.
 
 Signed-off-by: Brandon Philips [EMAIL PROTECTED]

Acked-by: Tejun Heo [EMAIL PROTECTED]

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 4/5][RFC] Implement devm_kcalloc

2007-08-03 Thread Tejun Heo
On Thu, Aug 02, 2007 at 03:45:45PM -0700, Brandon Philips wrote:
  /**
 + * devm_kcalloc - resource-managed kcalloc
 + * @dev: Device to allocate memory for
 + * @n: number of elements.
 + * @size: element size.
 + * @flags: the type of memory to allocate.
 + */
 +inline void * devm_kcalloc(struct device * dev, size_t n, size_t size,
 +gfp_t flags)
 +{
 +if (n != 0  size  ULONG_MAX / n)
 +return NULL;
 +return devm_kzalloc(dev, n * size, flags);
 +}
 +EXPORT_SYMBOL_GPL(devm_kcalloc);

Please drop inline.  It's meaningless.

Other than that, Acked-by: Tejun Heo [EMAIL PROTECTED]

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange tcp behavior

2007-08-03 Thread Simon Arlott

On Fri, August 3, 2007 09:25, Evgeniy Polyakov wrote:
 On Thu, Aug 02, 2007 at 07:58:03PM +0100, Simon Arlott ([EMAIL PROTECTED]) 
 wrote:
 19:24:32.897071 IP 192.168.7.4.5  192.168.7.8.2500: S 
 705362199:705362199(0) win 1500
 19:24:32.897211 IP 192.168.7.8.2500  192.168.7.4.5: S 
 4159455228:4159455228(0) ack 705362200 win
 14360 mss 7180
 19:24:32.920784 IP 192.168.7.4.5  192.168.7.8.2500: . ack 1 win 1500
 19:24:32.921732 IP 192.168.7.4.5  192.168.7.8.2500: P 1:17(16) ack 1 
 win 1500
 19:24:32.921795 IP 192.168.7.8.2500  192.168.7.4.5: . ack 17 win 14360
 19:24:32.922881 IP 192.168.7.4.5  192.168.7.8.2500: R 
 705362216:705362216(0) win 1500
 19:24:34.927717 IP 192.168.7.8.2500  192.168.7.4.5: R 1:1(0) ack 17 win 
 14360

 According to RFC 793, the RST from .4 means that the connection
 is CLOSED.

 RFC 2525 - common tcp problems, says we should send RST in this case,
 although it does not specify should we send it if socket is in CLOSED
 state or not. Well, we send :)
 Even if tcp_send_active_reset() will check if socket is in CLOSED state
 and will not send data, but is still there, it will not be easily
 triggered though, but it can be possible.

Since the connection is considered closed, couldn't another socket re-use it?

Socket A: Recv data (unread)
Socket A: Recv RST
Socket B: Reuses connection (same IPs/ports)
Socket A: Close

Wouldn't that disrupt socket B's use of the connection?

-- 
Simon Arlott
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 5/5][RFC] Update e1000 driver to use devres.

2007-08-03 Thread Tejun Heo
On Thu, Aug 02, 2007 at 03:45:52PM -0700, Brandon Philips wrote:
   if ((err = pci_request_regions(pdev, e1000_driver_name)))
 - goto err_pci_reg;
 + goto err_dma;

Why not just return?  Ditto for all goto err_dma's.

   err = -EIO;
 - adapter-hw.hw_addr = ioremap(mmio_start, mmio_len);
 + adapter-hw.hw_addr = devm_ioremap(pdev-dev, mmio_start, mmio_len);

This is correct conversion but I have no idea why the origical code
did manual ioremap instead of using pci_iomap().

 - adapter-hw.flash_address = ioremap(flash_start, flash_len);
 + adapter-hw.flash_address = devm_ioremap(pdev-dev,
 + flash_start,
 + flash_len);

Ditto.

  err_dma:
   pci_disable_device(pdev);
   return err;

err_dma can be killed.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/5][RFC] Update net core to use devres.

2007-08-03 Thread Brandon Philips
On 18:13 Fri 03 Aug 2007, Tejun Heo wrote:
  +   p = devres_alloc(devm_free_netdev, 0, GFP_KERNEL);
 
 s/0/sizeof(*p)/

Oops!  It should have read like this:

+static void * register_netdev_devres(struct device *gendev,
+   struct net_device *dev)
+{
+   void *p;
+
+   /* 0 size because we don't need it. The net_device is already alloc'd
+* in alloc_netdev_mq.  We can't use devm_kzalloc in alloc_netdev_mq
+* because a net_device cannot be free'd directly as it can be a
+* kobject.  See free_netdev.
+*/
+   p = devres_alloc(devm_free_netdev, 0, GFP_KERNEL);
+
+   if (unlikely(!p))
+   return NULL;
+
+   devres_add(gendev, p);
+
+   return dev;
+}

I will send the full correct patch.

Thanks,

Brandon
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Distributed storage.

2007-08-03 Thread Evgeniy Polyakov
Hi.

On Fri, Aug 03, 2007 at 09:04:51AM +0400, Manu Abraham ([EMAIL PROTECTED]) 
wrote:
 On 7/31/07, Evgeniy Polyakov [EMAIL PROTECTED] wrote:
 
  TODO list currently includes following main items:
  * redundancy algorithm (drop me a request of your own, but it is highly
  unlikley that Reed-Solomon based will ever be used - it is too slow
  for distributed RAID, I consider WEAVER codes)
 
 
 LDPC codes[1][2] have been replacing Turbo code[3] with regards to
 communication links and we have been seeing that transition. (maybe
 helpful, came to mind seeing the mention of Turbo code) Don't know how
 weaver compares to LDPC, though found some comparisons [4][5] But
 looking at fault tolerance figures, i guess Weaver is much better.
 
 [1] http://www.ldpc-codes.com/
 [2] http://portal.acm.org/citation.cfm?id=1240497
 [3] http://en.wikipedia.org/wiki/Turbo_code
 [4] 
 http://domino.research.ibm.com/library/cyberdig.nsf/papers/BD559022A190D41C85257212006CEC11/$File/rj10391.pdf
 [5] http://hplabs.hp.com/personal/Jay_Wylie/publications/wylie_dsn2007.pdf

Great thanks for this links, I will definitely study them.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Distributed storage.

2007-08-03 Thread Evgeniy Polyakov
On Thu, Aug 02, 2007 at 02:08:24PM -0700, Daniel Phillips ([EMAIL PROTECTED]) 
wrote:
 On Tuesday 31 July 2007 10:13, Evgeniy Polyakov wrote:
  Hi.
 
  I'm pleased to announce first release of the distributed storage
  subsystem, which allows to form a storage on top of remote and local
  nodes, which in turn can be exported to another storage as a node to
  form tree-like storages.
 
 Excellent!  This is precisely what the doctor ordered for the 
 OCFS2-based distributed storage system I have been mumbling about for 
 some time.  In fact the dd in ddsnap and ddraid stands for distributed 
 data.  The ddsnap/raid devices do not include an actual network 
 transport, that is expected to be provided by a specialized block 
 device, which up till now has been NBD.  But NBD has various 
 deficiencies as you note, in addition to its tendency to deadlock when 
 accessed locally.  Your new code base may be just the thing we always 
 wanted.  We (zumastor et al) will take it for a drive and see if 
 anything breaks.

That would be great.

 Memory deadlock is a concern of course.  From a cursory glance through, 
 it looks like this code is pretty vm-friendly and you have thought 
 quite a lot about it, however I respectfully invite peterz 
 (obsessive/compulsive memory deadlock hunter) to help give it a good 
 going over with me.
 
 I see bits that worry me, e.g.:
 
 + req = mempool_alloc(st-w-req_pool, GFP_NOIO);
 
 which seems to be callable in response to a local request, just the case 
 where NBD deadlocks.  Your mempool strategy can work reliably only if 
 you can prove that the pool allocations of the maximum number of 
 requests you can have in flight do not exceed the size of the pool.  In 
 other words, if you ever take the pool's fallback path to normal 
 allocation, you risk deadlock.

mempool should be allocated to be able to catch up with maximum
in-flight requests, in my tests I was unable to force block layer to put
more than 31 pages in sync, but in one bio. Each request is essentially
dealyed bio processing, so this must handle maximum number of in-flight
bios (if they do not cover multiple nodes, if they do, then each node
requires own request). Sync has one bio in-flight on my machines (from
tiny VIA nodes to low-end amd64), number of normal requests *usually*
does not increase several dozens (less than hundred always), but that
might be only my small systems, so request size was selected as small as
possible and number of allocations decreased to absolutely healthcare 
minimum.

 Anyway, if this is as grand as it seems then I would think we ought to 
 factor out a common transfer core that can be used by all of NBD, 
 iSCSI, ATAoE and your own kernel server, in place of the roll-yer-own 
 code those things have now.
 
 Regards,
 
 Daniel

Thanks.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] lro: Generic Large Receive Offload for TCP traffic

2007-08-03 Thread Jörn Engel
On Fri, 3 August 2007 14:41:19 +0200, Jan-Bernd Themann wrote:
 
 This patch provides generic Large Receive Offload (LRO) functionality
 for IPv4/TCP traffic.
 
 LRO combines received tcp packets to a single larger tcp packet and 
 passes them then to the network stack in order to increase performance
 (throughput). The interface supports two modes: Drivers can either pass
 SKBs or fragment lists to the LRO engine. 

Maybe this is a stupid question, but why is LRO done at the device
driver level?

If it is a unversal performance benefit, I would have expected it to be
done generically, i.e. have all packets moved into network layer pass
through LRO instead.

 +void lro_flush_pkt(struct net_lro_mgr *lro_mgr,
 +struct iphdr *iph, struct tcphdr *tcph);

In particular this bit looks like it should be driven by a timeout,
which would be settable via /proc/sys/net/core/lro_timeout or similar.

Jörn

-- 
Rules of Optimization:
Rule 1: Don't do it.
Rule 2 (for experts only): Don't do it yet.
-- M.A. Jackson
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] lro: myri10ge example how to use LRO

2007-08-03 Thread Andrew Gallatin

To follow up on Jan-Bernd Themann's LRO patch earlier today,
this patch shows how the generic LRO interface can be used for
page based drivers.

Again, many thanks to Jan-Bernd Themann for leading this effort.

Drew

Singed off by: Andrew Gallatin [EMAIL PROTECTED]
diff -urNp a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c
--- a/drivers/net/myri10ge/myri10ge.c   2007-07-24 15:57:12.0 -0400
+++ b/drivers/net/myri10ge/myri10ge.c   2007-08-03 13:07:48.0 -0400
@@ -48,6 +48,7 @@
 #include linux/etherdevice.h
 #include linux/if_ether.h
 #include linux/if_vlan.h
+#include linux/inet_lro.h
 #include linux/ip.h
 #include linux/inet.h
 #include linux/in.h
@@ -62,6 +63,8 @@
 #include linux/io.h
 #include linux/log2.h
 #include net/checksum.h
+#include net/ip.h
+#include net/tcp.h
 #include asm/byteorder.h
 #include asm/io.h
 #include asm/processor.h
@@ -89,6 +92,7 @@ MODULE_LICENSE(Dual BSD/GPL);
 
 #define MYRI10GE_EEPROM_STRINGS_SIZE 256
 #define MYRI10GE_MAX_SEND_DESC_TSO ((65536 / 2048) * 2)
+#define MYRI10GE_MAX_LRO_DESCRIPTORS 8
 
 #define MYRI10GE_NO_CONFIRM_DATA htonl(0x)
 #define MYRI10GE_NO_RESPONSE_RESULT 0x
@@ -151,6 +155,8 @@ struct myri10ge_rx_done {
dma_addr_t bus;
int cnt;
int idx;
+   struct net_lro_mgr lro_mgr;
+   struct net_lro_desc lro_desc[MYRI10GE_MAX_LRO_DESCRIPTORS];
 };
 
 struct myri10ge_priv {
@@ -276,6 +282,14 @@ static int myri10ge_debug = -1;/* defau
 module_param(myri10ge_debug, int, 0);
 MODULE_PARM_DESC(myri10ge_debug, Debug level (0=none,...,16=all));
 
+static int myri10ge_lro = 1;
+module_param(myri10ge_lro, int, S_IRUGO);
+MODULE_PARM_DESC(myri10ge_lro, Enable large receive offload\n);
+
+static int myri10ge_lro_max_pkts = MYRI10GE_LRO_MAX_PKTS;
+module_param(myri10ge_lro_max_pkts, int, S_IRUGO);
+MODULE_PARM_DESC(myri10ge_lro, Number of LRO packets to be aggregated\n);
+
 static int myri10ge_fill_thresh = 256;
 module_param(myri10ge_fill_thresh, int, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(myri10ge_fill_thresh, Number of empty rx slots allowed\n);
@@ -1019,6 +1033,15 @@ myri10ge_rx_done(struct myri10ge_priv *m
remainder -= MYRI10GE_ALLOC_SIZE;
}
 
+   if (mgp-csum_flag  myri10ge_lro) {
+   rx_frags[0].page_offset += MXGEFW_PAD;
+   rx_frags[0].size -= MXGEFW_PAD;
+   len -= MXGEFW_PAD;
+   lro_receive_frags(mgp-rx_done.lro_mgr, rx_frags,
+ len, len, (void *)(unsigned long)csum, csum);
+   return 1;
+   }
+
hlen = MYRI10GE_HLEN  len ? len : MYRI10GE_HLEN;
 
/* allocate an skb to attach the page(s) to. */
@@ -1137,6 +1160,9 @@ static inline void myri10ge_clean_rx_don
mgp-stats.rx_packets += rx_packets;
mgp-stats.rx_bytes += rx_bytes;
 
+   if (myri10ge_lro)
+   lro_flush_all(rx_done-lro_mgr);
+
/* restock receive rings if needed */
if (mgp-rx_small.fill_cnt - mgp-rx_small.cnt  myri10ge_fill_thresh)
myri10ge_alloc_rx_pages(mgp, mgp-rx_small,
@@ -1378,7 +1404,8 @@ static const char myri10ge_gstrings_stat
dropped_pause, dropped_bad_phy, dropped_bad_crc32,
dropped_unicast_filtered, dropped_multicast_filtered,
dropped_runt, dropped_overrun, dropped_no_small_buffer,
-   dropped_no_big_buffer
+   dropped_no_big_buffer, LRO aggregated, LRO flushed,
+   LRO avg aggr, LRO no_desc
 };
 
 #define MYRI10GE_NET_STATS_LEN  21
@@ -1444,6 +1471,14 @@ myri10ge_get_ethtool_stats(struct net_de
data[i++] = (unsigned int)ntohl(mgp-fw_stats-dropped_overrun);
data[i++] = (unsigned int)ntohl(mgp-fw_stats-dropped_no_small_buffer);
data[i++] = (unsigned int)ntohl(mgp-fw_stats-dropped_no_big_buffer);
+   data[i++] = mgp-rx_done.lro_mgr.stats.aggregated;
+   data[i++] = mgp-rx_done.lro_mgr.stats.flushed;
+   if (mgp-rx_done.lro_mgr.stats.flushed)
+   data[i++] = mgp-rx_done.lro_mgr.stats.aggregated /
+   mgp-rx_done.lro_mgr.stats.flushed;
+   else
+   data[i++] = 0;
+   data[i++] = mgp-rx_done.lro_mgr.stats.no_desc;
 }
 
 static void myri10ge_set_msglevel(struct net_device *netdev, u32 value)
@@ -1717,10 +1752,69 @@ static void myri10ge_free_irq(struct myr
pci_disable_msi(pdev);
 }
 
+static int
+myri10ge_get_frag_header(struct skb_frag_struct *frag, void **mac_hdr,
+void **ip_hdr, void **tcpudp_hdr,
+u64 * hdr_flags, void *priv)
+{
+   struct ethhdr *eh;
+   struct vlan_ethhdr *veh;
+   struct iphdr *iph;
+   u8 *va = page_address(frag-page) + frag-page_offset;
+   unsigned long ll_hlen;
+   __wsum csum = (__wsum) (unsigned long)priv;
+
+   /* find the mac header, aborting if not IPv4 */
+
+   eh = (struct ethhdr *)va;
+   *mac_hdr = eh;
+   ll_hlen = ETH_HLEN;
+   if (eh-h_proto != 

Re: strange tcp behavior

2007-08-03 Thread Simon Arlott
On Fri, August 3, 2007 12:56, Evgeniy Polyakov wrote:
 On Fri, Aug 03, 2007 at 12:21:46PM +0100, Simon Arlott ([EMAIL PROTECTED]) 
 wrote:
 Since the connection is considered closed, couldn't another socket re-use it?

 Socket A: Recv data (unread)
 Socket A: Recv RST
 Socket B: Reuses connection (same IPs/ports)
 Socket A: Close

 Wouldn't that disrupt socket B's use of the connection?

 Then it will drop our data, since there were no appropriate handhsake.

Couldn't the sequence numbers be close enough to make the RST valid?

-- 
Simon Arlott
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Distributed storage.

2007-08-03 Thread Peter Zijlstra
On Fri, 2007-08-03 at 17:49 +0400, Evgeniy Polyakov wrote:
 On Fri, Aug 03, 2007 at 02:27:52PM +0200, Peter Zijlstra ([EMAIL PROTECTED]) 
 wrote:
  On Fri, 2007-08-03 at 14:57 +0400, Evgeniy Polyakov wrote:
  
   For receiving situation is worse, since system does not know in advance
   to which socket given packet will belong to, so it must allocate from
   global pool (and thus there must be independent global reserve), and
   then exchange part of the socket's reserve to the global one (or just
   copy packet to the new one, allocated from socket's reseve is it was
   setup, or drop it otherwise). Global independent reserve is what I
   proposed when stopped to advertise network allocator, but it seems that
   it was not taken into account, and reserve was always allocated only
   when system has serious memory pressure in Peter's patches without any
   meaning for per-socket reservation.
  
  This is not true. I have a global reserve which is set-up a priori. You
  cannot allocate a reserve when under pressure, that does not make sense.
 
 I probably did not cut enough details - my main position is to allocate
 per socket reserve from socket's queue, and copy data there from main
 reserve, all of which are allocated either in advance (global one) or
 per sockoption, so that there would be no fairness issues what to mark 
 as special and what to not.
 
 Say we have a page per socket, each socket can assign a reserve for
 itself from own memory, this accounts both tx and rx side. Tx is not
 interesting, it is simple, rx has global reserve (always allocated on 
 startup or sometime way before reclaim/oom)where data is originally 
 received (including skb, shared info and whatever is needed, page is 
 just an exmaple), then it is copied into per-socket reserve and reused 
 for the next packet. Having per-socket reserve allows to have progress 
 in any situation not only in cases where single action must be 
 received/processed, and allows to be completely fair for all users, but
 not only special sockets, thus admin for example would be allowed to
 login, ipsec would work and so on...


Ah, I think I understand now. Yes this is indeed a good idea!

It would be quite doable to implement this on top of that I already
have. We would need to extend the socket with a sock_opt that would
reserve a specified amount of data for that specific socket. And then on
socket demux check if the socket has a non zero reserve and has not yet
exceeded said reserve. If so, process the packet.

This would also quite neatly work for -rt where we would not want
incomming packet processing to be delayed by memory allocations.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2.6.23-rc1] add xt_statistic.h to the header list for usermode programs

2007-08-03 Thread Chuck Ebbert
Add xt_statistic.h to the list of headers to install.

Apparently needed to build newer versions of iptables.

Signed-off-by: Chuck Ebbert [EMAIL PROTECTED]
---
 include/linux/netfilter/Kbuild |1 +
 1 file changed, 1 insertion(+)

--- linux-2.6.22.noarch.orig/include/linux/netfilter/Kbuild
+++ linux-2.6.22.noarch/include/linux/netfilter/Kbuild
@@ -28,6 +28,7 @@ header-y += xt_policy.h
 header-y += xt_realm.h
 header-y += xt_sctp.h
 header-y += xt_state.h
+header-y += xt_statistic.h
 header-y += xt_string.h
 header-y += xt_tcpmss.h
 header-y += xt_tcpudp.h
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange tcp behavior

2007-08-03 Thread Evgeniy Polyakov
On Fri, Aug 03, 2007 at 05:51:42PM +0100, Simon Arlott ([EMAIL PROTECTED]) 
wrote:

 17:38:03.533589 IP 192.168.7.4.50550  192.168.7.8.2500: R 
 82517592:82517592(0) win 1500 (raw)
 vs
 17:37:38.383085 IP 192.168.7.8.2500  192.168.7.4.50550: R 
 4259643274:4259643274(0) ack 1171836829 win 14360
 What happened there ?

You mean what will happend if second rst (4259643274) is close enough to
first (82517592) to reset the connection? If this will be session hijiking
attack first (known) implemented by Kevin Mitnik. So far things moved
forward and sequence number generation algorithm changed a lot.
It is the same situation, which would happen if you will spam remote
side with RST packets with arbitrary sequence number in hope that it
will reset some connection.


-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/13] dev-priv to netdev_priv(dev), for drivers/net/appletalk

2007-08-03 Thread Yoann Padioleau

Replacing accesses to dev-priv to netdev_priv(dev). The replacment
is safe when netdev_priv is used to access a private structure that is
right next to the net_device structure in memory. Cf
http://groups.google.com/group/comp.os.linux.development.system/browse_thread/thread/de19321bcd94dbb8/0d74a4adcd6177bd
This is the case when the net_device structure was allocated with
a call to alloc_netdev or one of its derivative.

Here is an excerpt of the semantic patch that performs the transformation

@ rule1 @
type T;
struct net_device *dev;
@@

 dev = 
(
alloc_netdev
| 
alloc_etherdev
|
alloc_trdev
)
   (sizeof(T), ...)

@ rule1bis @
struct net_device *dev;
expression E;
@@
 dev-priv = E

@ rule2 depends on rule1  !rule1bis  @
struct net_device *dev;
type rule1.T;
@@

- (T*) dev-priv
+ netdev_priv(dev)

Signed-off-by: Yoann Padioleau [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Cc: netdev@vger.kernel.org
Cc: [EMAIL PROTECTED]
---

 drivers/net/appletalk/ipddp.c |6 +++---
 drivers/net/appletalk/ltpc.c  |8 
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/appletalk/ipddp.c b/drivers/net/appletalk/ipddp.c
index f22e46d..61add0e 100644
--- a/drivers/net/appletalk/ipddp.c
+++ b/drivers/net/appletalk/ipddp.c
@@ -109,7 +109,7 @@ static struct net_device * __init ipddp_
  */
 static struct net_device_stats *ipddp_get_stats(struct net_device *dev)
 {
-return dev-priv;
+return netdev_priv(dev);
 }
 
 /*
@@ -171,8 +171,8 @@ static int ipddp_xmit(struct sk_buff *sk
 
 skb-protocol = htons(ETH_P_ATALK); /* Protocol has changed */
 
-   ((struct net_device_stats *) dev-priv)-tx_packets++;
-((struct net_device_stats *) dev-priv)-tx_bytes+=skb-len;
+   ((struct net_device_stats *)netdev_priv(dev))-tx_packets++;
+((struct net_device_stats *)netdev_priv(dev))-tx_bytes+=skb-len;
 
 if(aarp_send_ddp(rt-dev, skb, rt-at, NULL)  0)
 dev_kfree_skb(skb);
diff --git a/drivers/net/appletalk/ltpc.c b/drivers/net/appletalk/ltpc.c
index 6a6cbd3..be12c6b 100644
--- a/drivers/net/appletalk/ltpc.c
+++ b/drivers/net/appletalk/ltpc.c
@@ -726,7 +726,7 @@ static int sendup_buffer (struct net_dev
int dnode, snode, llaptype, len; 
int sklen;
struct sk_buff *skb;
-   struct net_device_stats *stats = ((struct ltpc_private 
*)dev-priv)-stats;
+   struct net_device_stats *stats = ((struct ltpc_private 
*)netdev_priv(dev))-stats;
struct lt_rcvlap *ltc = (struct lt_rcvlap *) ltdmacbuf;
 
if (ltc-command != LT_RCVLAP) {
@@ -823,7 +823,7 @@ static int ltpc_ioctl(struct net_device 
 {
struct sockaddr_at *sa = (struct sockaddr_at *) ifr-ifr_addr;
/* we'll keep the localtalk node address in dev-pa_addr */
-   struct atalk_addr *aa = ((struct ltpc_private *)dev-priv)-my_addr;
+   struct atalk_addr *aa = ((struct ltpc_private 
*)netdev_priv(dev))-my_addr;
struct lt_init c;
int ltflags;
 
@@ -913,7 +913,7 @@ static int ltpc_xmit(struct sk_buff *skb
 * and skb-len is the length of the ddp data + ddp header
 */
 
-   struct net_device_stats *stats = ((struct ltpc_private 
*)dev-priv)-stats;
+   struct net_device_stats *stats = ((struct ltpc_private 
*)netdev_priv(dev))-stats;
 
int i;
struct lt_sendlap cbuf;
@@ -952,7 +952,7 @@ static int ltpc_xmit(struct sk_buff *skb
 
 static struct net_device_stats *ltpc_get_stats(struct net_device *dev)
 {
-   struct net_device_stats *stats = ((struct ltpc_private *) 
dev-priv)-stats;
+   struct net_device_stats *stats = ((struct ltpc_private 
*)netdev_priv(dev))-stats;
return stats;
 }
 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/13] dev-priv to netdev_priv(dev), for drivers/net/tokenring

2007-08-03 Thread Yoann Padioleau

Replacing accesses to dev-priv to netdev_priv(dev). The replacment
is safe when netdev_priv is used to access a private structure that is
right next to the net_device structure in memory. Cf
http://groups.google.com/group/comp.os.linux.development.system/browse_thread/thread/de19321bcd94dbb8/0d74a4adcd6177bd
This is the case when the net_device structure was allocated with
a call to alloc_netdev or one of its derivative.

Here is an excerpt of the semantic patch that performs the transformation

@ rule1 @
type T;
struct net_device *dev;
@@

 dev = 
(
alloc_netdev
| 
alloc_etherdev
|
alloc_trdev
)
   (sizeof(T), ...)

@ rule1bis @
struct net_device *dev;
expression E;
@@
 dev-priv = E

@ rule2 depends on rule1  !rule1bis  @
struct net_device *dev;
type rule1.T;
@@

- (T*) dev-priv
+ netdev_priv(dev)

Signed-off-by: Yoann Padioleau [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Cc: netdev@vger.kernel.org
Cc: [EMAIL PROTECTED]
---

 drivers/net/tokenring/3c359.c   |   58 ++--
 drivers/net/tokenring/ibmtr.c   |   38 +++
 drivers/net/tokenring/lanstreamer.c |   32 +--
 drivers/net/tokenring/madgemc.c |4 +-
 drivers/net/tokenring/olympic.c |   36 +++---
 drivers/net/tokenring/tmspci.c  |4 +-
 6 files changed, 86 insertions(+), 86 deletions(-)

diff --git a/drivers/net/tokenring/3c359.c b/drivers/net/tokenring/3c359.c
index 9f1b6ab..a8573da 100644
--- a/drivers/net/tokenring/3c359.c
+++ b/drivers/net/tokenring/3c359.c
@@ -156,7 +156,7 @@ static void print_rx_state(struct net_de
 static void print_tx_state(struct net_device *dev)
 {
 
-   struct xl_private *xl_priv = (struct xl_private *)dev-priv ; 
+   struct xl_private *xl_priv = netdev_priv(dev) ; 
struct xl_tx_desc *txd ; 
u8 __iomem *xl_mmio = xl_priv-xl_mmio ; 
int i ; 
@@ -179,7 +179,7 @@ static void print_tx_state(struct net_de
 static void print_rx_state(struct net_device *dev)
 {
 
-   struct xl_private *xl_priv = (struct xl_private *)dev-priv ; 
+   struct xl_private *xl_priv = netdev_priv(dev) ; 
struct xl_rx_desc *rxd ; 
u8 __iomem *xl_mmio = xl_priv-xl_mmio ; 
int i ; 
@@ -213,7 +213,7 @@ #endif
 
 static u16 xl_ee_read(struct net_device *dev, int ee_addr)
 { 
-   struct xl_private *xl_priv = (struct xl_private *)dev-priv ;
+   struct xl_private *xl_priv = netdev_priv(dev) ;
u8 __iomem *xl_mmio = xl_priv-xl_mmio ; 
 
/* Wait for EEProm to not be busy */
@@ -245,7 +245,7 @@ static u16 xl_ee_read(struct net_device 
 
 static void  xl_ee_write(struct net_device *dev, int ee_addr, u16 ee_value) 
 {
-   struct xl_private *xl_priv = (struct xl_private *)dev-priv ;
+   struct xl_private *xl_priv = netdev_priv(dev) ;
u8 __iomem *xl_mmio = xl_priv-xl_mmio ; 
 
/* Wait for EEProm to not be busy */
@@ -305,11 +305,11 @@ static int __devinit xl_probe(struct pci
pci_release_regions(pdev) ; 
return -ENOMEM ; 
} 
-   xl_priv = dev-priv ; 
+   xl_priv = netdev_priv(dev) ; 
 
 #if XL_DEBUG  
printk(pci_device: %p, dev:%p, dev-priv: %p, ba[0]: %10x, 
ba[1]:%10x\n, 
-   pdev, dev, dev-priv, (unsigned int)pdev-resource[0].start, 
(unsigned int)pdev-resource[1].start) ;  
+   pdev, dev, netdev_priv(dev), (unsigned 
int)pdev-resource[0].start, (unsigned int)pdev-resource[1].start) ;  
 #endif 
 
dev-irq=pdev-irq;
@@ -365,7 +365,7 @@ #endif 
 
 static int __devinit xl_init(struct net_device *dev) 
 {
-   struct xl_private *xl_priv = (struct xl_private *)dev-priv ;
+   struct xl_private *xl_priv = netdev_priv(dev) ;
 
printk(KERN_INFO %s \n, version);
printk(KERN_INFO %s: I/O at %hx, MMIO at %p, using irq %d\n,
@@ -385,7 +385,7 @@ static int __devinit xl_init(struct net_
 
 static int xl_hw_reset(struct net_device *dev) 
 { 
-   struct xl_private *xl_priv = (struct xl_private *)dev-priv ;
+   struct xl_private *xl_priv = netdev_priv(dev) ;
u8 __iomem *xl_mmio = xl_priv-xl_mmio ; 
unsigned long t ; 
u16 i ; 
@@ -568,7 +568,7 @@ #endif
 
 static int xl_open(struct net_device *dev) 
 {
-   struct xl_private *xl_priv=(struct xl_private *)dev-priv;
+   struct xl_private *xl_priv=netdev_priv(dev);
u8 __iomem *xl_mmio = xl_priv-xl_mmio ; 
u8 i ; 
u16 hwaddr[3] ; /* Should be u8[6] but we get word return values */
@@ -726,7 +726,7 @@ static int xl_open(struct net_device *de
 
 static int xl_open_hw(struct net_device *dev) 
 { 
-   struct xl_private *xl_priv=(struct xl_private *)dev-priv;
+   struct xl_private *xl_priv=netdev_priv(dev);
u8 __iomem *xl_mmio = xl_priv-xl_mmio ; 
u16 vsoff ;
char ver_str[33];  
@@ -875,7 +875,7 @@ static int xl_open_hw(struct net_device 
 
 static void adv_rx_ring(struct net_device *dev) /* Advance 

Re: [patch 0/5][RFC] Update network drivers to use devres

2007-08-03 Thread Brandon Philips
On 14:44 Fri 03 Aug 2007, Stephen Hemminger wrote:
 On Fri, 03 Aug 2007 20:33:04 +0900 Tejun Heo [EMAIL PROTECTED] wrote:
   Devres makes low level drivers simpler, easier to get right and
   maintain.  Writing new drivers becomes easier too.  So, why not?
  
   Network devices seem to work fine thanks, and the resource requirements
   are different. If ain't broke, don't fix it.
   Care to enlighten me on how the resource requirments are different
   from ATA drivers?
   
   I was thinking of the hot remove (no mod ref counts) and lingering
   /sys open issues.  ATA drivers use ref counts.
  
  I guess the hot removing is done by severing netdev from the actual
  device, right?  I don't see how that affects usage of devres on network
  drivers.  Am I missing something?
 
 The issue is that device may be removed at any time. So you can't rely
 on module ref counts to save you. And netdevice structure must still
 linger after module is removed, till dev ref count goes to zero.

These patches allow the net_device to linger.  The code calls
free_netdev on device removal just as before.

This is how the net_device is handled on device removal by these
patches:

+static void devm_free_netdev(struct device *gendev, void *res)
+{
+   struct net_device *dev = dev_get_drvdata(gendev);
+   free_netdev(dev);
+}

  On a separate note, can you explain lingering /sys open issue to me a
  bit?  With recent sysfs changes, sysfs nodes are disconnected
  immediately on deletion.  Would that make any difference to netdevs?
 
 Examples are in Documentation/networking/netdevices.txt

Isn't this the same problem as above?  The net_device structure must
stay around if there are still references to it and it does.  

Or am I missing something?

Thanks,

Brandon
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange tcp behavior

2007-08-03 Thread Simon Arlott
On 03/08/07 18:39, Evgeniy Polyakov wrote:
 On Fri, Aug 03, 2007 at 05:51:42PM +0100, Simon Arlott ([EMAIL PROTECTED]) 
 wrote:
 
 17:38:03.533589 IP 192.168.7.4.50550  192.168.7.8.2500: R 
 82517592:82517592(0) win 1500 (raw)
 vs
 17:37:38.383085 IP 192.168.7.8.2500  192.168.7.4.50550: R 
 4259643274:4259643274(0) ack 1171836829 win 14360
 What happened there ?

Erm... you seem to have removed parts of my message in a way that doesn't 
make sense...

On Fri, Aug 03, 2007 at 05:51:42PM +0100, Simon Arlott wrote:
 17:38:04.536277 IP 192.168.7.8.2500  192.168.7.4.50550: R 1:1(0) ack 17 win 
 14360
 vs
 17:37:38.383085 IP 192.168.7.8.2500  192.168.7.4.50550: R 
 4259643274:4259643274(0) ack 1171836829 win 14360
 What happened there ?

The first one is the RST sent when the connection is close()d without 
reading, and the second one is the same RST but after other connection 
has been made on the same ports using a different socket.

 It is the same situation, which would happen if you will spam remote
 side with RST packets with arbitrary sequence number in hope that it
 will reset some connection.

Isn't it still possible that the connection that got reset is left open 
(possibly for days) until another connection using the same ports is 
using roughly the same sequence numbers?

-- 
Simon Arlott
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Distributed storage.

2007-08-03 Thread Daniel Phillips
On Friday 03 August 2007 06:49, Evgeniy Polyakov wrote:
 ...rx has global reserve (always allocated on
 startup or sometime way before reclaim/oom)where data is originally
 received (including skb, shared info and whatever is needed, page is
 just an exmaple), then it is copied into per-socket reserve and
 reused for the next packet. Having per-socket reserve allows to have
 progress in any situation not only in cases where single action must
 be received/processed, and allows to be completely fair for all
 users, but not only special sockets, thus admin for example would be
 allowed to login, ipsec would work and so on...

And when the global reserve is entirely used up your system goes back to 
dropping vm writeout acknowledgements, not so good.  I like your 
approach, and specifically the copying idea cuts out considerable 
complexity.  But I believe the per-socket flag to mark a socket as part 
of the vm writeout path is not optional, and in this case it will be a 
better world if it is a slightly unfair world in favor of vm writeout 
traffic.

Ssh will still work fine even with vm getting priority access to the 
pool.  During memory crunches, non-vm ssh traffic may get bumped till 
after the crunch, but vm writeout is never supposed to hog the whole 
machine.  If vm writeout hogs your machine long enough to delay an ssh 
login then that is a vm bug and should be fixed at that level.

Regards,

Daniel
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Distributed storage.

2007-08-03 Thread Daniel Phillips
On Friday 03 August 2007 07:53, Peter Zijlstra wrote:
 On Fri, 2007-08-03 at 17:49 +0400, Evgeniy Polyakov wrote:
  On Fri, Aug 03, 2007 at 02:27:52PM +0200, Peter Zijlstra wrote:
  ...my main position is to
  allocate per socket reserve from socket's queue, and copy data
  there from main reserve, all of which are allocated either in
  advance (global one) or per sockoption, so that there would be no
  fairness issues what to mark as special and what to not.
 
  Say we have a page per socket, each socket can assign a reserve for
  itself from own memory, this accounts both tx and rx side. Tx is
  not interesting, it is simple, rx has global reserve (always
  allocated on startup or sometime way before reclaim/oom)where data
  is originally received (including skb, shared info and whatever is
  needed, page is just an exmaple), then it is copied into per-socket
  reserve and reused for the next packet. Having per-socket reserve
  allows to have progress in any situation not only in cases where
  single action must be received/processed, and allows to be
  completely fair for all users, but not only special sockets, thus
  admin for example would be allowed to login, ipsec would work and
  so on...

 Ah, I think I understand now. Yes this is indeed a good idea!

 It would be quite doable to implement this on top of that I already
 have. We would need to extend the socket with a sock_opt that would
 reserve a specified amount of data for that specific socket. And then
 on socket demux check if the socket has a non zero reserve and has
 not yet exceeded said reserve. If so, process the packet.

 This would also quite neatly work for -rt where we would not want
 incomming packet processing to be delayed by memory allocations.

At this point we need anything that works in mainline as a starting 
point.  By erring on the side of simplicity we can make this 
understandable for folks who haven't spent the last two years wallowing 
in it.  The page per socket approach is about as simple as it gets.  I 
therefore propose we save our premature optimizations for later.

It will also help our cause if we keep any new internal APIs to strictly 
what is needed to make deadlock go away.  Not a whole lot more than 
just the flag to mark a socket as part of the vm writeout path when you 
get right down to essentials.

Regards,

Daniel
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange tcp behavior

2007-08-03 Thread David Miller
From: Evgeniy Polyakov [EMAIL PROTECTED]
Date: Fri, 3 Aug 2007 12:22:42 +0400

 On Thu, Aug 02, 2007 at 07:21:34PM -0700, David Miller ([EMAIL PROTECTED]) 
 wrote:
  What in the world are we doing allowing stream sockets to autobind?
  That is totally bogus.  Even if we autobind, that won't make a connect
  happen.
 
 For accepted socket it is perfectly valid assumption - we could autobind 
 it during the first send. Or may bind it during accept. Its a matter of
 taste I think. Autobinding during first sending can end up being a 
 protection against DoS in some obscure rare case...

accept()ed socket is by definition fully bound and already in
established state.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lro: eHEA example how to use LRO

2007-08-03 Thread Kok, Auke

Jan-Bernd Themann wrote:

This patch shows how the generic LRO interface is used for SKB mode

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---
 drivers/net/Kconfig |1 +
 drivers/net/ehea/ehea.h |9 -
 drivers/net/ehea/ehea_ethtool.c |   15 +++
 drivers/net/ehea/ehea_main.c|   84 +++---
 4 files changed, 101 insertions(+), 8 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index f8a602c..fec4004 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig


snip


+module_param(use_lro, int, 0);


Have you looked at my generic lro get/set patch that I posted this week? this 
adds a useless module parameter while ethtool has all the structure already to 
accomodate setting lro on/off.


Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lro: myri10ge example how to use LRO

2007-08-03 Thread Kok, Auke

Andrew Gallatin wrote:

To follow up on Jan-Bernd Themann's LRO patch earlier today,
this patch shows how the generic LRO interface can be used for
page based drivers.

Again, many thanks to Jan-Bernd Themann for leading this effort.

Drew

Singed off by: Andrew Gallatin [EMAIL PROTECTED]




please take a look at my lro patch for ethtool and see if it works for you, 
instead of adding another generic module parameter that doesn't need to be there.


Thanks.

Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lro: myri10ge example how to use LRO

2007-08-03 Thread Andrew Gallatin

Kok, Auke wrote:

Andrew Gallatin wrote:

To follow up on Jan-Bernd Themann's LRO patch earlier today,
this patch shows how the generic LRO interface can be used for
page based drivers.

Again, many thanks to Jan-Bernd Themann for leading this effort.

Drew

Singed off by: Andrew Gallatin [EMAIL PROTECTED]




please take a look at my lro patch for ethtool and see if it works for 
you, instead of adding another generic module parameter that doesn't 
need to be there.


That looks very nice, and will indeed work for me.

Thanks,

Drew
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[BNX2]: Fix suspend/resume problem.

2007-08-03 Thread Michael Chan
[BNX2]: Fix suspend/resume problem.

The device would not resume properly if it was shutdown before the system
was suspended.  In such scenario where the netif_running state is 0,
bnx2_suspend() would not save the PCI state and so the memory enable bit
and bus master enable bit would be lost.

We fix this by always saving and restoring the PCI state in
bnx2_suspend() and bnx2_resume() regardless of netif_running() state.

Update version to 1.6.4.

Signed-off-by: Michael Chan [EMAIL PROTECTED]

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index d53dfc5..24e7f9a 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -54,8 +54,8 @@
 
 #define DRV_MODULE_NAMEbnx2
 #define PFX DRV_MODULE_NAME: 
-#define DRV_MODULE_VERSION 1.6.3
-#define DRV_MODULE_RELDATE July 16, 2007
+#define DRV_MODULE_VERSION 1.6.4
+#define DRV_MODULE_RELDATE August 3, 2007
 
 #define RUN_AT(x) (jiffies + (x))
 
@@ -6937,6 +6937,11 @@ bnx2_suspend(struct pci_dev *pdev, pm_message_t state)
struct bnx2 *bp = netdev_priv(dev);
u32 reset_code;
 
+   /* PCI register 4 needs to be saved whether netif_running() or not.
+* MSI address and data need to be saved if using MSI and
+* netif_running().
+*/
+   pci_save_state(pdev);
if (!netif_running(dev))
return 0;
 
@@ -6952,7 +6957,6 @@ bnx2_suspend(struct pci_dev *pdev, pm_message_t state)
reset_code = BNX2_DRV_MSG_CODE_SUSPEND_NO_WOL;
bnx2_reset_chip(bp, reset_code);
bnx2_free_skbs(bp);
-   pci_save_state(pdev);
bnx2_set_power_state(bp, pci_choose_state(pdev, state));
return 0;
 }
@@ -6963,10 +6967,10 @@ bnx2_resume(struct pci_dev *pdev)
struct net_device *dev = pci_get_drvdata(pdev);
struct bnx2 *bp = netdev_priv(dev);
 
+   pci_restore_state(pdev);
if (!netif_running(dev))
return 0;
 
-   pci_restore_state(pdev);
bnx2_set_power_state(bp, PCI_D0);
netif_device_attach(dev);
bnx2_init_nic(bp);


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange tcp behavior

2007-08-03 Thread David Miller
From: Evgeniy Polyakov [EMAIL PROTECTED]
Date: Fri, 3 Aug 2007 12:22:42 +0400

 Maybe recvmsg should be changed too for symmetry?

I took a look at this, and it's not %100 trivial.

Let's do this later, and only sendmsg for now in order to
fix the bug in the stable branches.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netdevice queueing / sendmsg issue?

2007-08-03 Thread Krzysztof Halasa
David Miller [EMAIL PROTECTED] writes:

 Software interrupts might be getting lost, dev_kfree_skb_irq() has to
 queue the kfree_skb() to soft IRQ.

 Therefore, dev_kfree_skb_irq() will only work properly from hardware
 interrupt context, where we will return and thus run the scheduled
 software interrupt.

Problem solved, stupid user mistake.
I was using netif_start_queue() instead of netif_wake_queue().
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Distributed storage.

2007-08-03 Thread Daniel Phillips
Hi Evgeniy,

Nit alert:

On Tuesday 31 July 2007 10:13, Evgeniy Polyakov wrote:
 * storage can be formed on top of remote nodes and be exported
   simultaneously (iSCSI is peer-to-peer only, NBD requires device
   mapper and is synchronous)

In fact, NBD has nothing to do with device mapper.  I use it as a 
physical target underneath ddraid (a device mapper plugin) just like I 
would use your DST if it proves out.

Regards,

Daniel
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Distributed storage.

2007-08-03 Thread Daniel Phillips
Hi Mike,

On Thursday 02 August 2007 21:09, Mike Snitzer wrote:
 But NBD's synchronous nature is actually an asset when coupled with
 MD raid1 as it provides guarantees that the data has _really_ been
 mirrored remotely.

And bio completion doesn't?

Regards,

Daniel
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Distributed storage.

2007-08-03 Thread Daniel Phillips
On Friday 03 August 2007 03:26, Evgeniy Polyakov wrote:
 On Thu, Aug 02, 2007 at 02:08:24PM -0700, I wrote:
  I see bits that worry me, e.g.:
 
  +   req = mempool_alloc(st-w-req_pool, GFP_NOIO);
 
  which seems to be callable in response to a local request, just the
  case where NBD deadlocks.  Your mempool strategy can work reliably
  only if you can prove that the pool allocations of the maximum
  number of requests you can have in flight do not exceed the size of
  the pool.  In other words, if you ever take the pool's fallback
  path to normal allocation, you risk deadlock.

 mempool should be allocated to be able to catch up with maximum
 in-flight requests, in my tests I was unable to force block layer to
 put more than 31 pages in sync, but in one bio. Each request is
 essentially dealyed bio processing, so this must handle maximum
 number of in-flight bios (if they do not cover multiple nodes, if
 they do, then each node requires own request).

It depends on the characteristics of the physical and virtual block 
devices involved.  Slow block devices can produce surprising effects.  
Ddsnap still qualifies as slow under certain circumstances (big 
linear write immediately following a new snapshot). Before we added 
throttling we would see as many as 800,000 bios in flight.  Nice to 
know the system can actually survive this... mostly.  But memory 
deadlock is a clear and present danger under those conditions and we 
did hit it (not to mention that read latency sucked beyond belief). 

Anyway, we added a simple counting semaphore to throttle the bio traffic 
to a reasonable number and behavior became much nicer, but most 
importantly, this satisfies one of the primary requirements for 
avoiding block device memory deadlock: a strictly bounded amount of bio 
traffic in flight.  In fact, we allow some bounded number of 
non-memalloc bios *plus* however much traffic the mm wants to throw at 
us in memalloc mode, on the assumption that the mm knows what it is 
doing and imposes its own bound of in flight bios per device.   This 
needs auditing obviously, but the mm either does that or is buggy.  In 
practice, with this throttling in place we never saw more than 2,000 in 
flight no matter how hard we hit it, which is about the number we were 
aiming at.  Since we draw our reserve from the main memalloc pool, we 
can easily handle 2,000 bios in flight, even under extreme conditions.

See:
http://zumastor.googlecode.com/svn/trunk/ddsnap/kernel/dm-ddsnap.c
down(info-throttle_sem);

To be sure, I am not very proud of this throttling mechanism for various 
reasons, but the thing is, _any_ throttling mechanism no matter how 
sucky solves the deadlock problem.  Over time I want to move the 
throttling up into bio submission proper, or perhaps incorporate it in 
device mapper's queue function, not quite as high up the food chain.  
Only some stupid little logistical issues stopped me from doing it one 
of those ways right from the start.   I think Peter has also tried some 
things in this area.  Anyway, that part is not pressing because the 
throttling can be done in the virtual device itself as we do it, even 
if it is not very pretty there.  The point is: you have to throttle the 
bio traffic.  The alternative is to die a horrible death under 
conditions that may be rare, but _will_ hit somebody.

Regards,

Daniel
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 0/7] CAN: Add new PF_CAN protocol family, try #5

2007-08-03 Thread Urs Thuermann
Hello Dave,

this is the fifth post of the patch series that adds the PF_CAN
protocol family for the Controller Area Network.

Since our last post we have changed the following:

* Remove slab destructor from calls to kmem_cache_alloc().
* Add comments about types defined in can.h.
* Update comment on vcan loopback module parameter.
* Fix typo in documentation.

The changes in try #4 were:

* Change vcan network driver to use the new RTNL API, as suggested by
  Patrick.
* Revert our change to use skb-iif instead of skb-cb.  After
  discussion with Patrick and Jamal it turned out, our first
  implementation was correct.
* Use skb_tail_pointer() instead of skb-tail directly.
* Coding style changes to satisfy linux/scripts/checkpatch.pl.
* Minor changes for 64-bit-cleanliness.
* Minor cleanup of #include's

The changes in try #3 were:

* Use sbk-sk and skb-pkt_type instead of skb-cb to pass loopback
  flags and originating socket down to the driver and back to the
  receiving socket.  Thanks to Patrick McHardy for pointing out our
  wrong use of sbk-cb.
* Use skb-iif instead of skb-cb to pass receiving interface from
  raw_rcv() and bcm_rcv() up to raw_recvmsg() and bcm_recvmsg().
* Set skb-protocol when sending CAN frames to netdevices.
* Removed struct raw_opt and struct bcm_opt and integrated these
  directly into struct raw_sock and bcm_sock resp., like most other
  proto implementations do.
* We have found and fixed race conditions between raw_bind(),
  raw_{set,get}sockopt() and raw_notifier().  This resulted in
  - complete removal of our own notifier list infrastructure in
af_can.c.  raw.c and bcm.c now use normal netdevice notifiers.
  - removal of ro-lock spinlock.  We use lock_sock(sk) now.
  - changed deletion of dev_rcv_lists, which are now marked for
deletion in the netdevice notifier in af_can.c and are actually
deleted when all entries have been deleted using can_rx_unregister().
* Follow changes in 2.6.22 (e.g. ktime_t timestamps in skb).
* Removed obsolete code from vcan.c, as pointed out by Stephen Hemminger.

The changes in try #2 were:

* reduced RCU callback overhead when deleting receiver lists (thx to
  feedback from Paul E. McKenney).
* eliminated some code duplication in net/can/proc.c.
* renamed slock-29 and sk_lock-29 to slock-AF_CAN and sk_lock-AF_CAN in
  net/core/sock.c
* added entry for can.txt in Documentation/networking/00-INDEX
* added error frame definitions in include/linux/can/error.h, which are to
  be used by CAN network drivers.


This patch series applies against net-2.6 and is derived from Subversion
revision r455 of http://svn.berlios.de/svnroot/repos/socketcan.
It can be found in the directory
http://svn.berlios.de/svnroot/repos/socketcan/trunk/patch-series/version.

This patch doesn't touch anything in the kernel except for the allocation
of a couple of numbers for protocol, arp hw type, and a line discipline.

Please review this patch series for integration into your tree.

Thanks very much for your work!

Best regards,

Urs Thuermann
Oliver Hartkopp

--
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 6/7] CAN: Add maintainer entries

2007-08-03 Thread Urs Thuermann
This patch adds entries in the CREDITS and MAINTAINERS file for CAN.

Signed-off-by: Oliver Hartkopp [EMAIL PROTECTED]
Signed-off-by: Urs Thuermann [EMAIL PROTECTED]

---
 CREDITS |   16 
 MAINTAINERS |9 +
 2 files changed, 25 insertions(+)

Index: net-2.6/CREDITS
===
--- net-2.6.orig/CREDITS2007-08-03 11:21:31.0 +0200
+++ net-2.6/CREDITS 2007-08-03 11:21:56.0 +0200
@@ -1331,6 +1331,14 @@
 S: 5623 HZ Eindhoven
 S: The Netherlands
 
+N: Oliver Hartkopp
+E: [EMAIL PROTECTED]
+W: http://www.volkswagen.de
+D: Controller Area Network (network layer core)
+S: Brieffach 1776
+S: 38436 Wolfsburg
+S: Germany
+
 N: Andrew Haylett
 E: [EMAIL PROTECTED]
 D: Selection mechanism
@@ -3284,6 +3292,14 @@
 S: F-35042 Rennes Cedex
 S: France
 
+N: Urs Thuermann
+E: [EMAIL PROTECTED]
+W: http://www.volkswagen.de
+D: Controller Area Network (network layer core)
+S: Brieffach 1776
+S: 38436 Wolfsburg
+S: Germany
+
 N: Jon Tombs
 E: [EMAIL PROTECTED]
 W: http://www.esi.us.es/~jon
Index: net-2.6/MAINTAINERS
===
--- net-2.6.orig/MAINTAINERS2007-08-03 11:21:31.0 +0200
+++ net-2.6/MAINTAINERS 2007-08-03 11:21:56.0 +0200
@@ -951,6 +951,15 @@
 L: [EMAIL PROTECTED]
 S: Maintained
 
+CAN NETWORK LAYER
+P: Urs Thuermann
+M: [EMAIL PROTECTED]
+P: Oliver Hartkopp
+M: [EMAIL PROTECTED]
+L: [EMAIL PROTECTED]
+W: http://developer.berlios.de/projects/socketcan/
+S: Maintained
+
 CALGARY x86-64 IOMMU
 P: Muli Ben-Yehuda
 M: [EMAIL PROTECTED]

--
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 5/7] CAN: Add virtual CAN netdevice driver

2007-08-03 Thread Urs Thuermann
This patch adds the virtual CAN bus (vcan) network driver.
The vcan device is just a loopback device for CAN frames, no
real CAN hardware is involved.

Signed-off-by: Oliver Hartkopp [EMAIL PROTECTED]
Signed-off-by: Urs Thuermann [EMAIL PROTECTED]

---
 drivers/net/Makefile |1 
 drivers/net/can/Kconfig  |   25 
 drivers/net/can/Makefile |5 
 drivers/net/can/vcan.c   |  261 +++
 net/can/Kconfig  |3 
 5 files changed, 295 insertions(+)

Index: net-2.6/drivers/net/Makefile
===
--- net-2.6.orig/drivers/net/Makefile   2007-08-03 11:21:31.0 +0200
+++ net-2.6/drivers/net/Makefile2007-08-03 11:21:54.0 +0200
@@ -8,6 +8,7 @@
 obj-$(CONFIG_CHELSIO_T1) += chelsio/
 obj-$(CONFIG_CHELSIO_T3) += cxgb3/
 obj-$(CONFIG_EHEA) += ehea/
+obj-$(CONFIG_CAN) += can/
 obj-$(CONFIG_BONDING) += bonding/
 obj-$(CONFIG_ATL1) += atl1/
 obj-$(CONFIG_GIANFAR) += gianfar_driver.o
Index: net-2.6/drivers/net/can/Kconfig
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ net-2.6/drivers/net/can/Kconfig 2007-08-03 11:21:54.0 +0200
@@ -0,0 +1,25 @@
+menu CAN Device Drivers
+   depends on CAN
+
+config CAN_VCAN
+   tristate Virtual Local CAN Interface (vcan)
+   depends on CAN
+   default N
+   ---help---
+ Similar to the network loopback devices, vcan offers a
+ virtual local CAN interface.
+
+ This driver can also be built as a module.  If so, the module
+ will be called vcan.
+
+config CAN_DEBUG_DEVICES
+   bool CAN devices debugging messages
+   depends on CAN
+   default N
+   ---help---
+ Say Y here if you want the CAN device drivers to produce a bunch of
+ debug messages to the system log.  Select this if you are having
+ a problem with CAN support and want to see more of what is going
+ on.
+
+endmenu
Index: net-2.6/drivers/net/can/Makefile
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ net-2.6/drivers/net/can/Makefile2007-08-03 11:21:54.0 +0200
@@ -0,0 +1,5 @@
+#
+#  Makefile for the Linux Controller Area Network drivers.
+#
+
+obj-$(CONFIG_CAN_VCAN) += vcan.o
Index: net-2.6/drivers/net/can/vcan.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ net-2.6/drivers/net/can/vcan.c  2007-08-03 11:21:54.0 +0200
@@ -0,0 +1,261 @@
+/*
+ * vcan.c - Virtual CAN interface
+ *
+ * Copyright (c) 2002-2007 Volkswagen Group Electronic Research
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions, the following disclaimer and
+ *the referenced file 'COPYING'.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of Volkswagen nor the names of its contributors
+ *may be used to endorse or promote products derived from this software
+ *without specific prior written permission.
+ *
+ * Alternatively, provided that this notice is retained in full, this
+ * software may be distributed under the terms of the GNU General
+ * Public License (GPL) version 2 as distributed in the 'COPYING'
+ * file from the main directory of the linux kernel source.
+ *
+ * The provided data structures and external interfaces from this code
+ * are not restricted to be used by modules with a GPL compatible license.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
+ * DAMAGE.
+ *
+ * Send feedback to [EMAIL PROTECTED]
+ *
+ */
+
+#include linux/module.h
+#include linux/init.h
+#include linux/netdevice.h
+#include linux/if_arp.h
+#include linux/if_ether.h
+#include linux/can.h
+#include 

[patch 3/7] CAN: Add raw protocol

2007-08-03 Thread Urs Thuermann
This patch adds the CAN raw protocol.

Signed-off-by: Oliver Hartkopp [EMAIL PROTECTED]
Signed-off-by: Urs Thuermann [EMAIL PROTECTED]

---
 include/linux/can/raw.h |   31 +
 net/can/Kconfig |   26 +
 net/can/Makefile|3 
 net/can/raw.c   |  757 
 4 files changed, 817 insertions(+)

Index: net-2.6/include/linux/can/raw.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ net-2.6/include/linux/can/raw.h 2007-08-03 11:21:48.0 +0200
@@ -0,0 +1,31 @@
+/*
+ * linux/can/raw.h
+ *
+ * Definitions for raw CAN sockets
+ *
+ * Authors: Oliver Hartkopp [EMAIL PROTECTED]
+ *  Urs Thuermann   [EMAIL PROTECTED]
+ * Copyright (c) 2002-2007 Volkswagen Group Electronic Research
+ * All rights reserved.
+ *
+ * Send feedback to [EMAIL PROTECTED]
+ *
+ */
+
+#ifndef CAN_RAW_H
+#define CAN_RAW_H
+
+#include linux/can.h
+
+#define SOL_CAN_RAW (SOL_CAN_BASE + CAN_RAW)
+
+/* for socket options affecting the socket (not the global system) */
+
+enum {
+   CAN_RAW_FILTER = 1, /* set 0 .. n can_filter(s)  */
+   CAN_RAW_ERR_FILTER, /* set filter for error frames   */
+   CAN_RAW_LOOPBACK,   /* local loopback (default:on)   */
+   CAN_RAW_RECV_OWN_MSGS   /* receive my own msgs (default:off) */
+};
+
+#endif
Index: net-2.6/net/can/Kconfig
===
--- net-2.6.orig/net/can/Kconfig2007-08-03 11:21:46.0 +0200
+++ net-2.6/net/can/Kconfig 2007-08-03 11:21:48.0 +0200
@@ -16,6 +16,32 @@
  If you want CAN support, you should say Y here and also to the
  specific driver for your controller(s) below.
 
+config CAN_RAW
+   tristate Raw CAN Protocol (raw access with CAN-ID filtering)
+   depends on CAN
+   default N
+   ---help---
+ The Raw CAN protocol option offers access to the CAN bus via
+ the BSD socket API. You probably want to use the raw socket in
+ most cases where no higher level protocol is being used. The raw
+ socket has several filter options e.g. ID-Masking / Errorframes.
+ To receive/send raw CAN messages, use AF_CAN with protocol CAN_RAW.
+
+config CAN_RAW_USER
+   bool Allow non-root users to access Raw CAN Protocol sockets
+   depends on CAN_RAW
+   default N
+   ---help---
+ The Controller Area Network is a local field bus transmitting only
+ broadcast messages without any routing and security concepts.
+ In the majority of cases the user application has to deal with
+ raw CAN frames. Therefore it might be reasonable NOT to restrict
+ the CAN access only to the user root, as known from other networks.
+ Since CAN_RAW sockets can only send and receive frames to/from CAN
+ interfaces this does not affect security of others networks.
+ Say Y here if you want non-root users to be able to access CAN_RAW
+ sockets.
+
 config CAN_DEBUG_CORE
bool CAN Core debugging messages
depends on CAN
Index: net-2.6/net/can/Makefile
===
--- net-2.6.orig/net/can/Makefile   2007-08-03 11:21:46.0 +0200
+++ net-2.6/net/can/Makefile2007-08-03 11:21:48.0 +0200
@@ -4,3 +4,6 @@
 
 obj-$(CONFIG_CAN)  += can.o
 can-objs   := af_can.o proc.o
+
+obj-$(CONFIG_CAN_RAW)  += can-raw.o
+can-raw-objs   := raw.o
Index: net-2.6/net/can/raw.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ net-2.6/net/can/raw.c   2007-08-03 11:21:48.0 +0200
@@ -0,0 +1,757 @@
+/*
+ * raw.c - Raw sockets for protocol family CAN
+ *
+ * Copyright (c) 2002-2007 Volkswagen Group Electronic Research
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions, the following disclaimer and
+ *the referenced file 'COPYING'.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of Volkswagen nor the names of its contributors
+ *may be used to endorse or promote products derived from this software
+ *without specific prior written permission.
+ *
+ * Alternatively, provided that this notice is retained in full, this
+ * software may be distributed under the terms of the GNU General
+ * Public License (GPL) version 2 as distributed in the 'COPYING'
+ * file from the main directory of the linux kernel 

[patch 1/7] CAN: Allocate protocol numbers for PF_CAN

2007-08-03 Thread Urs Thuermann
This patch adds a protocol/address family number, ARP hardware type,
ethernet packet type, and a line discipline number for the SocketCAN
implementation.

Signed-off-by: Oliver Hartkopp [EMAIL PROTECTED]
Signed-off-by: Urs Thuermann [EMAIL PROTECTED]

---
 include/linux/if_arp.h   |1 +
 include/linux/if_ether.h |1 +
 include/linux/socket.h   |2 ++
 include/linux/tty.h  |3 ++-
 net/core/sock.c  |4 ++--
 5 files changed, 8 insertions(+), 3 deletions(-)

Index: net-2.6/include/linux/if_arp.h
===
--- net-2.6.orig/include/linux/if_arp.h 2007-08-03 11:21:32.0 +0200
+++ net-2.6/include/linux/if_arp.h  2007-08-03 11:21:42.0 +0200
@@ -52,6 +52,7 @@
 #define ARPHRD_ROSE270
 #define ARPHRD_X25 271 /* CCITT X.25   */
 #define ARPHRD_HWX25   272 /* Boards with X.25 in firmware */
+#define ARPHRD_CAN 280 /* Controller Area Network  */
 #define ARPHRD_PPP 512
 #define ARPHRD_CISCO   513 /* Cisco HDLC   */
 #define ARPHRD_HDLCARPHRD_CISCO
Index: net-2.6/include/linux/if_ether.h
===
--- net-2.6.orig/include/linux/if_ether.h   2007-08-03 11:21:32.0 
+0200
+++ net-2.6/include/linux/if_ether.h2007-08-03 11:21:42.0 +0200
@@ -90,6 +90,7 @@
 #define ETH_P_WAN_PPP   0x0007  /* Dummy type for WAN PPP frames*/
 #define ETH_P_PPP_MP0x0008  /* Dummy type for PPP MP frames */
 #define ETH_P_LOCALTALK 0x0009 /* Localtalk pseudo type*/
+#define ETH_P_CAN  0x000C  /* Controller Area Network  */
 #define ETH_P_PPPTALK  0x0010  /* Dummy type for Atalk over PPP*/
 #define ETH_P_TR_802_2 0x0011  /* 802.2 frames */
 #define ETH_P_MOBITEX  0x0015  /* Mobitex ([EMAIL PROTECTED])  */
Index: net-2.6/include/linux/socket.h
===
--- net-2.6.orig/include/linux/socket.h 2007-08-03 11:21:32.0 +0200
+++ net-2.6/include/linux/socket.h  2007-08-03 11:21:42.0 +0200
@@ -185,6 +185,7 @@
 #define AF_PPPOX   24  /* PPPoX sockets*/
 #define AF_WANPIPE 25  /* Wanpipe API Sockets */
 #define AF_LLC 26  /* Linux LLC*/
+#define AF_CAN 29  /* Controller Area Network  */
 #define AF_TIPC30  /* TIPC sockets */
 #define AF_BLUETOOTH   31  /* Bluetooth sockets*/
 #define AF_IUCV32  /* IUCV sockets */
@@ -220,6 +221,7 @@
 #define PF_PPPOX   AF_PPPOX
 #define PF_WANPIPE AF_WANPIPE
 #define PF_LLC AF_LLC
+#define PF_CAN AF_CAN
 #define PF_TIPCAF_TIPC
 #define PF_BLUETOOTH   AF_BLUETOOTH
 #define PF_IUCVAF_IUCV
Index: net-2.6/include/linux/tty.h
===
--- net-2.6.orig/include/linux/tty.h2007-08-03 11:21:32.0 +0200
+++ net-2.6/include/linux/tty.h 2007-08-03 11:21:42.0 +0200
@@ -24,7 +24,7 @@
 #define NR_PTYSCONFIG_LEGACY_PTY_COUNT   /* Number of legacy ptys */
 #define NR_UNIX98_PTY_DEFAULT  4096  /* Default maximum for Unix98 ptys */
 #define NR_UNIX98_PTY_MAX  (1  MINORBITS) /* Absolute limit */
-#define NR_LDISCS  17
+#define NR_LDISCS  18
 
 /* line disciplines */
 #define N_TTY  0
@@ -45,6 +45,7 @@
 #define N_SYNC_PPP 14  /* synchronous PPP */
 #define N_HCI  15  /* Bluetooth HCI UART */
 #define N_GIGASET_M101 16  /* Siemens Gigaset M101 serial DECT adapter */
+#define N_SLCAN17  /* Serial / USB serial CAN Adaptors */
 
 /*
  * This character is the same as _POSIX_VDISABLE: it cannot be used as
Index: net-2.6/net/core/sock.c
===
--- net-2.6.orig/net/core/sock.c2007-08-03 11:21:32.0 +0200
+++ net-2.6/net/core/sock.c 2007-08-03 11:21:42.0 +0200
@@ -153,7 +153,7 @@
   sk_lock-AF_ASH   , sk_lock-AF_ECONET   , sk_lock-AF_ATMSVC   ,
   sk_lock-21   , sk_lock-AF_SNA  , sk_lock-AF_IRDA ,
   sk_lock-AF_PPPOX , sk_lock-AF_WANPIPE  , sk_lock-AF_LLC  ,
-  sk_lock-27   , sk_lock-28  , sk_lock-29  ,
+  sk_lock-27   , sk_lock-28  , sk_lock-AF_CAN  ,
   sk_lock-AF_TIPC  , sk_lock-AF_BLUETOOTH, sk_lock-IUCV,
   sk_lock-AF_RXRPC , sk_lock-AF_MAX
 };
@@ -167,7 +167,7 @@
   slock-AF_ASH   , slock-AF_ECONET   , slock-AF_ATMSVC   ,
   slock-21   , slock-AF_SNA  , slock-AF_IRDA ,
   slock-AF_PPPOX , slock-AF_WANPIPE  , slock-AF_LLC  ,
-  slock-27   , slock-28  , slock-29  ,
+  slock-27   , slock-28  , slock-AF_CAN  ,
   

[patch 7/7] CAN: Add documentation

2007-08-03 Thread Urs Thuermann
This patch adds documentation for the PF_CAN protocol family.

Signed-off-by: Oliver Hartkopp [EMAIL PROTECTED]
Signed-off-by: Urs Thuermann [EMAIL PROTECTED]

---
 Documentation/networking/00-INDEX |2 
 Documentation/networking/can.txt  |  635 ++
 2 files changed, 637 insertions(+)

Index: net-2.6/Documentation/networking/can.txt
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ net-2.6/Documentation/networking/can.txt2007-08-03 11:21:58.0 
+0200
@@ -0,0 +1,635 @@
+
+
+can.txt
+
+Readme file for the Controller Area Network Protocol Family (aka Socket CAN)
+
+This file contains
+
+  1 Overview / What is Socket CAN
+
+  2 Motivation / Why using the socket API
+
+  3 Socket CAN concept
+3.1 receive lists
+3.2 loopback
+3.3 network security issues (capabilities)
+3.4 network problem notifications
+
+  4 How to use Socket CAN
+4.1 RAW protocol sockets with can_filters (SOCK_RAW)
+  4.1.1 RAW socket option CAN_RAW_FILTER
+  4.1.2 RAW socket option CAN_RAW_ERR_FILTER
+  4.1.3 RAW socket option CAN_RAW_LOOPBACK
+  4.1.4 RAW socket option CAN_RAW_RECV_OWN_MSGS
+4.2 Broadcast Manager protocol sockets (SOCK_DGRAM)
+4.3 connected transport protocols (SOCK_SEQPACKET)
+4.4 unconnected transport protocols (SOCK_DGRAM)
+
+  5 Socket CAN core module
+5.1 can.ko module params
+5.2 procfs content
+5.3 writing own CAN protocol modules
+
+  6 CAN network drivers
+6.1 general settings
+6.2 loopback
+6.3 CAN controller hardware filters
+6.4 currently supported CAN hardware
+6.5 todo
+
+  7 Credits
+
+
+
+1. Overview / What is Socket CAN
+
+
+The socketcan package is an implementation of CAN protocols
+(Controller Area Network) for Linux.  CAN is a networking technology
+which has wide-spread use in automation, embedded devices, and
+automotive fields.  While there have been other CAN implementations
+for Linux based on character devices, Socket CAN uses the Berkeley
+socket API, the Linux network stack and implements the CAN device
+drivers as network interfaces.  The CAN socket API has been designed
+as similar as possible to the TCP/IP protocols to allow programmers,
+familiar with network programming, to easily learn how to use CAN
+sockets.
+
+2. Motivation / Why using the socket API
+
+
+There have been CAN implementations for Linux before Socket CAN so the
+question arises, why we have started another project.  Most existing
+implementations come as a device driver for some CAN hardware, they
+are based on character devices and provide comparatively little
+functionality.  Usually, there is only a hardware-specific device
+driver which provides a character device interface to send and
+receive raw CAN frames, directly to/from the controller hardware.
+Queueing of frames and higher-level transport protocols like ISO-TP
+have to be implemented in user space applications.  Also, most
+character-device implementations support only one single process to
+open the device at a time, similar to a serial interface.  Exchanging
+the CAN controller requires employment of another device driver and
+often the need for adaption of large parts of the application to the
+new driver's API.
+
+Socket CAN was designed to overcome all of these limitations.  A new
+protocol family has been implemented which provides a socket interface
+to user space applications and which builds upon the Linux network
+layer, so to use all of the provided queueing functionality.  Device
+drivers for CAN controller hardware register itself with the Linux
+network layer as a network device, so that CAN frames from the
+controller can be passed up to the network layer and on to the CAN
+protocol family module and also vice-versa.  Also, the protocol family
+module provides an API for transport protocol modules to register, so
+that any number of transport protocols can be loaded or unloaded
+dynamically.  In fact, the can core module alone does not provide any
+protocol and can not be used without loading at least one additional
+protocol module.  Multiple sockets can be opened at the same time,
+on different or the same protocol module and they can listen/send
+frames on different or the same CAN IDs.  Several sockets listening on
+the same interface for frames with the same CAN ID are all passed the
+same received matching CAN frames.  An application wishing to
+communicate using a specific transport protocol, e.g. ISO-TP, just
+selects that protocol when opening the socket, and then can read and
+write application data byte streams, without having to deal with
+CAN-IDs, frames, etc.
+
+Similar functionality visible from user-space could 

[patch 4/7] CAN: Add broadcast manager (bcm) protocol

2007-08-03 Thread Urs Thuermann
This patch adds the CAN broadcast manager (bcm) protocol.

Signed-off-by: Oliver Hartkopp [EMAIL PROTECTED]
Signed-off-by: Urs Thuermann [EMAIL PROTECTED]

---
 include/linux/can/bcm.h |   65 +
 net/can/Kconfig |   28 
 net/can/Makefile|3 
 net/can/bcm.c   | 1755 
 4 files changed, 1851 insertions(+)

Index: net-2.6/include/linux/can/bcm.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ net-2.6/include/linux/can/bcm.h 2007-08-03 11:21:51.0 +0200
@@ -0,0 +1,65 @@
+/*
+ * linux/can/bcm.h
+ *
+ * Definitions for CAN Broadcast Manager (BCM)
+ *
+ * Author: Oliver Hartkopp [EMAIL PROTECTED]
+ * Copyright (c) 2002-2007 Volkswagen Group Electronic Research
+ * All rights reserved.
+ *
+ * Send feedback to [EMAIL PROTECTED]
+ *
+ */
+
+#ifndef CAN_BCM_H
+#define CAN_BCM_H
+
+/**
+ * struct bcm_msg_head - head of messages to/from the broadcast manager
+ * @opcode:opcode, see enum below.
+ * @flags: special flags, see below.
+ * @count: number of frames to send before changing interval.
+ * @ival1: interval for the first @count frames.
+ * @ival2: interval for the following frames.
+ * @can_id:CAN ID of frames to be sent or received.
+ * @nframes:   number of frames appended to the message head.
+ * @frames:array of CAN frames.
+ */
+struct bcm_msg_head {
+   int opcode;
+   int flags;
+   int count;
+   struct timeval ival1, ival2;
+   canid_t can_id;
+   int nframes;
+   struct can_frame frames[0];
+};
+
+enum {
+   TX_SETUP = 1,   /* create (cyclic) transmission task */
+   TX_DELETE,  /* remove (cyclic) transmission task */
+   TX_READ,/* read properties of (cyclic) transmission task */
+   TX_SEND,/* send one CAN frame */
+   RX_SETUP,   /* create RX content filter subscription */
+   RX_DELETE,  /* remove RX content filter subscription */
+   RX_READ,/* read properties of RX content filter subscription */
+   TX_STATUS,  /* reply to TX_READ request */
+   TX_EXPIRED, /* notification on performed transmissions (count=0) */
+   RX_STATUS,  /* reply to RX_READ request */
+   RX_TIMEOUT, /* cyclic message is absent */
+   RX_CHANGED  /* updated CAN frame (detected content change) */
+};
+
+#define SETTIMER0x0001
+#define STARTTIMER  0x0002
+#define TX_COUNTEVT 0x0004
+#define TX_ANNOUNCE 0x0008
+#define TX_CP_CAN_ID0x0010
+#define RX_FILTER_ID0x0020
+#define RX_CHECK_DLC0x0040
+#define RX_NO_AUTOTIMER 0x0080
+#define RX_ANNOUNCE_RESUME  0x0100
+#define TX_RESET_MULTI_IDX  0x0200
+#define RX_RTR_FRAME0x0400
+
+#endif /* CAN_BCM_H */
Index: net-2.6/net/can/Kconfig
===
--- net-2.6.orig/net/can/Kconfig2007-08-03 11:21:48.0 +0200
+++ net-2.6/net/can/Kconfig 2007-08-03 11:21:51.0 +0200
@@ -42,6 +42,34 @@
  Say Y here if you want non-root users to be able to access CAN_RAW
  sockets.
 
+config CAN_BCM
+   tristate Broadcast Manager CAN Protocol (with content filtering)
+   depends on CAN
+   default N
+   ---help---
+ The Broadcast Manager offers content filtering, timeout monitoring,
+ sending of RTR-frames and cyclic CAN messages without permanent user
+ interaction. The BCM can be 'programmed' via the BSD socket API and
+ informs you on demand e.g. only on content updates / timeouts.
+ You probably want to use the bcm socket in most cases where cyclic
+ CAN messages are used on the bus (e.g. in automotive environments).
+ To use the Broadcast Manager, use AF_CAN with protocol CAN_BCM.
+
+config CAN_BCM_USER
+   bool Allow non-root users to access CAN broadcast manager sockets
+   depends on CAN_BCM
+   default N
+   ---help---
+ The Controller Area Network is a local field bus transmitting only
+ broadcast messages without any routing and security concepts.
+ In the majority of cases the user application has to deal with
+ raw CAN frames. Therefore it might be reasonable NOT to restrict
+ the CAN access only to the user root, as known from other networks.
+ Since CAN_BCM sockets can only send and receive frames to/from CAN
+ interfaces this does not affect security of others networks.
+ Say Y here if you want non-root users to be able to access CAN_BCM
+ sockets.
+
 config CAN_DEBUG_CORE
bool CAN Core debugging messages
depends on CAN
Index: net-2.6/net/can/Makefile
===
--- net-2.6.orig/net/can/Makefile   2007-08-03 11:21:48.0 +0200
+++ net-2.6/net/can/Makefile2007-08-03 

Re: Distributed storage.

2007-08-03 Thread Dave Dillow
On Fri, 2007-08-03 at 09:04 +0400, Manu Abraham wrote:
 On 7/31/07, Evgeniy Polyakov [EMAIL PROTECTED] wrote:
 
  TODO list currently includes following main items:
  * redundancy algorithm (drop me a request of your own, but it is highly
  unlikley that Reed-Solomon based will ever be used - it is too slow
  for distributed RAID, I consider WEAVER codes)
 
 
 LDPC codes[1][2] have been replacing Turbo code[3] with regards to
 communication links and we have been seeing that transition. (maybe
 helpful, came to mind seeing the mention of Turbo code) Don't know how
 weaver compares to LDPC, though found some comparisons [4][5] But
 looking at fault tolerance figures, i guess Weaver is much better.
 
 [1] http://www.ldpc-codes.com/
 [2] http://portal.acm.org/citation.cfm?id=1240497
 [3] http://en.wikipedia.org/wiki/Turbo_code
 [4] 
 http://domino.research.ibm.com/library/cyberdig.nsf/papers/BD559022A190D41C85257212006CEC11/$File/rj10391.pdf
 [5] http://hplabs.hp.com/personal/Jay_Wylie/publications/wylie_dsn2007.pdf

Searching Google for Dr. Plank's work at the University of TN turns up
some analysis of using LDPC codes in storage systems.

http://www.google.com/search?hl=enq=plank+ldpcbtnG=Google+Search

Patents are an issue to watch out for around the use of Tornado/Raptor
codes. I've not researched it, but I believe there be dragons there.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 0/2][BNX2]: Add iSCSI support to BNX2 devices.

2007-08-03 Thread Michael Chan
[BNX2]: Add iSCSI support to BNX2 devices.

Modify bnx2 and add a cnic driver to support some offload functions
needed by iSCSI.

Add a new open-iscsi driver to support iSCSI offload on bnx2 devices.

Signed-off-by: Anil Veerabhadrappa [EMAIL PROTECTED]
Signed-off-by: Michael Chan [EMAIL PROTECTED]

--

The complete patch is in:

ftp://[EMAIL PROTECTED]/0001-BNX2-Add-iSCSI-support-to-BNX2-devices.patch

I broke this into 2 patches and omitted the firmware blob in the next 2
emails for review.

---
 drivers/net/Kconfig   |   10 +
 drivers/net/Makefile  |1 +
 drivers/net/bnx2.c|  116 +-
 drivers/net/bnx2.h|   25 +-
 drivers/net/bnx2_fw.h | 7036 ++---
 drivers/net/cnic.c| 1885 
 drivers/net/cnic.diff |  363 ++
 drivers/net/cnic.h|  163 +
 drivers/net/cnic_cm.h |  555 +++
 drivers/net/cnic_if.h |  152 +
 drivers/scsi/Kconfig  |2 +
 drivers/scsi/Makefile |1 +
 drivers/scsi/bnx2i/57xx_iscsi_constants.h |  212 +
 drivers/scsi/bnx2i/57xx_iscsi_hsi.h   | 1501 ++
 drivers/scsi/bnx2i/Kconfig|7 +
 drivers/scsi/bnx2i/Makefile   |4 +
 drivers/scsi/bnx2i/bnx2i.h|  828 
 drivers/scsi/bnx2i/bnx2i_hwi.c| 1993 
 drivers/scsi/bnx2i/bnx2i_init.c   |  393 ++
 drivers/scsi/bnx2i/bnx2i_iscsi.c  | 3718 +++
 drivers/scsi/bnx2i/bnx2i_sysfs.c  |  616 +++


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Distributed storage.

2007-08-03 Thread Manu Abraham
On 8/4/07, Dave Dillow [EMAIL PROTECTED] wrote:
 On Fri, 2007-08-03 at 09:04 +0400, Manu Abraham wrote:
  On 7/31/07, Evgeniy Polyakov [EMAIL PROTECTED] wrote:
 
   TODO list currently includes following main items:
   * redundancy algorithm (drop me a request of your own, but it is 
   highly
   unlikley that Reed-Solomon based will ever be used - it is too 
   slow
   for distributed RAID, I consider WEAVER codes)
 
 
  LDPC codes[1][2] have been replacing Turbo code[3] with regards to
  communication links and we have been seeing that transition. (maybe
  helpful, came to mind seeing the mention of Turbo code) Don't know how
  weaver compares to LDPC, though found some comparisons [4][5] But
  looking at fault tolerance figures, i guess Weaver is much better.
 
  [1] http://www.ldpc-codes.com/
  [2] http://portal.acm.org/citation.cfm?id=1240497
  [3] http://en.wikipedia.org/wiki/Turbo_code
  [4] 
  http://domino.research.ibm.com/library/cyberdig.nsf/papers/BD559022A190D41C85257212006CEC11/$File/rj10391.pdf
  [5] http://hplabs.hp.com/personal/Jay_Wylie/publications/wylie_dsn2007.pdf

 Searching Google for Dr. Plank's work at the University of TN turns up
 some analysis of using LDPC codes in storage systems.

 http://www.google.com/search?hl=enq=plank+ldpcbtnG=Google+Search

 Patents are an issue to watch out for around the use of Tornado/Raptor
 codes. I've not researched it, but I believe there be dragons there.


We don't use the code in the driver straight away [2] (in the case
that i mentioned), since that happens in the hardware (demodulator
chip) [1], but we have an interface for selecting the code-rate [2]
(LDPC/BCH) for DVB-S2 and the new papers for DVB-T2 looks geared that
the base decision is to use LDPC.

Though i now see a patent application for it [3]. Not sure whether it
is a registered patent, i am under an agreement of Non-Disclosure with
STM. Will ask the relevant person there, whether they have it
registered. (Most probably they may have it registered).

There are a few people from STM on LK, if not they can possibly
confirm whether the patent is regsitered or not.

[1] 
http://www2.dac.com/data2/42nd/42acceptedpapers.nsf/0c4c09c6ffa905c487256b7b007afb72/998f93e4b29e99fa87256fc400714617/$FILE/33_1.pdf

[2] 
http://linuxtv.org/hg/~manu/stb0899-c5/file/760cb230695c/linux/include/linux/dvb/frontend.h

[3] http://www.freepatentsonline.com/20060206779.html
http://www.freepatentsonline.com/20060206778.html
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REGRESSION] tg3 dead after s2ram

2007-08-03 Thread David Miller
From: Michael Chan [EMAIL PROTECTED]
Date: Thu, 02 Aug 2007 12:10:29 -0700

 [TG3]: Fix suspend/resume problem.
 
 Joachim Deguara [EMAIL PROTECTED] reported that tg3 devices
 would not resume properly if the device was shutdown before the system
 was suspended.  In such scenario where the netif_running state is 0,
 tg3_suspend() would not save the PCI state and so the memory enable bit
 and bus master enable bit would be lost.
 
 We fix this by always saving and restoring the PCI state in
 tg3_suspend() and tg3_resume() regardless of netif_running() state.
 
 Signed-off-by: Michael Chan [EMAIL PROTECTED]

Patch applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BNX2]: Fix suspend/resume problem.

2007-08-03 Thread David Miller
From: Michael Chan [EMAIL PROTECTED]
Date: Fri, 03 Aug 2007 15:32:34 -0700

 [BNX2]: Fix suspend/resume problem.
 
 The device would not resume properly if it was shutdown before the system
 was suspended.  In such scenario where the netif_running state is 0,
 bnx2_suspend() would not save the PCI state and so the memory enable bit
 and bus master enable bit would be lost.
 
 We fix this by always saving and restoring the PCI state in
 bnx2_suspend() and bnx2_resume() regardless of netif_running() state.
 
 Update version to 1.6.4.
 
 Signed-off-by: Michael Chan [EMAIL PROTECTED]

Also applied, thanks Michael.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ixgbe: New driver for Pci-Express 10GbE 82598 support

2007-08-03 Thread Kok, Auke

Auke Kok wrote:

This patch adds support for the Intel 82598 PCI-Express 10GbE
chipset. Devices will be available on the market soon.


Also available through http and git:

 http://foo-projects.org/~sofar/ixgbe-20070803-submission.patch
 http://foo-projects.org/~sofar/ixgbe-20070803-submission.patch.bz2

 git://lost.foo-projects.org/~ahkok/linux-2.6#ixgbe-20070803-submission

Cheers,

Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html