Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-13 Thread Michael S. Tsirkin
Quoting r. David Miller [EMAIL PROTECTED]:
 Subject: Re: Dropping NETIF_F_SG since no checksum feature.
 
 From: Michael S. Tsirkin [EMAIL PROTECTED]
 Date: Thu, 12 Oct 2006 21:12:06 +0200
 
  Quoting r. David Miller [EMAIL PROTECTED]:
   Subject: Re: Dropping NETIF_F_SG since no checksum feature.
   
   Numbers?
  
  I created two subnets on top of the same pair infiniband HCAs:
 
 I was asking for SG vs. non-SG numbers so I could see proof
 that it really does help like you say it will.
 

Dave, thanks for the clarification.
Please note that ib0 is a non-SG device with MTU 2K,
sorry that I forgot to mention that.


so, to summarize my previous mail:


interface   flags  mtubandwidth
ib0 linear(0)  2044   286.45
ibc0_F_SG  65484  782.55



If I will set both ib0 and ibc0 to 64K MTU, then
benchmark-mode with the same MTU SG is somewhat slower than non-SG 
(I tested this at some point, by some 10%, don't have the numbers at the moment 
-
do you want to see them?).  I did not claim it is faster to do SG with same MTU
and it is I think clear why linear should be faster for copy *with the same 
MTU*.
But do you really think that we will be able to allocate
even a single 64K linear skb after the machine has been active for a while?

My assumption is that if I want to reliably get MTU  PAGE_SIZE I must support 
SG.
Is it the wrong one?

If this assumption is correct, then below is my line of thinking:
- with infiniband we provably get a 2.5x speedup with MTU of 64K vs to 2K.
- to get packets of that size reliably we must declare S/G support
- infiniband verbs do not support IP checksumming
- per network algorithmics, it is better to piggyback checksum calculation
  on copying if copying takes place

For this reason, I would like to define the meaning of S/G set when checksum
bits are all clear as we support S/G but not checksum, please checksum
for us if you copy data anyway. Alternatively, add a new
NETIF_F_??_CSUM bit to mean this capability.
Does this make sense?

Thanks,

-- 
MST
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-12 Thread Michael S. Tsirkin
Quoting r. David Miller [EMAIL PROTECTED]:
 Subject: Re: Dropping NETIF_F_SG since no checksum feature.
 
 From: Michael S. Tsirkin [EMAIL PROTECTED]
 Date: Wed, 11 Oct 2006 23:23:39 +0200
 
  With my patch, there is a huge performance gain by increasing MTU to 64K.
  And it seems the only way to do this is by S/G.
 
 Numbers?
 

I created two subnets on top of the same pair infiniband HCAs:

[EMAIL PROTECTED] ~]# ifconfig ib0
ib0   Link encap:UNSPEC  HWaddr
00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00
  inet addr:12.4.3.69  Bcast:12.255.255.255  Mask:255.0.0.0
  inet6 addr: fe80::202:c902:20:ee45/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
  RX packets:1382531 errors:0 dropped:0 overruns:0 frame:0
  TX packets:2725206 errors:0 dropped:5 overruns:0 carrier:0
  collisions:0 txqueuelen:128
  RX bytes:71892772 (68.5 MiB)  TX bytes:5290011992 (4.9 GiB)

[EMAIL PROTECTED] ~]# ifconfig ibc0
ibc0  Link encap:UNSPEC  HWaddr
00-03-04-06-FE-80-00-00-00-00-00-00-00-00-00-00
  inet addr:11.4.3.69  Bcast:11.255.255.255  Mask:255.0.0.0
  inet6 addr: fe80::202:c902:20:ee45/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:65484  Metric:1
  RX packets:115647 errors:0 dropped:0 overruns:0 frame:0
  TX packets:253403 errors:0 dropped:4 overruns:0 carrier:0
  collisions:0 txqueuelen:128
  RX bytes:6014720 (5.7 MiB)  TX bytes:16589589008 (15.4 GiB)

The other side was configured with 12.4.3.68 for MTU 65484
and 11.4.3.68 for MTU 2044. 

And then I just run netperf:
[EMAIL PROTECTED] ~]#
[EMAIL PROTECTED] ~]# /mswg/work/mst/netperf-2.4.2/src/netperf -f M -H 
12.4.3.68 -c -C
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 12.4.3.68 (12.4.3.68)
port 0 AF_INET
Recv   SendSend  Utilization   Service Demand
Socket Socket  Message  Elapsed  Send Recv SendRecv
Size   SizeSize Time Throughput  localremote   local   remote
bytes  bytes   bytessecs.MBytes  /s  % S  % S  us/KB   us/KB

 87380  16384  1638410.00   286.45   40.2025.285.482   3.448

[EMAIL PROTECTED] ~]# /mswg/work/mst/netperf-2.4.2/src/netperf -f M -H 11.4.3.68
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.4.3.68 (11.4.3.68)
port 0 AF_INET
Recv   SendSend
Socket Socket  Message  Elapsed
Size   SizeSize Time Throughput
bytes  bytes   bytessecs.MBytes/sec

 87380  16384  1638410.01 782.55

This is all very preliminary - but I hope you get the idea -
increasing MTU is very helpful for infiniband, and infiniband adapters
handle large S/G lists without problems, but the verbs
do not include support for IP checksums, so these must be done in software.

So what we would like, is for the infiniband network device to say
I don't support checksums, I only support S/G and then for
network layer to do the checksumming for us piggybacking on data copy
at least for cases where it does perform the copy.

Does this makes sense now?

-- 
MST
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-12 Thread David Miller
From: Michael S. Tsirkin [EMAIL PROTECTED]
Date: Thu, 12 Oct 2006 21:12:06 +0200

 Quoting r. David Miller [EMAIL PROTECTED]:
  Subject: Re: Dropping NETIF_F_SG since no checksum feature.
  
  Numbers?
 
 I created two subnets on top of the same pair infiniband HCAs:

I was asking for SG vs. non-SG numbers so I could see proof
that it really does help like you say it will.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread Michael S. Tsirkin
Quoting r. David Miller [EMAIL PROTECTED]:
 Subject: Re: Dropping NETIF_F_SG since no checksum feature.
 
 From: Michael S. Tsirkin [EMAIL PROTECTED]
 Date: Wed, 11 Oct 2006 02:13:38 +0200
 
  Maybe I can patch linux to allow SG without checksum?
  Dave, maybe you could drop a hint or two on whether this is worthwhile
  and what are the issues that need addressing to make this work?
  
  I imagine it's not just the matter of changing net/core/dev.c :).
 
 You can't, it's a quality of implementation issue.  We sendfile()
 pages directly out of the filesystem page cache without any
 blocking of modifications to the page contents, and the only way
 that works is if the card computes the checksum for us.
 
 If we sendfile() a page directly, we must compute a correct checksum
 no matter what the contents.  We can't do this on the cpu before the
 data hits the device because another thread of execution can go in and
 modify the page contents which would invalidate the checksum and thus
 invalidating the packet.  We cannot allow this.
 
 Blocking modifications is too expensive, so that's not an option
 either.
 

But copying still works fine, does it not?
Dave, could you clarify this please?

ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset,
 size_t size, int flags)
{
ssize_t res;
struct sock *sk = sock-sk;

if (!(sk-sk_route_caps  NETIF_F_SG) ||
!(sk-sk_route_caps  NETIF_F_ALL_CSUM))
return sock_no_sendpage(sock, page, offset, size, flags);


So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
data will be copied over rather than sent directly.
So why does dev.c have to force set NETIF_F_SG to off then?

-- 
MST
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread Steven Whitehouse
Hi,

On Wed, Oct 11, 2006 at 11:05:04AM +0200, Michael S. Tsirkin wrote:
 Quoting r. David Miller [EMAIL PROTECTED]:
  Subject: Re: Dropping NETIF_F_SG since no checksum feature.
  
  From: Michael S. Tsirkin [EMAIL PROTECTED]
  Date: Wed, 11 Oct 2006 02:13:38 +0200
  
   Maybe I can patch linux to allow SG without checksum?
   Dave, maybe you could drop a hint or two on whether this is worthwhile
   and what are the issues that need addressing to make this work?
   
   I imagine it's not just the matter of changing net/core/dev.c :).
  
  You can't, it's a quality of implementation issue.  We sendfile()
  pages directly out of the filesystem page cache without any
  blocking of modifications to the page contents, and the only way
  that works is if the card computes the checksum for us.
  
  If we sendfile() a page directly, we must compute a correct checksum
  no matter what the contents.  We can't do this on the cpu before the
  data hits the device because another thread of execution can go in and
  modify the page contents which would invalidate the checksum and thus
  invalidating the packet.  We cannot allow this.
  
  Blocking modifications is too expensive, so that's not an option
  either.
  


I would argue that SG does make sense without checksum for protocols that
don't need/use a checksum. DECnet for example could do zero-copy without
caring about the checksum since it doesn't have one. One of these days
I'll get around to writing that bit of code :-)
 
 But copying still works fine, does it not?
 Dave, could you clarify this please?
 
 ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset,
  size_t size, int flags)
 {
 ssize_t res;
 struct sock *sk = sock-sk;
 
 if (!(sk-sk_route_caps  NETIF_F_SG) ||
 !(sk-sk_route_caps  NETIF_F_ALL_CSUM))
 return sock_no_sendpage(sock, page, offset, size, flags);
 
 
 So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
 data will be copied over rather than sent directly.
 So why does dev.c have to force set NETIF_F_SG to off then?

I agree with that analysis,

Steve.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread David Miller
From: Michael S. Tsirkin [EMAIL PROTECTED]
Date: Wed, 11 Oct 2006 11:05:04 +0200

 So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
 data will be copied over rather than sent directly.
 So why does dev.c have to force set NETIF_F_SG to off then?

Because it's more efficient to copy into a linear destination
buffer of an SKB than page sub-chunks when doing checksum+copy.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread Michael S. Tsirkin
Quoting r. David Miller [EMAIL PROTECTED]:
 Subject: Re: Dropping NETIF_F_SG since no checksum feature.
 
 From: Michael S. Tsirkin [EMAIL PROTECTED]
 Date: Wed, 11 Oct 2006 11:05:04 +0200
 
  So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
  data will be copied over rather than sent directly.
  So why does dev.c have to force set NETIF_F_SG to off then?
 
 Because it's more efficient to copy into a linear destination
 buffer of an SKB than page sub-chunks when doing checksum+copy.
 

Thanks for the explanation.
Obviously its true as long as you can allocate the skb that big.
I think you won't realistically be able to get 64K in a
linear SKB on a busy system, though, is not that right?

OTOH, having large MTU (e.g. 64K) helps performance a lot since it
reduces receive side processing overhead.

So, if I understand what you are saying correctly,
things do work correctly (just slower for small skb) if NETIF_F_SG is set bug
clear, it seems that all we need to do is drop the following in dev.c:

/* Fix illegal SG+CSUM combinations. */
if ((dev-features  NETIF_F_SG) 
!(dev-features  NETIF_F_ALL_CSUM)) {
printk(KERN_NOTICE %s: Dropping NETIF_F_SG since no checksum 
feature.\n,
   dev-name);
dev-features = ~NETIF_F_SG;
}

is that right?

-- 
MST
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread Michael S. Tsirkin
Quoting Steven Whitehouse [EMAIL PROTECTED]:
  ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset,
   size_t size, int flags)
  {
  ssize_t res;
  struct sock *sk = sock-sk;
  
  if (!(sk-sk_route_caps  NETIF_F_SG) ||
  !(sk-sk_route_caps  NETIF_F_ALL_CSUM))
  return sock_no_sendpage(sock, page, offset, size, flags);
  
  
  So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
  data will be copied over rather than sent directly.
  So why does dev.c have to force set NETIF_F_SG to off then?
 
 I agree with that analysis,

So, would you Ack something like the following then?

==

Enabling NETIF_F_SG without NETIF_F_ALL_CSUM actually seems to work fine by
doing an old-fashioned data copy in tcp_sendpage.
And for devices that do not calculate IP checksum in hardware (e.g. InfiniBand)
calculating the checksum for all packets in network driver is worse than have
the CPU piggyback the checksum compitation with the copy process.
Finally, note that NETIF_F_SG is necessary to be able to allocate skbs 
PAGE_SIZE on busy systems.

So, let's allow that combination, again, for drivers that want it.

Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED]

---

diff --git a/net/core/dev.c b/net/core/dev.c
index d4a1ec3..2d731a0 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2930,14 +2930,6 @@ #endif
}
}
 
-   /* Fix illegal SG+CSUM combinations. */
-   if ((dev-features  NETIF_F_SG) 
-   !(dev-features  NETIF_F_ALL_CSUM)) {
-   printk(KERN_NOTICE %s: Dropping NETIF_F_SG since no checksum 
feature.\n,
-  dev-name);
-   dev-features = ~NETIF_F_SG;
-   }
-
/* TSO requires that SG is present as well. */
if ((dev-features  NETIF_F_TSO) 
!(dev-features  NETIF_F_SG)) {

-- 
MST
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [openib-general] Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread Michael Krause

At 02:46 AM 10/11/2006, Michael S. Tsirkin wrote:

Quoting r. David Miller [EMAIL PROTECTED]:
 Subject: Re: Dropping NETIF_F_SG since no checksum feature.

 From: Michael S. Tsirkin [EMAIL PROTECTED]
 Date: Wed, 11 Oct 2006 11:05:04 +0200

  So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
  data will be copied over rather than sent directly.
  So why does dev.c have to force set NETIF_F_SG to off then?

 Because it's more efficient to copy into a linear destination
 buffer of an SKB than page sub-chunks when doing checksum+copy.


Thanks for the explanation.
Obviously its true as long as you can allocate the skb that big.
I think you won't realistically be able to get 64K in a
linear SKB on a busy system, though, is not that right?

OTOH, having large MTU (e.g. 64K) helps performance a lot since it reduces 
receive side processing overhead.


One thing to keep in mind is while it may help performance in a 
micro-benchmark, the system performance or the QoS impacts to other flows 
can be negatively impacted depending upon implementation.  For example, 
consider multiple messages interleaving (heaven help implementations that 
are not able to interleave multiple messages) on either the transmit or 
receive HCA / RNIC and how the time-to-completion of any message is 
extended out in time as a result of the interleave.  The effective 
throughput in terms of useful units of work can be lower as a result.   The 
same effect can be observed when there are a significant number connections 
in a device being simultaneously processed.


Also, if the copy-checksum is not performed on the processor where the 
application resides, then the performance can also be negatively impacted 
(want to have the right cache hot when initiated or concluded).  While the 
aggregate computational performance of systems may be increasing at a 
significant rate (set aside the per core vs. aggregate core debate), the 
memory performance gains are much less.  If you examine the longer term 
trends, there may be a flattening out of memory performance improvements by 
2009/10 without some major changes in the way controllers and subsystems 
are designed.


Mike 



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread Steven Whitehouse
Hi,

On Wed, Oct 11, 2006 at 05:01:03PM +0200, Michael S. Tsirkin wrote:
 Quoting Steven Whitehouse [EMAIL PROTECTED]:
   ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset,
size_t size, int flags)
   {
   ssize_t res;
   struct sock *sk = sock-sk;
   
   if (!(sk-sk_route_caps  NETIF_F_SG) ||
   !(sk-sk_route_caps  NETIF_F_ALL_CSUM))
   return sock_no_sendpage(sock, page, offset, size, flags);
   
   
   So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
   data will be copied over rather than sent directly.
   So why does dev.c have to force set NETIF_F_SG to off then?
  
  I agree with that analysis,
 
 So, would you Ack something like the following then?


In so far as I'm able to ack it, then yes, but with the following
caveats: that you also need to look at the tcp code's checks for
NETIF_F_SG (aside from the interface to tcp_sendpage which I think
we've agreed is ok) and ensure that this patch will not change their
behaviour, and here I'm thinking of the test in net/ipv4/tcp.c:select_size()
in particular - there may be others but thats the only one I can think
of off the top of my head. I think this is what davem was getting at
with his comment about copy  sum for smaller packets.

Also all subject to approval by davem and shemminger of course :-)

My general feeling is that devices should advertise the features that
they actually have and that the protocols should make the decision
as to which ones to use or not depending on the combinations available
(which I think is pretty much your argument).

Steve.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread David Miller
From: Michael S. Tsirkin [EMAIL PROTECTED]
Date: Wed, 11 Oct 2006 17:01:03 +0200

 Quoting Steven Whitehouse [EMAIL PROTECTED]:
   ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset,
size_t size, int flags)
   {
   ssize_t res;
   struct sock *sk = sock-sk;
   
   if (!(sk-sk_route_caps  NETIF_F_SG) ||
   !(sk-sk_route_caps  NETIF_F_ALL_CSUM))
   return sock_no_sendpage(sock, page, offset, size, flags);
   
   
   So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
   data will be copied over rather than sent directly.
   So why does dev.c have to force set NETIF_F_SG to off then?
  
  I agree with that analysis,
 
 So, would you Ack something like the following then?

I certainly don't.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread Michael S. Tsirkin
Quoting r. Steven Whitehouse [EMAIL PROTECTED]:
 Subject: Re: Dropping NETIF_F_SG since no checksum feature.
 
 Hi,
 
 On Wed, Oct 11, 2006 at 05:01:03PM +0200, Michael S. Tsirkin wrote:
  Quoting Steven Whitehouse [EMAIL PROTECTED]:
ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset,
 size_t size, int flags)
{
ssize_t res;
struct sock *sk = sock-sk;

if (!(sk-sk_route_caps  NETIF_F_SG) ||
!(sk-sk_route_caps  NETIF_F_ALL_CSUM))
return sock_no_sendpage(sock, page, offset, size, 
flags);


So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
data will be copied over rather than sent directly.
So why does dev.c have to force set NETIF_F_SG to off then?
   
   I agree with that analysis,
  
  So, would you Ack something like the following then?
 
 
 In so far as I'm able to ack it, then yes, but with the following
 caveats: that you also need to look at the tcp code's checks for
 NETIF_F_SG (aside from the interface to tcp_sendpage which I think
 we've agreed is ok) and ensure that this patch will not change their
 behaviour, and here I'm thinking of the test in net/ipv4/tcp.c:select_size()
 in particular - there may be others but thats the only one I can think
 of off the top of my head. I think this is what davem was getting at
 with his comment about copy  sum for smaller packets.

Will do - thanks for the tips.

 Also all subject to approval by davem and shemminger of course :-)

Goes without saying :)

 My general feeling is that devices should advertise the features that
 they actually have and that the protocols should make the decision
 as to which ones to use or not depending on the combinations available
 (which I think is pretty much your argument).
 
 Steve.

-- 
MST
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread Stephen Hemminger
On Wed, 11 Oct 2006 21:11:38 +0100
Steven Whitehouse [EMAIL PROTECTED] wrote:

 Hi,
 
 On Wed, Oct 11, 2006 at 05:01:03PM +0200, Michael S. Tsirkin wrote:
  Quoting Steven Whitehouse [EMAIL PROTECTED]:
ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset,
 size_t size, int flags)
{
ssize_t res;
struct sock *sk = sock-sk;

if (!(sk-sk_route_caps  NETIF_F_SG) ||
!(sk-sk_route_caps  NETIF_F_ALL_CSUM))
return sock_no_sendpage(sock, page, offset, size, 
flags);


So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
data will be copied over rather than sent directly.
So why does dev.c have to force set NETIF_F_SG to off then?
   
   I agree with that analysis,
  
  So, would you Ack something like the following then?
 
 
 In so far as I'm able to ack it, then yes, but with the following
 caveats: that you also need to look at the tcp code's checks for
 NETIF_F_SG (aside from the interface to tcp_sendpage which I think
 we've agreed is ok) and ensure that this patch will not change their
 behaviour, and here I'm thinking of the test in net/ipv4/tcp.c:select_size()
 in particular - there may be others but thats the only one I can think
 of off the top of my head. I think this is what davem was getting at
 with his comment about copy  sum for smaller packets.
 
 Also all subject to approval by davem and shemminger of course :-)
 
 My general feeling is that devices should advertise the features that
 they actually have and that the protocols should make the decision
 as to which ones to use or not depending on the combinations available
 (which I think is pretty much your argument).
 
 Steve.
 

You might want to try ignoring the check in dev.c and testing
to see if there is a performance gain.  It wouldn't be hard to test
a modified version and validate the performance change.

You could even do what I suggested and use skb_checksum_help()
to do inplace checksumming, as a performance test.


-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread David Miller
From: Michael S. Tsirkin [EMAIL PROTECTED]
Date: Wed, 11 Oct 2006 23:23:39 +0200

 With my patch, there is a huge performance gain by increasing MTU to 64K.
 And it seems the only way to do this is by S/G.

Numbers?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread Michael S. Tsirkin
Quoting r. David Miller [EMAIL PROTECTED]:
 Subject: Re: Dropping NETIF_F_SG since no checksum feature.
 
 From: Michael S. Tsirkin [EMAIL PROTECTED]
 Date: Wed, 11 Oct 2006 17:01:03 +0200
 
  Quoting Steven Whitehouse [EMAIL PROTECTED]:
ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset,
 size_t size, int flags)
{
ssize_t res;
struct sock *sk = sock-sk;

if (!(sk-sk_route_caps  NETIF_F_SG) ||
!(sk-sk_route_caps  NETIF_F_ALL_CSUM))
return sock_no_sendpage(sock, page, offset, size, 
flags);


So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
data will be copied over rather than sent directly.
So why does dev.c have to force set NETIF_F_SG to off then?
   
   I agree with that analysis,
  
  So, would you Ack something like the following then?
 
 I certainly don't.
 

Dave, sorry, you lost me.
Would a new flag NETIF_F_SW_CSUM be better, to tell network core that device
computes checksums in software, so we should piggyback the checksum
computation with the copy process if possible?

Or is there some other issue?

-- 
MST
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread Michael S. Tsirkin
Quoting r. Stephen Hemminger [EMAIL PROTECTED]:
 Subject: Re: Dropping NETIF_F_SG since no checksum feature.
 
 O
   
   You might want to try ignoring the check in dev.c and testing
   to see if there is a performance gain.  It wouldn't be hard to test
   a modified version and validate the performance change.
  
  Yes. With my patch, there is a huge performance gain by increasing MTU to 
  64K.
  And it seems the only way to do this is by S/G.
  
   You could even do what I suggested and use skb_checksum_help()
   to do inplace checksumming, as a performance test.
  
  I can. But as network algorithmics says (chapter 5)
  Since such bus reads are expensive, the CPU might as well piggyback
  the checksum computation with the copy process.
  
  It speaks about onboard the adapter buffers, but memory bus reads are also 
  much slower
  than CPU nowdays.  So I think even if this works well in benchmark in real 
  life
  single copy should better.
  
 
 The other alternative might be to make copy/checksum code smarter about using
 fragments rather than allocating a large buffer. It should avoid second order
 allocations (effective size  PAGESIZE).
 
In my testing, it seems quite smart already - once I declare F_SG and clear 
F_...CSUM
I start getting properly checksummed packets with lots of s/g fragments.
But I'm not catching the drift - an alternative to what this would be?

-- 
MST
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread Michael S. Tsirkin
Quoting r. Stephen Hemminger [EMAIL PROTECTED]:
 Subject: Re: Dropping NETIF_F_SG since no checksum feature.
 
 On Wed, 11 Oct 2006 21:11:38 +0100
 Steven Whitehouse [EMAIL PROTECTED] wrote:
 
  Hi,
  
  On Wed, Oct 11, 2006 at 05:01:03PM +0200, Michael S. Tsirkin wrote:
   Quoting Steven Whitehouse [EMAIL PROTECTED]:
 ssize_t tcp_sendpage(struct socket *sock, struct page *page, int 
 offset,
  size_t size, int flags)
 {
 ssize_t res;
 struct sock *sk = sock-sk;
 
 if (!(sk-sk_route_caps  NETIF_F_SG) ||
 !(sk-sk_route_caps  NETIF_F_ALL_CSUM))
 return sock_no_sendpage(sock, page, offset, size, 
 flags);
 
 
 So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
 data will be copied over rather than sent directly.
 So why does dev.c have to force set NETIF_F_SG to off then?

I agree with that analysis,
   
   So, would you Ack something like the following then?
  
  
  In so far as I'm able to ack it, then yes, but with the following
  caveats: that you also need to look at the tcp code's checks for
  NETIF_F_SG (aside from the interface to tcp_sendpage which I think
  we've agreed is ok) and ensure that this patch will not change their
  behaviour, and here I'm thinking of the test in net/ipv4/tcp.c:select_size()
  in particular - there may be others but thats the only one I can think
  of off the top of my head. I think this is what davem was getting at
  with his comment about copy  sum for smaller packets.
  
  Also all subject to approval by davem and shemminger of course :-)
  
  My general feeling is that devices should advertise the features that
  they actually have and that the protocols should make the decision
  as to which ones to use or not depending on the combinations available
  (which I think is pretty much your argument).
  
  Steve.
  
 
 You might want to try ignoring the check in dev.c and testing
 to see if there is a performance gain.  It wouldn't be hard to test
 a modified version and validate the performance change.

Yes. With my patch, there is a huge performance gain by increasing MTU to 64K.
And it seems the only way to do this is by S/G.

 You could even do what I suggested and use skb_checksum_help()
 to do inplace checksumming, as a performance test.

I can. But as network algorithmics says (chapter 5)
Since such bus reads are expensive, the CPU might as well piggyback
the checksum computation with the copy process.

It speaks about onboard the adapter buffers, but memory bus reads are also much 
slower
than CPU nowdays.  So I think even if this works well in benchmark in real life
single copy should better.

-- 
MST
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread Stephen Hemminger
O
  
  You might want to try ignoring the check in dev.c and testing
  to see if there is a performance gain.  It wouldn't be hard to test
  a modified version and validate the performance change.
 
 Yes. With my patch, there is a huge performance gain by increasing MTU to 64K.
 And it seems the only way to do this is by S/G.
 
  You could even do what I suggested and use skb_checksum_help()
  to do inplace checksumming, as a performance test.
 
 I can. But as network algorithmics says (chapter 5)
 Since such bus reads are expensive, the CPU might as well piggyback
 the checksum computation with the copy process.
 
 It speaks about onboard the adapter buffers, but memory bus reads are also 
 much slower
 than CPU nowdays.  So I think even if this works well in benchmark in real 
 life
 single copy should better.
 

The other alternative might be to make copy/checksum code smarter about using
fragments rather than allocating a large buffer. It should avoid second order
allocations (effective size  PAGESIZE).

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-10 Thread Michael S. Tsirkin
Quoting r. Stephen Hemminger [EMAIL PROTECTED]:
 Subject: Re: Dropping NETIF_F_SG since no checksum feature.
 
 On Mon, 9 Oct 2006 19:47:05 +0200
 Michael S. Tsirkin [EMAIL PROTECTED] wrote:
 
  Hi!
  I'm trying to build a network device driver supporting a very large MTU 
  (around 64K)
  on top of an infiniband connection, and I've hit a couple of issues I'd
  appreciate some feedback on:
  
  1. On the send side,
 I've set NETIF_F_SG, but hardware does not support checksum offloading,
 and I see dropping NETIF_F_SG since no checksum feature warning,
 and I seem to be getting large packets all in one chunk.
 The reason I've set NETIF_F_SG, is because I'm concerned that under real 
  life
 stress Linux won't be able to allocate 64K of continuous memory.
  
 Is this concern of mine valid? I saw in-tree drivers allocating at least 
  8K.
 What's the best way to enable S/G on send side?
 Is checksum offloading really required for S/G?
 
 Yes, in the current implementation, Linux needs checksum offload. But there
 is no reason, your driver can't compute the checksum in software.

Are there drivers that do this already? Couldn't find any such beast ...

I'm worried whether an extra pass over data won't eat up all of
the performance gains I get from the large MTU ...

 What are the helpers legal for fragmented skb?

BTW, I found skb_put_frags in sky2 which seems generic enough - I even wander
why isn't this in net/core.

Thanks!

-- 
MST
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-10 Thread Stephen Hemminger
On Tue, 10 Oct 2006 16:43:30 +0200
Michael S. Tsirkin [EMAIL PROTECTED] wrote:

 Quoting r. Stephen Hemminger [EMAIL PROTECTED]:
  Subject: Re: Dropping NETIF_F_SG since no checksum feature.
  
  On Mon, 9 Oct 2006 19:47:05 +0200
  Michael S. Tsirkin [EMAIL PROTECTED] wrote:
  
   Hi!
   I'm trying to build a network device driver supporting a very large MTU 
   (around 64K)
   on top of an infiniband connection, and I've hit a couple of issues I'd
   appreciate some feedback on:
   
   1. On the send side,
  I've set NETIF_F_SG, but hardware does not support checksum offloading,
  and I see dropping NETIF_F_SG since no checksum feature warning,
  and I seem to be getting large packets all in one chunk.
  The reason I've set NETIF_F_SG, is because I'm concerned that under 
   real life
  stress Linux won't be able to allocate 64K of continuous memory.
   
  Is this concern of mine valid? I saw in-tree drivers allocating at 
   least 8K.
  What's the best way to enable S/G on send side?
  Is checksum offloading really required for S/G?
  
  Yes, in the current implementation, Linux needs checksum offload. But there
  is no reason, your driver can't compute the checksum in software.

 Are there drivers that do this already? Couldn't find any such beast ...

dev_queue_xmit() does it, all you need to do in your driver is:

/* If packet is not checksummed and device does not support
 * checksumming for this protocol, complete checksumming here.
 */
if (skb-ip_summed == CHECKSUM_PARTIAL) {
if (skb_checksum_help(skb))
goto error_recovery
}


 I'm worried whether an extra pass over data won't eat up all of
 the performance gains I get from the large MTU ...

Yup, the cost is in touching the data, not in the copy.

  What are the helpers legal for fragmented skb?
 
 BTW, I found skb_put_frags in sky2 which seems generic enough - I even wander
 why isn't this in net/core.
 

Only because I just wrote it for my needs. If you need it, then it
can be moved to skbuff.c



-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-10 Thread Roland Dreier
Michael Maybe I can patch linux to allow SG without checksum?
Michael Dave, maybe you could drop a hint or two on whether this
Michael is worthwhile and what are the issues that need
Michael addressing to make this work?

What do you really gain by allowing SG without checksum?  Someone has
to do the checksum anyway, so I don't see that much difference between
doing it in the networking core before passing the data to/from the
driver, or down in the driver itself.

 - R.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-10 Thread Michael S. Tsirkin
Quoting r. Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: Dropping NETIF_F_SG since no checksum feature.
 
 Michael Maybe I can patch linux to allow SG without checksum?
 Michael Dave, maybe you could drop a hint or two on whether this
 Michael is worthwhile and what are the issues that need
 Michael addressing to make this work?
 
 What do you really gain by allowing SG without checksum?  Someone has
 to do the checksum anyway, so I don't see that much difference between
 doing it in the networking core before passing the data to/from the
 driver, or down in the driver itself.

My guess was, an extra pass over data is likely to be expensive - dirtying the
cache if nothing else. But I do plan to measure that, and see.

-- 
MST
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-10 Thread David Miller
From: Michael S. Tsirkin [EMAIL PROTECTED]
Date: Wed, 11 Oct 2006 02:13:38 +0200

 Maybe I can patch linux to allow SG without checksum?
 Dave, maybe you could drop a hint or two on whether this is worthwhile
 and what are the issues that need addressing to make this work?
 
 I imagine it's not just the matter of changing net/core/dev.c :).

You can't, it's a quality of implementation issue.  We sendfile()
pages directly out of the filesystem page cache without any
blocking of modifications to the page contents, and the only way
that works is if the card computes the checksum for us.

If we sendfile() a page directly, we must compute a correct checksum
no matter what the contents.  We can't do this on the cpu before the
data hits the device because another thread of execution can go in and
modify the page contents which would invalidate the checksum and thus
invalidating the packet.  We cannot allow this.

Blocking modifications is too expensive, so that's not an option
either.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-10 Thread Roland Dreier
Michael My guess was, an extra pass over data is likely to be
Michael expensive - dirtying the cache if nothing else. But I do
Michael plan to measure that, and see.

I don't get it -- where's the extra pass?  If you can't compute the
checksum on the NIC then you have to compute sometime it on the CPU
before passing the data to the NIC.

 - R.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-10 Thread David Miller
From: Roland Dreier [EMAIL PROTECTED]
Date: Tue, 10 Oct 2006 20:33:46 -0700

 Michael My guess was, an extra pass over data is likely to be
 Michael expensive - dirtying the cache if nothing else. But I do
 Michael plan to measure that, and see.
 
 I don't get it -- where's the extra pass?  If you can't compute the
 checksum on the NIC then you have to compute sometime it on the CPU
 before passing the data to the NIC.

Also, if you don't do checksumming on the card we MUST copy
the data (be it from a user buffer, or from a filesystem page
cache page) into a private buffer since if the data changes
the checksum would become invalid, as I mentioned in another
email earlier.

Therefore, since we have to copy anyways, it always is better
to checksum in parallel with the copy.

So the whole idea of SG without hw-checksum support is without
much merit at all.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-10 Thread Roland Dreier
David Also, if you don't do checksumming on the card we MUST copy
David the data (be it from a user buffer, or from a filesystem
David page cache page) into a private buffer since if the data
David changes the checksum would become invalid, as I mentioned
David in another email earlier.

Yes, I get that now -- I replied to Michael's email before I read yours.

David Therefore, since we have to copy anyways, it always is
David better to checksum in parallel with the copy.

Yes.

David So the whole idea of SG without hw-checksum support is
David without much merit at all.

Well, on IB it is possible to implement a netdevice (IPoIB connected
mode, I assume that's what Michael is working on) with a large MTU
(64KB is a number thrown around, but really there's not any limit) but
no HW checksum capability.  Doing that in a practical way means we
need to allow non-linear skbs to be passed in.

On the other hand I'm not sure how useful such a netdevice would be --
will non-sendfile() paths generate big packets even if the MTU is 64KB?

Maybe GSO gives us all the real advantages of this anyway?

 - R.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-10 Thread David Miller
From: Roland Dreier [EMAIL PROTECTED]
Date: Tue, 10 Oct 2006 20:42:20 -0700

 On the other hand I'm not sure how useful such a netdevice would be --
 will non-sendfile() paths generate big packets even if the MTU is 64KB?

non-sendfile() paths will generate big packets just fine, as long
as the application is providing that much data.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-10 Thread Roland Dreier
David non-sendfile() paths will generate big packets just fine,
David as long as the application is providing that much data.

OK, cool.  Will the big packets be non-linear skbs?

Because then it would make sense for a device with a huge MTU to want
to accept them without linearizing them, even if it had to copy them
to checksum the data.  Otherwise with fragmented memory it would be
impossible to handle such big packets at all.

 - R.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-10 Thread David Miller
From: Roland Dreier [EMAIL PROTECTED]
Date: Tue, 10 Oct 2006 20:49:09 -0700

 David non-sendfile() paths will generate big packets just fine,
 David as long as the application is providing that much data.
 
 OK, cool.  Will the big packets be non-linear skbs?

If you had SG enabled (and thus checksumming offload too) then yes
you'll get a non-linear SKB.  Otherwise, without SG, you'll get a
fully linear SKB.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Dropping NETIF_F_SG since no checksum feature.

2006-10-09 Thread Michael S. Tsirkin
Hi!
I'm trying to build a network device driver supporting a very large MTU (around 
64K)
on top of an infiniband connection, and I've hit a couple of issues I'd
appreciate some feedback on:

1. On the send side,
   I've set NETIF_F_SG, but hardware does not support checksum offloading,
   and I see dropping NETIF_F_SG since no checksum feature warning,
   and I seem to be getting large packets all in one chunk.
   The reason I've set NETIF_F_SG, is because I'm concerned that under real life
   stress Linux won't be able to allocate 64K of continuous memory.

   Is this concern of mine valid? I saw in-tree drivers allocating at least 8K.
   What's the best way to enable S/G on send side?
   Is checksum offloading really required for S/G?

2. On the receive side, what's the best/right way to create an skb that
   is larger than PAGE_SIZE?
   Do I allocate with alloc_page and fill in nr_frags with skb_fill_page_desc?
   Some drivers seem to fill in frag_list - which is better?
   I see than even skb_put only works properly on linear skb.
   What are the helpers legal for fragmented skb?

Suggestions would be appreciated.

Thanks,

-- 
MST
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-09 Thread Stephen Hemminger
On Mon, 9 Oct 2006 19:47:05 +0200
Michael S. Tsirkin [EMAIL PROTECTED] wrote:

 Hi!
 I'm trying to build a network device driver supporting a very large MTU 
 (around 64K)
 on top of an infiniband connection, and I've hit a couple of issues I'd
 appreciate some feedback on:
 
 1. On the send side,
I've set NETIF_F_SG, but hardware does not support checksum offloading,
and I see dropping NETIF_F_SG since no checksum feature warning,
and I seem to be getting large packets all in one chunk.
The reason I've set NETIF_F_SG, is because I'm concerned that under real 
 life
stress Linux won't be able to allocate 64K of continuous memory.
 
Is this concern of mine valid? I saw in-tree drivers allocating at least 
 8K.
What's the best way to enable S/G on send side?
Is checksum offloading really required for S/G?

Yes, in the current implementation, Linux needs checksum offload. But there
is no reason, your driver can't compute the checksum in software.

 2. On the receive side, what's the best/right way to create an skb that
is larger than PAGE_SIZE?
Do I allocate with alloc_page and fill in nr_frags with skb_fill_page_desc?
Some drivers seem to fill in frag_list - which is better?
I see than even skb_put only works properly on linear skb.


Allocating large buffers is problematic on busy systems.
See lastest e1000 or sky2 that use frag_list.

What are the helpers legal for fragmented skb?
Read the source. Setting up fragmented buffers has less helper
functions, but isn't that hard.

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html