On Thursday 02 February 2006 04:19, Greg Banks wrote:
> On Thu, 2006-02-02 at 14:13, David S. Miller wrote:
> > From: Greg Banks <[EMAIL PROTECTED]>
> > Date: Thu, 02 Feb 2006 14:06:06 +1100
> >
> > > On Thu, 2006-02-02 at 13:46, David S. Miller wrote:
> > > > I know SAMBA is using sendfile() (when the client has the oplock
> > > > held, which basically is "always"), is NFS doing so as well?
> > >
> > > NFS is an in-kernel server, and uses sock->ops->sendpage directly.
> >
> > Great.
> >
> > Then where's all the TX overhead for NFS?  All the small transactions
> > and the sunrpc header munging?
>
> Multiple trips down through TCP, qdisc, and the driver for each
> NFS packet sent:

Normally TSO was supposed to fix that.

I was playing with a design some time ago to let TCP batch
the lower level transactions even without that. The idea
was instead of calling down into IP and dev_queue_xmit et.al.
for each packet generated by TCP first generate a list of packets
in sendmsg/sendpage and then just hand down the list
through all layers into the driver.

It was inspired by Andrew Morton's 2.5 work in the VM layer
who used this trick very successfully with pages and BHs there.

But I didn't pursue it further when it turned out all interesting
hardware was using TSO already, which does a similar thing.

There was also some trickiness when to do the flush exactly.

> one for the header and one for each page.  Lots 
> of locks need to be taken and dropped, all this while multiple nfds
> on multiple CPUs are all trying to reply to NFS RPCs at the same
> time.  And in the particular case of the SN2 architecture, time
> spent flushing PCI writes in the driver (less of an issue now that
> host send rings are the default in tg3).

Hmm, maybe it would be still worth for your case with multiple
connections going on at the same time. But accumulating
the packet list somewhere between different connections
would be a natural congestion point and potential scalability 
issue.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to