Re: mmap of a network buffer

1999-05-25 Thread Mike Smith
   I really do not know how to describe the problem. But a friend here asks
   me how to mmap a network buffer so that there is no need to copy the data
   from user space to kernel space. We are not sure whether FreeBSD can
   create a device file (mknod) for a network card, and if so, we can use the
   mmap() call to do so because mmap() requires a file descriptor.  We assume
   that the file descriptor can be acquired by opening the network device.
   If this is infeasible, is there another way to accomplish the same goal?
  
  Use sendfile() for zero-copy file transmission; in all other cases it's 
  necessary to copy data into the kernel.  Memory-mapping a network 
  buffer makes no sense if you just think about it for a moment...
  
  There's also very little need for this under real circumstances; some 
  simple tests have demonstrated we can sustain about 800Mbps throughput 
  (UDP), and the bottleneck here seems to be checksum calculations, not 
  copyin/out.
  
 
 Oddly enough, I was just getting ready to implement something like this. 
 Not because of copyin performance issues, but because async io for sockets
 could be done better if I didn't have to do a copyin.  copyin has to have
 curproc==(proc with the buffer from which to copy)

That's basically right.  You have three options:

 - Switch to process context to access process data; this allows you to 
   take page faults in controlled circumstances (eg. copyin).
 - Wire the process' pages into the kernel so you don't have to fault.
 - Copy the user data into kernel space in an efficient fashion.

 which means that I have
 to do a context switch for every socket buffer sized chunk (best case) or
 every io op (worst case).

It sounds like your buffering is not efficient.

 My hope was to map the user's buffer into kernel space so that I could do
 event driven io on the socket without having to context switch to an aiod
 for every io operation.  Is this really a bad idea?  I am a little
 concerned about running out of kernel address space, but I don't think
 that's an immediate problem.

If you map into the kernel, you still have to context switch unless you 
wire the data down.  Excessive wiring can be expensive.  Have a look at
how physio() does it's thing.

 Such an implementation would lend itself to doing zero-copy writes async
 writes with some frobbing of the send routines.  It would also bypass some
 of the messing around done to do socket buffers--that is, there would not
 be a limit per se on socket buffering for writes since they would be
 mapped user space.   One might want to put arbitrary limits in place to
 ensure that an unreasonable amount of memory isn't locked.
 
 Thoughts? 

Sounds a lot like sendfile.  See if you can't improve on it to do eg. 
sendmem().

-- 
\\  The mind's the standard   \\  Mike Smith
\\  of the man.   \\  msm...@freebsd.org
\\-- Joseph Merrick   \\  msm...@cdrom.com




To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: mmap of a network buffer

1999-05-25 Thread Christopher Sedore


On Mon, 24 May 1999, Mike Smith wrote:

   There's also very little need for this under real circumstances; some 
   simple tests have demonstrated we can sustain about 800Mbps throughput 
   (UDP), and the bottleneck here seems to be checksum calculations, not 
   copyin/out.
   
  
  Oddly enough, I was just getting ready to implement something like this. 
  Not because of copyin performance issues, but because async io for sockets
  could be done better if I didn't have to do a copyin.  copyin has to have
  curproc==(proc with the buffer from which to copy)
 
 That's basically right.  You have three options:
 
  - Switch to process context to access process data; this allows you to 
take page faults in controlled circumstances (eg. copyin).
  - Wire the process' pages into the kernel so you don't have to fault.
  - Copy the user data into kernel space in an efficient fashion.

Glad to know that my understanding wasn't too far off-base.

  which means that I have
  to do a context switch for every socket buffer sized chunk (best case) or
  every io op (worst case).
 
 It sounds like your buffering is not efficient.

Well, I'd be happy if I could be convinced that were the problem, but
protocols like HTTP and NNTP which have short, rapid-fire (sometimes
lock-step) command sequences don't help the buffering any.  This means
that reading commands of an incoming connection causes many, many context
switches between an aiod and the main process doing async io.

On the outgoing side, in the optimal case of sending large blocks of data,
I don't have control over the buffering--the aiod essentially impersonates
my process, going to sleep in the socket write routines, context switching
to it when a copyin is necessary.  I could exert more control by metering
my writes so that they fit into socket buffers and avoid the switches, but
that increases the number of system calls, so I'm not sure how much of a
win it ends up.  Plus, I hope that I will get some added advantage out of
constructing a zero-copy write (not that I've had any throughput
troubles).

  My hope was to map the user's buffer into kernel space so that I could do
  event driven io on the socket without having to context switch to an aiod
  for every io operation.  Is this really a bad idea?  I am a little
  concerned about running out of kernel address space, but I don't think
  that's an immediate problem.
 
 If you map into the kernel, you still have to context switch unless you 
 wire the data down.  Excessive wiring can be expensive.  Have a look at
 how physio() does it's thing.

Will do.  There's some of that code in the async io routines now, for
dealing with raw io operations--I hoped to borrow from that to implement
my stuff.

  Such an implementation would lend itself to doing zero-copy writes async
  writes with some frobbing of the send routines.  It would also bypass some
  of the messing around done to do socket buffers--that is, there would not
  be a limit per se on socket buffering for writes since they would be
  mapped user space.   One might want to put arbitrary limits in place to
  ensure that an unreasonable amount of memory isn't locked.
  
  Thoughts? 
 
 Sounds a lot like sendfile.  See if you can't improve on it to do eg. 
 sendmem().

Yes.  I'd like mine to be async rather than synchronous, though.  I've
considered creating an async sendfile too.  (Actually, I've been thinking
about extending the async io code to allow calling any syscall async, but
there are other complexities there...).  

-Chris



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: mmap of a network buffer

1999-05-21 Thread Mike Smith
 
 I really do not know how to describe the problem. But a friend here asks
 me how to mmap a network buffer so that there is no need to copy the data
 from user space to kernel space. We are not sure whether FreeBSD can
 create a device file (mknod) for a network card, and if so, we can use the
 mmap() call to do so because mmap() requires a file descriptor.  We assume
 that the file descriptor can be acquired by opening the network device.
 If this is infeasible, is there another way to accomplish the same goal?

Use sendfile() for zero-copy file transmission; in all other cases it's 
necessary to copy data into the kernel.  Memory-mapping a network 
buffer makes no sense if you just think about it for a moment...

There's also very little need for this under real circumstances; some 
simple tests have demonstrated we can sustain about 800Mbps throughput 
(UDP), and the bottleneck here seems to be checksum calculations, not 
copyin/out.

-- 
\\  The mind's the standard   \\  Mike Smith
\\  of the man.   \\  msm...@freebsd.org
\\-- Joseph Merrick   \\  msm...@cdrom.com




To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: mmap of a network buffer

1999-05-21 Thread Christopher Sedore


On Fri, 21 May 1999, Mike Smith wrote:

  
  I really do not know how to describe the problem. But a friend here asks
  me how to mmap a network buffer so that there is no need to copy the data
  from user space to kernel space. We are not sure whether FreeBSD can
  create a device file (mknod) for a network card, and if so, we can use the
  mmap() call to do so because mmap() requires a file descriptor.  We assume
  that the file descriptor can be acquired by opening the network device.
  If this is infeasible, is there another way to accomplish the same goal?
 
 Use sendfile() for zero-copy file transmission; in all other cases it's 
 necessary to copy data into the kernel.  Memory-mapping a network 
 buffer makes no sense if you just think about it for a moment...
 
 There's also very little need for this under real circumstances; some 
 simple tests have demonstrated we can sustain about 800Mbps throughput 
 (UDP), and the bottleneck here seems to be checksum calculations, not 
 copyin/out.
 

Oddly enough, I was just getting ready to implement something like this. 
Not because of copyin performance issues, but because async io for sockets
could be done better if I didn't have to do a copyin.  copyin has to have
curproc==(proc with the buffer from which to copy) which means that I have
to do a context switch for every socket buffer sized chunk (best case) or
every io op (worst case). In my testing, I've been doing 3000 context
switches/second just in verification of a simple configuration--real load
will be 20-100x what current load is, and I'm not sure that 60,000-300,000
context switches/sec is desirable. 

My hope was to map the user's buffer into kernel space so that I could do
event driven io on the socket without having to context switch to an aiod
for every io operation.  Is this really a bad idea?  I am a little
concerned about running out of kernel address space, but I don't think
that's an immediate problem.

Such an implementation would lend itself to doing zero-copy writes async
writes with some frobbing of the send routines.  It would also bypass some
of the messing around done to do socket buffers--that is, there would not
be a limit per se on socket buffering for writes since they would be
mapped user space.   One might want to put arbitrary limits in place to
ensure that an unreasonable amount of memory isn't locked.

Thoughts? 

-Chris




To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



RE: mmap of a network buffer

1999-05-21 Thread Constantine Shkolnyy
 My hope was to map the user's buffer into kernel space so that I could do
 event driven io on the socket without having to context switch to an aiod
 for every io operation.  Is this really a bad idea?  I am a little
 concerned about running out of kernel address space, but I don't think
 that's an immediate problem.
 
 Such an implementation would lend itself to doing zero-copy writes async
 writes with some frobbing of the send routines.  It would also bypass
 some
 of the messing around done to do socket buffers--that is, there would not
 be a limit per se on socket buffering for writes since they would be
 mapped user space.   One might want to put arbitrary limits in place to
 ensure that an unreasonable amount of memory isn't locked.
 
 Thoughts? 

In my view, the problem can be described like this.

Some applications need to process data from their VA space, on some
devices. If the data is going to/from a file, it looks perfectly
well to copy it into kernel buffers, since the kernel does caching
and improves disk I/O performance. However, there are cases when the
kernel can't be concerned with the data. For example, I have an
encryption/compression processor on PCI board. For each operation,
this processor needs two separated data buffers and performs the
busmaster DMA. The user program is supposed to prepare the buffers
and communicate their location to the kernel mode driver via IOCTL.

What is more efficient - copy the data to/from the locked kernel
buffers or lock the user buffers in place and do processing?
(In my case, I don't even need to _remap_ the buffers, I only need
physical addresses).

I'd prefer the later, but I don't have sufficient FreeBSD knowledge
to insist that I think right. There may be some principles of this
O/S that I don't currently see that I violate by doing this.

It would be nice if somebody could give an analysis of the problem.

Stan



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: RE: mmap of a network buffer

1999-05-21 Thread Matthew Dillon
:In my view, the problem can be described like this.
:
:Some applications need to process data from their VA space, on some
:devices. If the data is going to/from a file, it looks perfectly
:well to copy it into kernel buffers, since the kernel does caching
:and improves disk I/O performance. However, there are cases when the
:kernel can't be concerned with the data. For example, I have an
:encryption/compression processor on PCI board. For each operation,
:this processor needs two separated data buffers and performs the
:busmaster DMA. The user program is supposed to prepare the buffers
:and communicate their location to the kernel mode driver via IOCTL.
:
:What is more efficient - copy the data to/from the locked kernel
:buffers or lock the user buffers in place and do processing?
:(In my case, I don't even need to _remap_ the buffers, I only need
:physical addresses).
:
:I'd prefer the later, but I don't have sufficient FreeBSD knowledge
:to insist that I think right. There may be some principles of this
:O/S that I don't currently see that I violate by doing this.
:
:It would be nice if somebody could give an analysis of the problem.
:
:Stan

Well, all the system buffer paradigm does is wire the pages and
associate them with a struct buf.  You do not have to map the pages
into KVM.  It also usually write-protects pages in user space for
the duration of the I/O.  Even if the pages are mapped into KVM,
the overhead is virtually nil if you do not actually touch the associated
KVM.  I don't think you would notice the difference between using 
the existing buffer code and rolling something custom.

-Matt
Matthew Dillon 
dil...@backplane.com


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message