Re: Zero Copy IO
This answer is longish and I send this only to linux-scsi to save linux-kernel bandwidth. On Sun, 8 Apr 2001, Alex Q Chen wrote: > I am trying to find a way to pin down user space memory from kernel, so > that these user space buffer can be used for direct IO transfer or > otherwise known as "zero copying IO". Searching through the Internet and > reading comments on various news groups, it would appear that most > developers including Linus himself doesn't believe in the benefit of "zero > copying IO". Most of the discussion however was based on network card > drivers. For certain other drivers such as SCSI Tape driver, which need to > handle great deal of data transfer, it would seemed still be more > advantageous to enable zero copy IO than copy_from_user() and copy_to_user > () all the data. Other OS such as AIX and OS2 have kernel functions that Whether zero-copy for tapes is more efficient than the current method using a kernel buffer depends on several factors and there is no unique answer. If we want to maximize the usage of a CPU in a multitasking system and don't care about the speed of the task writing to tape, zero-copy may be better. This is assuming the transfers are long enough compared to the increased overhead. (I would like to see someone to provide a quantitative answer what is the minimum transfer size where zero-copy is advantageous). On the other hand, if we want to maximize the throughput of the task writing to tape, this gets more complicated. The memory-memory transfers are much faster than the memory-tape buffer transfers. If we have some spare bus cycles, the extra transfer is not a decisive factor. The current tape driver uses by default so-called "asynchronous writes". This means that a SCSI write is started and the write() function does not wait for it to finish. The status is checked at the next write(). The user task can use the SCSI transaction time to collect more data for writing. This is not possible with zero-copy if we want to retain the Unix write semantics. (Well, we are not completely obeying the semantics with asynchronous writes because error reporting is delayed. However, this is the case anyway if we enable drive buffering. Without drive buffering we don't get decent throughput.) In fixed block mode the driver can buffer several write()s worth of data and this may increase throughput. The same applies to reading in fixed block mode. Reading in variable block mode is a case where no speedup is used. If we use kernel space buffers, we can increase the parallelism without changes to the user programs. How far this is useful is another question. When using zero-copy, a program can use parallelism but this requires changes to the program (multiple buffers and some kind of asynchrous i/o framework). Earlier it has been easy to postpone any discussion about using zero-copy in the drivers. As you have seen from answers from others, this is changing and fairly soon we will have the the necessary support for zero-copy in the kernel. I am not against modifying the tape driver to use zero-copy, but before doing it I would like to see/do a quantitative analysis of the advantages and disadvantages in the common and not so common cases. Kai - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED]
Re: Zero Copy IO
Douglas Gilbert wrote: > "Alex Q Chen" <[EMAIL PROTECTED]> wrote: > > > I am trying to find a way to pin down user space > > memory from kernel, so that these user space buffer > > can be used for direct IO transfer or otherwise > > known as "zero copying IO". Searching through the > > Internet and reading comments on various news groups, > > it would appear that most developers including Linus > > himself doesn't believe in the benefit of "zero > > copying IO". Most of the discussion however was based > > on network card drivers. For certain other drivers > > such as SCSI Tape driver, which need to handle great > > deal of data transfer, it would seemed still be more > > advantageous to enable zero copy IO than copy_from_user() > > and copy_to_user() all the data. Other OS such as AIX > > and OS2 have kernel functions that can be used to > > accomplish such a task. Has any ground work been done > > in Linux 2.4 to enable "zero copying IO"? > > Alex, > The kiobufs mechanism in the 2.4 series is the appropriate > tool for avoiding copy_from_user() and copy_to_user(). > The definitive driver is in drivers/char/raw.c which > does synchronous IO to block devices such as disks > (but is probably not appropriate for tapes). > > The SCSI generic (sg) driver supports direct IO. The driver > in lk 2.4.3 has the direct IO code commented out while > a version that I'm currently testing (sg 3.1.18 at > www.torque.net/sg) has its direct IO code activated. I have > a web page comparing throughput times and CPU utilizations > at http://www.torque.net/sg/rbuf_tbl.html . My testing > indicates that the kiobufs mechanism is now working > quite well. For various reasons I still think that it > is best to default to indirect IO and let speed hungry > users enable dio (which is done in sg via procfs). Even > when the user selects direct IO is should be possible to > fall back to indirect IO. [Sg does this when a SCSI > adapter can't support direct IO (e.g. an ISA adapter).] > > Since the SCSI tape (st) driver is structurally similar > to sg, it should be possible to add direct IO support > to st. > > One thing to note is that when you let the user provide > the buffer for direct IO (e.g. with malloc) then on > the i386 it won't be contiguous from a bus address POV. > This means large scatter gather lists (typically with > each element 4 KB on i386) which can be time consuming > to load on some SCSI adapters. One way around this would > be for a driver to provide "malloc/free" like ioctls. I'm not very knowledgeable, but doesn't the sound driver use mmap() to do this? Either way, AGP GART is basically a paged-MMU allowing non-contiguous phys-mem to be made to look contiguous from the AGP *and* from PCI (on most chipsets?). perhaps this would be helpful. Large contiguous phys-mem seems to be difficult currently, but would be nice as it would allow use of larger MMU pages with many cpus. Someone mentioned reverse page table support would be required first... > > Doug Gilbert > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED]
Re: Zero Copy IO
"Alex Q Chen" <[EMAIL PROTECTED]> wrote: > I am trying to find a way to pin down user space > memory from kernel, so that these user space buffer > can be used for direct IO transfer or otherwise > known as "zero copying IO". Searching through the > Internet and reading comments on various news groups, > it would appear that most developers including Linus > himself doesn't believe in the benefit of "zero > copying IO". Most of the discussion however was based > on network card drivers. For certain other drivers > such as SCSI Tape driver, which need to handle great > deal of data transfer, it would seemed still be more > advantageous to enable zero copy IO than copy_from_user() > and copy_to_user() all the data. Other OS such as AIX > and OS2 have kernel functions that can be used to > accomplish such a task. Has any ground work been done > in Linux 2.4 to enable "zero copying IO"? Alex, The kiobufs mechanism in the 2.4 series is the appropriate tool for avoiding copy_from_user() and copy_to_user(). The definitive driver is in drivers/char/raw.c which does synchronous IO to block devices such as disks (but is probably not appropriate for tapes). The SCSI generic (sg) driver supports direct IO. The driver in lk 2.4.3 has the direct IO code commented out while a version that I'm currently testing (sg 3.1.18 at www.torque.net/sg) has its direct IO code activated. I have a web page comparing throughput times and CPU utilizations at http://www.torque.net/sg/rbuf_tbl.html . My testing indicates that the kiobufs mechanism is now working quite well. For various reasons I still think that it is best to default to indirect IO and let speed hungry users enable dio (which is done in sg via procfs). Even when the user selects direct IO is should be possible to fall back to indirect IO. [Sg does this when a SCSI adapter can't support direct IO (e.g. an ISA adapter).] Since the SCSI tape (st) driver is structurally similar to sg, it should be possible to add direct IO support to st. One thing to note is that when you let the user provide the buffer for direct IO (e.g. with malloc) then on the i386 it won't be contiguous from a bus address POV. This means large scatter gather lists (typically with each element 4 KB on i386) which can be time consuming to load on some SCSI adapters. One way around this would be for a driver to provide "malloc/free" like ioctls. Doug Gilbert - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED]