Re: [linux-audio-dev] Re: File writes with O_SYNC slow

2000-04-20 Thread Steve Lord


> 
> I tried O_SYNC and even O_DSYNC on the SGI (Origin 2k),
> (D_SYNC syncs only data blocks but not metadata blocks)
> both only delivering 3.5MBytes/sec. (plain buffered writes were about
> 15-16MB/sec).


I should comment on this one since it would be on XFS

The difference between O_SYNC and O_DSYNC on XFS is only there if the
file size does not change during the write. O_DSYNC which extends the file
still does a synchronous transaction (a bad thing) if it extents the file.
XFS tries hard to do the right things in these cases, we also have customers
who use XFS inside of things like MRI scanners, they insist that if they
write data to the disk, it better stay there, even if the power goes
out.

I realize with audio recording it is difficult to know the size of the
file in advance. But you would get better results if you bumped up
the filesize periodically rather than on every write. Preallocation
would also help.

Steve


> 
> Benno.
> 





Re: O_DIRECT architecture (was Re: info point on linux hdr)

2000-04-18 Thread Steve Lord

> Hi,
> 
> On Tue, Apr 18, 2000 at 07:56:04AM -0500, Steve Lord wrote:
> 
> > I said basic implementation because it is currently paying no attention
> > to cached data. The Irix approach to this was to flush or toss cached
> > data which overlapped a direct I/O, I am leaning towards keeping them
> > as part of the I/O.
> 
> The big advantage of the scheme where I map the kiobuf pages into the
> real page cache before the I/O, and unmap after, is that cache
> coherency at the beginning of the I/O and all the way through it is
> guaranteed.  The cost is that the direct I/O may end up doing copies
> if there is other I/O going on at the same time to the same page, but
> I don't see that as a problem!

I was thinking along these lines.

So I guess the question here is how do you plan on keeping track of the
origin of the pages? Which ones were originally part of the kernel cache
and thus need copying up to user space? It does not seem hard, just wondering
what you had in mind. Also, I presume, if the page was already present
and up to date then on a read you would not refill it from disk - since it
may be more recent that the on disk data, existing buffer heads would
give you this information. 

> 

> 
> Ultimately we are going to have to review the whole device driver 
> interface.  We need that both to do things like >2TB block devices, and
> also to achieve better efficiency than we can attain right now with a
> separate buffer_head for every single block in the I/O.  It's just using
> too much CPU; being able to pass kiobufs directly to ll_rw_block along
> with a block address list would be much more efficient.

Agreed, XFS was getting killed by this (and the fixed block size requirement
of the interface) we have 512 byte I/O requests we need to do for some
meta-data, having to impose this on all I/O and create 8 buffer heads for
each 4K page was just nasty.

> 
> > So if O_ALIAS allows user pages to be put in the cache (provided you use
> > O_UNCACHE with it), you can do this.
> 
> Yes.
> 
> > However, O_DIRECT would be a bit more
> > than this - since if there already was cached data for part of the I/O
> > you still need to copy those pages up into the user pages which did not
> > get into cache. 
> 
> That's the intention --- O_ALIAS _allows_ the user page to be mapped 
> into the cache, but if existing cached data or alignment constraints
> prevent that, it will fall back to doing a copy.
> 
> One consequence is that O_DIRECT I/O from a file which is already cached
> will always result in copies, but I don't mind that too much.

So maybe an O_CLEANCACHE (or something similar) could be used to indicate
that anything which is found cached should be moved out of the way (flushed
to disk or tossed depending on what is happening). Some other sort of API
such as an fsync variant or that fadvise call which was mentioned recently
could be used to clean cache for a file. This would let those apps which really
want direct disk <-> user memory I/O get what they wanted.

> 
> The pagebuf stuff sounds like it is fairly specialised for now.  As
> long as all of the components that we are talking about can pass kiobufs
> between themselves, we should be able to make them interoperate pretty
> easily.
> 
> Is the pagebuf code intended to be core VFS functionality or do you
> see it being an XFS library component for the forseeable future?

We had talked about trying to use it on some other filesystem to see what
happened, but we don't really have the bandwidth to do that. We don't see
it as being just there for XFS - although, for existing Linux filesystems,
there may not be benefits to switching over to it.

> 
> --Stephen


Steve





Re: O_DIRECT architecture (was Re: info point on linux hdr)

2000-04-18 Thread Steve Lord

> Hi,
> 
> On Mon, Apr 17, 2000 at 05:58:48PM -0500, Steve Lord wrote:
> > 
> > O_DIRECT on Linux XFS is still a work in progress, we only have
> > direct reads so far. A very basic implementation was made available
> > this weekend.
> 
> Care to elaborate on how you are doing O_DIRECT?


XFS is using the pagebuf code we wrote (or I should say are writing - it
needs a lot of work yet). This uses kiobufs to represent data in a set of
pages. So, we have the infrastructure to take a kiobuf and read or write
it from disk (OK, it uses buffer heads under the covers). I glued this
together with the map_user_kiobuf() and unmap_kiobuf() calls from your raw
I/O driver and that was about it.

We only build these kiobufs for data which is sequential on disk, not for
the whole user request, the sequence we do things in is a bit different,
basically:

while data left to copy

obtain bmap from filesystem representing location of next
chunk of data (sequential on disk)

for buffered I/O

go find pages covering this range - create if they
do not exist.

issue blocking read for pages which are not uptodate

copy out to user space

for direct I/O 

map user pages into a kiobuf

issue blocking read for pages

unmap pages

I said basic implementation because it is currently paying no attention
to cached data. The Irix approach to this was to flush or toss cached
data which overlapped a direct I/O, I am leaning towards keeping them
as part of the I/O.

Other future possibilities I see are:

  o using caching to remove the alignment restrictions on direct I/O by
doing unaligned head and tail processing via buffered I/O.

  o Automatically switching to direct I/O under conditions where there
the I/O would flush to much cache.



> 
> It's something I've been thinking about in the general case.  Basically
> what I want to do is this:
> 
> Augment the inode operations with a new operation, "rw_kiovec" which
> performs reads and writes on vectors of kiobufs.  

You should probably take a look at what we have been doing to the ops,
although our extensions are really biased towards extent based filesystems,
rather than using getblock to identify individual blocks of file data we
added a bmap interface to return a larger range - this requires different
locking semantics than getblock, since the mapping we return covers multiple
pages. I suspect that any approach which assembles multiple pages in advance
is going to have similar issues.


> 
> Provide a generic_rw_kiovec() function which uses the existing page-
> oriented IO vectors to set up page mappings much as generic_file_{read,
> write} do, but honouring the following flags in the file descriptor:
> 
>  * O_ALIAS
>
>Allows the write function to install the page in the kiobuf 
>into the page cache if the data is correctly aligned and there is
>not already a page in the page cache.
> 
>For read, the meaning is different: it allows existing pages in 
>the page cache to be installed into the kiobuf.
> 
>  * O_UNCACHE
> 
>If the IO created a new page in the page cache, then attempt to
>unlink the page after the IO completes.
> 
>  * O_SYNC
> 
>Usual meaning: wait for synchronous write IO completion.
> 
> O_DIRECT becomes no more than a combination of these options.

So if O_ALIAS allows user pages to be put in the cache (provided you use
O_UNCACHE with it), you can do this. However, O_DIRECT would be a bit more
than this - since if there already was cached data for part of the I/O
you still need to copy those pages up into the user pages which did not
get into cache. 


> 
> Furthermore, by implementing this mechanism with kiobufs, we can go
> one step further and perform things like Larry's splice operations by
> performing reads and writes in kiobufs.  Using O_ALIAS kiobuf reads and
> writes gives us copies between regular files entirely in kernel space
> with the minimum possible memory copies.  sendfile() between regular
> files can be optimised to use this mechanism.  The data never has to
> hit user space.
> 
> As an example of the flexibility of the interface, you can perform
> an O_ALIAS, O_UNCACHE sendfile to copy one file to another, with full
> readahead still being performed on the input file but with no memory 
> copies at all.  You can also choose not to have O_UNCACHE and O_SYNC
> on the writes, in which case you have both readahead and writebehind
> with zero copy.
> 
> This is all fairly easy to implement (at least for ext2), and gives
> us much more than just O_DIRECT for no extra w

Re: [Fwd: [linux-audio-dev] info point on linux hdr]

2000-04-17 Thread Steve Lord


O_DIRECT on Linux XFS is still a work in progress, we only have
direct reads so far. A very basic implementation was made available
this weekend.

We also have a preallocation interface available via an ioctl call, it should be
fast as XFS is an extent based filesystem, but before direct write is
implemented it is not too useful for this application - all writes will
go through the buffer cache and be flushed to disk in a manner similar to
ext2 data.

For more details on what is available see http://oss.sgi.com/projects/xfs/

I should stress that XFS is still a work in progress on Linux, especially
the read/write path which is being rewritten from the Irix version.

Steve Lord

> Hi,
> 
> On Sat, Apr 15, 2000 at 06:50:48PM +0200, Benno Senoner wrote:
> > 
> > Anyway does anyone know if implementing O_DIRECT would be a big amount
> > of work in kernel 2.3.x ?
> 
> I'll be doing it, and it should be fairly straightforward.  There are
> one or two infrastructure changes required, however, so it won't make 
> it into 2.4 I expect.
> 
> > Is the O_DIRECT handling a filesystem issue or a block device driver issue.
> 
> No, it's a page cache issue.  Most filesystems will be able to use
> common page cache code for O_DIRECT.
> 
> > Do you know if XFS for Linux will support this (if the O_DIRECT is a filesy
stem
> > issue, otherwise ignore this question) ?
> 
> Count on it, as XFS is highly oriented towards high-performance
> I/O.
> 
> --Stephen
>