Re: [linux-audio-dev] Performance and SCSI

2000-04-22 Thread Paul Barton-Davis

I was trying to understand why the performance you get is so much better than
what Benno and I get, and I'm now fairly convinced it's the hardware.

  [ ... ]

yes, but a WARNING: anyone thinking of moving to 2.3.99pre5 or above:
there appears to be a serious bug that will kill ardour's performance
(as well as Benno's test programs). I've posted a followup on
linux-kernel to the person who first noticed it. Basically, writing a
few hundred MB causes the box to slow to a crawl as memory utilization
goes through the roof. This was not the case in 2.3.51. Right now, I
cannot run Benno's hdtest program (it will never complete), and ardour
cannot record more than a minute or so of a handful of tracks without
a dropout. Again, this appears to be a new bug introduced in 2.3.99
somewhere.

--p



Re: [linux-audio-dev] more preallocation vs no prealloc / async vs sync tests.

2000-04-21 Thread Paul Barton-Davis

./hdtest 500 async trunc
SINGLE THREADED:  5.856 MByte/sec 
MULTI-THREADED:  6.096 MByte/sec

./hdtest 500 async notrunc (rewrite to preallocated files)
SINGLE THREADED: 4.040 MByte/sec
MULTI-THREADED: 4.766 MByte/sec

./hdtest 150 sync trunc 

SINGLE THREADED: 1.442 MByte/sec
MULTI-THREADED: 0.121 MByte/sec   (floppy-like performance :-)  )

./hdtest 150 sync notrunc
SINGLE THREADED:  4.788 MByte/sec
MULTI-THREADED: 1.984 MByte/sec

PS: Paul run it on your 10k rpm SCSI disk so that we can do some comparison.

I hope you are ready for some *very* different numbers.

/tmp/hdtest 500 async trunc
SINGLE THREADED: 12.788 MByte/sec
MULTI-THREADED: 12.788 MByte/sec

/tmp/hdtest 500 async notrunc
SINGLE THREADED: 6.096 MByte/sec
MULTI-THREADED:  6.168 MByte/sec

/tmp/hdtest 150 sync trunc
SINGLE THREADED: 11.292 MByte/sec
MULTI-THREADED: 12.233 MByte/sec

/tmp/hdtest 150 sync trunc
SINGLE THREADED: 5.437 MByte/sec
MULTI-THREADED:  6.383 MByte/sec

A few notes.

In the source you sent, you are not doing 256kB writes, but 1MB
writes, since you defined MYSIZE as (262144*4). This is puzzling.
However, changing it to 256kB doesn't change the results in any
significant way, as far as I can tell.

It troubles me that the ongoing rate display is always significantly
higher than the eventual effective speed. I understand the reason for
the initially very high rate, but I typically see final rates from the
ongoing display that are very much higher than in your effective rate
display (e.g. 13MB/sec versus 5.5MB/sec, 20MB/sec versus 12MB/sec).
I don't have the time to stare at the source and figure out why this
is.

Its very interesting that writing to pre-allocated files is 50%
slower for me. This is even though your pre-allocation strategy causes
block-interleaving of the files. I suspect, but at this time cannot
prove, that this is due (in my case at least) to fs fragmentation. I
will try the benchmark on a clean 18GB disk the next time I'm over at
the studio.

Stephen Tweedie or someone else would know the answer to my last
question: I am wondering if contiguous allocation of fs blocks to a
file reduces the amount of metadata updating ? Does metadata belong to
a fixed-sized unit, or an inode, or a variable-sized unit, or some
combination ? I ask this because I see some visual indication of the
disk stalls you have talked about when running your hdtest program (it
may just be paging issues, however - hard to tell), and I still have
not seen them in ardour. Assuming for a second that these are real
stalls, one obvious difference is that your preallocation strategy
does not produce contiguous files.

--p






Re: [linux-audio-dev] Re: More results and thoughts on disk writing performance

2000-04-20 Thread Paul Barton-Davis

 
 Incidently, on the systems I tested it appears that preallocation *slows dow
n*
 data writing. Paul, have you compared your system with and without using
 preallocation? What speed difference do you see?

EXACTLY !
I am experiencing the same !
After Paul praised the preallocation so much, I decided to test it, and I
get about 20% performance slowdown over the case when running without
preallocation.

.
.
.
I can't explain why we do experience the slowdown with preallocation.

No big suprise here. Suppose you write the files in 256kB chunks, and
re-read them in the same way. If ext2 behaves the way I would expect (*)
it to, you end up with somewhat-to-totally block-interleaved files
that are read with no-or-little seeking (because the read pattern will
exactly match the write pattern).

The problem with not preallocating occurs only on the first write, and to
be honest, my preallocation scheme should be changed to mirror the
actual actions of a true "first write" by block-interleaving the files
instead of aiming for complete per-file contiguity. The one difficulty
with this is that if you change the size of the i/o requests, you may
get *worse* performance. At one time, I imagined this size to be
rather fluid, but it now appears to be likely to assume a fairly
constant value across all disks on all systems (and certainly on any
particular system). This removes my only real objection to
block-interleaving the files. 

I will change the way the files are pre-allocated and see if it speeds
things up even more.

Again, just in case anyone missed it: I have never encountered the
problems Benno has had (or at least, not the same underlying causes -
I used to have disk i/o performance problems), probably due to my use
of SCSI h/w, and ardour is working multichannel hard-disk recorder.

No you are wrong here:
the audio thread requires higher priority because it nneds finer granularity,
(we want low-latency response from our HD recorder).
The audio thread releases the CPU during blocking write()s to the audio device
,
giving the disk thread all the time it needs to perform large disk I/Os which
blocks the disk thread almost all the time.
Therefore you gain NOTHING (=zero disk performance increase) by giving the dis
k
thread higher priority than to the audio thread, except that the audio will
drop-out sometime. 

Fully agreed.

--p

(*) ext2 filesystems have a pre-allocate distance which someone
mentioned. I am hoping that allocating 256kB at a time makes this
figure irrelevant, but I am not sure at all that this is true.



Re: [Fwd: [linux-audio-dev] info point on linux hdr]

2000-04-18 Thread Paul Barton-Davis

 2) Why am I not having any of these problems ? Unlike Benno's code, I
have a working application that runs just fine. I get smooth
throughput from the disk subsystem too. 

What do you mean exactly with "unlike Benno's code" ?
My code just tries to simulate the operation of a busy harddisk recorder
using the sorting algorithm to support variable speed.
I read in 256kb chunks too therefore I don't see a big difference between
my code and yours form a disk IO subsystem POV.

In your own words, your code "simulates the operations of a busy
harddisk recorder". Mine *is* a busy harddisk recorder. There's lot of
stuff in my code that isn't in yours because I have a bunch of real
world stuff like a tape transport mechanism, extra intra-thread
communication, MTC delivery, an audio thread event/play list, and
more. Yet, despite all this, I have never run into the problems you
describe. As Stephen has noted, this may very well be because of my
use of SCSI h/w.

Therefore it would be useful if you could run my benchmark on your
disk to see if your (or my approach) gets better performance out of the disk,
and with how much buffer utilization.

When I get a minute or 30, I will.

In both cases, without O_SYNC, or anything else but preallocation
and careful design, I seem to be able to get smooth disk throughput
at significantly above the rate I need (9MB/sec; I get up to
17MB/sec from the UltraStar)

17MB/sec using hdparm or linear large reads/writes ( large cat / cp etc) or
17MB/sec within your harddisk recording app where 
num_tracks * datarate_of_each_track = 17MB/sec  
(it if's the latter then I doubt it because seek kills some of the throughput,
that's almost unavoidable, at least on my EIDE UDMA disks)

No, I do mean the latter: 17MB/sec from within ardour. You can doubt
it all you want, but I get it regularly. Actually, the real numbers
look more like (from memory, each line is one iteration of disk i/o
across all tracks, so 24*256kB of data):

 15MB/sec
 450MB/sec
 10MB/sec
 14MB/sec
 567MB/sec
 16MB/sec
 19MB/sec
 8MB/sec
 378MB/sec

The super-high numbers, I assume, are because of the read-ahead being
done by the kernel, which helps us out every so often. Remember, these
files are as contiguous as I can make 'em with ext2. And keep in mind
that my disks have a maximum transfer rate of 35MB/sec (nothing to do
with U2W - just that they are just about the latest disks).

I have a very small, standalone single-threaded test app that gets
similar rates, even though it does random sized seeks across the whole
disk. I've posted that program before on LAD, and so I trust the
numbers.

--p



Re: [Fwd: [linux-audio-dev] info point on linux hdr]

2000-04-17 Thread Paul Barton-Davis

 From: Paul Barton-Davis [EMAIL PROTECTED]
 i mentioned in some remarks to benno how important i thought it was to
 preallocate the files used for hard disk recording under linux.

Preallocation will make little difference.  The real issue is that the
buffer cache is doing write-behind, ie. it is batching up the writes into
big chunks which get blasted to disk once every five seconds or so,
causing large IO request queues to accumulate when that happens.

That's great for normal use because it means that trickles of write
activity don't tie up the spindles the whole time, but it's not ideal
for audio recording.

Acknowledging your much greater wisdom in this are than me, I don't
understand the above given that, in my experience:

1) pre-allocation takes a *long* time. Allocating 24 203MB files on a
   clean ext2 partition of 18GB takes many, many minutes, for example.
   Presumably, the same overhead is being incurred when block
   allocation happens "on the fly". If so, then even if pre-allocation
   doesn't solve the buffer cache write batching problem, it certainly
   gets rid of what appears to be an onerous task.

2) Why am I not having any of these problems ? Unlike Benno's code, I
   have a working application that runs just fine. I get smooth
   throughput from the disk subsystem too. My configuration (there are two):

   Kernel 2.3.52   Kernel 2.3.52
   Dual PII-450Dual PII-450
   on-board Adaptec 7890   on-board Adaptec 7890
   Seagate 4.5GB Cheetah U2W 10K rpm   IBM 9GB UltraStar U2W 10K rpm
   Quantum 4.5GB Viking U2W 7.5K rpm   3 x IBM 18GB UltraStar
   
   In both cases, without O_SYNC, or anything else but preallocation
   and careful design, I seem to be able to get smooth disk throughput
   at significantly above the rate I need (9MB/sec; I get up to
   17MB/sec from the UltraStar)

   In every case, I am doing disk I/O from a dedicated thread to a
   single disk.

I'm confused. Is it just that I'm running on a genuine SMP system ?

--p



Re: [Fwd: [linux-audio-dev] info point on linux hdr]

2000-04-17 Thread Paul Barton-Davis

 2) Why am I not having any of these problems ? Unlike Benno's code, I
Seagate 4.5GB Cheetah U2W 10K rpmIBM 9GB UltraStar U2W 10K rpm
Quantum 4.5GB Viking U2W 7.5K rpm3 x IBM 18GB UltraStar

Ahh --- SCSI.  The request queuing for SCSI is very different to 
that for non-SCSI devices.  

Different enough that you feel its likely to explain the significant
difference between my experience and that of both Benno and Juhana
when trying to record to disk ? Is it different enough to explain why
the buffer cache write-behind batching doesn't seem to show up as a
problem for me ?

Stephen - thanks for paying attention and giving us time on this. I
know you have a lot to work on, and that HDR is not what most people
consider to be a hot Linux application; our little minority is pretty
fanatical :) The application I have ("ardour") stands in relation to
existing HDR systems in the same way that Linux or *BSD-based routers
stand to Cisco h/w, with the difference being that very few people own
the dedicated h/w yet, and so we have a real chance to provide a
genuine, new service for people by getting this to work.

--p



Re: [Fwd: [linux-audio-dev] info point on linux hdr]

2000-04-17 Thread Paul Barton-Davis

It depends.  Using more threads can lead to more conflicting IO seeks
unless you can schedule enough IOs at once to give the scheduler a good
chance to sort the IOs into decent-sized blocks.

The objective should probably be to make sure that you have a few hundred
KB of outstanding IO requests on each stream at any one time.  That can
be done either with lots of threads submitting small IOs, or a few
threads submitting large IOs.  Just adding a few more threads but 
still performing small IOs will definitely not help.

Right now, I'm using 1 thread with 256kB IO requests, since 256kB
seems to give me optimal throughput. Obviously, measuring it is the
way to go, but do you have any sense of whether it would be worth
submitting 24 of these at more or less the same time, or 12, or just 1 ?

--p



Re: [Fwd: [linux-audio-dev] info point on linux hdr]

2000-04-15 Thread Paul Barton-Davis

It would be interesting to compare filesystem latencies in the HDrecording
case. As said it's amazing how long the disk thread can get blocked during a
large buffer flush/metadata update, on a PII I saw watched the disk thread
blocking for several seconds (up 8 secs in the worst case).
That means at a datarate of 200kb/sec per audio tracks sometimes you would
need up to 2MB ringbuffer per track. Multiply this value with 50 tracks , then
you get 100MB of precious RAM wasted for doing only buffering.
I am convinced that we can do the job with 0.5MB per track when using RAW IO.
(Windoze hdrecording apps seems to work well with these amounts of buffering).

Well, first of all, as I've mentioned already on LAD, by preallocating
the files, this problem appears to vanish. Second, its unclear whether
the Windows/MacOS apps do preallocation or not (though it is clear
that they are very fragmentation sensitive). Thirdly, the amount of
user-space buffering thats needed is not just a function of jitter in
the apparent disk throughput rate.

Benno and I have been through this before with respect to audio h/w
buffer usage, and we concluded that its very advantageous to use 3
fragments there, precisely to protect against jitter. I think that the
same applies here - you want the user space buffer divided into at
least 3 "fragments". When fragment N is in use by the audio thread,
the "previous" fragment is being handled by the butler thread (for
read-ahead and flush to disk). In theory, 2 would be enough, but if
for any reason, the butler is ever slowed down for one iteration, so
that it fails to finish processing its buffer before the audio thread
needs it again, having 3 provides a way to avoid problems.

If thats correct, then we next have to look at the appropriate size of
those fragments. Since they are intended to correspond to single disk
i/o requests, they need to be sized so that we get maximal disk
throughput. My experiments have suggested that a 256kB disk i/o
request seems to be the smallest size that gives the maximal
throughput. If so, then the lower bound on the amount of buffering is
not really "optional", but is 3 x 256kB.

You can substitute other numbers in here for the number of fragments
and the size of the disk i/o requests, but the principle will remain:
the amount of buffering is not really a function of time (seconds),
but of the way you subdivide the buffer for disk i/o and the optimal
disk i/o request size.

--p



Re: [Fwd: [linux-audio-dev] info point on linux hdr]

2000-04-14 Thread Paul Barton-Davis

Unfortunately efficient preallocation is rather hard with the current
ext2. To do it efficiently you just want to allocate the blocks in the
bitmaps without writing into the actual allocated blocks (otherwise
it would be as slow as the manual write-every-block-from-userspace trick)

yes, its slow, but its not hard to design an application so that its
rare that it needs to be done.

that said, when i was creating a 24 track, 40 minute "tape" for ardour
the other day, i managed to reheat lunch, eat it, and play with my
friend's son for a while during the creation process :)

--p