Re: [Haifux] Real-time write on *ANY* filesystem

2005-06-23 Thread Peter


On Wed, 22 Jun 2005, Eli Billauer wrote:


Insights, anybody?


Reliable high speed continuous writes on a multitasking system like 
linux are not possible. The next time there is network load or a daemon 
wakes up your task will be scheduled out. You can do it with RTLinux or 
VmWare or QNX. The closest you can get on linux is to use a SCHED_RR 
scheduled process with high priority as root and poll or select on the 
relevant file descriptors, all set to unbuffered mode (O_NDELAY), and 
lock the buffer and the program in memory. If you have to maintain the 
speed you must buy an AV disk which is specifically rated not to 
recalibrate from time to time. Look at the source code of cdrecord f.ex. 
for clues. It uses all the methods I have enumerated, plus mmap. It is 
possible to rise the priority of a single task so high that the system 
will be unusable for anything else using these methods (it appears 
frozen excepting the high priority task). You must find your tradeoff. 
Tivo etc. boxes do this (they don't do much besides recording and 
playing video so it's ok).


Peter#include stdio.h
#include sys/time.h
#include unistd.h

char junkdata[32768];

int main(int argc, char **argv) {
  FILE *f;

  struct timeval now, sleeptime;
  struct timezone junk;
  long prev_sec, prev_usec, deltat;
  int i;

  if (argc!=2) {
fprintf(stderr, Usage: %s output-filename\n, argv[0]);
return 1;
  }

  f = fopen (argv[1], wb);

  if (f==NULL) {
fprintf(stderr, Failed to open output file %s\n, argv[1]);
return 1;
  }

  if (gettimeofday(now, junk) != 0) {
fprintf(stderr, gettimeofday() failed!\n);
return 1;
  }

  prev_sec = now.tv_sec; prev_usec = now.tv_usec;

  for (i=0; i2047*32; i++) { // Almost 2 GB

// Time to sleep between writes. Check this program's output values
// to see what you actually got as typical values. They may be
// significantly longer than requested due to context switch overhead
// when the desired time is very short.
sleeptime.tv_sec = 0; 
sleeptime.tv_usec = 100; // Yeah, right. The real value will be much longer

// Next line should be commented out for full-speed
// select(0, NULL, NULL, NULL, sleeptime);

if (fwrite (junkdata, 1, sizeof(junkdata), f) != sizeof(junkdata)) {
  fprintf(stderr, Data write failed!\n);
  return 1;
}

if (gettimeofday(now, junk) != 0) {
  fprintf(stderr, gettimeofday() failed!\n);
  return 1;
}
  
deltat = now.tv_sec - prev_sec;
deltat = deltat * 100;
deltat +=  now.tv_usec - prev_usec;

prev_sec = now.tv_sec; prev_usec = now.tv_usec;

printf(%ld\n, deltat);

  }
  
  fclose(f);
  return 0;
}


Re: [Haifux] Real-time write on *ANY* filesystem

2005-06-22 Thread Muli Ben-Yehuda
On Wed, Jun 22, 2005 at 02:41:39AM +0200, Eli Billauer wrote:

 And finally: Does an RAM FIFO help? Surprisingly, the answer is no. I 
 did the following:
 
 mknod mypipe p
 mbuffer -i mypipe -o /fatfs/output-file 
 ./writefat mypipe  listfile

Until 2.6.mumble, pipes only used a single page in memory. Since
2.6.mumble we're using up to 16 pages and flipping between consumer
and producer, which should give much better pipe utilization for large
writes.

 Insights, anybody?

Yeah, how about you cut out the various middlemen from the code? at
least it's not in Java... 

- use write, not fwrite!!!
- use DIRECT_IO to bypass kernel caching
- use the appropriate IO elevator
- verify that your disk drivers are tuned for whatever you want to do
(is DMA on?)
- what else is the system doing? is it idle? busy? is anything else
interfering with the scheduling of your program?

Linux is a general purpose OS, which means it's good for a lot of
things and optimal for none. If you want it to be optimal for your
specific usage, you should spend some time optimizing and tuning it
for it. And that's true regardless of what your usage happens to be.

Cheers,
Muli
-- 
Muli Ben-Yehuda
http://www.mulix.org | http://mulix.livejournal.com/


--
Haifa Linux Club Mailing List (http://www.haifux.org)
To unsub send an empty message to [EMAIL PROTECTED]




Re: [Haifux] Real-time write on *ANY* filesystem

2005-06-22 Thread Oron Peled
First, thanks for an interesting test plan.
Just a quick note (I currently don't have time for retesting):
  On Wednesday 22 June 2005 03:41, Eli Billauer wrote:
   And finally: Does an RAM FIFO help? Surprisingly, the answer is no.

Since you use stdio (fwrite), which by default does full buffering
in user space (c.f: setvbuf(3)) this does not surprise me.
Repeating the test with open/write/close etc, would give more
significant results (altough I suspect they would only be worse :-(

Tzahi mentioned XFS. While I'm not sure XFS would help with small
write chunks (Reiserfs seems like better candidate for these),
I'd like to mention a related feature the original XFS had on Irix
(I think this feature wasn't ported to Linux) -- You could assign
a special sub-volume in the filesystem as real-time. The kernel
would then give absolute priority to I/O requests related to that
sub-volume -- In marketing speech this would give you guaranteed
I/O response time (although I don't remember seeing any specific
constraints on this [that's why I call it marketing]).

Any other ideas anybody?

-- 
Oron Peled Voice/Fax: +972-4-8228492
[EMAIL PROTECTED]  http://www.actcom.co.il/~oron
ICQ UIN: 16527398

If I have been able to see farther,
it was only because I stood on the shoulders of giants.
-- Sir Isaac Newton

--
Haifa Linux Club Mailing List (http://www.haifux.org)
To unsub send an empty message to [EMAIL PROTECTED]




Re: [Haifux] Real-time write on *ANY* filesystem

2005-06-22 Thread Eli Billauer




Muli Ben-Yehuda wrote:

  
mknod mypipe p
mbuffer -i mypipe -o /fatfs/output-file 
./writefat mypipe  listfile

  
  
Until 2.6.mumble, pipes only used a single page in memory. Since
2.6.mumble we're using up to 16 pages and flipping between consumer
and producer, which should give much better pipe utilization for large
writes.
  

Note that mbuffer is the RAM FIFO, and it was empty all the time (as
one could expect). Since mbuffer never blocked, I don't think it
matters how good the pipe between them is. This is why I found it weird
that I got delays at all, using a RAM FIFO.


  
- use write, not fwrite!!!
- use DIRECT_IO to bypass kernel caching
- use the appropriate IO elevator

Are these general guidelines for writing fast I/O, or are there good
reasons to suspect that one of these cause occasional long blocks? Keep
in mind that 3 MB/sec isn't fast at all. It's not like I care about a
long average delay. It's the peaks.

Besides, it's all nice when I write the application myself. But usually
what we do is to use some prewritten software. In my case I could hack
it (as I've already done for other reasons). But still this looks like
a kernel problem to me.

  
- verify that your disk drivers are tuned for whatever you want to do
(is DMA on?)
- what else is the system doing? is it idle? busy? is anything else
interfering with the scheduling of your program?

Yes, the DMA is on for both computers. And at least on the laptop,
there shouldn't be anything running (not even X).

  Linux is a general purpose OS, which means it's good for a lot of
things and optimal for none.

Well, as it turns out, it's not so good for a rather mainstream
multimedia recording task. At least not on my computers. I don't need
optimal. I need reasonable.

It would be nice if some of you tried the program I sent, and let's see
if you get the same results. Note that the real action begins when the
partition you write to is getting full.

Regards,
  Eli

-- 
Web: http://www.billauer.co.il





Re: [Haifux] Real-time write on *ANY* filesystem

2005-06-22 Thread guy keren


On Wed, 22 Jun 2005, Eli Billauer wrote:


It turns out that it's not a FAT issue, but that the same problem occurs
on ext3 systems as well. I've written a small program to test the delays
between writes, and the results are not very encouraging. Specially when
the disk getsfull (it always does, doesn't it?).


i don't think it's the disk gets full. i think its the page-cache gets
full. try this: get a partition that is already quite full, and run the
test on it. you will not see this problem.

or: get a very very large partition (e.g. 30GB free space) and run it
there - you'll notice the problem when the ammount of data you wrote
_lately_ gets large.

the page cache of the linux system is, by default, tuned for overall
throughput, not for worst-case-per-I/O. i don't think that changing the
elevator algorithm will help. things i would try:

1. what muli said, using O_DIRECT would help, but turn all the I/O into
  synchronous mode, which might give you a too slow throughput (can't
  tell without trying). to overcome this, your will need a combination of
  application-based buffering and direct/raw I/O. e.g. taking the source
  of mbuffer, making sure it works with O_DIRECT, or even with a 'raw'
  device (available on some linux distributions, not all).

yes, this is a problem with the linux system (as a whole).

--guy



In my opinion, this should concert anyone who want to use a Linux box
for storing a data stream (audio, video, whatever).

I've attached the source of the program I used. Basically, it loops on
writing 32kB chunks of data to a file, creating a list of number telling
how much time (in microseconds) elapsed since the last loop (to stdout).
There are two modes of testing: One is to let it write as fast as
possible, and the second is to put delays between writes, which
simulates waiting for incoming data. If there is enough room, the
program will write slightly less than 2 GB (guess why?).

Since Linux is a multitasking system, the results are not exactly
repeatable. But the general impression is that writing to FAT or on
ext3, on my laptop or on my desktop, they all behave more or less the same.

First test regards full-speed write. Data was simply written as fast as
possible. For anon-full partition, the write operation dwelled
typically 5.5 ms, with occasional bursts of 0.7-0.9 *seconds* delay on
the write operation. When the partition gets full, things get even
nastier. Several seconds of blocking was observed. 5 seconds, and up to
14 seconds delay typically appeared a few times for a 2 GB writing session.

Then I added a short sleep in the loop, in order to simulate data
written at ~ 3MB/sec (which is reasonable for video capturing). This is
far below the disk's physical capicity. The disk LED showed occasional
flushes.

Results: For a non-full partition, occasional peaks of up to 60ms were
observed, which is something one can live with, probably. At 3 MB/s this
means 180 kB stuck in the buffer. But when the partition started to get
full, peaks of 0.2-0.3 seconds started to appear. The latter means 900
kB waiting to go out, and this maybe explains why I originally had problems.

If you want to see how your system behaves, just compile the attached
code and go:
./writefat output-junk-data-file  listfile

The list of loop timings will be in listfile. Use your favourite number
cruncher to view graphs. (The program's name is due to historic
reasons...) If you want to test the slower writing speeds, check the
typical delay in the listfile, or see how fast the output file grows.
The sleep period defined in the program itself is not reliable, since
the operating system may not be able to sleep for too short periods.

And finally: Does an RAM FIFO help? Surprisingly, the answer is no. I
did the following:

mknod mypipe p
mbuffer -i mypipe -o /fatfs/output-file 
./writefat mypipe  listfile

and was quite surprised to find delays of 0.2 sec. BTW, mbuffer seems to
force the data to be flushed to disk much more often. The disk LED
showed that writes occured all the time, unlike the direct write to
file, in which flushes occured occasionally. And mypipe and logfile
are on an ext3, while outfile-file is on FAT.

Insights, anybody?

  Eli

--
Web: http://www.billauer.co.il




--
guy

For world domination - press 1,
or dial 0, and please hold, for the creator. -- nob o. dy


--
Haifa Linux Club Mailing List (http://www.haifux.org)
To unsub send an empty message to [EMAIL PROTECTED]




Re: [Haifux] Real-time write on *ANY* filesystem

2005-06-22 Thread Eli Billauer

guy keren wrote:


I don't think it's the disk gets full. i think its the page-cache gets
full. try this: get a partition that is already quite full, and run the
test on it. you will not see this problem.


Well, you may get other results if you test it, but what I saw was that 
if the partition was about to be full, I got one behaviour. Ran the same 
test after deleting some gigas of data from the partition, got something 
much better. Back and forth. This is how I reached the conclusion.


The question I find appealing in this context is when the filesystem 
looks for free blocks. If it does it only by demand, this would explain 
what happens. IMHO, it would make sense to fire off some tasklet (?) 
whenever the pool of free blocks starts to get empty, but I have no idea 
how it really works.


   Eli

--
Web: http://www.billauer.co.il



--
Haifa Linux Club Mailing List (http://www.haifux.org)
To unsub send an empty message to [EMAIL PROTECTED]




Re: [Haifux] Real-time write on *ANY* filesystem

2005-06-22 Thread Eli Billauer

Muli Ben-Yehuda wrote:


Where can I find the sourcve for mbuffer?


http://www.rcs.ei.tum.de/~maierkom/privat/software/mbuffer/

I downloaded 20011008 (the latest version didn't compile).


Which kernel are you using?


I'm on 2.4.22 and 2.4.21 (yeah, yeah, retro).

As for the results you posted: It's the peaks I'm after, not the tail. 
The peaks appear anywhere in the list. So the best thing is to draw a 
graph of these numbers.


Thanks,
  Eli

--
Web: http://www.billauer.co.il



--
Haifa Linux Club Mailing List (http://www.haifux.org)
To unsub send an empty message to [EMAIL PROTECTED]




Re: [Haifux] Real-time write on *ANY* filesystem

2005-06-22 Thread Muli Ben-Yehuda
On Wed, Jun 22, 2005 at 03:08:45PM +0200, Eli Billauer wrote:

 As for the results you posted: It's the peaks I'm after, not the tail. 
 The peaks appear anywhere in the list. So the best thing is to draw a 
 graph of these numbers.

This is the tail of the distribution - i.e the peak. (generated via
sort -n $file | tail -15)

Cheers,
Muli
-- 
Muli Ben-Yehuda
http://www.mulix.org | http://mulix.livejournal.com/


--
Haifa Linux Club Mailing List (http://www.haifux.org)
To unsub send an empty message to [EMAIL PROTECTED]




Re: [Haifux] Real-time write on *ANY* filesystem

2005-06-22 Thread guy keren

On Wed, 22 Jun 2005, Eli Billauer wrote:

 guy keren wrote:

  I don't think it's the disk gets full. i think its the page-cache gets
  full. try this: get a partition that is already quite full, and run the
  test on it. you will not see this problem.

 Well, you may get other results if you test it, but what I saw was that
 if the partition was about to be full, I got one behaviour. Ran the same
 test after deleting some gigas of data from the partition, got something
 much better. Back and forth. This is how I reached the conclusion.

so it _could_ be that due to fragmentation, instead of writing a large set
of data consecutively, the system wrote this large set of data in
several write commands on different parts of the hard drive.

 The question I find appealing in this context is when the filesystem
 looks for free blocks. If it does it only by demand, this would explain
 what happens.

the file system contains a list of all free blocks. it looks for a free
block _from this list_ when there is a need for a new free block.
furthermore, it usually does not allocate a single block - rather, it
tries to pre-allocate several consecutive blocks, assuming they'll soon be
needed. it does this in order to avoid spreading the file all over the
disk.

-- 
guy

For world domination - press 1,
 or dial 0, and please hold, for the creator. -- nob o. dy

--
Haifa Linux Club Mailing List (http://www.haifux.org)
To unsub send an empty message to [EMAIL PROTECTED]




RE: [Haifux] Real-time write on *ANY* filesystem

2005-06-21 Thread Tzahi Fadida
General insight. Try XFS, it tries to avoid disk operations as much as
possible.
It should outperform ext3 on large files.

Regards,
tzahi.

 -Original Message-
 From: Haifux - Haifa Linux Club 
 [mailto:[EMAIL PROTECTED] On Behalf Of Eli Billauer
 Sent: Wednesday, June 22, 2005 2:42 AM
 To: Haifa Linux Club Mailing list
 Subject: [Haifux] Real-time write on *ANY* filesystem
 
 
 Hello again,
 
 It turns out that it's not a FAT issue, but that the same 
 problem occurs 
 on ext3 systems as well. I've written a small program to test 
 the delays 
 between writes, and the results are not very encouraging. 
 Specially when 
 the disk gets full (it always does, doesn't it?).
 
 In my opinion, this should concert anyone who want to use a Linux box 
 for storing a data stream (audio, video, whatever).
 
 I've attached the source of the program I used. Basically, it 
 loops on 
 writing 32kB chunks of data to a file, creating a list of 
 number telling 
 how much time (in microseconds) elapsed since the last loop 
 (to stdout). 
 There are two modes of testing: One is to let it write as fast as 
 possible, and the second is to put delays between writes, which 
 simulates waiting for incoming data. If there is enough room, the 
 program will write slightly less than 2 GB (guess why?).
 
 Since Linux is a multitasking system, the results are not exactly 
 repeatable. But the general impression is that writing to FAT or on 
 ext3, on my laptop or on my desktop, they all behave more or 
 less the same.
 
 First test regards full-speed write. Data was simply written 
 as fast as 
 possible. For a  non-full partition, the write operation dwelled 
 typically 5.5 ms, with occasional bursts of 0.7-0.9 *seconds* 
 delay on 
 the write operation. When the partition gets full, things get even 
 nastier. Several seconds of blocking was observed. 5 seconds, 
 and up to 
 14 seconds delay typically appeared a few times for a 2 GB 
 writing session.
 
 Then I added a short sleep in the loop, in order to simulate data 
 written at ~ 3MB/sec (which is reasonable for video 
 capturing). This is 
 far below the disk's physical capicity. The disk LED showed 
 occasional 
 flushes.
 
 Results: For a non-full partition, occasional peaks of up to 
 60ms were 
 observed, which is something one can live with, probably. At 
 3 MB/s this 
 means 180 kB stuck in the buffer. But when the partition 
 started to get 
 full, peaks of 0.2-0.3 seconds started to appear. The latter 
 means 900 
 kB waiting to go out, and this maybe explains why I 
 originally had problems.
 
 If you want to see how your system behaves, just compile the attached 
 code and go:
 ./writefat output-junk-data-file  listfile
 
 The list of loop timings will be in listfile. Use your 
 favourite number 
 cruncher to view graphs. (The program's name is due to historic 
 reasons...) If you want to test the slower writing speeds, check the 
 typical delay in the listfile, or see how fast the output file grows. 
 The sleep period defined in the program itself is not reliable, since 
 the operating system may not be able to sleep for too short periods.
 
 And finally: Does an RAM FIFO help? Surprisingly, the answer is no. I 
 did the following:
 
 mknod mypipe p
 mbuffer -i mypipe -o /fatfs/output-file 
 ./writefat mypipe  listfile
 
 and was quite surprised to find delays of 0.2 sec. BTW, 
 mbuffer seems to 
 force the data to be flushed to disk much more often. The disk LED 
 showed that writes occured all the time, unlike the direct write to 
 file, in which flushes occured occasionally. And mypipe and 
 logfile 
 are on an ext3, while outfile-file is on FAT.
 
 Insights, anybody?
 
 Eli
 
 -- 
 Web: http://www.billauer.co.il
 
 



--
Haifa Linux Club Mailing List (http://www.haifux.org)
To unsub send an empty message to [EMAIL PROTECTED]