Re: Pipes, cat buffer size
On Sunday 19 October 2008 02:50:22 Dan Nelson wrote: > > But if it works in general, it may simply be that it isn't really > > applicable to my purpose (and I should modify the reader to read > > multiple blocks). > > That's my suggestion, yes. That way your program would also work when > passed data from an internet socket (where you will get varying read() > sizes too). It wouldn't add more than 10 lines to wrap your read in a > loop that exits when your preferred size has been reached. Since you mention a socket, would this patch be a good idea and use kqueue to read from the pipe? I would think that having the kernel fill the buffer, rather then a busy loop kernel/userland would improve speed, but I'm not too familiar with the code to know if this causes any problems. diff -u -r1.191.2.3 sys_pipe.c --- sys_pipe.c 6 Jun 2008 12:17:28 - 1.191.2.3 +++ sys_pipe.c 20 Oct 2008 14:04:18 - @@ -1594,7 +1609,10 @@ PIPE_UNLOCK(rpipe); return (1); } - ret = kn->kn_data > 0; + if ( kn->kn_sfflags & NOTE_LOWAT) + ret = kn->kn_data >= kn->kn_sdata; + else + ret = kn->kn_data > 0; PIPE_UNLOCK(rpipe); return ret; } -- Mel ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Pipes, cat buffer size
... the writer could write 1-byte buffers and the reader will be forced to read each byte individually. No; take a look at /sys/kern/sys_pipe.c . Depending on how much data is in the pipe, it switches between async in-kernel buffering (<8192 bytes), and direct page wiring between sender and receiver (basically zero-copy). Ok, maybe it's just not behaving as I thought it should. See this test program: However, when I add "sleep(1)" to your test program, I see the following output: $ dd bs=1 if=/dev/zero | ./reader read 2282 bytes read 65536 bytes read 65536 bytes read 65536 bytes read 65536 bytes read 65536 bytes read 65536 bytes read 65536 bytes read 65536 bytes read 65536 bytes read 65536 bytes In your original example, because your sample reader does nothing, it sees each write separately. If the reader was actually doing some work then the pipe would buffer up data while your reader was busy. This looks like exactly the right behavior: The reader will only block if there is no data in the pipe at all; the writer will only block if it gets "too far ahead" of the reader. Except in those cases, each program gets to do I/O as fast as it can. If your program needs larger blocks, it should keep reading until it gets enough data. (The -B option of GNU tar is an example of this sort of behavior.) Tim ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Pipes, cat buffer size
In the last episode (Oct 19), Ivan Voras said: > 2008/10/19 Dan Nelson <[EMAIL PROTECTED]>: > > In the last episode (Oct 19), Ivan Voras said: > >> Of course. But that's not the point :) From what I see (didn't > >> look at the code), Linux for example does some kind of internal > >> buffering that decouples how the reader and the writer interact. I > >> think that with FreeBSD's current behaviour the writer could write > >> 1-byte buffers and the reader will be forced to read each byte > >> individually. I don't know if there's some ulterior reason for > >> this. > > > > No; take a look at /sys/kern/sys_pipe.c . Depending on how much > > data is in the pipe, it switches between async in-kernel buffering > > (<8192 bytes), and direct page wiring between sender and receiver > > (basically zero-copy). > > Ok, maybe it's just not behaving as I thought it should. See this > test program: [ program that prints the amount of data in each read() ] > and this command line: > > > dd bs=1 if=/dev/zero| ./reader > > The output of this on RELENG_7 is: > > read 8764 bytes > read 1 bytes [..] > read 1 bytes > read 1 bytes > ... > > The first value puzzles me - so it actually is doing some kind of > buffering. Linux isn't actually much better, but the intention is > there: > > $ dd if=/dev/zero bs=1 | ./bla > read 1 bytes > read 38 bytes > read 8 bytes > read 2 bytes [..] > read 2 bytes > read 3 bytes > read 3 bytes > read 112 bytes > read 2 bytes > read 2 bytes > ... > > Maybe FreeBSD switches between the writer and the reader too soon so > the buffer doesn't get filled? If your reader isn't doing any real work between reads, it is always reading, so the pipe will never fill up. The delay in FreeBSD was probably due to the shell spawning the writer first, so it buffered up 8k of data before the reader was ready. After that, the reader was able to pull data as fast as the writer pushed. > Using cat (which started all this), FreeBSD consistently processes > 4096 byte buffers, while Linux's sizes are all over the place - from > 4 kB to 1 MB, randomly fluctuating. My goal would be (if it's > possible - it might not be) to maximize coalescing in an environment > where the reader does something with the data (e.g. compression) so > there should be a reasonable amount of backlogged input data. Remember that increasing coelescing also increases latency and decreases the parallelism between reader and writer (since if you coalesce you cause the reader to wait for data that's already been writen, in the hopes that the writer will write again soon). > But if it works in general, it may simply be that it isn't really > applicable to my purpose (and I should modify the reader to read > multiple blocks). That's my suggestion, yes. That way your program would also work when passed data from an internet socket (where you will get varying read() sizes too). It wouldn't add more than 10 lines to wrap your read in a loop that exits when your preferred size has been reached. > Though it won't help me, I still think that modifying cat is worth it :) -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Pipes, cat buffer size
2008/10/19 Dan Nelson <[EMAIL PROTECTED]>: > In the last episode (Oct 19), Ivan Voras said: >> Of course. But that's not the point :) From what I see (didn't look at >> the code), Linux for example does some kind of internal buffering that >> decouples how the reader and the writer interact. I think that with >> FreeBSD's current behaviour the writer could write 1-byte buffers and >> the reader will be forced to read each byte individually. I don't know >> if there's some ulterior reason for this. > > No; take a look at /sys/kern/sys_pipe.c . Depending on how much data > is in the pipe, it switches between async in-kernel buffering (<8192 > bytes), and direct page wiring between sender and receiver (basically > zero-copy). Ok, maybe it's just not behaving as I thought it should. See this test program: #include #include #include #define BSIZE (1024*1024) void main() { int r; char buf[BSIZE]; while (1) { r = read(0, buf, BSIZE); fprintf(stderr, "read %d bytes\n", r); if (r <= 0) break; } } and this command line: > dd bs=1 if=/dev/zero| ./reader The output of this on RELENG_7 is: read 8764 bytes read 1 bytes read 1 bytes read 1 bytes read 1 bytes read 1 bytes read 1 bytes read 1 bytes read 1 bytes read 1 bytes read 1 bytes read 1 bytes read 1 bytes read 1 bytes read 1 bytes read 1 bytes read 1 bytes read 1 bytes read 1 bytes ... The first value puzzles me - so it actually is doing some kind of buffering. Linux isn't actually much better, but the intention is there: $ dd if=/dev/zero bs=1 | ./bla read 1 bytes read 38 bytes read 8 bytes read 2 bytes read 2 bytes read 2 bytes read 2 bytes read 4 bytes read 3 bytes read 2 bytes read 2 bytes read 2 bytes read 2 bytes read 2 bytes read 2 bytes read 3 bytes read 3 bytes read 112 bytes read 2 bytes read 2 bytes ... Maybe FreeBSD switches between the writer and the reader too soon so the buffer doesn't get filled? Using cat (which started all this), FreeBSD consistently processes 4096 byte buffers, while Linux's sizes are all over the place - from 4 kB to 1 MB, randomly fluctuating. My goal would be (if it's possible - it might not be) to maximize coalescing in an environment where the reader does something with the data (e.g. compression) so there should be a reasonable amount of backlogged input data. But if it works in general, it may simply be that it isn't really applicable to my purpose (and I should modify the reader to read multiple blocks). Though it won't help me, I still think that modifying cat is worth it :) ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Pipes, cat buffer size
In the last episode (Oct 19), Ivan Voras said: > Dan Nelson wrote: > > In the last episode (Oct 18), Ivan Voras said: > >> I'm working on a program that's intended to be used as a "filter", > >> as in "something | myprogram > file". I'm trying it with cat and > >> I'm seeing my read()s return small blocks, 64 kB in size. I > >> suppose this is because cat writes in 64 kB blocks. So: > >> > >> a) Is there a way to programatically, per-process, set the pipe buffer > >> size? The program in question is a compressor and it's particularly > >> inefficient when given small blocks and I'm wondering if the system can > >> buffer enough data for it. > > > > Why not keep reading until you reach your desired compression block > > size? Bzip2's default blocksize is 900k, for example. > > Of course. But that's not the point :) From what I see (didn't look at > the code), Linux for example does some kind of internal buffering that > decouples how the reader and the writer interact. I think that with > FreeBSD's current behaviour the writer could write 1-byte buffers and > the reader will be forced to read each byte individually. I don't know > if there's some ulterior reason for this. No; take a look at /sys/kern/sys_pipe.c . Depending on how much data is in the pipe, it switches between async in-kernel buffering (<8192 bytes), and direct page wiring between sender and receiver (basically zero-copy). > >> b) Is there any objection to the following patch to cat: > > > > It might be simpler to just use "dd if=myfile obs=1m" instead of > > patching cat. > > I believe patching cat to bring its block size into the century of the > fruitbat has its own benefits. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Pipes, cat buffer size
Dan Nelson wrote: > In the last episode (Oct 18), Ivan Voras said: >> I'm working on a program that's intended to be used as a "filter", as >> in "something | myprogram > file". I'm trying it with cat and I'm >> seeing my read()s return small blocks, 64 kB in size. I suppose this >> is because cat writes in 64 kB blocks. So: >> >> a) Is there a way to programatically, per-process, set the pipe buffer >> size? The program in question is a compressor and it's particularly >> inefficient when given small blocks and I'm wondering if the system can >> buffer enough data for it. > > Why not keep reading until you reach your desired compression block > size? Bzip2's default blocksize is 900k, for example. Of course. But that's not the point :) From what I see (didn't look at the code), Linux for example does some kind of internal buffering that decouples how the reader and the writer interact. I think that with FreeBSD's current behaviour the writer could write 1-byte buffers and the reader will be forced to read each byte individually. I don't know if there's some ulterior reason for this. >> b) Is there any objection to the following patch to cat: > > It might be simpler to just use "dd if=myfile obs=1m" instead of > patching cat. I believe patching cat to bring its block size into the century of the fruitbat has its own benefits. signature.asc Description: OpenPGP digital signature
Re: Pipes, cat buffer size
In the last episode (Oct 18), Ivan Voras said: > I'm working on a program that's intended to be used as a "filter", as > in "something | myprogram > file". I'm trying it with cat and I'm > seeing my read()s return small blocks, 64 kB in size. I suppose this > is because cat writes in 64 kB blocks. So: > > a) Is there a way to programatically, per-process, set the pipe buffer > size? The program in question is a compressor and it's particularly > inefficient when given small blocks and I'm wondering if the system can > buffer enough data for it. Why not keep reading until you reach your desired compression block size? Bzip2's default blocksize is 900k, for example. > b) Is there any objection to the following patch to cat: It might be simpler to just use "dd if=myfile obs=1m" instead of patching cat. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"