Pádraig Brady <[email protected]> writes:

>> I only had a quick skim over the patch, but it generally looks good.
>> It reminded me of one thing though. You used a buffer like this:
>>     static char line_in[IO_BUFSIZE];
>> I think this was a mistake in my multibyte 'fold' implementation.
>> I'm
>> not really sure why I chose to use IO_BUFSIZE. It is meant to minimize
>> system call overhead, but since we are using fread/fwrite, libc chooses
>> how much to read and write per system call. For example on glibc:
>>      $ strace -e trace='/read|write' ./src/cut -f 1 /dev/zero  2>&1
>> | head
>>      [...]
>>      read(3, 
>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>> 262144) = 262144
>>      write(1, 
>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>> 262144) = 262144
>>      read(3, 
>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>> 262144) = 262144
>>      write(1, 
>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) 
>> = 4096
>>      write(1, 
>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>> 258048) = 258048
>>      read(3, 
>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>> 262144) = 262144
>>      write(1, 
>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) 
>> = 4096
>> I think it probably makes sense to just use BUFSIZ there. Likewise
>> for
>> 'fold' and 'expand':
>>      $ sed -i 's/IO_BUFSIZE/BUFSIZ/g' src/cut.c && make
>>      $ timeout 10 taskset 1 ./src/cut-iobufsize -f 1 /dev/zero  | taskset 2 
>> pv -r > /dev/null
>>      [1.80GiB/s]
>>      $ timeout 10 taskset 1 ./src/cut -f 1 /dev/zero  | taskset 2 pv -r > 
>> /dev/null
>>      [2.18GiB/s]
>> 
>
> You're right that stdio picks the I/O sizes, but the fread/fwrite size
> is a big hint to the I/O size to use and using IO_BUFSIZE is always a win for 
> me.
> We can also force the stdio I/O size with stdbuf (or setvbuf),
> so testing various combinations here:
>
> $ timeout 5 taskset 1 ./src/cut-bufsiz -f 1 /dev/zero | taskset 1 pv -r 
> >/dev/null
> [ 649MiB/s]
> $ timeout 5 taskset 1 ./src/cut-bufsiz -f 1 /dev/zero | taskset 2 pv -r 
> >/dev/null
> [1.29GiB/s]
> $ timeout 5 taskset 1 ./src/cut-iobufsize -f 1 /dev/zero | taskset 1 pv -r 
> >/dev/null
> [2.51GiB/s]
> $ timeout 5 taskset 1 ./src/cut-iobufsize -f 1 /dev/zero | taskset 2 pv -r 
> >/dev/null
> [1.98GiB/s]
> $ timeout 5 taskset 1 stdbuf -o 262144 ./src/cut-iobufsize -f 1 /dev/zero | 
> taskset 1 pv -r >/dev/null
> [2.38GiB/s]
> $ timeout 5 taskset 1 stdbuf -o 262144 ./src/cut-iobufsize -f 1 /dev/zero | 
> taskset 2 pv -r >/dev/null
> [1.89GiB/s]
>
> So in summary the current IO_BUFSIZE performs best for me.
> I was considering setting setvbuf to IO_BUFSIZE,
> but there seems to be no need given the above results.

Oh, interesting. The difference between BUFSIZ and IO_BUFSIZE is also
quite large on your system. I guess using IO_BUFSIZE consistently is
reasonable then, since this stuff will vary across environments.

> I also did a full perf run with all previously shown option combinations,
> and all performed a bit better with the larger I/O size.

Previously, I was unsure if we could assume that 256KiB, or more if
using multiple IO_BUFSIZ byte buffers, could be allocated on the stack.
Lasse Collin, who I added to CC, also mentioned this when he reviewed my
wc_neon.c patch. Do you know if we can assume it is safe?

In 2009, Bruno mentioned that the default stack size for new threads is
quite small on some systems [1]. But I'm not sure about the main thread,
and those 2009 limits are probably much different than on today's
platforms. Likewise, in 2009 you were scared of 128KiB allocated on the
stack [2].

I probably would have mentioned this sooner, but I figured anyone with a
tiny stack could lower the value of IO_BUFSIZ and recompile. So I forgot
to bring it up on list.

Collin

[1] https://lists.gnu.org/archive/html/bug-coreutils/2009-10/msg00262.html
[2] https://lists.gnu.org/archive/html/bug-coreutils/2009-02/msg00152.html

Reply via email to