On Sun, Dec 4, 2016 at 11:27 PM, Laslo Hunhold <d...@frign.de> wrote: > On Sun, 4 Dec 2016 21:55:02 -0800 > Michael Forney <mfor...@mforney.org> wrote: > > Dear Michael, > >> I finally got around to addressing the issues with fread I raised at >> the end of http://lists.suckless.org/dev/1504/26580.html. >> >> This patch series eliminates several uses of fread/fwrite where plain >> read/write are more appropriate. To help with the common case of >> writing an entire buffer to a file descriptor, I added a writeall >> function (which works similar to plan9's standard write function and >> the Write/write_all functions in go/rust). All users of concat were >> better suited to work on plain file descriptors except for tail, so I >> switched concat to a read/writeall loop, and tail to use a >> fgetc/fputc loop. > > fgetc/fputc really can slow down your program, which I noticed when I > wrote farbfeld-filters. Working on a buffer-basis rather than a > char-by-char-basis really speeds up things a lot (for me it was around > 50%-70%).
Internally, fgetc/fputc are still using the buffers behind the FILE structures, so the only overhead I can imagine is due to the repeated function calls. The previous tail -f code used getline/fputs. We could fix the NUL byte issue with fwrite instead of fputs, but I'd like to make sure that it actually makes any noticeable difference before trying this. > Can you provide some performance-information e.g. for cat(1)? > Does it get any faster with the writeall()-loop? What other benefits do > we see here? I compared the time it took to cat a 4.3G file to /dev/null. After running several times to wait for any file caching to settle down, I see before: 0m00.81s real 0m00.14s user 0m00.67s system after: 0m00.66s real 0m00.03s user 0m00.63s system Sidenote: this is more like 1.4s real when I build with musl because musl's BUFSIZ is 1024, and glibc's is 8192. I'm thinking concat should use a fixed buffer length like 8192 rather than BUFSIZ. go's Copy uses 32*1024, and rust's uses 8*1024. But this is outside the scope of this patch set. I might send in another patch later. > Can you elaborate in this thread again why we should do > raw I/O? In a naïve sense, the operating system can "shedule" reads and > writes more efficiently with the buffered I/O. For the concat case, previously what was happening was essentially this fread(BUFSIZ) -> read(BUFSIZ) = r1, read(BUFSIZ - r1) = r2, read(BUFSIZ - r1 - r2) = r3, ... -> copy data from FILE buffer to provided buffer fwrite(BUFSIZ) -> copy data from provided buffer to FILE buffer -> write(BUFSIZ) = w1, write(BUFSIZ - w1) = w2, write(BUFSIZ - w1 - w2) = w3, ... now, we do this read(BUFSIZ) = r1 write(r1), ... read(BUFSIZ) = r2 write(r2), ... read(BUFSIZ) = r3 write(r3), ... So in terms of system calls, it is near identical. We also don't have to copy the data twice. The only thing that could make this slower than above is if the reads are returning tiny chunks, so we end up making more write calls. We could fix this in concat by making sure to read at least some amount of data before writing, perhaps controlled by an argument which we can pass when cat's -u is set. I didn't bother with this for now because openbsd's cat doesn't do it. If your input file descriptor is only returning data in small chunks, it probably isn't coming in very fast, so it doesn't matter much to make sure writes happen in big chunks. For the cases where we are just processing the incoming data like *sum and od, the raw read really is the right way to go. Both fread and the new code are making the same read calls into a buffer of size BUFSIZ. The only difference could be that we go through more loop iterations if the reads come in small chunks rather then doing the extra copy to the provided buffer. I think a good rule of thumb is if you are trying to read data in and don't care what size chunks it comes in, use read. fread offers you no advantages. Otherwise, use fread and let stdio take care of buffering behind the scenes. >> The result should be more efficient (no copy between read/write >> buffers, and data is processed as it comes in), and is roughly the >> same complexity as before. In most cases, we now pull in less of >> stdio, so statically linked binaries get smaller. Additionally, >> cat/tee on a pipe/terminal is now usable. > > That makes sense. > >> Along the way, I found and fixed several bugs, mostly dealing with >> error checking. >> I've been running with these patches for a couple days now and >> haven't noticed any regressions. > > That is great! I'm glad you test your patches before submitting, which > I have been guilty of neglecting for so many times.