Re: RFE: enable buffering on null-terminated data

Carl Edquist via GNU coreutils General Discussion Thu, 14 Mar 2024 08:15:13 -0700


On Mon, 11 Mar 2024, Zachary Santer wrote:

On Mon, Mar 11, 2024 at 7:54 AM Carl Edquist <[email protected]>wrote:
(In my coprocess management library, I effectively run every coprocwith --output=L by default, by eval'ing the output of 'env -i stdbuf-oL env', because most of the time for a coprocess, that's whatswanted/necessary.)
Surrounded by 'set -a' and 'set +a', I guess? Now that's interesting.

Ah, no - I use the 'VAR=VAL command line' syntax so that it's specific tothe command (it's not left exported to the shell).


Effectively the coprocess commands are run with

        LD_PRELOAD=... _STDBUF_O=L command line

This allow running shell functions for the command line, which will allget the desired stdbuf behavior. Because you can't pass a shell function(within the context of the current shell) as the command to stdbuf.

As far as I can tell, the stdbuf tool sets LD_PRELOAD (to point tolibstdbuf.so) and your custom buffering options in _STDBUF_{I,O,E}, in theenvironment for the program it runs. The double-env thing there is just away to cleanly get exactly the env vars that stdbuf sets. The valuesdon't change, but since they are an implementation detail of stdbuf, it'sa bit more portable to grab the values this way rather than hard codethem. This is done only once per shell session to extract the values, andsave them to a private variable, and then they are used for the commandline as show above.

Of course, if "command line" starts with "stdbuf --output=0" or whatever,that will override the new line-buffered default.

You can definitely export it to your shell though, either with 'set -a'like you said, or with the export command. After that everything you runshould get line-buffered stdio by default.

I just added that to a script I have that prints lines output by anothercommand that it runs, generally a build script, to the command line, butupdating the same line over and over again. I want to see if it updatesmore continuously like that.

So, a lot of times build scripts run a bunch of individual commands.Each of those commands has an implied flush when it terminates, so youwill get the output from each of them promptly (as each commandcompletes), even without using stdbuf.

Where things get sloppy is if you add some stuff in a pipeline after yourbuild script, which results in things getting block-buffered along theway:


        $ ./build.sh | sed s/what/ever/ | tee build.log

And there you will definitely see a difference.


        sloppy () {
                for x in {1..10}; do sleep .2; echo $x; done |
                sed s/^/:::/ | cat
        }

        {
                echo before:
                sloppy
                echo

                export $(env -i stdbuf -oL env)

                echo after:
                sloppy
        }

Yeah, there's really no way to break what I'm doing into a standardpipeline.


I admit I'm curious what you're up to  :)

Of course, using line-buffered or unbuffered output in this situationmakes no sense. Where it might be useful in a pipeline is when anearlier command in a pipeline might only print things occasionally, andyou want those things transformed and printed to the command lineimmediately.

Right ... And in that case, losing the performance benefit of a largerblock buffer is a smaller price to pay.

My assumption is that line-buffering through setbuf(3) was implementedfor printing to the command line, so its availability to stdbuf(1) isjust a useful side effect.


Right, stdbuf(1) leverages setbuf(3).

setbuf(3) tweaks the buffering behavior of stdio streams (stdin, stdout,stderr, and anything else you open with, eg, fopen(3)). It's not reallylimited to terminal applications, but yeah it makes it easier to ensurethat your calls to printf(3) actually get output after each line (whetherthat's to a file or a pipe or a tty), without having to call an explicitfflush(3) of stdout every time.

stdbuf(1) sets LD_PRELOAD to libstdbuf.so for your program, causing it tocall setbuf(3) at program startup based on the values of _STDBUF_* in theenvironment (which stdbuf(1) also sets).


(That's my read of it anyway.)

In the BUGS section in the man page for stdbuf(1), we see: On GLIBCplatforms, specifying a buffer size, i.e., using fully buffered modewill result in undefined operation.


Eheh xD

Oh, I imagine "undefined operation" means something more like"unspecified" here. stdbuf(1) uses setbuf(3), so the behavior you'll getshould be whatever the setbuf(3) from the libc on your system does.

I think all this means is that the C/POSIX standards are a bit loose aboutwhat is required of setbuf(3) when a buffer size is specified, and thereis room in the standard for it to be interpreted as only a hint.

If I'm not mistaken, then buffer modes other than 0 and L don't actuallywork. Maybe I should count my blessings here. I don't know what's goingon in the background that would explain glibc not supporting any ofthat, or stdbuf(1) implementing features that aren't supported on thevast majority of systems where it will be installed.


Hey try it right?

Works for me (on glibc-2.23)

        $ for s in 8k 16k 32k 1M; do
            echo ::: $s :::
            { stdbuf -o$s strace -ewrite tr 1 2
            } < /dev/zero 2>&1 > /dev/null | head -3
            echo
          done

        ::: 8k :::
        write(1, "\0\0\0\0\0\0\0\0"..., 8192) = 8192
        write(1, "\0\0\0\0\0\0\0\0"..., 8192) = 8192
        write(1, "\0\0\0\0\0\0\0\0"..., 8192) = 8192

        ::: 16k :::
        write(1, "\0\0\0\0\0\0\0\0"..., 16384) = 16384
        write(1, "\0\0\0\0\0\0\0\0"..., 16384) = 16384
        write(1, "\0\0\0\0\0\0\0\0"..., 16384) = 16384

        ::: 32k :::
        write(1, "\0\0\0\0\0\0\0\0"..., 32768) = 32768
        write(1, "\0\0\0\0\0\0\0\0"..., 32768) = 32768
        write(1, "\0\0\0\0\0\0\0\0"..., 32768) = 32768

        ::: 1M :::
        write(1, "\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
        write(1, "\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
        write(1, "\0\0\0\0\0\0\0\0"..., 1048576) = 1048576

It may just be that nobody has actually had a real need for it.(Yet?)
I imagine if anybody has, they just set --output=0 and moved on. Bashscripts aren't the fastest thing in the world, anyway.


Ouch.  Ouch.  Ouuuuch.  :)

While that's true if you're talking about bash itself doing the actualcomputation and data processing, the main work of the shell is making iteasy to set up pipelines for other (very fast) programs to pass their dataaround.

The stdbuf tool is not meant for the shell! It's meant for those veryfast programs that the shell stands up.

Using stdbuf to tweak a very fast program, causing it to output more oftenat newlines over pipes rather than at block boundaries, does slow downthose programs somewhat. But as we've discussed, this is necessary forcertain pipelines that have two-way communication (including coprocesses),or in general any time you want the output immediately.

What may not be obvious is that the shell does not need to get involvedwith writing input for a coprocess or reading its output - the shell canstart other (very fast) programs with input/output redirected to/from thecoprocess pipes to do that processing.

My point though earlier was that a null-terminated record buffering mode,as useful as it sounds on the surface (for null-terminated paths), mayactually be something _nobody_ has ever actually needed for an actual (notcontrived) workflow.


But then again I say "Yet?" - because, never say never.


Happy line-buffering  :)

Carl

Re: RFE: enable buffering on null-terminated data

Reply via email to