Re: [dev] ii: how to process out in a pipeline and still page with less
Hello, Does nobuf(1) help? http://jdebp.uk/Softwares/djbwares/guide/nobuf.html Note: it tackles exactly the POSIX feature to line buffer output to tty's by providing one to the program in the pipeline, but without using any shared-object magic. Have not used it (yet) though. Best Regards, Georg On 5/30/22 08:29, Josuah Demangeon wrote: Rodrigo Martins wrote: What if instead of changing every program we changed the standard library? We could make stdio line buffered by setting an environment variable. I applaude this idea! Environment variables seems to be the right spot for any config a library could need: are unobstrusive, can be set by the program calling it, yet keep each program configurable by default. Markus Wichmann wrote: The problem you run into here is that there is more than one standard library. The problem was stated here for libc's stdio, but I still like the idea for new libraries: A call to getenv (which never fails), into xyz_init(). What about $DISPLAY, $MALLOC_OPTIONS (OpenBSD), or $LIBV4LCONTROL_FLAGS[1] or some LIBXYZ_DEFAULT_DEV_FILE=/dev/xyz3? Insane and breaking some important concept? To keep for debugging purposes only? Not to use for every kind of configuration? Better let the programmer control the entirety of the library? Although, if a library (or any program really) does not *require* any configuration or environment variable and always works without that, I like it even better. [1]: https://github.com/philips/libv4l/blob/cdfd29/libv4lconvert/control/libv4lcontrol.c#L369-L371
Re: [dev] ii: how to process out in a pipeline and still page with less
Rodrigo Martins wrote: > What if instead of changing every program we changed the standard > library? We could make stdio line buffered by setting an environment > variable. I applaude this idea! Environment variables seems to be the right spot for any config a library could need: are unobstrusive, can be set by the program calling it, yet keep each program configurable by default. Markus Wichmann wrote: > The problem you run into here is that there is more than one standard > library. The problem was stated here for libc's stdio, but I still like the idea for new libraries: A call to getenv (which never fails), into xyz_init(). What about $DISPLAY, $MALLOC_OPTIONS (OpenBSD), or $LIBV4LCONTROL_FLAGS[1] or some LIBXYZ_DEFAULT_DEV_FILE=/dev/xyz3? Insane and breaking some important concept? To keep for debugging purposes only? Not to use for every kind of configuration? Better let the programmer control the entirety of the library? Although, if a library (or any program really) does not *require* any configuration or environment variable and always works without that, I like it even better. [1]: https://github.com/philips/libv4l/blob/cdfd29/libv4lconvert/control/libv4lcontrol.c#L369-L371
Re: [dev] ii: how to process out in a pipeline and still page with less
On Sun, May 29, 2022 at 10:20:05PM +, Rodrigo Martins wrote: > It was thus said that the Great Markus Wichmann once stated: > > And you fundamentally cannot change anything about the userspace of another > > program, at least not in UNIX. > > When I open file descriptors and exec(3) the new program inherits > those. Is that not chaning the userspace of another process? > Program, not process. And no, it is changing the kernelspace of another program. For userspace, file descriptors are just numbers. They attain meaning only from the kernel interface. > It was thus said that the Great Markus Wichmann once stated: > > Having one special-case program is better than changing all the general > > ones, right? > > Sure is. Too bad the stdbuf(1) uses such a fragile mechanism. > Well, I cannot think of anything else they could have done. Fundamentally, setting environment variables and hoping the target program will interpret them correctly is about the extent of what an external filter is capable of, here. > What if instead of changing every program we changed the standard > library? We could make stdio line buffered by setting an environment > variable. > The problem you run into here is that there is more than one standard library, and indeed it is even thinkable that some programming language may shirk libc entirely. Golang has been trying their damndest at that for a long time, just didn't go all the way and still wanted to use libpthread. Haskell/GHC would be another candidate, as would be Pascal. The only way to roll out a change that would affect all programs at the same time would be a kernel update, but as discussed, this is a userspace problem to solve. Plus, the environment variable idea breaks with programs with elevated privilege, but that is probably a good thing here. Ciao, Markus
Re: [dev] ii: how to process out in a pipeline and still page with less
It was thus said that the Great Markus Wichmann once stated: > And you fundamentally cannot change anything about the userspace of another > program, at least not in UNIX. When I open file descriptors and exec(3) the new program inherits those. Is that not chaning the userspace of another process? It was thus said that the Great Markus Wichmann once stated: > Having one special-case program is better than changing all the general ones, > right? Sure is. Too bad the stdbuf(1) uses such a fragile mechanism. What if instead of changing every program we changed the standard library? We could make stdio line buffered by setting an environment variable. Rodrigo. signature.asc Description: PGP signature
Re: [dev] ii: how to process out in a pipeline and still page with less
On Sat, May 28, 2022 at 08:32:57PM +0200, Markus Wichmann wrote: > ultimately terminates on the terminal. But who knows if that is the > case? Pipelines ending in a call to "less" will terminate on the > terminal, pipelines ending in a call to "nc" will not. So the shell > can't know, only the last command can know. The only solution would be to allow buffering to be passed through exec/posix_spawn, with a way to signal tools to not go through their heuristic logic. Then the shell could have a syntax to signal the pipeline buffering mode. Basically, another POSIX problem that can't be fixed =). [BLOG POST WARNING] A better solution would require departing from UNIX/POSIX and its minimalist API of "input: stdin/argv/signals in, output: stdout/stderr/return code out and FS for both". You know, years ago I found myself laughing at stuff like Lisp OSs or Spring as overly complex academic drivel, but the more I "progress" in computing, the more I learn of my error and understand UNIX as reactionary: rightly so, considering Multics and PL/I and the hardware of the time; but the reasons that made sense then don't now. Now that I'm infatuated with Common Lisp, functions instead of executables make perfect sense to me. In that case there would be no pipe and no buffering problem at all, since memory is shared and you're passing objects/pointers around instead of copying massive amounts of it via kernel.
Re: [dev] ii: how to process out in a pipeline and still page with less
On Sat, May 28, 2022 at 07:19:24PM +, Rodrigo Martins wrote: > Hello, Markus, > > Thank for filling in the details. I should do more research next time. > > I tried to write a program that does the same as stdbuf(1), but using > setbuf(3). Unfortunately it seems the buffering mode is reset across > exec(3), since my program did not work. If it did that would be a > clean solution. > But it cannot possibly happen that way, because the buffering set with setbuf(3) is solely in userspace. And you fundamentally cannot change anything about the userspace of another program, at least not in UNIX. > Does the buffering happen on the input or on the output side? Or it > that not the right way to look at it? Are these programs doing > something wrong or is this a limitation by the specifications? > There is too much buffering and changing according to file mode going on here. I had a program I called "syslogd" (on Windows) that would simply listen to the syslog port on UDP and print all the packages that arrived. Running just "syslogd" on its own would print all packages as they came in, but running "syslogd | tr a A" would print blocks of data long after the fact, making it useless for my usecase. Why? Because for one, syslogd's output buffering mode had changed to "fully buffered", now that the output was a pipe rather than a terminal. tr's input buffering mode was also fully buffered now, but that doesn't much matter, since the data is usually passed on quickly. It's just that the data is only actually sent on from syslogd when syslogd's buffer is full. Cygwin by default defines a BUFSIZ of 1024, so that's the buffer that has to be filled first. tr's buffer on input doesn't much matter, because the input buffer is refilled on underflow, and then filled only as far as is possible in one go. And tr's output buffer is line buffered in the above application, making it perfect for the application. No, the problem was the change in output mode for my own application happening as part of being in the middle of a pipeline. > Is modifying each program the best solutions we have? Granted it is > not an invasive change, especially for simple pipeline-processing > programs, but making such extensions could bring portability issues. > Modifying all programs is typically a bad solution. It is what the systemd people are doing, and most here despise them for that if nothing else. It just appears there is no simple solution for this problem other than writing specialized programs. Having one special-case program is better than changing all the general ones, right? Ciao, Markus
Re: [dev] ii: how to process out in a pipeline and still page with less
Hello, Markus, Thank for filling in the details. I should do more research next time. I tried to write a program that does the same as stdbuf(1), but using setbuf(3). Unfortunately it seems the buffering mode is reset across exec(3), since my program did not work. If it did that would be a clean solution. Does the buffering happen on the input or on the output side? Or it that not the right way to look at it? Are these programs doing something wrong or is this a limitation by the specifications? Is modifying each program the best solutions we have? Granted it is not an invasive change, especially for simple pipeline-processing programs, but making such extensions could bring portability issues. Rodrigo. signature.asc Description: PGP signature
Re: [dev] ii: how to process out in a pipeline and still page with less
On Sat, May 28, 2022 at 06:09:04PM +, Hadrien Lacour wrote: > Now, I wonder how it'd be fixed ("it" being how does the read end of the pipe > signal to the write one the kind of buffering it wants) in a perfect world. The problem ultimately stems from the mistaken idea that buffering is invisible to the user. Which is true if the pipeline ultimately terminates in a disk file or some such, but not if the pipeline ultimately terminates on the terminal. But who knows if that is the case? Pipelines ending in a call to "less" will terminate on the terminal, pipelines ending in a call to "nc" will not. So the shell can't know, only the last command can know. So to make this work automatically, the last command would have to be able to somehow inform all commands in the pipeline of its intentions. Sadly, pipes are unidirectional, and in general it is impossible to figure out the process on the other side of the pipe. But even if that was possible, now what? Send a signal to the other side to please unbuffer your output? That might actually work, but would require each and every program to make intelligent decisions about how to handle that signal. More importantly, it would require each and every UNIX programmer to agree. Both on a signal and the behavior there, and on the necessity of it all. Frankly, I have little hope for that ever happening. In a perfect world, yes, it could be done, but in a perfect world we'd have brain-computer-interfaces so that the machines understand our intentions. We'd not be stuck on emulations of 1960s teletypes to do the same. Besides, adding more automagic code that works different based on the type of output device is going to make debugging shell scripts even harder than it already is. Ciao, Markus
Re: [dev] ii: how to process out in a pipeline and still page with less
On Sat, May 28, 2022 at 07:58:40PM +0200, Markus Wichmann wrote: > > You can use stdbuf(1) to modify that aspect without touching the > > program source itself. > > > > Had to look up the source for that. I had heard of stdbuf, but I always > thought that that was impossible. How can one process change a > process-internal detail of another program? One that is not inherited > through fork() or exec(), I mean. Well, turns out it is impossible. > stdbuf sets LD_PRELOAD and execs the command line, and the changing of > the buffer modes happens in the library. > > That means that whole thing only works if: > - you have the target program linked dynamically > - you have stdbuf and the target program linked against the same libc > - the target program doesn't change buffering modes later, anyway > - the target program does not have elevated privilege. > You know what, thanks for looking it up, I also thought it was using some kind of fork or ptrace hack. The man page doesn't even mention this =( Now, I wonder how it'd be fixed ("it" being how does the read end of the pipe signal to the write one the kind of buffering it wants) in a perfect world. An environment variable read by the libc would work but is kind of ugly. A pipe flag together with a special sh syntax would be even uglier. H.
Re: [dev] ii: how to process out in a pipeline and still page with less
On Sat, May 28, 2022 at 08:38:49AM +, Hadrien Lacour wrote: > On Sat, May 28, 2022 at 03:33:16AM +, Rodrigo Martins wrote: > > Hello, > > > > The problem here is I/O buffering. I suspect it to happen in the C > > standard library, specifically on the printf function family. You know, that is the sort of claim that ought to be researched. First of all, it is stdio that is providing buffering, if any. For reasons I won't go into, musl's printf functions will provide a temporary buffer on unbuffered files, but the buffer is then flushed before the functions return (else the buffer would be invalid). > > If I > > recall, the C standard says stdio is line-buffered when the file is > > an interactive device and let's it be fully buffered otherwise. Not quite. I looked it up in both POSIX and C, and found that - stderr is not fully buffered - stdin and stdout are fully buffered if and only if they are determined to not be interactive streams. This means the standard allows "line buffered" and "unbuffered" for stderr, and also those two modes for stdin and stdout for interactive streams. But yes, in practice we usually see stderr be entirely unbuffered, and stdin and stdout be line buffered on terminals and fully buffered on everything else. > You can use stdbuf(1) to modify that aspect without touching the > program source itself. > Had to look up the source for that. I had heard of stdbuf, but I always thought that that was impossible. How can one process change a process-internal detail of another program? One that is not inherited through fork() or exec(), I mean. Well, turns out it is impossible. stdbuf sets LD_PRELOAD and execs the command line, and the changing of the buffer modes happens in the library. That means that whole thing only works if: - you have the target program linked dynamically - you have stdbuf and the target program linked against the same libc - the target program doesn't change buffering modes later, anyway - the target program does not have elevated privilege. Ciao, Markus
Re: [dev] ii: how to process out in a pipeline and still page with less
On Sat, May 28, 2022 at 03:33:16AM +, Rodrigo Martins wrote: > Hello, > > The problem here is I/O buffering. I suspect it to happen in the C standard > library, specifically on the printf function family. If I recall, the C > standard says stdio is line-buffered when the file is an interactive device > and let's it be fully buffered otherwise. This is likely why you see > different behavior with and without less on the pipeline. > I don't yet have a clear solution to this problem that doesn't involve > modifying each program in the pipeline, but I've annexed a C source as an > example that may be used in place of tr to replace single chars. This program > is not supposed to buffer any I/O. > I see tee "shall not buffer output". Another possibility is the setbuf > function, but I'm not sure it can be used without editing each program. More > investigation is needed. > > Rodrigo. You can use stdbuf(1) to modify that aspect without touching the program source itself.
Re: [dev] ii: how to process out in a pipeline and still page with less
Hello, The problem here is I/O buffering. I suspect it to happen in the C standard library, specifically on the printf function family. If I recall, the C standard says stdio is line-buffered when the file is an interactive device and let's it be fully buffered otherwise. This is likely why you see different behavior with and without less on the pipeline. I don't yet have a clear solution to this problem that doesn't involve modifying each program in the pipeline, but I've annexed a C source as an example that may be used in place of tr to replace single chars. This program is not supposed to buffer any I/O. I see tee "shall not buffer output". Another possibility is the setbuf function, but I'm not sure it can be used without editing each program. More investigation is needed. Rodrigo. signature.asc Description: PGP signature
Re: [dev] ii: how to process out in a pipeline and still page with less
"Greg Reagle" wrote: > I have a file named "out" (from ii) that I want to view. Of course, it can > grow while I am viewing it. I can view it with "tail -f out" or "less +F out > ", both of which work. I also want to apply some processing in a pipeline, > something like "tail -f out | tr a A | less" but that does not work. The > less command ignores my keystrokes (unless I hit Ctrl-C, but that kills tail > and tr). The "tr a A" command is arbitrary; you can substitute whatever > processing you want, even just cat. Hello Greg, this is a fundamental limitation of stdin and the tty subsystem. A while ago I had my head banging against a wall trying to write an application that can read keyboard input from the stdin after all the data has been drained (read) from the pipe. Turns out it is impossible to reset the descriptor fd that is used for reading the stdin to the state as if there was no pipe at all. I had to resort to looking over implementation of unix program less to find out how they do the keyboard input such that it can handle both inputs at the same time. What they do is just swap between different special fd's one for keyboard and other one for stdin. However even after taking apart the code from less into a smaller test program I never been able to reproduce the same behavior that of less. If using the poll() function to wait for keyboard input, if the program is piped into the poll(); call becomes nonblocking even if special descriptors are setup using dup2(); calls or changing some flags using fcntl();. Till this day I have no idea how to implement this specific behavior of less and if I be honest it's fucking black magic. So I suspect the reason you can't use less in that scenario you described is exactly because of this black magic that onbody knows about since there is absolutely no documentation on how to make it work. Anybody else reading this, if you be kind enough to provide a small sample C program that can read the data from the pipe, ie do echo "hello world" | ./a.out and be able to recover after reading the "hello world", put the stdin back into "blocking" mode so that it starts to wait for keyboard input after that just like it would do if there was no pipe thing going on. Also as a bonus, your stdout should remain unaffected, so you should be able to just write(1, "hello world\n", ...); being displayed as expected. And this must be done using only functions specified in posix/linux kernel, so no fgets, no fgetc or any other input reading function that hides what is actually going on. Use functions like open(), fcntl(), poll(), dup() only. I believe once this mystery is uncovered it would be possible to solve your issue with less, since it would be clear on how to manipulate the stdin descriptor to do virtually anything we want. I also remember reading some neckbeard guy post about how less is not technically a unix program because of how it interfaces with the pipes and stdin, whatever that means. Maybe we are all trying to do things that were not originally envisioned on this interface. It is either you read from pipe or you read from keyboard, having both at the same time - heresy. > This command "tail -f out | tr a A" is functional and has no bugs, but it > doesn't let me use the power of less, which I crave. > > This command "tail out | tr a A | less" is functional and has no bugs, but > it doesn't let me see newly appended lines. > > Can I use the power of the Unix pipeline to do text processing and the > power of less for excellent paging and still be able to see new lines as they > are appended? Why doesn't or can't less continue to monitor stdin from the > pipeline and respond to my keystrokes from the tty? > > I am using Debian 11 in case it matters, with fish. But I am happy to try > other shells. In fact I already have and that doesn't seem to help. I have > also tried more, most, nano -, and vi -, instead of less, to no avail. Unsuprising. Best wishes, Kyryl
Re: [dev] ii: how to process out in a pipeline and still page with less
May 27, 2022, 11:43 AM, "Greg Reagle" mailto:l...@speedpost.net?to=%22Greg%20Reagle%22%20%3Clist%40speedpost.net%3E > wrote: > > I have a file named "out" (from ii) that I want to view. Of course, it can > grow while I am viewing it. I can view it with "tail -f out" or "less +F > out", both of which work. I also want to apply some processing in a pipeline, > something like "tail -f out | tr a A | less" but that does not work. The less > command ignores my keystrokes (unless I hit Ctrl-C, but that kills tail and > tr). The "tr a A" command is arbitrary; you can substitute whatever > processing you want, even just cat. > > This command "tail -f out | tr a A" is functional and has no bugs, but it > doesn't let me use the power of less, which I crave. > > This command "tail out | tr a A | less" is functional and has no bugs, but it > doesn't let me see newly appended lines. > > Can I use the power of the Unix pipeline to do text processing and the power > of less for excellent paging and still be able to see new lines as they are > appended? Why doesn't or can't less continue to monitor stdin from the > pipeline and respond to my keystrokes from the tty? > > I am using Debian 11 in case it matters, with fish. But I am happy to try > other shells. In fact I already have and that doesn't seem to help. I have > also tried more, most, nano -, and vi -, instead of less, to no avail. > Hi Greg, Why don't you just save the output to the temporary file after your processing? Like ``` tail -f out | tr a A > out.post & less +F out.post ``` If you ran something like this in a script, you may want to ensure everything gets cleaned up when it exists. Another option is to use vi/vim and have it periodically reload the file. >Why doesn't or can't less continue to monitor stdin from the pipeline and >respond to my keystrokes from the tty? Probably because less doesn't know it should check a tty. Programs like less often check to see if stdin refers to a tty and in the pipeline above, less's stdin isn't a tty. See `echo | tty` vs `tty`. You may be able to modify less to check to see if stdout/stderr refers to a tty instead. Arthur
Re: [dev] ii: how to process out in a pipeline and still page with less
On Fri, May 27, 2022 at 02:43:03PM -0400, Greg Reagle wrote: > I have a file named "out" (from ii) that I want to view. Of course, it can > grow while I am viewing it. I can view it with "tail -f out" or "less +F > out", both of which work. I also want to apply some processing in a > pipeline, something like "tail -f out | tr a A | less" but that does not > work. The less command ignores my keystrokes (unless I hit Ctrl-C, but that > kills tail and tr). The "tr a A" command is arbitrary; you can substitute > whatever processing you want, even just cat. > > This command "tail -f out | tr a A" is functional and has no bugs, but it > doesn't let me use the power of less, which I crave. > > This command "tail out | tr a A | less" is functional and has no bugs, but it > doesn't let me see newly appended lines. > > Can I use the power of the Unix pipeline to do text processing and the power > of less for excellent paging and still be able to see new lines as they are > appended? Why doesn't or can't less continue to monitor stdin from the > pipeline and respond to my keystrokes from the tty? > > I am using Debian 11 in case it matters, with fish. But I am happy to try > other shells. In fact I already have and that doesn't seem to help. I have > also tried more, most, nano -, and vi -, instead of less, to no avail. > I don't know ii at all, but it seems like you're searching for "less +F", to me.
Re: [dev] ii: how to process out in a pipeline and still page with less
Hello, Le ven. 27 mai 2022 à 20:45, Greg Reagle a écrit : > > I have a file named "out" (from ii) that I want to view. Of course, it can > grow while I am viewing it. I can view it with "tail -f out" or "less +F > out", both of which work. I also want to apply some processing in a > pipeline, something like "tail -f out | tr a A | less" but that does not > work. The less command ignores my keystrokes (unless I hit Ctrl-C, but that > kills tail and tr). The "tr a A" command is arbitrary; you can substitute > whatever processing you want, even just cat. > > This command "tail -f out | tr a A" is functional and has no bugs, but it > doesn't let me use the power of less, which I crave. > > This command "tail out | tr a A | less" is functional and has no bugs, but it > doesn't let me see newly appended lines. > > Can I use the power of the Unix pipeline to do text processing and the power > of less for excellent paging and still be able to see new lines as they are > appended? Why doesn't or can't less continue to monitor stdin from the > pipeline and respond to my keystrokes from the tty? > > I am using Debian 11 in case it matters, with fish. But I am happy to try > other shells. In fact I already have and that doesn't seem to help. I have > also tried more, most, nano -, and vi -, instead of less, to no avail. > The simplest solution to avoid the side effect of killing tail and tr when using ^C to quit "follow mode" in less is to redirect the output of tr to another file, and use less +F to open that file. Like so: tail -f out | tr a A >out2 & less +F out2 Some other solutions: https://unix.stackexchange.com/a/659175 Best, AN