Re: [dev] ii: how to process out in a pipeline and still page with less

2022-05-31 Thread Georg Lehner

Hello,

Does nobuf(1) help?

  http://jdebp.uk/Softwares/djbwares/guide/nobuf.html

Note: it tackles exactly the POSIX feature to line buffer output to 
tty's by providing one to the program in the pipeline, but without using 
any shared-object magic.


Have not used it (yet) though.

Best Regards,

  Georg

On 5/30/22 08:29, Josuah Demangeon wrote:

Rodrigo Martins wrote:

What if instead of changing every program we changed the standard
library? We could make stdio line buffered by setting an environment
variable.

I applaude this idea! Environment variables seems to be the right spot
for any config a library could need: are unobstrusive, can be set by the
program calling it, yet keep each program configurable by default.

Markus Wichmann  wrote:

The problem you run into here is that there is more than one standard
library.

The problem was stated here for libc's stdio, but I still like the idea
for new libraries: A call to getenv (which never fails), into xyz_init().

What about $DISPLAY, $MALLOC_OPTIONS (OpenBSD), or $LIBV4LCONTROL_FLAGS[1]
or some LIBXYZ_DEFAULT_DEV_FILE=/dev/xyz3?

Insane and breaking some important concept?
To keep for debugging purposes only?
Not to use for every kind of configuration?
Better let the programmer control the entirety of the library?

Although, if a library (or any program really) does not *require* any
configuration or environment variable and always works without that,
I like it even better.

[1]: 
https://github.com/philips/libv4l/blob/cdfd29/libv4lconvert/control/libv4lcontrol.c#L369-L371





Re: [dev] ii: how to process out in a pipeline and still page with less

2022-05-29 Thread Josuah Demangeon
Rodrigo Martins wrote:
> What if instead of changing every program we changed the standard
> library? We could make stdio line buffered by setting an environment
> variable.

I applaude this idea! Environment variables seems to be the right spot
for any config a library could need: are unobstrusive, can be set by the
program calling it, yet keep each program configurable by default.

Markus Wichmann  wrote:
> The problem you run into here is that there is more than one standard
> library.

The problem was stated here for libc's stdio, but I still like the idea
for new libraries: A call to getenv (which never fails), into xyz_init().

What about $DISPLAY, $MALLOC_OPTIONS (OpenBSD), or $LIBV4LCONTROL_FLAGS[1]
or some LIBXYZ_DEFAULT_DEV_FILE=/dev/xyz3?

Insane and breaking some important concept?
To keep for debugging purposes only?
Not to use for every kind of configuration?
Better let the programmer control the entirety of the library?

Although, if a library (or any program really) does not *require* any
configuration or environment variable and always works without that,
I like it even better.

[1]: 
https://github.com/philips/libv4l/blob/cdfd29/libv4lconvert/control/libv4lcontrol.c#L369-L371



Re: [dev] ii: how to process out in a pipeline and still page with less

2022-05-29 Thread Markus Wichmann
On Sun, May 29, 2022 at 10:20:05PM +, Rodrigo Martins wrote:
> It was thus said that the Great Markus Wichmann once stated:
> > And you fundamentally cannot change anything about the userspace of another 
> > program, at least not in UNIX.
>
> When I open file descriptors and exec(3) the new program inherits
> those. Is that not chaning the userspace of another process?
>

Program, not process. And no, it is changing the kernelspace of another
program. For userspace, file descriptors are just numbers. They attain
meaning only from the kernel interface.

> It was thus said that the Great Markus Wichmann once stated:
> > Having one special-case program is better than changing all the general 
> > ones, right?
>
> Sure is. Too bad the stdbuf(1) uses such a fragile mechanism.
>

Well, I cannot think of anything else they could have done.
Fundamentally, setting environment variables and hoping the target
program will interpret them correctly is about the extent of what an
external filter is capable of, here.

> What if instead of changing every program we changed the standard
> library? We could make stdio line buffered by setting an environment
> variable.
>

The problem you run into here is that there is more than one standard
library, and indeed it is even thinkable that some programming language
may shirk libc entirely. Golang has been trying their damndest at that
for a long time, just didn't go all the way and still wanted to use
libpthread. Haskell/GHC would be another candidate, as would be Pascal.

The only way to roll out a change that would affect all programs at the
same time would be a kernel update, but as discussed, this is a
userspace problem to solve.

Plus, the environment variable idea breaks with programs with elevated
privilege, but that is probably a good thing here.

Ciao,
Markus



Re: [dev] ii: how to process out in a pipeline and still page with less

2022-05-29 Thread Rodrigo Martins
It was thus said that the Great Markus Wichmann once stated:
> And you fundamentally cannot change anything about the userspace of another 
> program, at least not in UNIX.

When I open file descriptors and exec(3) the new program inherits those. Is 
that not chaning the userspace of another process?

It was thus said that the Great Markus Wichmann once stated:
> Having one special-case program is better than changing all the general ones, 
> right?

Sure is. Too bad the stdbuf(1) uses such a fragile mechanism.

What if instead of changing every program we changed the standard library? We 
could make stdio line buffered by setting an environment variable.

Rodrigo.


signature.asc
Description: PGP signature


Re: [dev] ii: how to process out in a pipeline and still page with less

2022-05-28 Thread Hadrien Lacour
On Sat, May 28, 2022 at 08:32:57PM +0200, Markus Wichmann wrote:
> ultimately terminates on the terminal. But who knows if that is the
> case? Pipelines ending in a call to "less" will terminate on the
> terminal, pipelines ending in a call to "nc" will not. So the shell
> can't know, only the last command can know.

The only solution would be to allow buffering to be passed through
exec/posix_spawn, with a way to signal tools to not go through their heuristic
logic. Then the shell could have a syntax to signal the pipeline buffering mode.

Basically, another POSIX problem that can't be fixed =).


[BLOG POST WARNING]

A better solution would require departing from UNIX/POSIX and its minimalist API
of "input: stdin/argv/signals in, output: stdout/stderr/return code out and FS
for both".

You know, years ago I found myself laughing at stuff like Lisp OSs or Spring as
overly complex academic drivel, but the more I "progress" in computing, the
more I learn of my error and understand UNIX as reactionary: rightly so,
considering Multics and PL/I and the hardware of the time; but the reasons that
made sense then don't now.
Now that I'm infatuated with Common Lisp, functions instead of executables make
perfect sense to me. In that case there would be no pipe and no buffering
problem at all, since memory is shared and you're passing objects/pointers
around instead of copying massive amounts of it via kernel.



Re: [dev] ii: how to process out in a pipeline and still page with less

2022-05-28 Thread Markus Wichmann
On Sat, May 28, 2022 at 07:19:24PM +, Rodrigo Martins wrote:
> Hello, Markus,
>
> Thank for filling in the details. I should do more research next time.
>
> I tried to write a program that does the same as stdbuf(1), but using
> setbuf(3). Unfortunately it seems the buffering mode is reset across
> exec(3), since my program did not work. If it did that would be a
> clean solution.
>

But it cannot possibly happen that way, because the buffering set with
setbuf(3) is solely in userspace. And you fundamentally cannot change
anything about the userspace of another program, at least not in UNIX.

> Does the buffering happen on the input or on the output side? Or it
> that not the right way to look at it? Are these programs doing
> something wrong or is this a limitation by the specifications?
>

There is too much buffering and changing according to file mode going on
here. I had a program I called "syslogd" (on Windows) that would simply
listen to the syslog port on UDP and print all the packages that
arrived. Running just "syslogd" on its own would print all packages as
they came in, but running "syslogd | tr a A" would print blocks of data
long after the fact, making it useless for my usecase.

Why? Because for one, syslogd's output buffering mode had changed to
"fully buffered", now that the output was a pipe rather than a terminal.
tr's input buffering mode was also fully buffered now, but that doesn't
much matter, since the data is usually passed on quickly. It's just that
the data is only actually sent on from syslogd when syslogd's buffer is
full. Cygwin by default defines a BUFSIZ of 1024, so that's the buffer
that has to be filled first.

tr's buffer on input doesn't much matter, because the input buffer is
refilled on underflow, and then filled only as far as is possible in one
go. And tr's output buffer is line buffered in the above application,
making it perfect for the application. No, the problem was the change in
output mode for my own application happening as part of being in the
middle of a pipeline.

> Is modifying each program the best solutions we have? Granted it is
> not an invasive change, especially for simple pipeline-processing
> programs, but making such extensions could bring portability issues.
>

Modifying all programs is typically a bad solution. It is what the
systemd people are doing, and most here despise them for that if nothing
else. It just appears there is no simple solution for this problem other
than writing specialized programs. Having one special-case program is
better than changing all the general ones, right?

Ciao,
Markus



Re: [dev] ii: how to process out in a pipeline and still page with less

2022-05-28 Thread Rodrigo Martins
Hello, Markus,

Thank for filling in the details. I should do more research next time.

I tried to write a program that does the same as stdbuf(1), but using 
setbuf(3). Unfortunately it seems the buffering mode is reset across exec(3), 
since my program did not work. If it did that would be a clean solution.

Does the buffering happen on the input or on the output side? Or it that not 
the right way to look at it? Are these programs doing something wrong or is 
this a limitation by the specifications?

Is modifying each program the best solutions we have? Granted it is not an 
invasive change, especially for simple pipeline-processing programs, but making 
such extensions could bring portability issues.

Rodrigo.


signature.asc
Description: PGP signature


Re: [dev] ii: how to process out in a pipeline and still page with less

2022-05-28 Thread Markus Wichmann
On Sat, May 28, 2022 at 06:09:04PM +, Hadrien Lacour wrote:
> Now, I wonder how it'd be fixed ("it" being how does the read end of the pipe
> signal to the write one the kind of buffering it wants) in a perfect world.

The problem ultimately stems from the mistaken idea that buffering is
invisible to the user. Which is true if the pipeline ultimately
terminates in a disk file or some such, but not if the pipeline
ultimately terminates on the terminal. But who knows if that is the
case? Pipelines ending in a call to "less" will terminate on the
terminal, pipelines ending in a call to "nc" will not. So the shell
can't know, only the last command can know.

So to make this work automatically, the last command would have to be
able to somehow inform all commands in the pipeline of its intentions.
Sadly, pipes are unidirectional, and in general it is impossible to
figure out the process on the other side of the pipe.

But even if that was possible, now what? Send a signal to the other side
to please unbuffer your output? That might actually work, but would
require each and every program to make intelligent decisions about how
to handle that signal. More importantly, it would require each and every
UNIX programmer to agree. Both on a signal and the behavior there, and
on the necessity of it all. Frankly, I have little hope for that ever
happening.

In a perfect world, yes, it could be done, but in a perfect world we'd
have brain-computer-interfaces so that the machines understand our
intentions. We'd not be stuck on emulations of 1960s teletypes to do the
same.

Besides, adding more automagic code that works different based on the
type of output device is going to make debugging shell scripts even
harder than it already is.

Ciao,
Markus



Re: [dev] ii: how to process out in a pipeline and still page with less

2022-05-28 Thread Hadrien Lacour
On Sat, May 28, 2022 at 07:58:40PM +0200, Markus Wichmann wrote:
> > You can use stdbuf(1) to modify that aspect without touching the
> > program source itself.
> >
>
> Had to look up the source for that. I had heard of stdbuf, but I always
> thought that that was impossible. How can one process change a
> process-internal detail of another program? One that is not inherited
> through fork() or exec(), I mean. Well, turns out it is impossible.
> stdbuf sets LD_PRELOAD and execs the command line, and the changing of
> the buffer modes happens in the library.
>
> That means that whole thing only works if:
> - you have the target program linked dynamically
> - you have stdbuf and the target program linked against the same libc
> - the target program doesn't change buffering modes later, anyway
> - the target program does not have elevated privilege.
>

You know what, thanks for looking it up, I also thought it was using some kind
of fork or ptrace hack. The man page doesn't even mention this =(

Now, I wonder how it'd be fixed ("it" being how does the read end of the pipe
signal to the write one the kind of buffering it wants) in a perfect world.
An environment variable read by the libc would work but is kind of ugly. A pipe
flag together with a special sh syntax would be even uglier. H.



Re: [dev] ii: how to process out in a pipeline and still page with less

2022-05-28 Thread Markus Wichmann
On Sat, May 28, 2022 at 08:38:49AM +, Hadrien Lacour wrote:
> On Sat, May 28, 2022 at 03:33:16AM +, Rodrigo Martins wrote:
> > Hello,
> >
> > The problem here is I/O buffering. I suspect it to happen in the C
> > standard library, specifically on the printf function family.

You know, that is the sort of claim that ought to be researched. First
of all, it is stdio that is providing buffering, if any. For reasons I
won't go into, musl's printf functions will provide a temporary buffer
on unbuffered files, but the buffer is then flushed before the functions
return (else the buffer would be invalid).

> > If I
> > recall, the C standard says stdio is line-buffered when the file is
> > an interactive device and let's it be fully buffered otherwise.

Not quite. I looked it up in both POSIX and C, and found that
- stderr is not fully buffered
- stdin and stdout are fully buffered if and only if they are
  determined to not be interactive streams.

This means the standard allows "line buffered" and "unbuffered" for
stderr, and also those two modes for stdin and stdout for interactive
streams.

But yes, in practice we usually see stderr be entirely unbuffered, and
stdin and stdout be line buffered on terminals and fully buffered on
everything else.

> You can use stdbuf(1) to modify that aspect without touching the
> program source itself.
>

Had to look up the source for that. I had heard of stdbuf, but I always
thought that that was impossible. How can one process change a
process-internal detail of another program? One that is not inherited
through fork() or exec(), I mean. Well, turns out it is impossible.
stdbuf sets LD_PRELOAD and execs the command line, and the changing of
the buffer modes happens in the library.

That means that whole thing only works if:
- you have the target program linked dynamically
- you have stdbuf and the target program linked against the same libc
- the target program doesn't change buffering modes later, anyway
- the target program does not have elevated privilege.

Ciao,
Markus



Re: [dev] ii: how to process out in a pipeline and still page with less

2022-05-28 Thread Hadrien Lacour
On Sat, May 28, 2022 at 03:33:16AM +, Rodrigo Martins wrote:
> Hello,
>
> The problem here is I/O buffering. I suspect it to happen in the C standard 
> library, specifically on the printf function family. If I recall, the C 
> standard says stdio is line-buffered when the file is an interactive device 
> and let's it be fully buffered otherwise. This is likely why you see 
> different behavior with and without less on the pipeline.
> I don't yet have a clear solution to this problem that doesn't involve 
> modifying each program in the pipeline, but I've annexed a C source as an 
> example that may be used in place of tr to replace single chars. This program 
> is not supposed to buffer any I/O.
> I see tee "shall not buffer output". Another possibility is the setbuf 
> function, but I'm not sure it can be used without editing each program. More 
> investigation is needed.
>
> Rodrigo.

You can use stdbuf(1) to modify that aspect without touching the program source
itself.



Re: [dev] ii: how to process out in a pipeline and still page with less

2022-05-28 Thread Rodrigo Martins
Hello,

The problem here is I/O buffering. I suspect it to happen in the C standard 
library, specifically on the printf function family. If I recall, the C 
standard says stdio is line-buffered when the file is an interactive device and 
let's it be fully buffered otherwise. This is likely why you see different 
behavior with and without less on the pipeline.
I don't yet have a clear solution to this problem that doesn't involve 
modifying each program in the pipeline, but I've annexed a C source as an 
example that may be used in place of tr to replace single chars. This program 
is not supposed to buffer any I/O.
I see tee "shall not buffer output". Another possibility is the setbuf 
function, but I'm not sure it can be used without editing each program. More 
investigation is needed.

Rodrigo.


signature.asc
Description: PGP signature


Re: [dev] ii: how to process out in a pipeline and still page with less

2022-05-27 Thread Kyryl Melekhin
"Greg Reagle"  wrote:

> I have a file named "out" (from ii) that I want to view.  Of course, it can
> grow while I am viewing it.  I can view it with "tail -f out" or "less +F out
> ", both of which work.  I also want to apply some processing in a pipeline,
> something like "tail -f out | tr a A | less" but that does not work.  The
> less command ignores my keystrokes (unless I hit Ctrl-C, but that kills tail
> and tr).  The "tr a A" command is arbitrary; you can substitute whatever
> processing you want, even just cat.

Hello Greg, this is a fundamental limitation of stdin and the tty
subsystem. A while ago I had my head banging against a wall trying to
write an application that can read keyboard input from the stdin
after all the data has been drained (read) from the pipe. Turns out
it is impossible to reset the descriptor fd that is used for reading
the stdin to the state as if there was no pipe at all. I had to resort
to looking over implementation of unix program less to find out
how they do the keyboard input such that it can handle both inputs
at the same time. What they do is just swap between different special
fd's one for keyboard and other one for stdin. However even after
taking apart the code from less into a smaller test program I never
been able to reproduce the same behavior that of less. If using
the poll() function to wait for keyboard input, if the program is
piped into the poll(); call becomes nonblocking even if special
descriptors are setup using dup2(); calls or changing some flags
using fcntl();. Till this day I have no idea how to implement
this specific behavior of less and if I be honest it's fucking
black magic. So I suspect the reason you can't use less in that
scenario you described is exactly because of this black magic
that onbody knows about since there is absolutely no documentation
on how to make it work.

Anybody else reading this, if you be kind enough to provide a
small sample C program that can read the data from the pipe, ie
do echo "hello world" | ./a.out and be able to recover after reading
the "hello world", put the stdin back into "blocking" mode so that
it starts to wait for keyboard input after that just like it would do
if there was no pipe thing going on. Also as a bonus, your stdout
should remain unaffected, so you should be able to just write(1, "hello 
world\n", ...);
being displayed as expected.
And this must be done using only functions specified in posix/linux kernel,
so no fgets, no fgetc or any other input reading function that hides what is
actually going on. Use functions like open(), fcntl(), poll(), dup() only.

I believe once this mystery is uncovered it would be possible to
solve your issue with less, since it would be clear on how
to manipulate the stdin descriptor to do virtually anything we want.

I also remember reading some neckbeard guy post about how less
is not technically a unix program because of how it interfaces
with the pipes and stdin, whatever that means. Maybe we are
all trying to do things that were not originally envisioned
on this interface. It is either you read from pipe or you
read from keyboard, having both at the same time - heresy.

> This command "tail -f out | tr a A" is functional and has no bugs, but it 
> doesn't let me use the power of less, which I crave.
>
> This command "tail out | tr a A | less" is functional and has no bugs, but 
> it doesn't let me see newly appended lines.
>
> Can I use the power of the Unix pipeline to do text processing and the 
> power of less for excellent paging and still be able to see new lines as they 
> are appended?  Why doesn't or can't less continue to monitor stdin from the 
> pipeline and respond to my keystrokes from the tty?
>
> I am using Debian 11 in case it matters, with fish.  But I am happy to try 
> other shells.  In fact I already have and that doesn't seem to help.  I have 
> also tried more, most, nano -, and vi -, instead of less, to no avail.

Unsuprising.

Best wishes,
Kyryl



Re: [dev] ii: how to process out in a pipeline and still page with less

2022-05-27 Thread taaparthur
May 27, 2022, 11:43 AM, "Greg Reagle" mailto:l...@speedpost.net?to=%22Greg%20Reagle%22%20%3Clist%40speedpost.net%3E > 
wrote:

> 
> I have a file named "out" (from ii) that I want to view. Of course, it can 
> grow while I am viewing it. I can view it with "tail -f out" or "less +F 
> out", both of which work. I also want to apply some processing in a pipeline, 
> something like "tail -f out | tr a A | less" but that does not work. The less 
> command ignores my keystrokes (unless I hit Ctrl-C, but that kills tail and 
> tr). The "tr a A" command is arbitrary; you can substitute whatever 
> processing you want, even just cat.
> 
> This command "tail -f out | tr a A" is functional and has no bugs, but it 
> doesn't let me use the power of less, which I crave.
> 
> This command "tail out | tr a A | less" is functional and has no bugs, but it 
> doesn't let me see newly appended lines.
> 
> Can I use the power of the Unix pipeline to do text processing and the power 
> of less for excellent paging and still be able to see new lines as they are 
> appended? Why doesn't or can't less continue to monitor stdin from the 
> pipeline and respond to my keystrokes from the tty?
> 
> I am using Debian 11 in case it matters, with fish. But I am happy to try 
> other shells. In fact I already have and that doesn't seem to help. I have 
> also tried more, most, nano -, and vi -, instead of less, to no avail.
>

Hi Greg,

Why don't you just save the output to the temporary file after your processing? 
Like
```
tail -f out | tr a A > out.post &
less +F out.post
```
If you ran something like this in a script, you may want to ensure everything 
gets cleaned up when it exists.

Another option is to use vi/vim and have it periodically reload the file. 

>Why doesn't or can't less continue to monitor stdin from the pipeline and 
>respond to my keystrokes from the tty?

Probably because less doesn't know it should check a tty. Programs like less 
often check to see if stdin refers to a tty and in the pipeline above, less's 
stdin isn't a tty. See `echo | tty` vs `tty`. You may be able to modify less to 
check to see if stdout/stderr refers to a tty instead.

Arthur



Re: [dev] ii: how to process out in a pipeline and still page with less

2022-05-27 Thread Hadrien Lacour
On Fri, May 27, 2022 at 02:43:03PM -0400, Greg Reagle wrote:
> I have a file named "out" (from ii) that I want to view.  Of course, it can 
> grow while I am viewing it.  I can view it with "tail -f out" or "less +F 
> out", both of which work.  I also want to apply some processing in a 
> pipeline, something like "tail -f out | tr a A | less" but that does not 
> work.  The less command ignores my keystrokes (unless I hit Ctrl-C, but that 
> kills tail and tr).  The "tr a A" command is arbitrary; you can substitute 
> whatever processing you want, even just cat.
>
> This command "tail -f out | tr a A" is functional and has no bugs, but it 
> doesn't let me use the power of less, which I crave.
>
> This command "tail out | tr a A | less" is functional and has no bugs, but it 
> doesn't let me see newly appended lines.
>
> Can I use the power of the Unix pipeline to do text processing and the power 
> of less for excellent paging and still be able to see new lines as they are 
> appended?  Why doesn't or can't less continue to monitor stdin from the 
> pipeline and respond to my keystrokes from the tty?
>
> I am using Debian 11 in case it matters, with fish.  But I am happy to try 
> other shells.  In fact I already have and that doesn't seem to help.  I have 
> also tried more, most, nano -, and vi -, instead of less, to no avail.
>

I don't know ii at all, but it seems like you're searching for "less +F", to me.



Re: [dev] ii: how to process out in a pipeline and still page with less

2022-05-27 Thread Alexandre Niveau
Hello,

Le ven. 27 mai 2022 à 20:45, Greg Reagle  a écrit :
>
> I have a file named "out" (from ii) that I want to view.  Of course, it can 
> grow while I am viewing it.  I can view it with "tail -f out" or "less +F 
> out", both of which work.  I also want to apply some processing in a 
> pipeline, something like "tail -f out | tr a A | less" but that does not 
> work.  The less command ignores my keystrokes (unless I hit Ctrl-C, but that 
> kills tail and tr).  The "tr a A" command is arbitrary; you can substitute 
> whatever processing you want, even just cat.
>
> This command "tail -f out | tr a A" is functional and has no bugs, but it 
> doesn't let me use the power of less, which I crave.
>
> This command "tail out | tr a A | less" is functional and has no bugs, but it 
> doesn't let me see newly appended lines.
>
> Can I use the power of the Unix pipeline to do text processing and the power 
> of less for excellent paging and still be able to see new lines as they are 
> appended?  Why doesn't or can't less continue to monitor stdin from the 
> pipeline and respond to my keystrokes from the tty?
>
> I am using Debian 11 in case it matters, with fish.  But I am happy to try 
> other shells.  In fact I already have and that doesn't seem to help.  I have 
> also tried more, most, nano -, and vi -, instead of less, to no avail.
>

The simplest solution to avoid the side effect of killing tail and tr
when using ^C to quit "follow mode" in less is to redirect the output
of tr to another file, and use less +F to open that file. Like so:
tail -f out | tr a A >out2 & less +F out2

Some other solutions: https://unix.stackexchange.com/a/659175

Best,

AN