Re: are head/tail allowed (required?) to rewind stdin
2018-04-30 16:49:34 +0100, Geoff Clare: [...] > Yes, but it clearly shows that this offset is intended to be honoured > by the next utility to read from stdin, when it says: > > tail -n +2 file > (sed -n 1q; cat) < file > [...] > The second command is equivalent to the first only when the file is > seekable. True, but "cat" spec says it reads stdin, not that it reads stdin *from the start of the file*. The line-number addresses for "sed" are expressed in terms of "input lines" not "nth line of *files*". [...] > > But would you agree that it's not what the text currently says? > > Should we create a ticket for that? > > Yes, it needs a ticket. It may well affect a lot of utilities, so > perhaps adding something in XCU 1.4 under STDIN would be the best > solution. [...] Thanks. Though it would certainly help to have a clarification in XCU 1.4 under STDIN, I don't think the problem is that bad. I'd say the problem is mostly with utilities that explicitely reference offsets within files/input. There's also a problem with dd whose "seek" description is wrong (it says the offset should be relative to the start of the file, while when there's no of=file, the offset should be relative to the current position on stdout). Now it's true that there are a lot of cases where utility descriptions reference "input files" instead of just "input" which can be misleading/ambiguous when dealing with stdin. For instance, in: { head -n 1 > /dev/null # skip header join/comm - file2 } < file1 file1 may not be sorted as the header would likely break the sorting, but it's not a problem as we removed it. It's OK because we skip it before feeding to join. What matters is that the input join sees is sorted even in the input file is not sorted itself. Still, I don't think anyone would infer from the current text that the behaviour is unspecified because the input files are not sorted. -- Stephane
Re: are head/tail allowed (required?) to rewind stdin
Stephane Chazelas wrote, on 30 Apr 2018: > > 2018-04-30 15:50:10 +0100, Geoff Clare: > > Stephane Chazelas > > wrote, on 30 > > Apr 2018: > > > > > > The head/tail specifications refer to line/byte offsets as > > > offsets within *files* as opposed to *input*. > > > > > > Does it mean that: > > > > > > { head -n 1; head -n 1; } < file > > > { tail -n 1; tail -n 1; } < file > > > > > > are required to print the first/last line of "file" twice > > > (assuming "file" is seekable and is not modified between the two > > > head/tail invocations)? > > > > > > In the case of "head", I can't find any implementation that > > > does, they all return the first line of their *input* as opposed > > > to the first line of whatever file may be open on stdin. > > > > The intended behaviour of the head example is that the first head > > writes the first line of "file" and the second head writes the second > > line of "file". See XCU 1.4 under INPUT FILES. > > Thanks, but that text covers where the utility shall *leave* > stdin's position *after* it has processed its input, but not > whether it may change it before reading the input. Yes, but it clearly shows that this offset is intended to be honoured by the next utility to read from stdin, when it says: tail -n +2 file (sed -n 1q; cat) < file [...] The second command is equivalent to the first only when the file is seekable. > [...] > > > However, in the case of "tail", for seekable stdin, traditional > > > implementations used to seek to the end of the file open on > > > stdin and look backward for the last line from there even if the > > > initial position of stdin was past the start of that last line > > > (it could even be past the end of the file). > > > > The intention is certainly that when reading from standard input, > > tail should not write anything that is before the initial offset of > > standard input. > [...] > > Thanks. > > But would you agree that it's not what the text currently says? > Should we create a ticket for that? Yes, it needs a ticket. It may well affect a lot of utilities, so perhaps adding something in XCU 1.4 under STDIN would be the best solution. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: are head/tail allowed (required?) to rewind stdin
2018-04-30 15:50:10 +0100, Geoff Clare: > Stephane Chazelas > wrote, on 30 Apr 2018: > > > > The head/tail specifications refer to line/byte offsets as > > offsets within *files* as opposed to *input*. > > > > Does it mean that: > > > > { head -n 1; head -n 1; } < file > > { tail -n 1; tail -n 1; } < file > > > > are required to print the first/last line of "file" twice > > (assuming "file" is seekable and is not modified between the two > > head/tail invocations)? > > > > In the case of "head", I can't find any implementation that > > does, they all return the first line of their *input* as opposed > > to the first line of whatever file may be open on stdin. > > The intended behaviour of the head example is that the first head > writes the first line of "file" and the second head writes the second > line of "file". See XCU 1.4 under INPUT FILES. Thanks, but that text covers where the utility shall *leave* stdin's position *after* it has processed its input, but not whether it may change it before reading the input. (note that it started from that unix.stackexchange.com Q&A https://unix.stackexchange.com/a/239562 where I already quote part of the "INPUT FILES" section, but to discuss where head leaves the position after). In { tail -n 1; tail -n 1; } < file outputting the last line twice in some implementations, the problem is not that the first tail leaves stdin position at the start of the last line (it doesn't, it leaves it at the end of the last line, or possibly even further if it was already past the end of the file in the implementations that I consider correct) But that the second tail then moves the position back (rewinds) from where the first tail left it (in those implementations that I consider incorrect). [...] > > However, in the case of "tail", for seekable stdin, traditional > > implementations used to seek to the end of the file open on > > stdin and look backward for the last line from there even if the > > initial position of stdin was past the start of that last line > > (it could even be past the end of the file). > > The intention is certainly that when reading from standard input, > tail should not write anything that is before the initial offset of > standard input. [...] Thanks. But would you agree that it's not what the text currently says? Should we create a ticket for that? -- Stephane
Re: are head/tail allowed (required?) to rewind stdin
Stephane Chazelas wrote, on 30 Apr 2018: > > The head/tail specifications refer to line/byte offsets as > offsets within *files* as opposed to *input*. > > Does it mean that: > > { head -n 1; head -n 1; } < file > { tail -n 1; tail -n 1; } < file > > are required to print the first/last line of "file" twice > (assuming "file" is seekable and is not modified between the two > head/tail invocations)? > > In the case of "head", I can't find any implementation that > does, they all return the first line of their *input* as opposed > to the first line of whatever file may be open on stdin. The intended behaviour of the head example is that the first head writes the first line of "file" and the second head writes the second line of "file". See XCU 1.4 under INPUT FILES. However, I can see that text such as "The first number lines of each input file shall be copied" for head -n is misleading in this respect. > However, in the case of "tail", for seekable stdin, traditional > implementations used to seek to the end of the file open on > stdin and look backward for the last line from there even if the > initial position of stdin was past the start of that last line > (it could even be past the end of the file). The intention is certainly that when reading from standard input, tail should not write anything that is before the initial offset of standard input. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England