On 03/30/2014 09:40 PM, Denis Excoffier wrote:
> Hello,
>
> head -n -1 -- -
> or equivalently
> head -n -1
> returns immediately (ie does not wait for further stdin) and prints nothing.
>
> I use coreutils 8.22 compiled (with gcc-4.8.2) on top of darwin 13.1.0
> (Mavericks).
>
> However the following seem to work perfectly:
> head -n 1
> head -c -1
> cat | head -n -1
> head -n -1 ---presume-input-pipe
> on cygwin: head -n -1
>
> What is weird on my system is lseek() at the beginning of
> elide_tail_lines_file():
> lseek(fd, 0, SEEK_CUR) returns a (random?) number, something like 6735, 539
> etc.
> lseek(fd, 0, SEEK_END) returns 0
So:
head -n -1 # returns immediately
while:
cat | head -n -1 # waits as expected
It seems we might be using non portable code here. POSIX says:
"The behavior of lseek() on devices which are incapable of seeking is
implementation-defined.
The value of the file offset associated with such a device is undefined."
and also:
"The lseek() function shall fail [with ESPIPE if] the fildes argument is
associated with a pipe, FIFO, or socket"
So tty devices would come outside of this POSIX scope.
Furthermore the FreeBSD lseek man pages states:
"Some devices are incapable of seeking and POSIX does not specify which
devices must support it.
Linux specific restrictions: using lseek on a tty device returns
ESPIPE. Other systems return the number of written characters, using
SEEK_SET to set the counter. Some devices, e.g. /dev/null do not cause
the error ESPIPE, but return a pointer which value is undefined."
Now head(1) isn't the only place we use this logic. In dd we have:
offset = lseek (STDIN_FILENO, 0, SEEK_CUR);
input_seekable = (0 <= offset);
I wonder should be be using something like:
bool seekable (int fd)
{
return ! isatty (fd) && lseek (fd, 0, SEEK_CUR) >= 0;
}
Though this only handles the tty case, and there
could be other devices for which this could be an issue.
So the general question is, is there a way we can robustly
determine if we have a seekable device or not?
Perhaps by using SEEK_SET in combination with SEEK_CUR,
but notice the BSD lseek man page above says that tty devices
support SEEK_SET also :/ Anyway...
Note the original head(1) code to detect seekable input was introduced with:
http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commit;h=61ba51a6
and that was changed recently due to a coverity identified logic issue, to:
http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commit;h=5fdb5082
However that now logically consistent code will return immediately in your case.
I also notice the related `head -c -1` check is more conservative
in that it only uses the more efficient lseek() code for regular files,
which would mean we don't operate as efficiently as we could on a disk device
for example. But that's much better than undefined operation of course.
If we were to do the same for lines then we would also introduce a change
in behavior with devices like /dev/zero. Currently on Linux, this will return
immediately:
head -n -1 /dev/zero
I.E. we currently treat such devices as empty, and return immediately with
success status,
whereas treating as a stream of NULs, would result in memory exhaustion while
buffering
waiting for a complete line. That is probably the more consistent operation at
least.
So the attached uses this more conservative test for the --lines=-N case.
thanks,
Pádraig.
>From 386bdbd2e9b9af1813bc4ac205755c0ce17f5cda Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <[email protected]>
Date: Mon, 31 Mar 2014 13:16:56 +0100
Subject: [PATCH] head: with --lines=-N process devices more consistently
* src/head.c (elide_tail_lines_file): Restrict lseek() logic
to regular files, as it's not possible to determine if a device
is really seekable or not. On BSD for example, lseek() returns
>= 0 for tty devices, while on Linux 0 will be returned for /dev/zero
which would therefore be ignored without this change.
Fixes http://bugs.gnu.org/17145
Redported by Denis Excoffier
---
NEWS | 4 ++++
src/head.c | 8 +++++---
2 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/NEWS b/NEWS
index 582f060..b37a21a 100644
--- a/NEWS
+++ b/NEWS
@@ -24,6 +24,10 @@ GNU coreutils NEWS -*- outline -*-
now copies all input to stdout. Previously nothing was output in this case.
[bug introduced with the --lines=-N feature in coreutils-5.0.1]
+ head --lines=-N now handles devices more consistently, not ignoring data
+ from virtual devices like /dev/zero, or on BSD systems data from tty devices.
+ [bug introduced with the --lines=-N feature in coreutils-5.0.1]
+
ln -sf now replaces symbolic links whose targets can't exist. Previously
it would display an error, requiring --no-dereference to avoid the issue.
[bug introduced in coreutils-5.3.0]
diff --git a/src/head.c b/src/head.c
index b833af6..d3d8eca 100644
--- a/src/head.c
+++ b/src/head.c
@@ -414,9 +414,9 @@ elide_tail_bytes_pipe (const char *filename, int fd, uintmax_t n_elide_0)
static bool
elide_tail_bytes_file (const char *filename, int fd, uintmax_t n_elide)
{
- struct stat stats;
+ struct stat st;
- if (presume_input_pipe || fstat (fd, &stats) || ! S_ISREG (stats.st_mode))
+ if (presume_input_pipe || fstat (fd, &st) || ! S_ISREG (st.st_mode))
{
return elide_tail_bytes_pipe (filename, fd, n_elide);
}
@@ -735,7 +735,9 @@ elide_tail_lines_seekable (const char *pretty_filename, int fd,
static bool
elide_tail_lines_file (const char *filename, int fd, uintmax_t n_elide)
{
- if (!presume_input_pipe)
+ struct stat st;
+
+ if (! presume_input_pipe && fstat (fd, &st) == 0 && S_ISREG (st.st_mode))
{
/* Find the offset, OFF, of the Nth newline from the end,
but not counting the last byte of the file.
--
1.7.7.6