Re: pax -s option and symlink targets

2021-07-04 Thread Stephane Chazelas via austin-group-l at The Open Group
2021-07-04 07:21:08 +0100, Stephane Chazelas via austin-group-l at The Open 
Group:
[...]
> Note that the equivalent --transform in the GNU implementation
> of tar lets you specify whether to apply the transformation to
> {sym,hard}links or not with flags.
[...]

Actually, libarchive's bsdtar supports the same rshRSH flags for
its -s option.

That may where the idea comes from. The s flag was added
earlier in 2008, actually based on NetBSD's pax (added in tar
and pax in 2007 see
https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=35257,
not documented in pax until 2019), but with a reverse meaning:
the flag *suppresses* the substitution on symlink targets!

libarchive switched to GNU style rhsRHS in 2011.

So we do have prior art for pax -s, but unfortunatly
incompatible with bsdtar -s or gnutar --transform.

I don't know if NetBSD have switched to libarchive for their tar
yet, but if/when they do, they'd have a problem that the meaning
of the s flag would change.

-- 
Stephane



Re: sort -c/C and last-resort sorting

2021-07-04 Thread Robert Elz via austin-group-l at The Open Group
Date:Fri, 2 Jul 2021 14:41:50 +0100
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  <20210702134150.GB16587@localhost>

  | I've always assumed that the intention for -c is to answer the
  | question "if I ran this command without -c would the output be 
  | the same as the input?"  So the NetBSD behaviour seems wrong
  | to me.

But:
jinx$ printf '%s\n' a,b a,a 
a,b
a,a
jinx$ printf '%s\n' a,b a,a | sort -t, -k1,1
a,b
a,a
jinx$ printf '%s\n' a,b a,a | sort -c -t, -k1,1; echo $?
0

the output (without -c) does match the input, so it seems right to me.

Note that on NetBSD, the -s option that has been discussed here exists,
but does nothing, as it is the default.

ie: When -k args are given, there is no fallback to "whole record" matching,
if one wants that, one can easily add a final -k1 option to make that happen:

jinx$ printf '%s\n' a,b a,a | sort -t, -k1,1 -k1
a,a
a,b
jinx$ printf '%s\n' a,b a,a | sort -c -t, -k1,1 -k1
sort: found disorder: a,a

which is the way it should be - if one has taken the trouble to specify
what parts of the record are the keys for sorting (and -u comparisons)
then sort should not be gratuitously adding more - that it used to do so
was widely regarded as a bug (especially given that there was no way to
defeat it, but enabling it is so simple when it is not the default).

Or, if one simply wants the useless posix behaviour, -S requests that

jinx$ printf '%s\n' a,b a,a | sort -S -t, -k1,1
a,a
a,b
jinx$ printf '%s\n' a,b a,a | sort -c -S -t, -k1,1
sort: found disorder: a,a

I doubt that option is much used however, as it is so counter-intuitive,
but if it was wanted all the time

sort() { command sort -S "$@"; }

should achieve that.   Setting POSIXLY_CORRECT in the environ probably
should as well, but doesn't currently I believe (and no-one is complaining
that it doesn't).

kre




Re: sort -c/C and last-resort sorting

2021-07-04 Thread Stephane Chazelas via austin-group-l at The Open Group
2021-07-04 15:47:55 +0700, Robert Elz via austin-group-l at The Open Group:
> Date:Fri, 2 Jul 2021 14:41:50 +0100
> From:"Geoff Clare via austin-group-l at The Open Group" 
> 
> Message-ID:  <20210702134150.GB16587@localhost>
> 
>   | I've always assumed that the intention for -c is to answer the
>   | question "if I ran this command without -c would the output be 
>   | the same as the input?"  So the NetBSD behaviour seems wrong
>   | to me.
> 
> But:
>   jinx$ printf '%s\n' a,b a,a 
>   a,b
>   a,a
>   jinx$ printf '%s\n' a,b a,a | sort -t, -k1,1
>   a,b
>   a,a

That would make is non-compliant then.

SUS> When there are multiple key fields, later keys shall be
SUS> compared only after all earlier keys compare equal. Except
SUS> when the -u option is specified, lines that otherwise
  
SUS> compare equal shall be ordered as if none of the options
 
SUS> -d, -f, -i, -n, or -k were present (but with -r still in
 
SUS> effect, if it was specified) and with all bytes in the
 ^^
SUS> lines significant to the comparison. The order in which
 ^^^
SUS> lines that still compare equal are written is unspecified.

[...]
> ie: When -k args are given, there is no fallback to "whole record" matching,
> if one wants that, one can easily add a final -k1 option to make that happen:
[...]
> which is the way it should be - if one has taken the trouble to specify
> what parts of the record are the keys for sorting (and -u comparisons)
> then sort should not be gratuitously adding more - that it used to do so
> was widely regarded as a bug (especially given that there was no way to
> defeat it, but enabling it is so simple when it is not the default).
[...]

I don't know what the original rationale was, but /one/
rationale could be to garantee a deterministic and total order,
to make sure that two files with the same lines (though in
different orders) result in the same output when sorted whatever
the sorting specification.

That guarantee is broken in locales that don't have total order
which was the subject of recent changes.

POSIX sort does sort as specified, and in cases where the user
doesn't say (sort key same but line different), makes one of
several possible decisions, in that case last resort comparison
of the full line (and resort to memcmp() comparison when
strcoll() find them equal if need be), whilst NetBSD sort uses
the original order. Note that POSIX doesn't require the order be
stable, leaves it unspecified what the selected one is for sort
-uk1,1 for instance.

-- 
Stephane



Re: sort -c/C and last-resort sorting

2021-07-04 Thread Stephane Chazelas via austin-group-l at The Open Group
2021-07-04 15:47:55 +0700, Robert Elz via austin-group-l at The Open Group:
[...]
> which is the way it should be - if one has taken the trouble to specify
> what parts of the record are the keys for sorting (and -u comparisons)
> then sort should not be gratuitously adding more - that it used to do so
> was widely regarded as a bug (especially given that there was no way to
> defeat it, but enabling it is so simple when it is not the default).
> 
> Or, if one simply wants the useless posix behaviour, -S requests that
[...]
> should achieve that.   Setting POSIXLY_CORRECT in the environ probably
> should as well, but doesn't currently I believe (and no-one is complaining
> that it doesn't).
[...]

While that view may have some merit, I'm not convinced that it
would be enough to justify deviating from all other
implementations and from the standard.

I'd imagine backward compatibility would have been broken at
some point for that as NetBSD sort used to be the GNU one like
on most BSDs, so I suspect that's a strongly held view there.

That's even more justification for adding -s to the standard
though so people can at least choose to get a stable sort
portably. -S could probably be added as well, but I don't think
it wise to make the default behaviour unspecified.

-- 
Stephane



Re: utilities and write errors

2021-07-04 Thread Robert Elz via austin-group-l at The Open Group
Date:Thu, 1 Jul 2021 11:45:40 +0100
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  <20210701104540.GA4023@localhost>

  | Because it is a precondition of this discussion.  I.e. what we are
  | debating is what is the required behaviour if pwd does not write to
  | standard output (because an error occurred).

OK, but that's not what I was getting at - I meant that there were
two variants of the command shown, in neither of them did the
directory name output appear anywhere, but apparently only in one do you
expect an error message to be displayed.   Seems inconsistent to me.
That was why the "how do you know" - the two commands appeared to
have identical results, externally the difference cannot be observed.

  | The point is that pwd must not ignore write errors (when the actual
  | write to the file descriptor is done).

And for the reason above, I'm not sure I agree.   Note here that what I
am disagreeing with is not what pwd ought do, but with the claim that the
standard requires it to do that.

  | We are just using EPIPE as an
  | easy way to (portably) set up an error condition.

Sure, I'm aware of that, I'm the one that suggested that as the
simple (and relatively portable, unlike /dev/full) way of forcing
an error.

  | The standard says nothing about internal buffering;

It doesn't?   (Line numbers from 202x_D2)

30895 ERRORS

30896 CX   For the conditions under which dprintf( ), fprintf( ), and printf( ) 
fail and may fail, refer to fputc( )

30897 or fputwc( ).

30898  In addition, all forms of fprintf( ) shall fail if:
30899
30900  CX  [EILSEQ]  A wide-character code that does not correspond to a valid 
character has been

 detected.

30901 CX   [EOVERFLOW] The value to be returned is greater than {INT_MAX}.

30902 The dprintf( ) function may fail if:

30903  [EBADF]   The fildes argument is not a valid file descriptor.

30904 CX   The dprintf( ), fprintf( ), and printf( ) functions may fail if:

30905 CX   [ENOMEM]  Insufficient storage space is available.

30906 The snprintf( ) function shall fail if:

30907 CX   [EOVERFLOW] The value of n is greater than {INT_MAX}.


None of that is highly relevant, except perhaps ENOMEM which is definitely
not referring to file storage space, but internal memory for buffer storage,
but we do get referred to fputc, which says...

206 ERRORS
207 The fputc( ) function shall fail if either the stream is unbuffered or the 
stream's buffer needs to be
208 flushed, and:

Note the "or the stream's output buffer needs to be flushed" part.

The "and" includes ENOSPC and EPIPE (amongst others) as possible errors
(ie: the two that have been of concern here, including also EBADF which
was also used in some tests, though they were useless here).

I could quote more places in the standard where it mentioned buffering
(eg: XSH 3.setbuf ...) - the standard certainly knows that output
streams can be buffered (so does everyone else).

We also have (in both XSH 3.fprintf and XSH 3.fputf with similar wording)

31199  CX  The last data modification and last file status change timestamps of 
the file shall be marked for
31200  update between the successful execution of fputc( ) and the next 
successful completion of a call
31201  to fflush( ) or fclose( ) on the same stream or a call to exit( ) or 
abort( ).

which is also a consequence of buffering, even though not explicitly
stated.   The standard knows all about the consequences of streams being
buffered in the library, and where appropriate, makes specific allowances
for that.   In pwd (or the general requirements for utilities) it says
nothing, from which one must conclude that no special actions are required.


I agree that pwd doesn't say anything about buffering, one way or the
other, but nor does it say that it is required to use write() rather
than printf() or that if printf() is used then setbuf(stdout, NULL);
must accompany it.   Or did I somehow miss that part?

Nor does it, as best I can tell, say that pwd must fflush(stdout) rather
than allowing exit() to do that (or that it even should be aware in any way
at all that output buffering is possible, which is what actually happens).

Which of my "does not say" do you believe is incorrect?   Please quote
the text which refutes my position (obviously I cannot quote text which
is not there).   And none of the "it must mean" BS, either it is explicitly
stated, or it is not.

If your position is that exit() (or at atexit() function supplied by stdio
when it creates a buffered stream) is required to write error messages to
stderr if an error occurs when flushing a stream, and also change the exit
code (how it would know what to change it into I have no idea, not all
utilities are defined to exit(1) on error, some use that for other purposes)
then please reveal where that is required.

  | it just requires pwd to write the directory to file descriptor 1.

Where does it say that?

104869 

Re: utilities and write errors

2021-07-04 Thread Harald van Dijk via austin-group-l at The Open Group

On 04/07/2021 18:32, Robert Elz via austin-group-l at The Open Group wrote:

 Date:Thu, 1 Jul 2021 11:45:40 +0100
 From:"Geoff Clare via austin-group-l at The Open Group" 

 Message-ID:  <20210701104540.GA4023@localhost>

   | it just requires pwd to write the directory to file descriptor 1.

Where does it say that?


I think the definition of "write", in Base Definitions, tries to say so, 
but I also think the definition is wrong.



3.449 Write

To output characters to a file, such as standard output or standard
error. Unless otherwise stated, standard output is the default output
destination for all uses of the term "write''; see the distinction
between display and write in Display.


Note the "to a file", not "to a stream". If an implementation uses stdio 
to achieve a write, the write has only completed once the stream has 
been flushed: until that happens, no file access has been performed.


However, if that interpretation is correct, the rest of the definition 
is wrong, as standard output and standard error are streams, not files, 
so the definition should read something like


> 3.449 Write
>
> To output characters to a file, such as the files associated with
> standard output or standard error. Unless otherwise stated, the file
> associated with standard output is the default output destination for
> all uses of the term "write''; see the distinction between display and
> write in Display.

Alternatively, "write" could be defined in a way such that writing to a 
stream is defined as writing to the file associated with that stream.


Cheers,
Harald van Dijk

P.S.: The fact that the underlying file descriptor of standard output is 
fd 1 rather than some other fd is specified under "stderr, stdin, stdout 
- standard I/O streams", look for "STDOUT_FILENO", but I think that is 
ultimately no longer relevant.




Re: utilities and write errors

2021-07-04 Thread Robert Elz via austin-group-l at The Open Group
Date:Sun, 4 Jul 2021 19:26:25 +0100
From:Harald van Dijk 
Message-ID:  <40fc0b0c-364f-8ff8-0613-a76c887d4...@gigawatt.nl>

  | I think the definition of "write", in Base Definitions, tries to say so,

To a degree, yes, but I don't think that actually adds anything to what
we already knew.

  | but I also think the definition is wrong.

Probably, though one could argue whether it is wrong in the way that
you suggest, via your proposed alternate wording, or instead really
means to say "to a stream" but that seems (or seemed) awkward, especially
back when "streams" as a kind of networking interface still existed, and
using "stream" might have been considered misleading.

  | P.S.: The fact that the underlying file descriptor of standard output is 
  | fd 1 rather than some other fd is specified under "stderr, stdin, stdout 
  | - standard I/O streams", look for "STDOUT_FILENO", but I think that is 
  | ultimately no longer relevant.

For the last point, it never really was relevant.   pwd writes to standard
output, whatever that is.

But for interest, what section are you referring to, I don't have a (useful)
PDF grep tool (I have one, somewhere, but every  time I use it, I seem to
spend more time trying to figure out what it is telling me than getting any
kind of useful result ... that is, I can probably get to see the line
containing some string (or pattern) but what is needed is the context, and
that I don't seem able to either extract, or get a reference so I can find
it using some other tool.   Never mind...)

kre

ps: we really don't want to get into the rathole that is what it means to
actually "write to a file" - and where the data must be before that is
considered to have succeeded.



Re: sort -c/C and last-resort sorting

2021-07-04 Thread Robert Elz via austin-group-l at The Open Group
Date:Sun, 4 Jul 2021 10:31:06 +0100
From:Stephane Chazelas 
Message-ID:  <20210704093106.2ce2cyg77f2nm...@chazelas.org>


  | That would make is non-compliant then.

s/is/it/ ... and yes, so?

  | SUS> When there are multiple key fields, later keys shall be

There was no need to quote that, I'm fully aware of how it is specified,
which I'm also not complaining about - this is a case where that actually
is (or was) the standard, as it is how sort was implemented, from long
long ago (way back before -k was invented and we just had + and -).

It's just stupid.

This is a case where systems need to simply start ignoring the standard
and doing the sane thing, so that the standard can eventually be updated
to also be sane.   That's how evolution happens.

  | I don't know what the original rationale was, but /one/
  | rationale could be to garantee a deterministic and total order,

No question but that there needs to be a method to achieve that, but
it does not need to be an undefeatable default.  It doesn't need to be
the default at all.  The default should be the most useful behaviour,
which is probably not that (its only real merit is for compat with the
ancient past).

  | to make sure that two files with the same lines (though in
  | different orders) result in the same output when sorted whatever
  | the sorting specification.

Nor is that final qualification needed.   The output order is defined
by the sorting specification, if one wants some particular order, one
must write the spec to achieve that order.  A different sorting specification
both can, and should, result in a different order.

kre



Re: utilities and write errors

2021-07-04 Thread Harald van Dijk via austin-group-l at The Open Group

On 04/07/2021 21:54, Robert Elz wrote:

 Date:Sun, 4 Jul 2021 19:26:25 +0100
 From:Harald van Dijk 
 Message-ID:  <40fc0b0c-364f-8ff8-0613-a76c887d4...@gigawatt.nl>

   | P.S.: The fact that the underlying file descriptor of standard output is
   | fd 1 rather than some other fd is specified under "stderr, stdin, stdout
   | - standard I/O streams", look for "STDOUT_FILENO", but I think that is
   | ultimately no longer relevant.

But for interest, what section are you referring to, I don't have a (useful)
PDF grep tool


I was looking at the HTML version. Aside from its search function, you 
can find it there under System Interfaces (top left frame), then System 
Interfaces (bottom left frame), then stdout (bottom left frame).


Cheers,
Harald van Dijk



Re: utilities and write errors

2021-07-04 Thread Robert Elz via austin-group-l at The Open Group
Date:Mon, 5 Jul 2021 00:41:23 +0100
From:Harald van Dijk 
Message-ID:  <88337ff9-c726-c563-4d4f-3fa74d964...@gigawatt.nl>

  | I was looking at the HTML version. Aside from its search function, you 
  | can find it there under System Interfaces (top left frame), then System 
  | Interfaces (bottom left frame), then stdout (bottom left frame).

Thanks, exactly what I needed to know,

System Interfaces (top left frame)is XSH
System Interfaces (bottom left frame) is XSH.3
and stdout (bottom left frame)is XSH 3.stdin

(the stdin/stderr/stdout pages are all the same, the one identified is
stdin, probably because it is 0, though stderr would be more logical, as
normally the lexically first is chosen when there are multiple interfaces
in a page).

I never even thought of looking there, I don't regard those things as
being system interfaces I guess, I was looking mostly in XBD (definitions,
headers, etc).

But that's actually a very interesting page:

63796  CX  At program start-up, three streams shall be predefined and already 
open: stdin (standard input, |
63797  for conventional input) for reading, stdout (standard output, for 
conventional output) for  |
63798  writing, and stderr (standard error, for diagnostic output) for 
writing. When opened, stderr shall  |
63799  not be fully buffered; stdin and stdout shall be fully buffered if 
and only if the file descriptor  |
63800  associated with the stream is determined not to be associated with 
an interactive device.

Once again, buffering exists and is known to exist - and what's more,
the standard actually *required* stdout to be buffered whenever it is
not associated with a terminal (which is no big surprise, as that's how
systems have always worked).

Note that there's nothing in there that even hints that applications are
responsible for flushing the buffers themselves, that is, and always has
been an automated part of exit processing (and is largely why _exit()
exists).

kre