On Thu, 2022-01-27 at 12:44 +0000, Geoff Clare via austin-group-l at
The Open Group wrote:
> 
> It seems to me that there are three cases to consider:
> 
> * The command's output is expected to contain byte sequences that
>   might not form valid characters.
>   In this case LC_ALL=C should be used during all handling of the
>   output.

Okay, that's also what I understood by now.

And just for the record if anyone else should ever stumble over that
thread here in the archives... this is what I'd consider *the* solution
which works in any POSIX compatible shell (and there in any scope,
global or function) that handles locales in a sane way... with any
locale/encoding:

result="$(command ; e=$?; print '.' ; exit $?)"

#optionally error out if OLD_LC_ALL is already set
unset -v OLD_LC_ALL ; [ "${LC_ALL+is_set}" ] && OLD_LC_ALL="${LC_ALL}"

LC_ALL=C
result="${result%.}"

[ "${OLD_LC_ALL+is_set}" ] && LC_ALL="${OLD_LC_ALL}" || unset -v LC_ALL

And it should in principle also work with other sentinels than '.',
too, but sticking to either '.' or '/' seems still pretty reasonable to
me.


Unless anyone disagrees?!


> * The command's output is expected to contain valid characters,
>   but could be truncated mid-character.
>   In this case, the encoding issue is only one small aspect of the
>   potential consequences of truncation. It is more important to
>   detect and handle any kind of truncation, not just the kind that
>   causes an encoding error.
> 
> * The command's output is expected to contain valid characters,
>   but the concern is that there could be corruption.
>   This is similar to the second case but more extreme, as the
>   consequences of corruption could be many things, including valid
>   (but wrong) characters.  Again, there is no point trying to deal
>   with one very small aspect of those potential consequences in
>   isolation.

Okay, but I guess both points (while valid) are anyway beyond the
scope.



> > 3) Does POSIX define anywhere which values a shell variable is
> > required
> >    to be able to store?
> >    I only found that NUL is excluded, but that alone doesn't mean
> > that
> >    any other byte value is required to work.
> 
> Kind of circular, but POSIX clearly requires that a variable can be
> assigned any value obtained from a command substitution that does not
> include a NUL byte, and specifies utilities that can be used to
> generate arbitrary byte values, therefore a variable can contain any
> sequence of bytes that does not include a NUL byte.

>From the replies given by Harald van Dijk it's probably clear that it's
not so 100% clear ;-)


Would you recommend to open tickets asking for clarification for the
following cases that came up?

a) The question of my very original thread, i.e. whether the last line
of output in a command substitution needs to have a trailing \n for it
to be guaranteed to be considered.
(Should be doable with a single sentence in the section.)

b) What shell variables are actually required to be able to hold? Any
bytes except NUL ... vs.  any valid character in the locale, except
NUL.

c) Something that might have come up[0], namely whether
setting/unsetting/assigning LANG/LC_* needs to have an immediate effect
on the shell (script/interactive session).
E.g. the LC_ALL=C ; var="${var%.}"
Is that "voluntary" or mandated.


Thanks,
Chris.



[0] always the first paragraph:
https://lists.gnu.org/archive/html/help-bash/2022-01/msg00067.html
https://lists.gnu.org/archive/html/help-bash/2022-01/msg00068.html
https://lists.gnu.org/archive/html/help-bash/2022-01/msg00069.html
https://lists.gnu.org/archive/html/help-bash/2022-01/msg00073.html

        • ... Chet Ramey via austin-group-l at The Open Group
        • ... Chet Ramey via austin-group-l at The Open Group
        • ... Christoph Anton Mitterer via austin-group-l at The Open Group
        • ... Harald van Dijk via austin-group-l at The Open Group
    • Re: how... Geoff Clare via austin-group-l at The Open Group
      • Re:... Harald van Dijk via austin-group-l at The Open Group
        • ... Christoph Anton Mitterer via austin-group-l at The Open Group
        • ... Harald van Dijk via austin-group-l at The Open Group
    • Re: how... Christoph Anton Mitterer via austin-group-l at The Open Group
      • Re:... Harald van Dijk via austin-group-l at The Open Group
  • Re: how do t... Christoph Anton Mitterer via austin-group-l at The Open Group
    • Re: how... Geoff Clare via austin-group-l at The Open Group
      • Re:... Christoph Anton Mitterer via austin-group-l at The Open Group
        • ... Christoph Anton Mitterer via austin-group-l at The Open Group
        • ... Geoff Clare via austin-group-l at The Open Group
        • ... Christoph Anton Mitterer via austin-group-l at The Open Group
        • ... Christoph Anton Mitterer via austin-group-l at The Open Group
        • ... Eric Blake via austin-group-l at The Open Group
        • ... Christoph Anton Mitterer via austin-group-l at The Open Group
        • ... Christoph Anton Mitterer via austin-group-l at The Open Group
        • ... Thorsten Glaser via austin-group-l at The Open Group

Reply via email to