Hey.

On Tue, 2022-04-19 at 01:52 +0100, Harald van Dijk wrote:
> Even I did not to apply this to pattern matching. The 
> lexical locale, the locale used for lexing, is only used for lexing, 
> i.e. for recognising tokens, not to how those tokens are then 
> interpreted later on. If locale comes into play for that, as it does
> in 
> pattern matching, it is the then-current value of LC_CTYPE that comes
> into play, as it does in other shells.

So... how is (as per the standard) it intended to work?

My understanding was that if during lexing it sees a pattern '*∈' it
would store the binary representation (as following from the lexical
locale, in which the shell script/input is in principle expected to be)
of these characters for the pattern.

But when the actual pattern matching is done, it would interpret that
binary representation with respect to the current locale (LC_CTYPE).
So if by then, then binary representation of the script's '*∈' would
mean '*z?' in the current locale, it would use that meaning as the
pattern.

Does that sound right?


'∈' not being a member of the portable character set would make it,
AFAIU, in principle valid for being mapped to `z?` in another locale -
while changing the mapping of '*' would be possible, but according to
POSIX produce undefined results.

("If the encoded values associated with each member of the portable
character set are not invariant across all locales supported by the
implementation, if an application uses any pair of locales where the
character encodings differ, or accesses data from an application using
a locale which has different encodings from the locales used by the
application, the results are unspecified.")


> As for future directions, no opinion on that from me.

That would IMO only make sense, if e.g. there was only one and not even
well maintained shell that behaves different from all others.

The "future directions" would indicate to possible new implementers
where things may go and what they should do.
10 years later, one could re-visit the topic, and if that one shell
that behaved different from all others had died in the meantime, and
any possible new ones followed the future directions... one could
standardise it. If not, one could simply leave everything as is and no
one would get into troubles.

Whether such approach actually works out as intended is of course not
guaranteed.


> I would not think this should be a special case: «${foo%.}» should
> strip 
> a trailing «.» in exactly those cases where the shell considers foo
> to 
> match the pattern «*.». However, I can see value in doing some extra 
> tests to verify that this matches what shells do.

Remember that it might not be enough to check whether such shell strip
off correctly when one has the case
  <non-character-byte(s)><character>
but also the case where one or more trailing bytes of the first group
and the bytes of the valid character form a new valid character.

While this wouldn't be possible if '.' is the characters (because of
it's special properties)... it can happen with other characters in some
special locales.


> Very well, I will post tests and test results as soon I can make the 
> time for it.

Thanks.


FYI: I think the outcome will also affect the current proposal for
#1561:
https://www.austingroupbugs.net/view.php?id=1561#c5795

in specific the part:
    On page 2321 line 74857 section 2.6.2 Parameter Expansion, change:


Thanks,
Chris.

  • [Issue 8 dra... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [Issue 8 dra... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [Issue 8 dra... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [Issue 8 dra... Austin Group Bug Tracker via austin-group-l at The Open Group
    • Re: [Is... Robert Elz via austin-group-l at The Open Group
      • Re:... Geoff Clare via austin-group-l at The Open Group
      • Re:... Robert Elz via austin-group-l at The Open Group
        • ... Harald van Dijk via austin-group-l at The Open Group
          • ... Christoph Anton Mitterer via austin-group-l at The Open Group
            • ... Harald van Dijk via austin-group-l at The Open Group
              • ... Christoph Anton Mitterer via austin-group-l at The Open Group
              • ... Harald van Dijk via austin-group-l at The Open Group
              • ... Christoph Anton Mitterer via austin-group-l at The Open Group
              • ... Harald van Dijk via austin-group-l at The Open Group
              • ... Christoph Anton Mitterer via austin-group-l at The Open Group
              • ... Harald van Dijk via austin-group-l at The Open Group
              • ... Chet Ramey via austin-group-l at The Open Group
              • ... Harald van Dijk via austin-group-l at The Open Group
          • ... Geoff Clare via austin-group-l at The Open Group
            • ... Harald van Dijk via austin-group-l at The Open Group
        • ... Geoff Clare via austin-group-l at The Open Group

Reply via email to