On 15/04/2022 00:03, Robert Elz via austin-group-l at The Open Group wrote:
     Date:        Thu, 14 Apr 2022 09:42:37 +0100
     From:        "Geoff Clare via austin-group-l at The Open Group" 
<[email protected]>
     Message-ID:  <20220414084237.GA15370@localhost>

   | That is how things are at present. The suggested changes just make it
   | explicit.

Yes, I know, but that's what I am suggesting that we not do in this one case.

   | Do you have an alternative proposal?

Only to the extent of "do nothing".   I am certainly not suggesting that
we attempt to solve the problem.

Hmm, I would.

Except perhaps it might be worth adding something to the Rationale (but
about what, ie: where there, I have no idea) along the lines of:

        It is often unclear whether a string is to be interpreted as
        characters in some locale, or as an arbitrary byte string.
        While it would have been possible to arbitrarily make the various
        cases more explicit, or explicitly unspecifried, it was considered
        better, in this version of <however the doc refers to itself> to
        make no changes, as it is believed that much additional work is
        required to enable a standards-worthy specification possible.
        This work is beyond the scope of this standard.

The problem I see, is that any specification at all of any of this,
allows implementors to just say "that is what posix requires" and do
nothing at all, where we really need some innovation, by someone who
actually understands the issues and how to deal with them in a rational
way - or at least who can come up with some kind of plan, and without
any possibility of being considered a non-conformant implementation
because of it.

For the most part(*), those shells that support locales appear to already be in agreement that single bytes that do not form a valid multi-byte character are interpreted as single characters that can be matched with *, with ?, and with those single bytes themselves. Shells are not in agreement on whether such single bytes can be matched with [...], nor in those shells where they can be, whether multiple bracket expressions can be used to match the individual bytes of a valid multi-byte character.

The cases with [...] only come up when scripts themselves use patterns that are not valid character strings, they are unlikely to affect existing scripts and I imagine there is not much harm in leaving those unspecified. The cases with * and ? do come up in existing scripts, but if shells are in agreement as they appear to be, there is no need to coordinate with shell authors on whether they would be willing to change their implementations, it is possible to change POSIX to describe the shells' current behaviour.

If there is interest in getting this standardised, I can spend some more time on creating some hopefully comprehensive tests for this to confirm in what cases shells agree and disagree, and use that as a basis for proposing wording to cover it.

Cheers,
Harald van Dijk

(*) The notable exception here is yash, which internally processes strings as wide strings and cannot handle any string that cannot be converted to a wide string. This was already said to be contrary to what POSIX intends in other cases.

  • [Issue 8 dra... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [Issue 8 dra... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [Issue 8 dra... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [Issue 8 dra... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [Issue 8 dra... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [Issue 8 dra... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [Issue 8 dra... Austin Group Bug Tracker via austin-group-l at The Open Group
    • Re: [Is... Robert Elz via austin-group-l at The Open Group
      • Re:... Geoff Clare via austin-group-l at The Open Group
      • Re:... Robert Elz via austin-group-l at The Open Group
        • ... Harald van Dijk via austin-group-l at The Open Group
          • ... Christoph Anton Mitterer via austin-group-l at The Open Group
            • ... Harald van Dijk via austin-group-l at The Open Group
              • ... Christoph Anton Mitterer via austin-group-l at The Open Group
              • ... Harald van Dijk via austin-group-l at The Open Group
              • ... Christoph Anton Mitterer via austin-group-l at The Open Group
              • ... Harald van Dijk via austin-group-l at The Open Group
              • ... Christoph Anton Mitterer via austin-group-l at The Open Group
              • ... Harald van Dijk via austin-group-l at The Open Group
              • ... Chet Ramey via austin-group-l at The Open Group
              • ... Harald van Dijk via austin-group-l at The Open Group

Reply via email to