Date:        Thu, 26 Sep 2019 11:43:37 +0100
    From:        Geoff Clare <g...@opengroup.org>
    Message-ID:  <20190926104337.GA25231@lt2.masqnet>

  | Good point.  I think that this, and the behaviour I described, are
  | both allowed by the standard.

If they are, they shouldn't be.

Before char classes, equiv classes, and collating elements were
invented, bracket expressions could contain anything (so could
patterns in general).   That makes it hard to add anything new
without potentially invalidating previously valid code.

The solution to that relies upon backet expressions being sets,
where while legal, putting an element in the set more than once
is a waste of time, and accomplishes nothing.

That's why these new forms are defined only inside bracket expressions,
and all have the property of a duplicated character in their syntax, that
is, isn't just that [: :] looks pretty, whereas [: ] doesn't, it is the
only way to more or less safely add this new form to patterns.

So, if we have

        [[:alpha]

there is absolutely no question but that this is a bracket expr
that matches one of the 7 chars
        [ : a l p h a
and is in no way any kind of character class reference, whatever it
looks like its author may have intended, and regardless of what comes
after it.

If the standard says any different, or implies different, or even allows
different, it is simply wrong.

Now if this kind of "invalid char class" (invalid because the terminating
: is missing) is to not cause the bracket expression to be invalid, it is
absurd to believe that the simpler case of an unknown class name could do
so - simply absurd.

Either the unknown class name means that there is no character class, and
all the text which looks like a character class is really just elements of
the bracket expression, or the unknown class name is treated as probably
being a class in some other locale, which has no members in the current
locale, coujld be an interpretation which makes sense (though the latter
is more useful, IMO).  Invalidating the bracket expression makes no sense.

  | >   | XBD 9.3.5 item 8 says it is unspecified whether [:bogus:] is treated 
as
  | >   | a character class, treated as a matching list expression, or rejected
  | >   | as an error.
  | > 
  | > Yes, that is unfortunate, it should be specified than an unknown (but
  | > syntactically valid) class name in a character class is simply to be
  | > treated as a class containing no characters,
  |
  | Item 8 isn't about what's between the ':'s in [[:...:]], it's about
  | an RE that contains [:...:] without the outer pair of square brackets.

Sure.  But as I interpreted Harald's question, to which we are attempting
to reply, things that look like char classes, but are not in a bracket
expression, aren't relevant (nor is 9.3.5 item 8).

The question was entirely about [x[:bogus:]] and [![:bogus:]] so perhaps
we should stick to answering that, and avoid deviating into side issues.

  | My point was that ksh93 treats [a"-"b] the same as [a-b] so trying
  | to test something more specific to do with character classes in ksh93
  | is not going to yield any useful information.

Again, sure, and again, not helpful for answering the question asked.
What buggy implementations happen to do is not really interesting.
What we want to know here is what the standard says should be done,
and perhaps also what it should say should be done.

So, is [[:"alpha":]] required to be treated the same as [[:alpha:]] ,
not allowed to be treated the same, explicitly unspecified, or simply
never considered (previously) ?

  | My previous reply was based on XBD 9.3.5 item 4, but I have just spotted
  | that the intro paragraph of 9.3.5 uses the word "may":

Ater I saw your updated reply on this, which arriuved while I was composing
my previous message, I also went and looked at the standard, but I looked
at XCU 2.13.1:
        The pattern bracket expression also shall match a single
        collating element.
So there in the specific to the shell section, we have a "shall".

Which means

  | So it appears that it is optional whether matching a bracket expression
  | against more than one character is supported.

perhaps not.

Now both of those sections are poorly worded.   In XBD 9.3.5 one might
interpret it as being "may" because not all bracket expressions match
collating elements, so it would be absurd to require them to do so.

That is [abc] matches one of 'a' 'b' or 'c' and no collating elements
at all, and it would be absurd if the language in 9.3.5 required that
a specific set of multi-character collating elements shall be matched.

Or perhaps the "may" there is as you just interpreted it, and means that
matching multi-char collating elements is optional, even when the
bracket expression is
        [[=ch=]]

Who knows?

XCU 2.13.1 is just as badly written, in the opposite direction.  It
(seems to) require every bracket expression to match a collating element.
I doubt that is what it really intends to say though, especially as it
doesn't say which specific single collating element the bracket expression
is required to match.

We all mostly ignore stuff like this, as we (generally) know what it is
trying to say - clearly 2.13.1 only means matching a collating element
when the bracket expression contains a match for one (in which case that
expression says which single collating elemenment is to match), not every
single time - even though that qualification is not stated.

But we should clear up all this loose language and make it accurate
(and yes, make the standard even more boring to read.)

I have a new defect report on a similar issue I have been holding off
submitting until I see the results of the recently resolved issues (which
I have still not read properly) - I think one of the issues I was going
to raise has been solved already (which is why I held off reporting, while
unrelated, this is in the same area as some of the recent work), but there
is still some poorly written English which caused at least one implementation
to implement the wrong thing.   Defect report should appear soon (this will
be non-controversial and trivial to fix) that is if all the relevant current
bugs have now been resolved (this issue has nothing at all to do with
backslashes in patterns, but does relate to quoting.)

kre


Reply via email to