Date: Thu, 26 Sep 2019 11:43:37 +0100 From: Geoff Clare <g...@opengroup.org> Message-ID: <20190926104337.GA25231@lt2.masqnet>
| Good point. I think that this, and the behaviour I described, are | both allowed by the standard. If they are, they shouldn't be. Before char classes, equiv classes, and collating elements were invented, bracket expressions could contain anything (so could patterns in general). That makes it hard to add anything new without potentially invalidating previously valid code. The solution to that relies upon backet expressions being sets, where while legal, putting an element in the set more than once is a waste of time, and accomplishes nothing. That's why these new forms are defined only inside bracket expressions, and all have the property of a duplicated character in their syntax, that is, isn't just that [: :] looks pretty, whereas [: ] doesn't, it is the only way to more or less safely add this new form to patterns. So, if we have [[:alpha] there is absolutely no question but that this is a bracket expr that matches one of the 7 chars [ : a l p h a and is in no way any kind of character class reference, whatever it looks like its author may have intended, and regardless of what comes after it. If the standard says any different, or implies different, or even allows different, it is simply wrong. Now if this kind of "invalid char class" (invalid because the terminating : is missing) is to not cause the bracket expression to be invalid, it is absurd to believe that the simpler case of an unknown class name could do so - simply absurd. Either the unknown class name means that there is no character class, and all the text which looks like a character class is really just elements of the bracket expression, or the unknown class name is treated as probably being a class in some other locale, which has no members in the current locale, coujld be an interpretation which makes sense (though the latter is more useful, IMO). Invalidating the bracket expression makes no sense. | > | XBD 9.3.5 item 8 says it is unspecified whether [:bogus:] is treated as | > | a character class, treated as a matching list expression, or rejected | > | as an error. | > | > Yes, that is unfortunate, it should be specified than an unknown (but | > syntactically valid) class name in a character class is simply to be | > treated as a class containing no characters, | | Item 8 isn't about what's between the ':'s in [[:...:]], it's about | an RE that contains [:...:] without the outer pair of square brackets. Sure. But as I interpreted Harald's question, to which we are attempting to reply, things that look like char classes, but are not in a bracket expression, aren't relevant (nor is 9.3.5 item 8). The question was entirely about [x[:bogus:]] and [![:bogus:]] so perhaps we should stick to answering that, and avoid deviating into side issues. | My point was that ksh93 treats [a"-"b] the same as [a-b] so trying | to test something more specific to do with character classes in ksh93 | is not going to yield any useful information. Again, sure, and again, not helpful for answering the question asked. What buggy implementations happen to do is not really interesting. What we want to know here is what the standard says should be done, and perhaps also what it should say should be done. So, is [[:"alpha":]] required to be treated the same as [[:alpha:]] , not allowed to be treated the same, explicitly unspecified, or simply never considered (previously) ? | My previous reply was based on XBD 9.3.5 item 4, but I have just spotted | that the intro paragraph of 9.3.5 uses the word "may": Ater I saw your updated reply on this, which arriuved while I was composing my previous message, I also went and looked at the standard, but I looked at XCU 2.13.1: The pattern bracket expression also shall match a single collating element. So there in the specific to the shell section, we have a "shall". Which means | So it appears that it is optional whether matching a bracket expression | against more than one character is supported. perhaps not. Now both of those sections are poorly worded. In XBD 9.3.5 one might interpret it as being "may" because not all bracket expressions match collating elements, so it would be absurd to require them to do so. That is [abc] matches one of 'a' 'b' or 'c' and no collating elements at all, and it would be absurd if the language in 9.3.5 required that a specific set of multi-character collating elements shall be matched. Or perhaps the "may" there is as you just interpreted it, and means that matching multi-char collating elements is optional, even when the bracket expression is [[=ch=]] Who knows? XCU 2.13.1 is just as badly written, in the opposite direction. It (seems to) require every bracket expression to match a collating element. I doubt that is what it really intends to say though, especially as it doesn't say which specific single collating element the bracket expression is required to match. We all mostly ignore stuff like this, as we (generally) know what it is trying to say - clearly 2.13.1 only means matching a collating element when the bracket expression contains a match for one (in which case that expression says which single collating elemenment is to match), not every single time - even though that qualification is not stated. But we should clear up all this loose language and make it accurate (and yes, make the standard even more boring to read.) I have a new defect report on a similar issue I have been holding off submitting until I see the results of the recently resolved issues (which I have still not read properly) - I think one of the issues I was going to raise has been solved already (which is why I held off reporting, while unrelated, this is in the same area as some of the recent work), but there is still some poorly written English which caused at least one implementation to implement the wrong thing. Defect report should appear soon (this will be non-controversial and trivial to fix) that is if all the relevant current bugs have now been resolved (this issue has nothing at all to do with backslashes in patterns, but does relate to quoting.) kre