Re: should we change [^a-z] to -[a..z] instead of -[a-z]?
--- Larry Wall [EMAIL PROTECTED] wrote: On Fri, Apr 15, 2005 at 11:28:31AM -0500, Rod Adams wrote: : David Wheeler wrote: : : But the first person to write [a...] gets what's comin' to 'em. : : Is that nothing (since '.' lt 'a'), or everything after 'a'? Might as well make it everything after 'a' for consistency. One could also view the last dot as a special version of the ordinary any dot, and read it a to whatever. Larry I think that if we're looking for consistency, the default should be to read it as a and everything after it. If someone wants a to whatever, they should write it [a..\.] since it's a pretty odd fringe case. __ Do you Yahoo!? Plan great trips with Yahoo! Travel: Now over 17,000 guides! http://travel.yahoo.com/p-travelguide
Re: should we change [^a-z] to -[a..z] instead of -[a-z]?
--- Larry Wall [EMAIL PROTECTED] wrote: . . . -[a..z] should be allowed/encouraged/required. It greatly improves the readability in my estimation. The only problem with requiring .. is that people *will* write [a-z] out of habit, and we would probably have to outlaw the - form for many years before everyone would get used to the .. form. So maybe we allow - but warn if not backslashed. In general, I think this is a great idea, but what exactly do you mean by warn if not backslashed? That I'd get a warning *any* time I use a dash in a character class? I guess I can live with that. __ Do you Yahoo!? Plan great trips with Yahoo! Travel: Now over 17,000 guides! http://travel.yahoo.com/p-travelguide
RE: should we change [^a-z] to -[a..z] instead of -[a-z]?
-Original Message- From: Paul Hodges [mailto:[EMAIL PROTECTED] Sent: Sunday, April 17, 2005 1:30 PM To: Larry Wall; perl6-language@perl.org Subject: Re: should we change [^a-z] to -[a..z] instead of -[a-z]? --- Larry Wall [EMAIL PROTECTED] wrote: . . . -[a..z] should be allowed/encouraged/required. It greatly improves the readability in my estimation. The only problem with requiring .. is that people *will* write [a-z] out of habit, and we would probably have to outlaw the - form for many years before everyone would get used to the .. form. So maybe we allow - but warn if not backslashed. In general, I think this is a great idea, but what exactly do you mean by warn if not backslashed? That I'd get a warning *any* time I use a dash in a character class? I guess I can live with that. On the other hand, you can use the canonical perl 5 trick of having the dash be the first character in the class if you want to use a literal dash. Joe Gottman.
Re: should we change [^a-z] to -[a..z] instead of -[a-z]?
David Wheeler skribis 2005-04-14 21:32 (-0700): I was going to say that that was inconsistent, but since you never need to repeat a letter in a character class, well, I guess it isn't. But the first person to write [a...] gets what's comin' to 'em. Given ASCII, [\x20...] would then be everything except control characters. Handy! By the way, does ...5 mean -Inf..5? ;) Juerd -- http://convolution.nl/maak_juerd_blij.html http://convolution.nl/make_juerd_happy.html http://convolution.nl/gajigu_juerd_n.html
Re: should we change [^a-z] to -[a..z] instead of -[a-z]?
- Original Message - From: Aaron Sherman [EMAIL PROTECTED] To: David Wheeler [EMAIL PROTECTED] Cc: Perl6 Language List perl6-language@perl.org Sent: Friday, April 15, 2005 2:00 PM Subject: Re: should we change [^a-z] to -[a..z] instead of -[a-z]? On Thu, 2005-04-14 at 21:32 -0700, David Wheeler wrote: On Apr 14, 2005, at 7:06 PM, Patrick R. Michaud wrote: So, [a.z] matches a, ., and z, while [a..z] matches characters a through z inclusive. I was going to say that that was inconsistent, but since you never need to repeat a letter in a character class, well, I guess it isn't. But the first person to write [a...] gets what's comin' to 'em. A silly question: is there a canonical character set from which we extract these ranges? Are we hard-coding Unicode here, or is there some way for the user to specify the character set for ranges? delurk even sillier question: if [a.z] matches a, . and z and [a...] matches all characters from a including (for some definition of 'all') how will be range \x21 .. \x2e written? [!..\.]? (i.e. . escaped?) /delurk brao
Re: should we change [^a-z] to -[a..z] instead of -[a-z]?
delurk even sillier question: if [a.z] matches a, . and z and [a...] matches all characters from a including (for some definition of 'all') how will be range \x21 .. \x2e written? [!..\.]? (i.e. . escaped?) /delurk I was assuming from Larry's mail that [a...] would parse as either: 1) a character class containing the range from 'a' to '.' (what that means is a bit mind-bending for a friday afternoon) 2) a character class containing 'a' then a range from '.' to... oh, an error Which way might be ambiguous, but could of course be defined in the grammar. It hadn't occurred to me that ... for the range to infinity would be allowed or useful here. I suppose it could just mean 'up to the end of the available codepoints'. I do love the idea of [a..f] type ranges though. It's just what the three dots mean that's got me confused.
Re: should we change [^a-z] to -[a..z] instead of -[a-z]?
On 14 Apr, Larry Wall wrote: : In writing some character class translation, I realized that : : -[a-z] : : and its ilk are rather hard to read because of the two hyphens : that mean different things. We can't use ![a-z] because that's a : 0-width lookahead. Given that we're trying to get rid of special : exceptions, and - in character classes is weird, and we already : use .. for ranges everywhere else, and nobody is going to put a : repeated character into a character class, I'm wondering if : : -[a..z] : : should be allowed/encouraged/required. It greatly improves the : readability in my estimation. The only problem with requiring .. is : that people *will* write [a-z] out of habit, and we would probably : have to outlaw the - form for many years before everyone would get : used to the .. form. So maybe we allow - but warn if not backslashed. : : Larry I think, if we bear in mind, as it has been stressed previously, that many changes concerning regular expressions have been introduced and require users to assimilate themselves accordingly, it doesn't seem unreasonable requiring to write double-dot instead of a hyphen; it also fits the Principle of least surprise idiom nicely, in my opinion. Nevertheless, as mentioned by David, [a...] would become rather confusing to people first and secondly to the compiler; although, regardless whether we assume dot preceeds double-dot or vice-versa, there would be an expansion enforced (what I'd expect), perhaps accompanied by a warning. I agree on a warning upon non-escaped hyphen. Steven
Re: should we change [^a-z] to -[a..z] instead of -[a-z]?
Aaron Sherman wrote in perl.perl6.language : A silly question: is there a canonical character set from which we extract these ranges? Are we hard-coding Unicode here, or is there some way for the user to specify the character set for ranges? Perl 5 forces [a-z] (or [i-j] for that matter) to be a range of lowercase alphabetic characters, even on EBCDIC platforms (where it's not).
Re: should we change [^a-z] to -[a..z] instead of -[a-z]?
On Fri, Apr 15, 2005 at 01:01:58PM -, Rafael Garcia-Suarez wrote: Aaron Sherman wrote in perl.perl6.language : A silly question: is there a canonical character set from which we extract these ranges? Are we hard-coding Unicode here, or is there some way for the user to specify the character set for ranges? Perl 5 forces [a-z] (or [i-j] for that matter) to be a range of lowercase alphabetic characters, even on EBCDIC platforms (where it's not). At the moment, PGE (the part that implements the rule engine) is deferring such questions to Parrot, and otherwise assuming Unicode. Plus, S02 explicitly indicates that Perl is written in Unicode and has consistent Unicode semantics, so I think that's what we should go with. It's certainly the way the compiler will go, at least initially. Pm
Re: should we change [^a-z] to -[a..z] instead of -[a-z]?
David Wheeler wrote: But the first person to write [a...] gets what's comin' to 'em. Is that nothing (since '.' lt 'a'), or everything after 'a'? -- Rod Adams
Re: should we change [^a-z] to -[a..z] instead of -[a-z]?
On Fri, Apr 15, 2005 at 11:28:31AM -0500, Rod Adams wrote: : David Wheeler wrote: : : But the first person to write [a...] gets what's comin' to 'em. : : Is that nothing (since '.' lt 'a'), or everything after 'a'? Might as well make it everything after 'a' for consistency. One could also view the last dot as a special version of the ordinary any dot, and read it a to whatever. Larry
Re: should we change [^a-z] to -[a..z] instead of -[a-z]?
At 5:21 PM -0700 4/14/05, Larry Wall wrote: In writing some character class translation, I realized that -[a-z] and its ilk are rather hard to read because of the two hyphens that mean different things. We can't use ![a-z] because that's a 0-width lookahead. Given that we're trying to get rid of special exceptions, and - in character classes is weird, and we already use .. for ranges everywhere else, and nobody is going to put a repeated character into a character class, I'm wondering if -[a..z] should be allowed/encouraged/required. It greatly improves the readability in my estimation. The only problem with requiring .. is that people *will* write [a-z] out of habit, and we would probably have to outlaw the - form for many years before everyone would get used to the .. form. So maybe we allow - but warn if not backslashed. Larry I don't see why the old syntax has to be supported at all. Lots of other regexp details are already being changed, such as the bounding '' and the removal of the leading internal '^', so people already have to edit their regexps. So they can replace the '-' too while they're at it; not very difficult. Moreover, I often create character classes that have a literal '-' in it, and it would be nice to not have to make that the last character in the class for it to parse properly. Also, the '..' is easy to learn because it is consistent with other parts of Perl 6. Likewise, the consistency is another plus when demonstrating what is good about Perl to folk who don't use it. -- Darren Duncan
Re: should we change [^a-z] to -[a..z] instead of -[a-z]?
On Thu, Apr 14, 2005 at 05:21:05PM -0700, Larry Wall wrote: Given that we're trying to get rid of special exceptions, and - in character classes is weird, and we already use .. for ranges everywhere else, and nobody is going to put a repeated character into a character class, I'm wondering if -[a..z] should be allowed/encouraged/required. It greatly improves the readability in my estimation. So, [a.z] matches a, ., and z, while [a..z] matches characters a through z inclusive. I think that works for me. I'll implement it that way (and yes, there *are* updates to PGE coming very soon!). I guess I can't complain too loudly about .. over - for ranges since I was the one who suggested replacing , with .. in quantifiers (e.g., {1..3} instead of {1,3}). Not that I'd be complaining anyway. :-) The only problem with requiring .. is that people *will* write [a-z] out of habit, and we would probably have to outlaw the - form for many years before everyone would get used to the .. form. So maybe we allow - but warn if not backslashed. Just to make sure I have it right, by allow - you mean that [a-z] matches a, -, and z and produces a warning about an unescaped '-'? Pm
Re: should we change [^a-z] to -[a..z] instead of -[a-z]?
On Apr 14, 2005, at 7:06 PM, Patrick R. Michaud wrote: So, [a.z] matches a, ., and z, while [a..z] matches characters a through z inclusive. I was going to say that that was inconsistent, but since you never need to repeat a letter in a character class, well, I guess it isn't. But the first person to write [a...] gets what's comin' to 'em. Regards, David -- David Wheeler President, Kineticode, Inc. http://www.kineticode.com/ Kineticode. Setting knowledge in motion.[sm]