Side note: you could get around some of the problems, below, but in order to do so, you would have to exhaustively express all of Unicode using the Str builtin module's RANGES constant. In fact, as it is now, it defines ASCII lowercase, but doesn't define Latin lowercase. Presumably because doing so would be a massive pain. Again, I'll point out that using script and properties is much easier....
On Tue, Jul 20, 2010 at 10:35 PM, Solomon Foster <colo...@gmail.com> wrote: > > Sorry, didn't mean to imply the series operator was perfect. (Though > it is surprisingly awesome in general, IMO.) Just that the right > questions would be about the series operator rather than Ranges. > So, what's the intention of the range operator, then? Is it just there to offer backward compatibility with Perl 5? Is it a vestige that should be removed so that we can Huffman ... down to ..? I'm not trying to be difficult, here, I just never knew that ... could operate on a single item as LHS, and if it can, then .. seems to be obsolete and holding some prime operator real estate. > > The questions definitely look different that way: for example, > ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz is easily and > clearly expressed as > > 'A' ... 'Z', 'a' ... 'z' # don't think this works in Rakudo yet :( > I still contend that this is so frequently desirable that it should have a simpler form, but it's still going to have problems. One example: for expressing "Katakana letters" (I use "letters" in the Unicode sense, here) it's still dicey. There are things interspersed in the Unicode sequence for Katakana that aren't the same thing at all. Unicode calls them lowercase, but that's not quite right. They're smaller versions of Katakana characters which are used more as punctuation or accents than as syllabic glyphs the way the rest of Katakana is. I guess you could write: ア, イ, ウ, エ, オ, カ ... ヂ,ツ ...モ,ヤ, ユ, ヨ ... ロ, ワ ... ヴ (add quotes to taste) But that seems quite a bit more painful than: ア .. ヴ (or ... if you prefer) Similar problems exist for many scripts (including some of Latin, we're just used to the parts that are odd), though I think it's possible that Katakana may be the worst because of the mis-use of Ll to indicate a letter when the truth of the matter is far more complicated. > That suggests to me that the current behavior of 'A' ... 'z' is pretty > reasonable. > You still have to decide to make at least some allowances for invalid codepoints and I think you should avoid ever generating a combining or modifying codepoint in such a sequence (e.g. "Ѻ" ... "Ҋ" in Cyrillic which contains several combining characters for currency and counting as well as one undefined codepoint). -- Aaron Sherman Email or GTalk: a...@ajs.com http://www.ajs.com/~ajs