On Mon, Mar 28, 2005 at 11:53:07AM -0500, Chip Salzenberg wrote: : According to Larry Wall: : > On Fri, Mar 25, 2005 at 07:38:10PM -0000, Chip Salzenberg wrote: : > : And might I also ask why in Perl 6 (if not Parrot) there seems to be : > : no type support for strings with known encodings which are not subsets : > : of Unicode? : > : > Well, because the main point of Unicode is that there *are* no encodings : > that cannot be considered subsets of Unicode. : : Certainly the Unicode standard makes such a claim about itself. There : are people who remain unpersuaded by Unicode's advertising. I conclude : that they will find Perl 6 somewhat disappointing.
If it turns out to be a Real Problem, we'll fix it. Right now I think it's a Fake Problem, and we have more important things to worry about. Most of the carping about Unicode is with regard to CJK unifications that can't be represented in any one existing character set anyway. Unicode has at least done pretty well with the round-trip guarantee for any single existing character set. There are certainly localization issues with regard to default input and output transformations, and things like changing the default collation order from Unicodian to SJISian or Big5ian or whatever. But those are good things to make explicit in any event, and that's what the language-dependent level is for. And people who are trying to write programs across language boundaries are already basically screwed over by their national character sets. You can't even go back and forth between Japanese and English without getting all fouled up between ¥ and \. Unicode distinguishes them, so it's a distinction that Perl 6 *always makes*. That being said, there's no reason in the current design that a string that is viewed as on the language level as, say, French couldn't actually be encoded in Morse code or some such. It's *only* the abstract semantics at the current Unicode level that are required to be Unicode semantics by default. And it's as lazy as we care to make it--when you do s/foo/bar/ on a string, it's not required to convert the string from any particular encoding to any other. It only has to have the same abstract result *as if* you'd translated it to Unicode and then back to whatever the internal form is. Even if you don't want to emulate Unicode in the API, there are options. For some problems it'd be more efficient to do translate lazily, and for others it's more efficient to just translate everything once one input and once on output. (It also tends to be a little cleaner to isolate "lossy" translations to one spot in the program. By the round-trip nature of Unicode, most of the lossy translations would be on output.) But anyway, a bit about my own psychology. I grew up as a preacher's kid in a fundamentalist setting, and I heard a lot of arguments of the form, "I'm not offended by this, but I'm afraid someone else might be offended, so you shouldn't do it." I eventually learned to discount such arguments to preserve my own sanity, so saying "someone might be disappointed" is not quite sufficient to motivate me to action. Plus there are a lot of people out there who are never happy unless they have something to be unhappy about. If I thought that I could design a language that will never disappoint anyone, I'd be a lot stupider than I already think I am, I think. All that being said, you can do whatever you like with Parrot, and if you give a decent enough API, someone will link it into Perl 6. :-) Larry