Re: new sigil
> Luke Palmer wrote: > >> limited access to system settings. >> And in those kinds of corporate environments, you're not going to be >> working with any code but code written in-house. Which means that >> nobody is going to be using Latin-1, and everyone will be using the >> ASCII synonyms. What's the problem? Dave Whipp wrote: > My experience is that this isn't true: we use lots of external code, > but I still need to file requests with IT to get system-settings changed. Right. We rely on Perl libraries from CPAN, and elsewhere. You have to make sure that the code you are looking at is transfered via utf-8 aware systems only. It is not safe that we decide to use ASCII synonyms ourselves. We have to be sure that all the modules, which happen to have Unicode sigils/ops, should be installed without intervening legacy systems. Explanation of the situation in Japan follows. Those who are not interested in Japan can skip. Seemingly this problem is very unique to Japan. It's already one year since yen sign became zip-operator. This is not to kick a discussion, just a whining of mine. :P Ancient ISO-646 allowed variants, which substitute certain part of ASCII characters with local symbols. Currency signs were the first candidates of this. http://en.wikipedia.org/wiki/ISO_646 This legacy convention is still alive in Japan as JIS/ShiftJIS encodings. I hope Unicode supercedes them and the "backslash-yen" confusion would disappear, but the movement is not quick enough. The problem doesn't reside in writing code but in carrying files. - You cannot tell whether a text file is in US-ASCII, utf8, or ShiftJIS, when all the code points are below 0x7f. It is too late when you receive a code snippet from your colleague by mail. - If we convert yen from Latin-1 (0xa5) to Unicode (utf8=c2a5), then to "the default coding system, which is believed to be ASCII but actually ShiftJIS", it becomes 0x5c. There's no way to tell whether the byte was a bachslash or a yen at the beginning. Grepping for yen signs doesn't help because at the time you run grep, they are already backslashes. If we find a lot of yen sign as zip-operator in the standard library, we have a big question: "Give up either Perl6 or Windows. Which do we abandon?" And I suppose the answer would be "We have a lot of substitutes to Perl6: Ruby, Perl5, etc." In Japan, yes is synonym to backslash. We wish to retain this legacy. Zip-operator is far less important than regex-escape, string-escape, and take-reference operator. -- Kaoru Maeda [EMAIL PROTECTED]
Re: new sigil
Luke Palmer wrote: >> limited access to system settings. >> And in those kinds of corporate environments, you're not going to be >> working with any code but code written in-house. Which means that >> nobody is going to be using Latin-1, and everyone will be using the >> ASCII synonyms. What's the problem? Dave Whipp wrote: > My experience is that this isn't true: we use lots of external code, > but I still need to file requests with IT to get system-settings changed. Right. We rely on Perl libraries from CPAN, and elsewhere. You have to make sure that the code you are looking at is transfered via utf-8 aware systems only. It is not safe that we decide to use ASCII synonyms ourselves. We have to be sure that all the modules, which happen to have Unicode sigils/ops, should be installed without intervening legacy systems. Explanation of the situation in Japan follows. Those who are not interested in Japan can skip. Seemingly this problem is very unique to Japan. (It's already one year since yen sign became zip-operator. This is not to kick an argument, just a whining of mine. :P) The problem doesn't reside in writing code but in carrying files. - You cannot tell whether a text file is in US-ASCII, utf8, or ShiftJIS, when all the code points are below 0x7f. It is too late when you receive a code snippet from your colleague by mail. - If we convert yen from Latin-1 (0xa5) to Unicode (utf8=c2a5), then to "the default coding system, which is believed to be ASCII but actually ShiftJIS", it becomes 0x5c. There's no way to tell whether the byte was a bachslash or a yen at the beginning. Grepping for yen signs doesn't help because at the time you run grep, they are already backslashes. If we find a lot of yen signs as zip-operators in the standard library, Japanese would have a big question: "Give up either Perl6 or Windows. Which do we need?" And I suppose the answer would be "We have a lot of substitutes to Perl6: Ruby, Perl5, etc." In <[EMAIL PROTECTED]> Larry wrote: > (Of course, we'll leave out the little problem that half the people > in Japan would read it as a backslash wannabe...that's not really > a problem since a zipper would only be used where an operator is > expected, and backslash is illegal there (so far).) It is not the people who read a yen as a backslash, but the legacy systems. We might define backslash as a synonym for the zip op, but it's too risky. "Yen as zip" has the same magnitude of risk in Japan. -- Kaoru Maeda [EMAIL PROTECTED]
Re: new sigil
Darren Duncan wrote: In this case, I support the use of any international currency symbol for use as Perl sigils and/or operators as appropriate. Eg, we already use $ (dollar; unicode=0024; utf8=24) and ¥ (yen; unicode=00A5; utf8=C2A5), and I suggest that the next best one to exploit is ¤ (euro; unicode=20AC; utf8=E282AC), and the next best is £ (pound; unicode=00A3; utf8=C2A3). In my experience, the ¢ (cent; unicode=00A3; utf8=C2A3) is no harder to type than either of those. I haven't read this list for quite a long time, but do we already have the yen sign as a sigil? In Japan, there has been a big confusion between backslashes and yen signs over two decades. The code point 0x5c is a backslash in ASCII but it is the yen sign in JISX0201. When I display ASCII Perl program with my Japanese Windows' notepad, it shows all the backslashes as yen signs. Japanese Perl books sometimes tell: "If you cannot find a backslash on your keyboard, use the yen sign". Thus we usually think yen = ascii 005c, my eyes are optimized to unify a backslash and a yen sign in program codes, my finger is optimized to hit the yen key when my brain thinks of a backslash. It's already merged into my reflection :P Yes, I know. Careful configuration of your editor should allow you to distinguish ASCII 0x5c from JISX0201 0x5c. But in Japan, only a very keen coding-system/character-set wizard can do that. Don't you have similar confusions with the pound sign in ISO-646 British version? > the next best is £ (pound; unicode=00A3; utf8=C2A3) Isn't that 0x23 in UK? I imagine that someday all the comment lines cause syntax errors in UK... Sorry if this is an already discussed and solved issue. -- Kaoru Maeda [EMAIL PROTECTED]
Re: RFC 230 (v2) Replace C built-in with pragmatically-induced C function
Some of oriental characters in Japanese and Korean are usually aligned as if they have 2 columns per character. Jperl has been patched on format built-in so that Japanese characters get special treatments: - 2-byte characters occupy 2 columns * this assumption is not strictly correct, but good enough for practical use. - don't split a 2-byte character in the middle * pad a space if necessary * ellipses might be changed to "... " instead of "..." - text is breakable before or after a 2-byte character, regardless of $FORMAT_LINE_BREAK_CHARACTERS. * possible breakpoints between 1-byte characters are the same as the original Perl. Japanese has another formatting rule that punctuation characters cannot appear at the beginning of or end of line (depending on their meanings). This rule is not implemented in Jperl. Most text-formatting programs like web browsers neither have this "disabling rules". Commercial word-processing/DTP's and the Mule editor (multi-lingual Emacs) have. I have two ideas: - User-specifiable break sub (as in Text::Autosplit) looks after all of above. Expensive at runtime. - A small lookup-table which maps a charcter to its width. A user-specified table may provide proportional formatting. Japanese would set the lookup-table so that Kanji characters have twice as wide as ASCII; this would produce the same output as current Jperl. Those who need disabling rules would use their own break function for better output, others would leave it default for speed. Hmm... Text::Autosplit::replace has hard-coded split-on-whitespace loop. I'm not sure but this may cause "disabled at end-of-line" characters at eol. --- Avatar Md+ d/ HH \. Md+ Kaoru "Mad Player" MAEDA 75t 145km/h AFC50 O \#oo#/ " LG+ LG+ [EMAIL PROTECTED]HeatSink 13 LRM10 .=X~~X=. LRM10 --- Armor 19.5t Md+ _|__|_ Md+