Re: new sigil

2005-10-23 Thread Kaoru Maeda

> Luke Palmer wrote:
>
>> limited access to system settings.
>> And in those kinds of corporate environments, you're not going to be
>> working with any code but code written in-house.  Which means that
>> nobody is going to be using Latin-1, and everyone will be using the
>> ASCII synonyms.  What's the problem?

Dave Whipp wrote:
> My experience is that this isn't true: we use lots of external code,
> but I still need to file requests with IT to get system-settings changed.

Right.  We rely on Perl libraries from CPAN, and elsewhere.
You have to make sure that the code you are looking at is
transfered via utf-8 aware systems only.
It is not safe that we decide to use ASCII synonyms ourselves.
We have to be sure that all the modules, which happen to
have Unicode sigils/ops, should be installed without intervening
legacy systems.

Explanation of the situation in Japan follows.  Those who are not
interested in Japan can skip.  Seemingly this problem is very unique
to Japan.  It's already one year since yen sign became zip-operator.
This is not to kick a discussion, just a whining of mine. :P

Ancient ISO-646 allowed variants, which substitute certain part of ASCII 
characters
with local symbols.  Currency signs were the first candidates of this.
http://en.wikipedia.org/wiki/ISO_646
This legacy convention is still alive in Japan as JIS/ShiftJIS encodings.
I hope Unicode supercedes them and the "backslash-yen" confusion would 
disappear,
but the movement is not quick enough.

The problem doesn't reside in writing code but in carrying files.
  - You cannot tell whether a text file is in US-ASCII, utf8,
or ShiftJIS, when all the code points are below 0x7f.  It is too
late when you receive a code snippet from your colleague by mail.
  - If we convert yen from Latin-1 (0xa5) to Unicode
(utf8=c2a5), then to "the default coding system,
which is believed to be ASCII but actually
ShiftJIS", it becomes 0x5c.  There's no way to tell
whether the byte was a bachslash or a yen at the beginning.

Grepping for yen signs doesn't help because at the time you
run grep, they are already backslashes.

If we find a lot of yen sign as zip-operator in the standard library,
we have a big question: "Give up either Perl6 or Windows.  Which do we abandon?"
And I suppose the answer would be "We have a lot of substitutes to Perl6:
Ruby, Perl5, etc."

In Japan, yes is synonym to backslash.  We wish to retain this legacy.
Zip-operator is far less important than regex-escape, string-escape, and
take-reference operator.

--
Kaoru Maeda
[EMAIL PROTECTED]


Re: new sigil

2005-10-23 Thread maeda
Luke Palmer wrote:
>> limited access to system settings.
>> And in those kinds of corporate environments, you're not going to be
>> working with any code but code written in-house.  Which means that
>> nobody is going to be using Latin-1, and everyone will be using the
>> ASCII synonyms.  What's the problem?

Dave Whipp wrote:
> My experience is that this isn't true: we use lots of external code,
> but I still need to file requests with IT to get system-settings changed.

Right.  We rely on Perl libraries from CPAN, and elsewhere.  You
have to make sure that the code you are looking at is transfered
via utf-8 aware systems only.  It is not safe that we decide to
use ASCII synonyms ourselves.  We have to be sure that all the
modules, which happen to have Unicode sigils/ops, should be
installed without intervening legacy systems.

Explanation of the situation in Japan follows.  Those who are not
interested in Japan can skip.  Seemingly this problem is very unique
to Japan.

(It's already one year since yen sign became zip-operator.
This is not to kick an argument, just a whining of mine. :P)

The problem doesn't reside in writing code but in carrying files.
   - You cannot tell whether a text file is in US-ASCII, utf8,
 or ShiftJIS, when all the code points are below 0x7f.  It
 is too late when you receive a code snippet from your
 colleague by mail.
   - If we convert yen from Latin-1 (0xa5) to Unicode
 (utf8=c2a5), then to "the default coding system, which is
 believed to be ASCII but actually ShiftJIS", it becomes
 0x5c.  There's no way to tell whether the byte was a
 bachslash or a yen at the beginning.

Grepping for yen signs doesn't help because at the time you run
grep, they are already backslashes.

If we find a lot of yen signs as zip-operators in the standard
library, Japanese would have a big question: "Give up either
Perl6 or Windows.  Which do we need?"  And I suppose the answer
would be "We have a lot of substitutes to Perl6: Ruby, Perl5,
etc."

In <[EMAIL PROTECTED]> Larry wrote:
> (Of course, we'll leave out the little problem that half the people
> in Japan would read it as a backslash wannabe...that's not really
> a problem since a zipper would only be used where an operator is
> expected, and backslash is illegal there (so far).)

It is not the people who read a yen as a backslash, but the
legacy systems.  We might define backslash as a synonym for the
zip op, but it's too risky.  "Yen as zip" has the same magnitude
of risk in Japan.

-- 
Kaoru Maeda
[EMAIL PROTECTED]


Re: new sigil

2005-10-21 Thread Kaoru Maeda

Darren Duncan wrote:

In this case, I support the use of any international currency symbol 
for use as Perl sigils and/or operators as appropriate.  Eg, we 
already use $ (dollar; unicode=0024; utf8=24) and ¥ (yen; 
unicode=00A5; utf8=C2A5), and I suggest that the next best one to 
exploit is ¤ (euro; unicode=20AC; utf8=E282AC), and the next best is £ 
(pound; unicode=00A3; utf8=C2A3).  In my experience, the ¢ (cent; 
unicode=00A3; utf8=C2A3) is no harder to type than either of those. 


I haven't read this list for quite a long time, but do we already have 
the yen sign as a sigil?
In Japan, there has been a big confusion between backslashes and yen 
signs over two decades.
The code point 0x5c is a backslash in ASCII but it is the yen sign in 
JISX0201.
When I display ASCII Perl program with my Japanese Windows' notepad, it 
shows all the backslashes as yen signs.

Japanese Perl books sometimes tell:
 "If you cannot find a backslash on your keyboard, use the yen sign".
Thus we usually think yen = ascii 005c,
my eyes are optimized to unify a backslash and a yen sign in program codes,
my finger is optimized to hit the yen key when my brain thinks of a 
backslash. 
It's already merged into my reflection :P


Yes, I know.  Careful configuration of your editor should allow you to 
distinguish ASCII 0x5c from JISX0201 0x5c.
But in Japan, only a very keen coding-system/character-set wizard can do 
that.


Don't you have similar confusions with the pound sign in ISO-646 British 
version?

> the next best is £ (pound; unicode=00A3; utf8=C2A3)
Isn't that 0x23 in UK?  I imagine that someday all the comment lines 
cause syntax errors in UK...


Sorry if this is an already discussed and solved issue.

--
Kaoru Maeda
[EMAIL PROTECTED]



Re: RFC 230 (v2) Replace C built-in with pragmatically-induced C function

2000-09-19 Thread maeda

Some of oriental characters in Japanese and Korean are usually
aligned as if they have 2 columns per character.  Jperl has been
patched on format built-in so that Japanese characters get
special treatments:
  - 2-byte characters occupy 2 columns
* this assumption is not strictly correct, but good enough
  for practical use.
  - don't split a 2-byte character in the middle
* pad a space if necessary
* ellipses might be changed to "... " instead of "..."
  - text is breakable before or after a 2-byte character,
regardless of $FORMAT_LINE_BREAK_CHARACTERS.
* possible breakpoints between 1-byte characters are the
  same as the original Perl.

Japanese has another formatting rule that punctuation characters
cannot appear at the beginning of or end of line (depending on
their meanings).  This rule is not implemented in Jperl.  Most
text-formatting programs like web browsers neither have this
"disabling rules".  Commercial word-processing/DTP's and the
Mule editor (multi-lingual Emacs) have.

I have two ideas:
  - User-specifiable break sub (as in Text::Autosplit) looks
after all of above.  Expensive at runtime.
  - A small lookup-table which maps a charcter to its width.
A user-specified table may provide proportional formatting.
Japanese would set the lookup-table so that Kanji characters
have twice as wide as ASCII; this would produce the same output
as current Jperl.  Those who need disabling rules would use
their own break function for better output, others would leave
it default for speed.

Hmm... Text::Autosplit::replace has hard-coded
split-on-whitespace loop.  I'm not sure but this may cause
"disabled at end-of-line" characters at eol.

 
---  Avatar      Md+   d/ HH \.   Md+
   Kaoru "Mad Player" MAEDA  75t 145km/h   AFC50  O \#oo#/ "  LG+ LG+
   [EMAIL PROTECTED]HeatSink 13   LRM10   .=X~~X=.   LRM10
---  Armor 19.5t Md+  _|__|_  Md+