Re: [MacPerl] Re: problem with Japanese text

2003-03-27 Thread Dan Kogai
On Friday, Mar 28, 2003, at 11:37 Asia/Tokyo, Joel Rees wrote:
Not sure if my comments are relevant, just feeling inclined to expose 
my
ignorance --
And here is mine.

Japanese is one of those languages that has relatively few specifically
plural forms. To get the pluralizations right in Japanese, the program
would have to consult a dictionary.
More exactly speaking, Japanese has no plural form in a sense of 
Indo-European languages.  Japanese totally lacks subject-verb agreement 
so you don have to delete the "es" in "does" when you change the 
subject form "s/he" to "they".

On the other hand, counting can be tricky even for natives.  The very 
name of numbers changes depending on what you count.  When you count 
people it goes hito-ri, futa-ri, san-nin but when you count object it 
goes hito-tsu, futa-tsu (or ik-ko, ni-ko,) and the list goes on (I 
think this "number-object agreement" came from Chinese).

But when the number is not an issue, you can totally forget if a 
subject is singular or plural.

Pluralization could probably be ignored for this purpose for Japanese,
but, if the purpose is to produce text that the technically un-inclined
can parse reasonably effortlessly, there are all sorts of other context
related issues, most of which would require not just vocabulary
dictionaries, but idiom dictionaries as well. And your locale 
machinery would
have to include some sensitivity to dialect issues and social status
issues, to make the generated text natural and non-offending.
I feel Japanese is a hard language to compose because of that but that 
also makes Japanese easier to read because Japanese tend to include not 
only "what to say" but also "in what situation by what kind of person 
says".  In English the singular nominative pronoun is nothing but "I", 
no matter how old or young you are or whether you are a boy or a girl 
(or a computer).  But in Japanese it can be "Watashi" or "Boku" or 
"Ore" or "Maro" or "Warawa" or "Sessha" or "Jibun" or "Ware"  even 
English "me" can be used.

Maybe to compensate this complexity, Japanese grammar seems much 
simpler.   No subject-verb agreement, very few irregular verbs   It 
is far easier to compose a grammatically correct Japanese.  It gets 
darn hard once you aim for social and political correctness.

Japanese is becoming more egalitarian, more homogenized, and less
colorful, so those who work on such things are aiming at a moving 
target.
Less colorful I am not sure because at the same time the newer, simple, 
and more boring expressions are pervasive, the old and more complex 
expressions hardly die.  So in total Japanese is getting richer.  Well, 
the total richness of Japanese is I believe is increasing but 
ironically "per-capita" richness might not be.  But I believe this 
phenomenon is not unique to Japanese; could be even more prevalent in 
English.  If you don't believe me just compare the Two Bushes in White 
House :)

Thinking about the recognizer side, did anyone mention that Japanese
text does not use word delimiters? Space has a somewhat different
meaning for Japanese.
Japanese tokenization is nothing but a trivial issue.  In Japanese the 
very notion of "a word" is often moot.  Nevertheless, we do have good 
enough tokenizers to implement input methods and search engines.  Of 
course they are not perfect but the Japanese are very frank about the 
lack of perfection.  After all we don't even have "de jure standard 
Japanese" to compare.

Dan the Man with Too Many Languages to Deal with



Re: [MacPerl] Re: problem with Japanese text

2003-03-27 Thread Joel Rees
Not sure if my comments are relevant, just feeling inclined to expose my
ignorance --

> Character set difficulties are still a real problem, but so is dynamic
> text.  Damian Conway's paper
> 
>"An Algorithmic Approach to English Pluralization"
>http://www.csse.monash.edu.au/~damian/papers/HTML/Plurals.html
> 
> contains some fairly complicated tools for generating dynamically-
> pluralized English.  Now generalize that tool set for multiple
> languages and/or more complex variations.  Right.

Japanese is one of those languages that has relatively few specifically
plural forms. To get the pluralizations right in Japanese, the program
would have to consult a dictionary.

> In my current work, I am generating user-specific explanations for the
> permission and ownership information in (roughly) an "ls -al" listing.
> That is, the user gets three paragraphs, saying (a) what the effect of
> these permissions is on the user, (b) how this was derived, and (c)
> what the item's permissions are, as a whole. 

I see the reason for the interest in automatic pluralization there.

Pluralization could probably be ignored for this purpose for Japanese,
but, if the purpose is to produce text that the technically un-inclined
can parse reasonably effortlessly, there are all sorts of other context
related issues, most of which would require not just vocabulary
dictionaries, but idiom dictionaries as well. And your locale machinery would
have to include some sensitivity to dialect issues and social status
issues, to make the generated text natural and non-offending.

Japanese is becoming more egalitarian, more homogenized, and less
colorful, so those who work on such things are aiming at a moving target.

Thinking about the recognizer side, did anyone mention that Japanese
text does not use word delimiters? Space has a somewhat different
meaning for Japanese.

-- 
Joel Rees <[EMAIL PROTECTED]>



Re: [MacPerl] Re: problem with Japanese text

2003-03-27 Thread Rich Morin
Character set difficulties are still a real problem, but so is dynamic
text.  Damian Conway's paper
  "An Algorithmic Approach to English Pluralization"
  http://www.csse.monash.edu.au/~damian/papers/HTML/Plurals.html
contains some fairly complicated tools for generating dynamically-
pluralized English.  Now generalize that tool set for multiple
languages and/or more complex variations.  Right.
In my current work, I am generating user-specific explanations for the
permission and ownership information in (roughly) an "ls -al" listing.
That is, the user gets three paragraphs, saying (a) what the effect of
these permissions is on the user, (b) how this was derived, and (c)
what the item's permissions are, as a whole.  For example:
  permission bits  mode  owner  group  type   name
  ---    -  -     
  rwx  rwx  r-xt   1775  root   wheel  directory  Users
  /Users
  --
  You have read, write, and execute (rwx) permissions for this
  directory.  This allows you to inspect, change (e.g., add to), and
  access its contents.  Because the sticky bit (t) is set, you may not
  remove other users' files.
  The user id for this node does not match your effective user id, but
  the group id matches one of your effective group ids.  Consequently,
  your access is controlled by the node's "group" permissions (rwx, as
  shown in the second field of the "permission bits" column).
  The node's owner (root) has read, write, and execute permissions
  (rwx).  Members of group "wheel" have read, write, and execute
  permissions (rwx).  Other users have read and execute permissions
  (r-x).  The sticky bit (t in the third field) is set; files in this
  directory may only be removed or renamed by a user if the user has
  write permission for the directory and the user is the owner of the
  file, the owner of the directory, or the super-user.  See sticky(8)
  for more information.
As I was writing generation code for the text above, the prospect of
modifying the code for multiple languages crossed my mind.  I quickly
decided, however, that this was unlikely to be my problem.  Even if I
weren't firmly monolingual, the process of generalizing this code is
going to be quite language-specific (and doing it automagically is
AI-complete :-).
-r
--
email: [EMAIL PROTECTED]; phone: +1 650-873-7841
http://www.cfcl.com/rdm- my home page, resume, etc.
http://www.cfcl.com/Meta   - The FreeBSD Browser, Meta Project, etc.
http://www.ptf.com/dossier - Prime Time Freeware's DOSSIER series
http://www.ptf.com/tdc - Prime Time Freeware's Darwin Collection


Re: [MacPerl] Re: problem with Japanese text

2003-03-27 Thread Robin
On Thursday, March 27, 2003, at 01:31  am, Chris Nandor wrote:

[EMAIL PROTECTED] (Robin) wrote:
MacPerl per se historically has not been aware of locale outside of
ascii defined ones (not sure about the latest version).
Is there a reason for MacJPerl when MacPerl 5.8.x is released?
 while the 5.8 perl interpreter has built in unicode support, how you 
would go about displaying, editing or even using a perl script 
containing Japanese characters on OS9 (even with the Japanese Language 
kit) is no small task (try a simple regex substitution and you'll see 
what I mean). OSX makes it potentially easier, but most software is 
lagging far behing the promise, still coming in mono lingual mindset 
rather than multilingual, and all that this entails.

Anyway for anyone intersted in more info about the history and 
development of Japanese text encodings, here's a link to one of the 
best pages I found so far on the web:
http://tronweb.super-nova.co.jp/characcodehist.html

Robin



Re: Another shell/GUI cooperation script

2003-03-27 Thread Chris Nandor
In article <[EMAIL PROTECTED]>,
 [EMAIL PROTECTED] (Ken Williams) wrote:

> Chris, would it be fun/possible to convert all that applescript to perl?

Yep.  Note that currently you cannot "tell" an object in Mac::Glue, but no 
big deal; that is just syntactic sugar.  You'll see that apart from that, it 
is basically a line-by-line translation of syntax, using the same words but 
in Perl.

   use Mac::Glue ':all';

   my $mail = new Mac::Glue 'Mail';

   my $msg = $mail->make( new => 'outgoing message' );
   $mail->set( $mail->prop( visible => $msg ), to => gTrue() );

   my $last_char = $mail->obj( character => gLast(), of => content => $msg );
   $mail->make(
   new => 'attachment',
   at  => location( after => $last_char ),
   with_properties => {
   file_name => "/etc/motd"
   }
   );

   $mail->activate;

The most difficult thing is to know what it means, for example, to tell 
newMessage to set visible to true.  In this case, it means the same as set 
visible of newMessage to true, where visible is a property, so we 
prop(visible => $msg), to => gTrue().  Yay fun.

See similar fun with the location insertion record for "after last character 
of content of message" (the extra "of" in the perl is because content is a 
property and not an object ... at some point I want to get rid of that extra 
verbiage, which should be possible).

See the Mac::Glue POD for more details.

Of course, this currently works only in MacPerl (but the script above does, 
in fact, work from MacPerl in Classic to control Mail.app).  But as noted 
before, I am working to finish porting it all to work under Mac OS X, and 
most of the pieces are in place, I just need to find the time to finish it 
all up.  Before summer.  :)

-- 
Chris Nandor  [EMAIL PROTECTED]http://pudge.net/
Open Source Development Network[EMAIL PROTECTED] http://osdn.com/