Re: [MacPerl] Re: problem with Japanese text
On Friday, Mar 28, 2003, at 11:37 Asia/Tokyo, Joel Rees wrote: Not sure if my comments are relevant, just feeling inclined to expose my ignorance -- And here is mine. Japanese is one of those languages that has relatively few specifically plural forms. To get the pluralizations right in Japanese, the program would have to consult a dictionary. More exactly speaking, Japanese has no plural form in a sense of Indo-European languages. Japanese totally lacks subject-verb agreement so you don have to delete the "es" in "does" when you change the subject form "s/he" to "they". On the other hand, counting can be tricky even for natives. The very name of numbers changes depending on what you count. When you count people it goes hito-ri, futa-ri, san-nin but when you count object it goes hito-tsu, futa-tsu (or ik-ko, ni-ko,) and the list goes on (I think this "number-object agreement" came from Chinese). But when the number is not an issue, you can totally forget if a subject is singular or plural. Pluralization could probably be ignored for this purpose for Japanese, but, if the purpose is to produce text that the technically un-inclined can parse reasonably effortlessly, there are all sorts of other context related issues, most of which would require not just vocabulary dictionaries, but idiom dictionaries as well. And your locale machinery would have to include some sensitivity to dialect issues and social status issues, to make the generated text natural and non-offending. I feel Japanese is a hard language to compose because of that but that also makes Japanese easier to read because Japanese tend to include not only "what to say" but also "in what situation by what kind of person says". In English the singular nominative pronoun is nothing but "I", no matter how old or young you are or whether you are a boy or a girl (or a computer). But in Japanese it can be "Watashi" or "Boku" or "Ore" or "Maro" or "Warawa" or "Sessha" or "Jibun" or "Ware" even English "me" can be used. Maybe to compensate this complexity, Japanese grammar seems much simpler. No subject-verb agreement, very few irregular verbs It is far easier to compose a grammatically correct Japanese. It gets darn hard once you aim for social and political correctness. Japanese is becoming more egalitarian, more homogenized, and less colorful, so those who work on such things are aiming at a moving target. Less colorful I am not sure because at the same time the newer, simple, and more boring expressions are pervasive, the old and more complex expressions hardly die. So in total Japanese is getting richer. Well, the total richness of Japanese is I believe is increasing but ironically "per-capita" richness might not be. But I believe this phenomenon is not unique to Japanese; could be even more prevalent in English. If you don't believe me just compare the Two Bushes in White House :) Thinking about the recognizer side, did anyone mention that Japanese text does not use word delimiters? Space has a somewhat different meaning for Japanese. Japanese tokenization is nothing but a trivial issue. In Japanese the very notion of "a word" is often moot. Nevertheless, we do have good enough tokenizers to implement input methods and search engines. Of course they are not perfect but the Japanese are very frank about the lack of perfection. After all we don't even have "de jure standard Japanese" to compare. Dan the Man with Too Many Languages to Deal with
Re: [MacPerl] Re: problem with Japanese text
Not sure if my comments are relevant, just feeling inclined to expose my ignorance -- > Character set difficulties are still a real problem, but so is dynamic > text. Damian Conway's paper > >"An Algorithmic Approach to English Pluralization" >http://www.csse.monash.edu.au/~damian/papers/HTML/Plurals.html > > contains some fairly complicated tools for generating dynamically- > pluralized English. Now generalize that tool set for multiple > languages and/or more complex variations. Right. Japanese is one of those languages that has relatively few specifically plural forms. To get the pluralizations right in Japanese, the program would have to consult a dictionary. > In my current work, I am generating user-specific explanations for the > permission and ownership information in (roughly) an "ls -al" listing. > That is, the user gets three paragraphs, saying (a) what the effect of > these permissions is on the user, (b) how this was derived, and (c) > what the item's permissions are, as a whole. I see the reason for the interest in automatic pluralization there. Pluralization could probably be ignored for this purpose for Japanese, but, if the purpose is to produce text that the technically un-inclined can parse reasonably effortlessly, there are all sorts of other context related issues, most of which would require not just vocabulary dictionaries, but idiom dictionaries as well. And your locale machinery would have to include some sensitivity to dialect issues and social status issues, to make the generated text natural and non-offending. Japanese is becoming more egalitarian, more homogenized, and less colorful, so those who work on such things are aiming at a moving target. Thinking about the recognizer side, did anyone mention that Japanese text does not use word delimiters? Space has a somewhat different meaning for Japanese. -- Joel Rees <[EMAIL PROTECTED]>
Re: [MacPerl] Re: problem with Japanese text
Character set difficulties are still a real problem, but so is dynamic text. Damian Conway's paper "An Algorithmic Approach to English Pluralization" http://www.csse.monash.edu.au/~damian/papers/HTML/Plurals.html contains some fairly complicated tools for generating dynamically- pluralized English. Now generalize that tool set for multiple languages and/or more complex variations. Right. In my current work, I am generating user-specific explanations for the permission and ownership information in (roughly) an "ls -al" listing. That is, the user gets three paragraphs, saying (a) what the effect of these permissions is on the user, (b) how this was derived, and (c) what the item's permissions are, as a whole. For example: permission bits mode owner group type name --- - - rwx rwx r-xt 1775 root wheel directory Users /Users -- You have read, write, and execute (rwx) permissions for this directory. This allows you to inspect, change (e.g., add to), and access its contents. Because the sticky bit (t) is set, you may not remove other users' files. The user id for this node does not match your effective user id, but the group id matches one of your effective group ids. Consequently, your access is controlled by the node's "group" permissions (rwx, as shown in the second field of the "permission bits" column). The node's owner (root) has read, write, and execute permissions (rwx). Members of group "wheel" have read, write, and execute permissions (rwx). Other users have read and execute permissions (r-x). The sticky bit (t in the third field) is set; files in this directory may only be removed or renamed by a user if the user has write permission for the directory and the user is the owner of the file, the owner of the directory, or the super-user. See sticky(8) for more information. As I was writing generation code for the text above, the prospect of modifying the code for multiple languages crossed my mind. I quickly decided, however, that this was unlikely to be my problem. Even if I weren't firmly monolingual, the process of generalizing this code is going to be quite language-specific (and doing it automagically is AI-complete :-). -r -- email: [EMAIL PROTECTED]; phone: +1 650-873-7841 http://www.cfcl.com/rdm- my home page, resume, etc. http://www.cfcl.com/Meta - The FreeBSD Browser, Meta Project, etc. http://www.ptf.com/dossier - Prime Time Freeware's DOSSIER series http://www.ptf.com/tdc - Prime Time Freeware's Darwin Collection
Re: [MacPerl] Re: problem with Japanese text
On Thursday, March 27, 2003, at 01:31 am, Chris Nandor wrote: [EMAIL PROTECTED] (Robin) wrote: MacPerl per se historically has not been aware of locale outside of ascii defined ones (not sure about the latest version). Is there a reason for MacJPerl when MacPerl 5.8.x is released? while the 5.8 perl interpreter has built in unicode support, how you would go about displaying, editing or even using a perl script containing Japanese characters on OS9 (even with the Japanese Language kit) is no small task (try a simple regex substitution and you'll see what I mean). OSX makes it potentially easier, but most software is lagging far behing the promise, still coming in mono lingual mindset rather than multilingual, and all that this entails. Anyway for anyone intersted in more info about the history and development of Japanese text encodings, here's a link to one of the best pages I found so far on the web: http://tronweb.super-nova.co.jp/characcodehist.html Robin
Re: Another shell/GUI cooperation script
In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] (Ken Williams) wrote: > Chris, would it be fun/possible to convert all that applescript to perl? Yep. Note that currently you cannot "tell" an object in Mac::Glue, but no big deal; that is just syntactic sugar. You'll see that apart from that, it is basically a line-by-line translation of syntax, using the same words but in Perl. use Mac::Glue ':all'; my $mail = new Mac::Glue 'Mail'; my $msg = $mail->make( new => 'outgoing message' ); $mail->set( $mail->prop( visible => $msg ), to => gTrue() ); my $last_char = $mail->obj( character => gLast(), of => content => $msg ); $mail->make( new => 'attachment', at => location( after => $last_char ), with_properties => { file_name => "/etc/motd" } ); $mail->activate; The most difficult thing is to know what it means, for example, to tell newMessage to set visible to true. In this case, it means the same as set visible of newMessage to true, where visible is a property, so we prop(visible => $msg), to => gTrue(). Yay fun. See similar fun with the location insertion record for "after last character of content of message" (the extra "of" in the perl is because content is a property and not an object ... at some point I want to get rid of that extra verbiage, which should be possible). See the Mac::Glue POD for more details. Of course, this currently works only in MacPerl (but the script above does, in fact, work from MacPerl in Classic to control Mail.app). But as noted before, I am working to finish porting it all to work under Mac OS X, and most of the pieces are in place, I just need to find the time to finish it all up. Before summer. :) -- Chris Nandor [EMAIL PROTECTED]http://pudge.net/ Open Source Development Network[EMAIL PROTECTED] http://osdn.com/