Re: [OT] Japanese (correction about Mail)

2003-07-07 Thread Joel Rees
 I took another look at some garbled spam I seem to be picking up
 regularly, which I had mistakenly assumed to be from a Korean source,
 and it looks like Apple's mail app in 10.2.4 is _not_ handling 7-bit JIS
 correctly. More later.

rant
Crud. I have some resume pages that I _know_ are shift JIS (looked at
the byte values with hexdump), but Metrowerks Codewarrior 5 editor is
won't play fair with them. Apparently, the pages I'm working with are
ones which I pulled back off the web (stripping the resource fork) and
edited with something that did not leave a traditional resource fork,
but whatever the Mac OS X file system is using instead of a resource
fork.

Haven't had time to pin things down, but Mac OS X's Text Edit utility
and the OS are playing strange guessing games on me, and the result is
that, even with Classic booted and CW 5 set to the Osaka font, what it's
showing me is as if the text were Mac (Latin) 8 bit.

Pages which do not have the incontinuity of going to the web and back
again are showing just fine. And Text Edit, when I save, adds another 8K
to the file, just because I changed the encoding on the save. Four
changes and 900 bytes of text is now 40K.

Whoever convinced the architects for the Mac OS X utilities and OS to
insist on converting the encoding when I just want to change the
interpretation, I'd like to have a little heart-to-heart with.
Converting should be NOT be under the Format menu in the editor in the
dev tools. If Save As is not enough, we need another menu. What's
under Format should only change the interpretion of the byte values.

I'll post an update on the interactions in a day or two, unless someone
throws a red flag about topicality. Maybe I'll be a bit more rational,
too.
/rant

-- 
Joel, muttering Japanese expletives while wandering over to the Mac OS X
bug reporting pages, if they are still there.



Re: [OT] Japanese (correction about Mail)

2003-06-23 Thread Joel Rees
I took another look at some garbled spam I seem to be picking up
regularly, which I had mistakenly assumed to be from a Korean source,
and it looks like Apple's mail app in 10.2.4 is _not_ handling 7-bit JIS
correctly. More later.

But, while I was checking that, I checked the following:

 I managed to find the kanji I asked the person about with the charecter palette
 description she gave, but it was or could be described otherwise as: Unicode
 5782, JIS(X0213) 1-31-66, Shift JIS(X0208) 9082: 

tarasu/tareru (hang down)

 and it was mojibake'ed as
 ($BEZ(B)

That kind of looks like seven-bit JIS. The $B is a piece of a control
sequence when mixing 7-bit JIS with 7-bit ANSI.

EZ is the 7-bit JIS for tsuchi (earth, dirt). And BE is 7-bit JIS for
the da in datou (valid). Nope. Something else happened to that.

 Some other codes she sent, and hence probably in the same encoding,
 were $B7V(B and $Bj%(B both for hotaru.

7V is the 7-bit JIS for hotaru (firefly). j% is 7-bit JIS for a more
traditional rendering of hotaru.

Here's the meat of the source of a C tool I wrote to check:
-
for ( i = 0; i  kTermWidth - 1; i += 2 )
{   unsigned long byte1 = (unsigned char) buf[ i ] - 0x21;  /* kuten */
unsigned long byte2 = (unsigned char) buf[ i + 1 ] - 0x21;  /* 
kuten */
if ( byte1 == '\0' )
break;
byte2 += 0x40;
if ( ( byte1  1 ) == 1 )
byte2 += 94;
if ( byte2  0x7e )
++byte2;
byte1 = 1;
byte1 += 0x81;
if ( byte1  0x9f )
byte1 += 0x40;
buf[ i ] = (char) byte1;
buf[ i + 1 ] = (char) byte2;
}
buf[ kTermWidth ] = '\0';   /* training wheels */
-

(Yeah, C comes more natural to me than perl. Especially for this kind of
stuff. So shoot me.) It's missing the escape sequence and end-of-line
handling, among other things, but may be amusing to those interested in
the relationship between 7-bit JIS and shift-JIS.

 
 some other strings are:
 a$EAaD (this is the one I could decode)

Weird. All I can read out of that is kilogram told hits. Or, maybe just
the character hayai (early)?

 $B0T$B0U$B0Gndc

Who's meaningful dark? Or perhaps the saba fish in the crucible?

Anyway, they _look_ sort of like 7-bit JIS, and the two you came up with
for hotaru are, in fact, 7-bit JIS for hotaru.

-- 
Joel Rees, programmer, Kansai Systems Group
Altech Corporation (Alpsgiken), Osaka, Japan
http://www.alpsgiken.co.jp



Re: [OT] Japanese

2003-06-23 Thread Joel Rees
 On Wednesday, Jun 18, 2003, at 20:01 Asia/Tokyo, Joel Rees wrote:
 
  (I'm still trying to decipher what they've done with the file system,
  and still trying to figure out how to get the terminal app to show the
  Japanese names for files. My brother in law has a book that shows a way
  that is supposed to even get it to show shift-JIS file names correctly
  in the terminal app, but I haven't got it to work on my iBook yet.)
 
 See Japanese in OS X 10.2 Terminal (in Japanese)
 http://member.nifty.ne.jp/poseidon/osx2t.html

Good information there. Thanks. I'd tried using escapes to type in file
names, but I did something wrong.

 File names in OS X are encoded in UTF-8 decomposition form. I.e.  
 U+30C0: KATAKANA LETTER DA is represented as 0xE382BF (U+30BF) followed  
 by 0xE38299 (U+3099).

Decomposed. ta+dakuten. Okay, knowing Apple is de-composing the kana
will be useful.

 UTF-8 aware tcsh is available as
 ftp://ftp.tba.org.tohoku.ac.jp/pub/tcsh-6.12-bin.tgz
 
 If you need install instructions in English, please see my posting to  
 another list.
 http://listserv.dartmouth.edu/scripts/ 
 wa.exe?A2=ind0306L=nisusT=0F=S=P=40940
 
  (Of course, this is all off topic unless somebody wants to come up with
  some perl code for trying to undo garbled file names.)
 
 It's not a perl script, but nkf -- Network Kanji code conversion Filter  
 -- is able to guess Japanese encodings and to restore broken JIS-Kanji.  

Well, I knew nkf was good for conversions, but I've never tried using it to
to fix really broken text. ;)

Actually, I was thinking more in terms of some code snippets that could
be useful in recovering text that had suffered serious damage in a
broken conversion pipeline.

 The garbled text in the original poster's message does not seem to be  
 Japanese though.

It looks like it didn't survive e-mailing. It looks like 7bit JIS, but
converting down doesn't produce much that makes sense, as you say. The
conversion that worked for hotaru is bad news, because it tends to
indicate some serious non-deterministic behavior in the broken pipe. 

 nkf is available as a part of jx package -- Japanese  
 aware Unix tools for OS X.
 http://www.fan.gr.jp/~sakai/jx.html
 
 BTW there are free text editors which autodetect Japanese encodings  
 properly in most cases.

 CocoaEditorJ (seems to be discontinued)
 http://cocoedit.hp.infoseek.co.jp/
 http://cocoedit.hp.infoseek.co.jp/CocoaEditorJ.dmg (binary)
 http://cocoedit.hp.infoseek.co.jp/CocoaEditorJSource.dmg (source)
 
 KEdit (syntax colouring for perl, php, html)
 http://www.drycarbon.com/macosx/kedit/
 http://www.drycarbon.com/macosx/archive/kedit010-20030619.zip (binary  
 and source)
 
 Dunno if they would help you in making money though ;-)

Heh.

-- 
Joel Rees, programmer, Kansai Systems Group
Altech Corporation (Alpsgiken), Osaka, Japan
http://www.alpsgiken.co.jp



Re: [OT] Japanese

2003-06-21 Thread Kino
On Wednesday, Jun 18, 2003, at 20:01 Asia/Tokyo, Joel Rees wrote:

(I'm still trying to decipher what they've done with the file system,
and still trying to figure out how to get the terminal app to show the
Japanese names for files. My brother in law has a book that shows a way
that is supposed to even get it to show shift-JIS file names correctly
in the terminal app, but I haven't got it to work on my iBook yet.)
See Japanese in OS X 10.2 Terminal (in Japanese)
http://member.nifty.ne.jp/poseidon/osx2t.html
File names in OS X are encoded in UTF-8 decomposition form. I.e.  
U+30C0: KATAKANA LETTER DA is represented as 0xE382BF (U+30BF) followed  
by 0xE38299 (U+3099).

UTF-8 aware tcsh is available as
ftp://ftp.tba.org.tohoku.ac.jp/pub/tcsh-6.12-bin.tgz
If you need install instructions in English, please see my posting to  
another list.
http://listserv.dartmouth.edu/scripts/ 
wa.exe?A2=ind0306L=nisusT=0F=S=P=40940

(Of course, this is all off topic unless somebody wants to come up with
some perl code for trying to undo garbled file names.)
It's not a perl script, but nkf -- Network Kanji code conversion Filter  
-- is able to guess Japanese encodings and to restore broken JIS-Kanji.  
The garbled text in the original poster's message does not seem to be  
Japanese though. nkf is available as a part of jx package -- Japanese  
aware Unix tools for OS X.
http://www.fan.gr.jp/~sakai/jx.html

BTW there are free text editors which autodetect Japanese encodings  
properly in most cases.

CocoaEditorJ (seems to be discontinued)
http://cocoedit.hp.infoseek.co.jp/
http://cocoedit.hp.infoseek.co.jp/CocoaEditorJ.dmg (binary)
http://cocoedit.hp.infoseek.co.jp/CocoaEditorJSource.dmg (source)
KEdit (syntax colouring for perl, php, html)
http://www.drycarbon.com/macosx/kedit/
http://www.drycarbon.com/macosx/archive/kedit010-20030619.zip (binary  
and source)

Dunno if they would help you in making money though ;-)

Kino



Re: [OT] Japanese

2003-06-17 Thread Joel Rees
 Editors I use a lot.
 
 Jedit, Java editor.

I've got to try that some time.

   www.jedit.org
   It is extremely good at setting default encodings, changing file 
 encoding (batch mode, too) on the fly, et cetera.
 
 Mi, great text editor from Japanese author
   http://www.asahi-net.or.jp/~gf6d-kmym/en/mimi/index.html

Mac OS X. Looks interesting.

   http://www.asahi-net.or.jp/~gf6d-kmym/mimi/index.html
 
 VIM...well, not great at Japanese.  But an lovely editor.  Just had to 
 add it here.  Works great in X11 on OS X, too! ;)

Use it in freeBSD, trying to get it set up for openBSD, having trouble
with Wnn and onew, because the groff (ja-groff) port is not where the
obsd port seems to think it should be. While I'm definitely glad to have
it for fBSD and oBSD, I have trouble motivating myself to use it on Mac
OS X. I'm a little spoiled, perhaps. (Hmm. Might be interesting to try
MIFES under emulation on obsd/fbsd. It isn't free, of course. Come to
think of it, I should get Java up under emulation and try Jedit first.)

I haven't tried the editor in the latest Metrowerks Codewarrior, but
it's always been natural for me. No character set tools, however. The
color-coding would get out of sync in shift-JIS strings (for reasons
that those who work regularly with shift-JIS know and appreciate). 

BBEdit is what spoiled me on the Mac, by the way.

But none of that addresses the OP's garbled string of something
I was thinking looked like euc-JIS transmorgrified into some visible
hexadecimal display form --

deg.TMGDBdeg.$D9$BBdeg.$D9deg.$E9deg.$D9-$DA$F5

Where have I seen that before? It just doesn't make any sense at all as
any JIS in a visible hexadecimal form. Maybe it's just raw, untouched,
straight JIS, with no escapes.

-- 
Joel Rees [EMAIL PROTECTED]



Re: [OT] Japanese

2003-06-17 Thread Joel Rees
 deg.TMGDBdeg.$D9$BBdeg.$D9deg.$E9deg.$D9-$DA$F5
 
 Where have I seen that before? It just doesn't make any sense at all as
 any JIS in a visible hexadecimal form. Maybe it's just raw, untouched,
 straight JIS, with no escapes.

Nope. Not even straight JIS with the escapes being munged to periods. 
(Tried opening it with an editor that understands straight JIS.)

-- 
Joel Rees [EMAIL PROTECTED]



Re: [OT] Japanese

2003-06-17 Thread David Cantrell
On Tuesday, June 17, 2003 16:09 +0900 Joel Rees [EMAIL PROTECTED] 
wrote:
Jedit, Java editor.
I've got to try that some time.
Very heavy, very slow, even on my G4-867 with a gig of RAM.  Is claimed to 
support folding in code but I could never get it to work.  I'll take BBedit 
any day, at least until Activestate port Komodo.

--
David Cantrell


Re: [OT] Japanese

2003-06-17 Thread Nicholas G. Thornton
Well well, it looks like I got a lot of replies over the night...

There were two places I've gotten the mojibake text: one was from an email (I
was asking how to make a specific kanji i couldn't find), the second was from
file names a japanese friend of mine gave me (mp3). As far as typing in
Japanese, I can do that fine (even if I can only seem to do it in the Finder and
in TextEdit). 

I tried copying them over to an html page and tried the three different japanese
encodings Camino offers (ISO 2022-JP, Shift JIS, EUC) as well as unicode (UTF8).
Those helped with one of the five or so strings (a short one) and one of the
strings was rendered two very different ways with two of the encodings (and only
partially for either of them). None of the other strings did anything other than
sometimes switch from the two-byte format to ?.  Incidentally, if you want
some of the strings to mess around with, here they are...

I managed to find the kanji I asked the person about with the charecter palette
description she gave, but it was or could be described otherwise as: Unicode
5782, JIS(X0213) 1-31-66, Shift JIS(X0208) 9082: and it was mojibake'ed as
($BEZ(B) Some other codes she sent, and hence probably in the same encoding,
were $B7V(B and $Bj%(B both for hotaru.

some other strings are:
a$EAaD (this is the one I could decode)
$B0T$B0U$B0Gndc
--GDB$BBc-$DA$F5 (this is the one rendered differently)

But...it really depends on how you qualify My 
next question is where 
there's a good online FAQ site for doing
japanese on OSX or finding OSX programs 
that accept Japenese 
(unicode?).

I was mainly wondering two things: one if there's a good general
troubleshooting/doing japanese on OSX page (I couldn't find any such thing on
Apple's site) or just troubleshooting/doing in general, and two if there was
some place that lists/downloads OSX apps that accept Japanese input. For the
latter I think Ward might have helped with suggesting stuff to do from the
system preferences.

Thanks a bundle already,
~wren


Re: [OT] Japanese

2003-06-17 Thread Nicholas G. Thornton
Oh, something important I just noticed about this discussion. It seems my email
(or the listserv?) is further garbling some of the mojibake I send out. The last
three examples should be...

a-grave, i-acute, a-circ/hat, capital-delta

infinity, T, infinity, U, infinity, G, unequal, curved-open-double-quote,
unequal, n-tilde, unequal, d, unequal, c

(em?)dash, D, capital-delta, esset, unequal, a-superscript, unequal, cent,
unequal, (em?)dash, forward-slash, and something that looks like a dot-less
lowercase i

~wren


Re: [OT] Japanese

2003-06-17 Thread Joel Rees
(Replying to myself again, but just for the record, ...)

  VIM...well, not great at Japanese.  But an lovely editor.  Just had to 
  add it here.  Works great in X11 on OS X, too! ;)
 
 Use it in freeBSD, trying to get it set up for openBSD,

jvim. It would not make sense to use vim with Wnn and onew.

With freeBSD, jvim/Wnn does fairly well, behaves like I would expect vi
to behave with Japanese. It does flake out at times, however. I'll know
more about how they work on openBSD today, I suppose.

One of these days, I'll have to talk my employer into letting me use a
Mac at work.

-- 
Joel Rees [EMAIL PROTECTED]



[OT] Japanese

2003-06-16 Thread Nicholas G. Thornton
I ask only because it came up here before and I can find nothing online about
it...

I'm trying to do some things in japanese on my OSX box, unfortuanately my
japanese isn't terribly good so any help info on my computer is minimally
helpful. My biggest question is how to get garbled text like
deg.TMGDBdeg.$D9$BBdeg.$D9deg.$E9deg.$D9-$DA$F5 back into the form of
kanji/kana. My next question is where there's a good online FAQ site for doing
japanese on OSX or finding OSX programs that accept Japenese (unicode?).

~wren


Re: [OT] Japanese

2003-06-16 Thread Joel Rees
 I ask only because it came up here before and I can find nothing online about
 it...
 
 I'm trying to do some things in japanese on my OSX box, unfortuanately my
 japanese isn't terribly good so any help info on my computer is minimally
 helpful. My biggest question is how to get garbled text like
 deg.TMGDBdeg.$D9$BBdeg.$D9deg.$E9deg.$D9-$DA$F5 back into the form of
 kanji/kana. My next question is where there's a good online FAQ site for doing
 japanese on OSX or finding OSX programs that accept Japenese (unicode?).

I couldn't even guess where to tell you to start, except for google.
(Google is a great tool.)

The text there looks kind of like it might be some ASCII visible
encoding of euc-JIS, but I am not going to take time to test the
hypothesis. Where did you get it? If it's from a web browser, you can
try selecting a different encoding. (OmniWeb 4.1 requires you change the
default in preferences and then re-load the page.)

I'm thinking there are good tools to work with that kind of text in Perl
5.8 (also the jcode module in previous versions), but there are people
around here better qualified than I to tell you how. 

There were some good pages mentioned here several months ago, so you
might search the archives. Or search the web for things like CJK, euc
encoding, shift-JIS, tron characters, etc.

-- 
Joel Rees [EMAIL PROTECTED]



Re: [OT] Japanese

2003-06-16 Thread David Ackerman
On Monday, June 16, 2003, at 11:39  PM, Joel Rees wrote:
 My next question is where there's a good online FAQ site for doing
japanese on OSX or finding OSX programs that accept Japenese 
(unicode?).
I type in Japanese in textedit, mail and project builder without 
difficulty. Have you set the language and the script in the 
international System preference?  I do have a little cgi that displays 
random vocabulary here at http://therockquarry.com Not exactly sure 
what you are trying to do.



Re: [OT] Japanese

2003-06-16 Thread Ward W. Vuillemot
Japanese and Mac OS Xawesome.  Yes; I am biased.

Can you tell us how you got the garbled text?  It looks like if you 
switch the encoding, as suggested, that you might get things into 
something resembling Japanese.

As for Mac OS X supporting Japanese.  I presume you have checked the 
system preferences.  You can set it so it opens applications with 
Japanese localization, if available.  While this might be useful to you 
at this moment, you can also add Japanese IME (input) to any 
application within the system preferences.

I use Mail from Apple without any issues.  Occasionally someone's email 
is garbled (mojibake being the 'technical' term for this as it relates 
to Japanese) when some sets the text as single-byte...but this 
typically happens with MS clients such as Outlook (I use Japanese 
regional settings at work...and Outlook just botches things too much 
for my likes).  Japanese characters are two-bytes each  so parsing 
per one-byte just corrupts everything and makes 2x the characters out 
of the message.  Anyway.

Using Perl's pack() and unpack() are good ways to keep things nicely 
packed into a format that supports two-bytes.  Hopefully 5.8.0 
will/does support Unicode fully.  This is what I have done with Perl 
web-apps in the past.

But...it really depends on how you qualify My next question is where 
there's a good online FAQ site for doing
japanese on OSX or finding OSX programs that accept Japenese 
(unicode?).

Cheers,
Ward


Re: [OT] Japanese

2003-06-16 Thread Joel Rees
 I type in Japanese in textedit, mail and project builder without 
 difficulty.

I find I've had to fuss a bit with Project Builder's editor. Most of the
docs, source, etc., I see are encoded in shift-JIS, but I haven't yet
found a way to tell the PB editor what encoding to assume when opening a
file. I have been able to figure out that it seems to pick up the byte
order mark or something for UTF-16 (which surprised me). And it seems to
require UTF-8 for compiling in C. 

(Maybe I should download the new version of the dev tools. I don't have
the room on my vintage iBook's 5.6G drive, but I think Apple said the
bulk of the documentation didn't have to be on the boot volume.)

But I can tell Text Edit what encoding to assume and then save as UTF-8
or UTF-16, so there's a work-around.

Unfortunately, that's not going to help the OP, near as I can tell. Do
you recognize his deg.TMGDBdeg.$D9$BBdeg.$D9deg.$E9deg.$D9-$DA$F5?

 Have you set the language and the script in the 
 international System preference? 

I like to have several users, each with different language and script
settings. (Family account set to Japanese, of course.) I also find it
convenient to have multiple scripts selectable, and the precedence
defaults are pretty handy, as well.

-- 
Joel Rees [EMAIL PROTECTED]



Re: [OT] Japanese

2003-06-16 Thread Ward W. Vuillemot
But I can tell Text Edit what encoding to assume and then save as UTF-8
or UTF-16, so there's a work-around.
Unfortunately, that's not going to help the OP, near as I can tell. Do
you recognize his deg.TMGDBdeg.$D9$BBdeg.$D9deg.$E9deg.$D9-$DA$F5?
Editors I use a lot.

Jedit, Java editor.
	www.jedit.org
	It is extremely good at setting default encodings, changing file 
encoding (batch mode, too) on the fly, et cetera.

Mi, great text editor from Japanese author
http://www.asahi-net.or.jp/~gf6d-kmym/en/mimi/index.html
http://www.asahi-net.or.jp/~gf6d-kmym/mimi/index.html
VIM...well, not great at Japanese.  But an lovely editor.  Just had to 
add it here.  Works great in X11 on OS X, too! ;)