Re: A product compatibility question

2001-10-18 Thread Berthold Frommann

Hello,

> Incorrect. Again, they are *not* separate languages, but two
> orthographic renditions of the same *written* language.
... yet there are a few differences in the vocabulary which actually require
entirely different characters - and I don't just mean the traditional and
the simplified version of a particular character. Take e.g. the word for
"bicycle".
But after all, it's AFAIK just a list - not too long - of words which has to
be replaced when doing a conversion, quite regularly.

As John Jenkins already pointed out on May 5,
> Partial data to interconvert between simplified and traditional
> characters is available through the Unihan database.  However, the
> problem is not a simple one, as there are frequently multiple
> traditional forms that correspond to a single simplified form.
> Moreover, the vocabulary used in the PRC with simplified characters
> differs on occasion from the vocabulary used in Taiwan and elsewhere
> for traditional ones (e.g., the names of the chemical elements, until
> recently the word for "computer").  It really isn't possible to
> convert between simplified and traditional characters without doing a
> lexical analysis.

There are some solutions around, AFAIR it's also possible in current
versions of MS Office.


Regards,
Berthold


Japanese Studies, Free University Berlin





RE: A product compatibility question

2001-10-18 Thread Hietaniemi Jarkko (NRC/Boston)

: > Incorrect. Again, they are *not* separate languages, but two
: > orthographic renditions of the same *written* language.
: ... yet there are a few differences in the vocabulary which actually
require
: entirely different characters - and I don't just mean the traditional
and
: the simplified version of a particular character. Take e.g. the word
for
: "bicycle".
: But after all, it's AFAIK just a list - not too long - of words which
has to
: be replaced when doing a conversion, quite regularly.

You mean differences like "lift" vs "elevator", "muesli" vs "granola",
"pavement"
vs "sidewalk", etc?  I do know that US and UK are two countries
separated by one
language :-)





RE: A product compatibility question

2001-10-18 Thread Ayers, Mike


> From: Hietaniemi Jarkko (NRC/Boston) 
> [mailto:[EMAIL PROTECTED]] 
> 
> : > Incorrect. Again, they are *not* separate languages, but two
> : > orthographic renditions of the same *written* language.
> : ... yet there are a few differences in the vocabulary which actually
> require
> : entirely different characters - and I don't just mean the 
> traditional
> and
> : the simplified version of a particular character. Take e.g. the word
> for
> : "bicycle".
> : But after all, it's AFAIK just a list - not too long - of 
> words which
> has to
> : be replaced when doing a conversion, quite regularly.
> 
> You mean differences like "lift" vs "elevator", "muesli" vs "granola",
> "pavement"
> vs "sidewalk", etc?  I do know that US and UK are two countries
> separated by one
> language :-)

That's close.  However, to complete the example, we need some
changes to the alphabet.  Let's say, for the sake of discussion (or
argument, if you like a good argument), that the Brits decide that that
German double-s thingy looks pretty cool after all and adopt it, using it
wherever two "s"'s (esses?) occur together.  Now you have two differences
between British and American English - one difference is the words they
choose, and the other is the way they write words[1].

To an outside observer (i.e. one who does not speak English), it
would be very difficult to distinguish between the words which are spoken
differently (or, for that matter, the different idioms), and the words which
are spelled differently.

This is essentially the problem that we are facing with Chinese.
Chinese is basically common in writing.  There are differences, but none
that truly impede communication[2].  So we have Chinese dialects, in which
different words[3] may be used for a concept, and Chinese writing systems,
in which different glyphs may be used for a character.  Since writing
systems tend to remain constant within a dialect, it can be difficult to
distinguish these differences if one does not speak Chinese.


HTH,

/|/|ike



[1] - Yes, I'm aware of that color/colour thing, but thought it to be not
strong enough an example.

[2] - The exception appears to be Cantonese and its dialects.  While
linguists slice Chinese dialects every which way, Chinese speakers seem to
distinguish only between Mandarin and Cantonese.  Apparently there are
enough differences between the two, and similarities within the two, to
prevent communication even in writing, although limited communication is
possible.

[3] - "Words" here means "character or set of characters which embody a
single concept".




RE: A product compatibility question

2001-10-18 Thread Kenneth Whistler

Mike,

> Chinese is basically common in writing.  There are differences, but none
> that truly impede communication[2].  

> 
> [2] - The exception appears to be Cantonese and its dialects.  While
> linguists slice Chinese dialects every which way, Chinese speakers seem to
> distinguish only between Mandarin and Cantonese.  Apparently there are
> enough differences between the two, and similarities within the two, to
> prevent communication even in writing, although limited communication is
> possible.

I agree with almost everything you said, but I would shade this
one a little differently.

Chinese speakers, even Cantonese Chinese speakers are aware of,
and distinguish lots of different "dialects" of spoken Chinese. So a
Cantonese speaker will be aware of regional variants of the Yue
(~Cantonese) language, including many dialects that they find
difficult or impossible to understand. They are also aware that
other "dialects" (actually languages) of Chinese, such as
Minnan (~Taiwanese), Wu (~Shanghainese), and yes, Mandarin, are
even *more* distinct, and also find them difficult or impossible
to understand. (Except that nearly everybody undersands Mandarin
to some extent, since a particular variety of it is the standard 
national language.)

When it comes to the *written* language, however, Cantonese is
in a somewhat unique position, since Cantonese writers seem
to have a fairly long tradition of writing *Cantonese*
explicitly, including the use of many additional characters
(most traditional, but some invented explicitly for Cantonese)
not used in standard (Mandarin-derived) written Chinese. If
you look hard enough, you can find such traditions for other
Chinese languages. For example, in Fujian province or Taiwan,
it is possible to find dictionaries of Minnan dialect
characters, and documents explicitly written in Minnanhua,
which Mandarin speakers would find very quaint or difficult
to understand in written form, because of the different
character usage and grammatical constructions that occasionally
occur. [Example: In Taiwanese, there is a very common sentence
ending, ...sibo, which means something like "...isn't it?"
and the like. Written out, it is U+662F U+5426, Mandarin shi4fou3.
Mandarin does use shi4fou3, but it is a kind of hifalutin'
classical form, and never in the position Taiwanese does,
where Mandarin would instead use something like dui4budui4.]

But such written dialect traditions other than
Cantonese are very much sub rosa, unlike the mainstream
Cantonese literature.  

--Ken




RE: A product compatibility question

2001-10-18 Thread Ayers, Mike


> From: Kenneth Whistler [mailto:[EMAIL PROTECTED]] 
> Sent: Thursday, October 18, 2001 12:55 PM

> I agree with almost everything you said, but I would shade this
> one a little differently.

[lots of interesting stuff snipped]

Note that my post was slanted pretty much exclusively towards the
written form.  I don't speak (or more importantly, listen) nearly well
enough to comment on the spoken form except as an outside observer.

It is interesting to note that Chinese entertainment stores in my
area can be separated pretty cleanly into three categories - Mainland,
Taiwanese, and Hong Kongnese[1].  The mainlanders (and any Singaporeans
about, I would imagine) can read simplified subtitles when the dialect
confuses them, the Taiwanese read traditional subtitles, and the Hong
Kongnese read Cantonese and/or English (the English fluency rate in Hong
Kong being very high).  The latest Jackie Chan film will be available in all
three stores, each with its preferred subtitling.

With music, however, it is quite different.  The same album will be
available in all three stores.  Many artists will, however, print Cantonese
and Mandarin versions of the same album, or will have Mandarin and Cantonese
releases which overlap heavily.  Some artists even put Mandarin and
Cantonese versions of songs on the same album.  The CDs are clearly
separated into four categories in every store I've been in - Mandarin male,
Mandarin female, Cantonese male, and Cantonese female.  The preferences so
obvious in video entertainment do not appear to influence the selection, and
the latest Jacky Cheung record will be available in both Mandarin and
Cantonese.

I'm going to do some more investigation into Chinese mutual
intelligibility, but that'll take awhile...


/|/|ike



[1] - How to refer to the people of Hong Kong is a problem I have never seen
solced.  I have heard they themselves use the term "Hongkies", but, being an
American, I cannot use this term, as it sounds too much like another, very
impolite, word.




RE: A product compatibility question

2001-10-09 Thread Suzanne M. Topping

> If it does not, 
> can someone kindly suggest a publishing application solution 
> that does 
> support this capability? (i.e., Pagemaker, Quark, etc.)

I'm not sure about FrameMaker, but I just ran a test in which I
successfully inserted Japanese, Simplified Chinese, and Traditional
Chinese text boxes into Microsoft Publisher 2000.

Probably not the solution you were looking for however...

Suzanne Topping
BizWonk Inc.
[EMAIL PROTECTED]




RE: A product compatibility question

2001-10-09 Thread David_Possin



I gave up trying anything else than MS Office products when I have to use many different fonts with Unicode  - Publisher 2000 works great for creating printable (.pub) and browser viewable (html) documents, plus I can make presentations in PowerPoint 2000 - all from one original document set, mixing any languages, all on international Win2000. Other software companies have a lot to learn to offer that support, I tried them all.

The purpose is to create multilingual catalogs with graphics etc with same look and feel online and offline in a global scenario.

David Possin
International QA Engineer (i18n & l10n)
i2 Technologies



> If it does not, 
> can someone kindly suggest a publishing application solution 
> that does 
> support this capability? (i.e., Pagemaker, Quark, etc.)

I'm not sure about FrameMaker, but I just ran a test in which I
successfully inserted Japanese, Simplified Chinese, and Traditional
Chinese text boxes into Microsoft Publisher 2000.

Probably not the solution you were looking for however...

Suzanne Topping
BizWonk Inc.
[EMAIL PROTECTED]





T-->S, S-->T conversions (was RE: A product compatibility question)

2001-10-20 Thread Edward Cherlin

Berthold Frommann wrote
> Sent: Thu, October 18, 2001 9:15 AM
> To: [EMAIL PROTECTED]
> Subject: Re: A product compatibility question
>
>
> Hello,
>
> > Incorrect. Again, they are *not* separate languages, but two
> > orthographic renditions of the same *written* language.
> ... yet there are a few differences in the vocabulary which
> actually require
> entirely different characters - and I don't just mean the
> traditional and
> the simplified version of a particular character. Take e.g.
> the word for "bicycle".
> But after all, it's AFAIK just a list - not too long - of
> words which has to
> be replaced when doing a conversion, quite regularly.

Jack Halpern of the CJK Dictionary Institute explains these issues at
http://www.cjk.org/cjk/c2c/c2centry.htm. Traditional to Simplified and
Simplified to Traditional conversions are more complicated than simple
dictionary lookup. In particular, the conversion is often
context-dependent. His company offers "Chinese to Chinese" conversion
software, among other products.

> As John Jenkins already pointed out on May 5,
> > Partial data to interconvert between simplified and traditional
> > characters is available through the Unihan database.  However, the
> > problem is not a simple one, as there are frequently multiple
> > traditional forms that correspond to a single simplified form.
> > Moreover, the vocabulary used in the PRC with simplified
characters
> > differs on occasion from the vocabulary used in Taiwan and
elsewhere
> > for traditional ones (e.g., the names of the chemical elements,
until
> > recently the word for "computer").  It really isn't possible to
> > convert between simplified and traditional characters without
doing a
> > lexical analysis.

And, as it turns out, contextual analysis also.

> There are some solutions around, AFAIR it's also possible in current
> versions of MS Office.

That's just character conversion. It doesn't even handle vocabulary
differences, much less context.

> Regards,
> Berthold
> Japanese Studies, Free University Berlin

Edward Cherlin
Generalist
"A knot! Oh, do let me help to undo it."
Alice in Wonderland






Re: T-->S, S-->T conversions (was RE: A product compatibility question)

2001-10-21 Thread Tom Emerson

Edward Cherlin writes:
> Jack Halpern of the CJK Dictionary Institute explains these issues at
> http://www.cjk.org/cjk/c2c/c2centry.htm. Traditional to Simplified and
> Simplified to Traditional conversions are more complicated than simple
> dictionary lookup. In particular, the conversion is often
> context-dependent. His company offers "Chinese to Chinese" conversion
> software, among other products.

Do be more specific, the software was developed by Basis Technology in
coordination with Jack's organization, which supplied the data.

-tree

-- 
Tom Emerson  Basis Technology Corp.
Sr. Computational Linguist http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"




Re: T-->S, S-->T conversions (was RE: A product compatibility question)

2001-10-21 Thread Jack Halpern

Greetings

In message "Re: T-->S, S-->T conversions (was RE: A product compatibility question)",
Tom Emerson wrote...
 >Edward Cherlin writes:
 >> Jack Halpern of the CJK Dictionary Institute explains these issues at
 >> http://www.cjk.org/cjk/c2c/c2centry.htm. Traditional to Simplified and
 >> Simplified to Traditional conversions are more complicated than simple
 >> dictionary lookup. In particular, the conversion is often
 >> context-dependent. His company offers "Chinese to Chinese" conversion
 >> software, among other products.
 >
 >Do be more specific, the software was developed by Basis Technology in
 >coordination with Jack's organization, which supplied the data.
 
Yes, that is correct. Actually, we do not develop products (software 
applications) as Edward mentions, but specialize in compiling large-scale CJK lexical 
databases.
 


Regards, Jack Halpern
   President, The CJK Dictionary Institute, Inc. 
http://www.cjk.org Phone: +81-48-473-3508