This is still not working out very well... let me explain more about what I'm doing, and maybe it will ring a bell for someone.

I'm working with a site that stores it's content in big5, and is run through a conversion program to create a gb2312 version for those who prefer the simplified characters. I know these are the charsets being used; I've seen the config files for the converter. Unfortunately the converter was written by a Chinese company with no English info available, does not appear in Google, and is no longer supported even by the original authors. So basically I have to write my own program to do what it does, without any info on how it does what it does.

I'm currently working with a snippet of text from the site, but the eventual idea is to have the converter run under a separate web server and have it grab the page from the big5 site, convert it, and send it out to the browser. This is how the existing translator works, as far as I can tell.

Regardless of whether I'm reading the snippet from a text file or getting an entire page via ns_http; I have to set the encoding to utf-8 in order to get the data properly. It does not display properly if I call it big5. This is odd, but not terribly so; the database and source AOLserver are both configured to use utf-8, so this is at least consistent.

The only conversion that works with the java program is to go utf-8 to utf-8s, which it calls simplified utf-8. Google tells me that this is a bastardized format of sorts, proposed by Oracle and not widely accepted. Unfortunately it is, so far, the only one that works. Data comes in as utf-8, gets converted to utf-8s, and goes out through AOLserver configured to use utf-8. All is well.

The problem is, Tcl doesn't support utf-8s, and as far as I can tell there is no other format that will work. This will leave me stuck with the java program, and I have serious concerns about the performance of any sort of exec, let alone one that involves writing files.

Any suggestions?

thanks,

janine

On Sep 5, 2007, at 6:08 AM, Dossy Shiobara wrote:

On 2007.09.05, Janine Sisk <[EMAIL PROTECTED]> wrote:
I'm working with strings encoded in big5 and gb2312 (traditional and
simplified Chinese, respectively).  I'm exec'ing out to an Java
program that translates from one to the other.  [...]

Is that Java program doing anything else to the data?  If you're just
using Java to transcode Tcl strings, you're really hurting yourself for
no reason:

    set big5string [encoding convertto big5 $gb2312string]

    set gb2312string [encoding convertto gb2312 $big5string]

Tcl's encoding support is probably one of its strenghts as a scripting
language.

I can't, for example, grab the return value of  the command directly
from the exec;  if I do, it's mangled.

I don't think you can tell [exec] what encoding the I/O will be.
Perhaps you could/should see if there's a TIP for [exec -encoding $name
$command] already ...

-- Dossy

--
Dossy Shiobara              | [EMAIL PROTECTED] | http://dossy.org/
Panoptic Computer Network   | http://panoptic.com/
  "He realized the fastest way to change is to laugh at your own
    folly -- then you can let go and quickly move on." (p. 70)


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.



--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> 
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.

Reply via email to