OK, just to help out others with the pain I have suffered over this issue.

Both Apache and Java take pains to support internationalization. This is the root of the problem.

When writing the output file from java, it was picking up the character set from the Apache environment. In both of the Apache servers I found, it was an ANSI character set even though everything else in the operating system was running UTF-8.

This was the reason that when I ran my program from the command prompt it worked fine because the command prompt was running under UTF-8. When running under Apache, though, it was translating my special unicode characters incorrectly because I needed to use the UTF-8 character set to get my character.

After a week of research, I finally figured out to use the OutputStreamWriter class and give it "UTF-8" as the character set to use for that file and everything is now working fine.

Hope this helps someone else. Sorry for slightly off-topic, but was good information anyway.
--- Begin Message ---
OK, not exactly perl, but this was the closest list I could find.

I am running a perl CGI script that launches a java program. This java program writes output files that are delimited using what I believe to be a unicode character. On most editors it looks like an upside-down question mark, which I believe is correct. On some editors, it shows as a degree symbol. This character is represented by the hex pair 0xc2a1. Here is the character 'ยก'.

Now here is the problem. When I test my java program everything is great. When I test the perl script to launch the java program, all is still well. When I run my perl script through CGI, though, it replaces each occurrence of the above character with ??. I cannot understand why the CGI is interfering with file output from my program. This is not going through display, but is directly writing this file. Anyone have any ideas?

Also, if anyone can suggest a better list, I'd appreciate that too.

Thanks.


--- End Message ---

Reply via email to