Everyone,

There was a very interesting question about encoding character in the
user group.  It appears as though Jorn and I have narrowed the issue
down to the code-page being unable to display the characters; however,
it may also affect the input when using < and > input/output redirection
to create or use the stdio path for input to the parsers.
The default encoding for Windows is ANSI ... I'm not too sure what it is
for the Mac or Linux platforms; however the default today may not hold
for tomorrow.  So, we may need to propose a way of wrapping the input
and output streams for the System.out and System.in classes to handle
the proper encoding / decoding.

What I'm proposing is having a general input / output class that wraps
the System.in / System.out classes to handle the proper character
encoding.  Unfortunately, this means we may want to add a -encoding
parameter to the parsers / tokenizers / etc. to allow this to happen on
the I/O.

There is a simple way to handle the output using the method below:
---
http://www.velocityreviews.com/forums/t137667-changing-system-out-encoding.html

<quote>
PrintStream out = new PrintStream(System.out, true, "ISO-8859-1");
out.println("\u00E0\u00E1\u00E2\u00E9\u00EA\u00EB" );
</quote>
---

Of course, we would have to use the proper encoding this was just an
example from the post.

The other was informational from here:
---
http://illegalargumentexception.blogspot.com/2009/05/java-rough-guide-to-character-encoding.html
---

I don't see us having a major problem now; but, we may need to either
look for other methods or risk loosing added support for say Arabic,
Chinese or Japanese.

Any comments, suggestions, or rants welcome.

James

Reply via email to