I made an experiment and saw that the situation is much worse that just
losing one frequent Russian letter.

I made a UTF-8 file with both Russian text and one German A Umlaut letter,
and Camel was unable to read a German letter replacing it with a question
mark, just because my windows dev machine native charset happened to be
win-1251.

I don't really think it's okay

1) to ever flatten Unicode strings to a single byte character set;

2) when the behaviour of the server side code depends on the host operating
system settings (becomes not portable)

May I file a Jira bug report?

Here's by route:

        <dataFormats>
            <json id="jack" library="Jackson" prettyPrint="true"/>
        </dataFormats>        

        <route>
            
            <from
uri="file:///C:/tries/collApp/exchange/in?fileName=registerSampleUtf.csv&amp;charset=UTF-8"/>
            <log message="file: ${body.class.name} ${body}"
loggingLevel="WARN"/>
            <unmarshal>
                <csv delimiter=";"  useMaps="true" />
            </unmarshal>            
            <log message="unmarshalled: ${body.class.name} ${body}"
loggingLevel="WARN"/>
            <marshal ref="jack"/>
            <log message="marshalled: ${body}" loggingLevel="WARN"/>
            <to
uri="file:///C:/tries/collApp/exchange/out?fileName=out.json"/>          
        </route>

At the first "log" only a German letter is replaced with the question mark.

At the second, all Russian letters are replaced with the question marks.

The resulting JSON can't even display the question marks when read in any of
the world's encodings.

Shall I provide a test CSV file here? (warning: it contains Russian letters)



--
View this message in context: 
http://camel.465427.n5.nabble.com/A-possible-bug-in-IOConverter-with-Win-1251-charset-tp5778665p5778666.html
Sent from the Camel Development mailing list archive at Nabble.com.

Reply via email to