What happens is that your default charset is win-1251 while the file is UTF-8.

The file is read correctly according to the charset argument passed to the 
toInputStream method ; however, the default charset used to parse and send the 
stream is the default charset.

The immediate workaround for you is to add an explicit charset when launching 
the JVM: -Dfile.encoding=UTF-8

I would recommend you go ahead, file a bug and add a simple test case in 
IOConverterTest around line 83.

> On Mar 5, 2016, at 11:05 PM, fedd <feddkr...@hotmail.com> wrote:
> 
> I made an experiment and saw that the situation is much worse that just
> losing one frequent Russian letter.
> 
> I made a UTF-8 file with both Russian text and one German A Umlaut letter,
> and Camel was unable to read a German letter replacing it with a question
> mark, just because my windows dev machine native charset happened to be
> win-1251.
> 
> I don't really think it's okay
> 
> 1) to ever flatten Unicode strings to a single byte character set;
> 
> 2) when the behaviour of the server side code depends on the host operating
> system settings (becomes not portable)
> 
> May I file a Jira bug report?
> 
> Here's by route:
> 
>        <dataFormats>
>            <json id="jack" library="Jackson" prettyPrint="true"/>
>        </dataFormats>        
> 
>        <route>
> 
>            <from
> uri="file:///C:/tries/collApp/exchange/in?fileName=registerSampleUtf.csv&amp;charset=UTF-8"/>
>            <log message="file: ${body.class.name} ${body}"
> loggingLevel="WARN"/>
>            <unmarshal>
>                <csv delimiter=";"  useMaps="true" />
>            </unmarshal>            
>            <log message="unmarshalled: ${body.class.name} ${body}"
> loggingLevel="WARN"/>
>            <marshal ref="jack"/>
>            <log message="marshalled: ${body}" loggingLevel="WARN"/>
>            <to
> uri="file:///C:/tries/collApp/exchange/out?fileName=out.json"/>          
>        </route>
> 
> At the first "log" only a German letter is replaced with the question mark.
> 
> At the second, all Russian letters are replaced with the question marks.
> 
> The resulting JSON can't even display the question marks when read in any of
> the world's encodings.
> 
> Shall I provide a test CSV file here? (warning: it contains Russian letters)
> 
> 
> 
> --
> View this message in context: 
> http://camel.465427.n5.nabble.com/A-possible-bug-in-IOConverter-with-Win-1251-charset-tp5778665p5778666.html
> Sent from the Camel Development mailing list archive at Nabble.com.

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to