[ 
https://issues.apache.org/jira/browse/IGNITE-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736499#comment-16736499
 ] 

Ilya Kasnacheev commented on IGNITE-10732:
------------------------------------------

[~dpavlov] Even if encoding is consistent between nodes, it might lead to data 
corruption when Unicode strings are encoded to 8-bit encoding and then 
de-encoded. Some characters will turn into ?'s as they're not representable in 
a given 8-bit charset. Therefore, we should keep the warning. They may still 
run but they need to be aware. We have quite a few warnings anyway which are 
printed even with default configuration.

Maybe we should also check real inconsistency in the cluster, reject nodes 
which have file.encoding which is not consistent to existing ones. I think this 
will demand another ticket.

> Incorrect file.encoding leads to inconsistent SqlFieldsQuery results between 
> nodes
> ----------------------------------------------------------------------------------
>
>                 Key: IGNITE-10732
>                 URL: https://issues.apache.org/jira/browse/IGNITE-10732
>             Project: Ignite
>          Issue Type: Bug
>          Components: sql
>    Affects Versions: 2.4
>            Reporter: Ilya Kasnacheev
>            Assignee: Ilya Kasnacheev
>            Priority: Critical
>              Labels: windows
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When doing 
> {code}
> cache.query(new SqlFieldsQuery("SELECT _key FROM Cache"))
> {code}
> resulting Unicode values may be different when coming from Windows or Linux 
> node.
> Linux nodes will mostly use UTF-8 but Windows nodes will use local CpNNNN 
> encoding to encode query results, as bizzare as it may sound.
> Windows < - > Windows and Linux < - > Linux will get correct result but 
> Windows < - > Linux will get broken strings.
> Note that if cluster has Windows and Linux nodes and cache is REPLICATED, 
> results will be different for subsequent queries!
> There is a workaround for this: set -Dfile.encoding=UTF-8 JVM arg on Windows.
> There is probably an underlying problem in H2 but since non-UTF-8 
> file.encoding is dangerous (it affects String.getBytes()) I think we should 
> peg it to UTF-8.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to