Can we just detect the encoding at cache, or at least column level? This
way if the encoding does not match, we throw an exception immediately.

Will it work?

D.

On Tue, Sep 5, 2017 at 9:16 AM, Andrey Kuznetsov <stku...@gmail.com> wrote:

> Hi Igniters!
>
> I met a couple of issues related to different binary string encoding
> settings on different cluster nodes.
>
> Let cluster has two nodes. Node0 uses win-1251 to marshal strings with
> BinaryMarshaller and Node1 uses default utf-8 encoding. Let's create
> replicated cache and add some entry to Node0:
>
> node0.cache("myCache").put("k", "v");
>
> Then
>
> node1.cache("myCache").get("k")
>
> returns null.
>
> Let me describe the cause. First, string key comes to Node1 as binary
> payload of DHT update request, it has win-1251 encoding. This
> representation stays in offheap area of Node1. Then GetTask comes with the
> same key, plain (Serializable) Java object; BinaryMarshaller encodes the
> key using utf-8 (Node1 setting). Finally, B+Tree lookup fails for this
> binary key due to different encodings.
>
> When the key is just a string then this can be fixed by decoding binary
> strings entirely on B+Tree lookups. But when the key is an arbitrary object
> with some strings inside this way is too expensive.
>
> The second issue relates to lossy string encodings. Mixed-encoding cluster
> does not guarantee string data integrity when "lossless" node goes down for
> a while.
>
> Any ideas on addressing these issues?
>
> --
> Best regards,
>   Andrey Kuznetsov.
>

Reply via email to