[jira] [Commented] (IGNITE-3140) C++: UTF-16 surrogate symbols are not serialized properly

2016-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15409254#comment-15409254
 ] 

ASF GitHub Bot commented on IGNITE-3140:


Github user isapego closed the pull request at:

https://github.com/apache/ignite/pull/829


> C++: UTF-16 surrogate symbols are not serialized properly
> -
>
> Key: IGNITE-3140
> URL: https://issues.apache.org/jira/browse/IGNITE-3140
> Project: Ignite
>  Issue Type: Bug
>  Components: platforms
>Affects Versions: 1.5.0.final
>Reporter: Denis Magda
>Assignee: Vladimir Ozerov
> Fix For: 1.7
>
>
> There is an issue with serialization of a surrogate symbol with 
> {{BinaryMarshaller}}. On Java side String's serialization logic was improved 
> to support all the cases. Refer to IGNITE-3098.
> C++ serialization logic has to be updated as well. Please refer to the 
> algorithm located in ignite-3098 branch in the following places:
> - {{BinaryUtils.utf8BytesToStr}} - serialization
> - {{BinaryUtils.strToUtf8Bytes}} - deserialization
> - 
> {{IgniteSystemProperties.IGNITE_BINARY_MARSHALLER_USE_STRING_SERIALIZATION_VER_2}}
>  controls which version of serialization logic to use (old or new).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (IGNITE-3140) C++: UTF-16 surrogate symbols are not serialized properly

2016-06-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348191#comment-15348191
 ] 

ASF GitHub Bot commented on IGNITE-3140:


GitHub user isapego opened a pull request:

https://github.com/apache/ignite/pull/829

IGNITE-3140: Added tests for string format validity.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/isapego/ignite ignite-3140

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/829.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #829


commit 9ca608aa8cf91d546dfa08dd7f560d42047b1c3d
Author: isapego 
Date:   2016-06-23T16:53:32Z

IGNITE-3140: Added tests for string format validity.

commit 41e8d703d45b532c0169f523672fe7ab5291bb87
Author: isapego 
Date:   2016-06-23T17:00:55Z

IGNTIE-3140: Minor fix for test.

commit f70ef622cb05159da99786697a533260bae57929
Author: isapego 
Date:   2016-06-24T12:14:25Z

IGNITE-3140: Fix for JVM reloading.




> C++: UTF-16 surrogate symbols are not serialized properly
> -
>
> Key: IGNITE-3140
> URL: https://issues.apache.org/jira/browse/IGNITE-3140
> Project: Ignite
>  Issue Type: Bug
>  Components: platforms
>Affects Versions: 1.5.0.final
>Reporter: Denis Magda
>Assignee: Igor Sapego
> Fix For: 1.7
>
>
> There is an issue with serialization of a surrogate symbol with 
> {{BinaryMarshaller}}. On Java side String's serialization logic was improved 
> to support all the cases. Refer to IGNITE-3098.
> C++ serialization logic has to be updated as well. Please refer to the 
> algorithm located in ignite-3098 branch in the following places:
> - {{BinaryUtils.utf8BytesToStr}} - serialization
> - {{BinaryUtils.strToUtf8Bytes}} - deserialization
> - 
> {{IgniteSystemProperties.IGNITE_BINARY_MARSHALLER_USE_STRING_SERIALIZATION_VER_2}}
>  controls which version of serialization logic to use (old or new).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (IGNITE-3140) C++: UTF-16 surrogate symbols are not serialized properly

2016-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15286750#comment-15286750
 ] 

ASF GitHub Bot commented on IGNITE-3140:


Github user asfgit closed the pull request at:

https://github.com/apache/ignite/pull/723


> C++: UTF-16 surrogate symbols are not serialized properly
> -
>
> Key: IGNITE-3140
> URL: https://issues.apache.org/jira/browse/IGNITE-3140
> Project: Ignite
>  Issue Type: Bug
>  Components: platforms
>Affects Versions: 1.5.0.final
>Reporter: Denis Magda
>Assignee: Vladimir Ozerov
> Fix For: 1.7
>
>
> There is an issue with serialization of a surrogate symbol with 
> {{BinaryMarshaller}}. On Java side String's serialization logic was improved 
> to support all the cases. Refer to IGNITE-3098.
> C++ serialization logic has to be updated as well. Please refer to the 
> algorithm located in ignite-3098 branch in the following places:
> - {{BinaryUtils.utf8BytesToStr}} - serialization
> - {{BinaryUtils.strToUtf8Bytes}} - deserialization
> - 
> {{IgniteSystemProperties.IGNITE_BINARY_MARSHALLER_USE_STRING_SERIALIZATION_VER_2}}
>  controls which version of serialization logic to use (old or new).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (IGNITE-3140) C++: UTF-16 surrogate symbols are not serialized properly

2016-05-17 Thread Igor Sapego (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15286603#comment-15286603
 ] 

Igor Sapego commented on IGNITE-3140:
-

Ok, it seems like {{BinaryUtils.utf8BytesToStr}} does not implement conversion 
of 4-byte UTF-8 to UTF-16 surrogate pairs. I believe we should implement it on 
Java side. Except for that, everything seems to be OK from C++ point of view.

> C++: UTF-16 surrogate symbols are not serialized properly
> -
>
> Key: IGNITE-3140
> URL: https://issues.apache.org/jira/browse/IGNITE-3140
> Project: Ignite
>  Issue Type: Bug
>  Components: platforms
>Affects Versions: 1.5.0.final
>Reporter: Denis Magda
>Assignee: Vladimir Ozerov
> Fix For: 1.7
>
>
> There is an issue with serialization of a surrogate symbol with 
> {{BinaryMarshaller}}. On Java side String's serialization logic was improved 
> to support all the cases. Refer to IGNITE-3098.
> C++ serialization logic has to be updated as well. Please refer to the 
> algorithm located in ignite-3098 branch in the following places:
> - {{BinaryUtils.utf8BytesToStr}} - serialization
> - {{BinaryUtils.strToUtf8Bytes}} - deserialization
> - 
> {{IgniteSystemProperties.IGNITE_BINARY_MARSHALLER_USE_STRING_SERIALIZATION_VER_2}}
>  controls which version of serialization logic to use (old or new).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (IGNITE-3140) C++: UTF-16 surrogate symbols are not serialized properly

2016-05-17 Thread Igor Sapego (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15286443#comment-15286443
 ] 

Igor Sapego commented on IGNITE-3140:
-

Ok, I get it. It seems like our new method {{BinaryUtils.utf8BytesToStr}} does 
not really support all valid UTF-8 strings. I'll add related test and see how 
C++ can deal with that.

> C++: UTF-16 surrogate symbols are not serialized properly
> -
>
> Key: IGNITE-3140
> URL: https://issues.apache.org/jira/browse/IGNITE-3140
> Project: Ignite
>  Issue Type: Bug
>  Components: platforms
>Affects Versions: 1.5.0.final
>Reporter: Denis Magda
>Assignee: Vladimir Ozerov
> Fix For: 1.7
>
>
> There is an issue with serialization of a surrogate symbol with 
> {{BinaryMarshaller}}. On Java side String's serialization logic was improved 
> to support all the cases. Refer to IGNITE-3098.
> C++ serialization logic has to be updated as well. Please refer to the 
> algorithm located in ignite-3098 branch in the following places:
> - {{BinaryUtils.utf8BytesToStr}} - serialization
> - {{BinaryUtils.strToUtf8Bytes}} - deserialization
> - 
> {{IgniteSystemProperties.IGNITE_BINARY_MARSHALLER_USE_STRING_SERIALIZATION_VER_2}}
>  controls which version of serialization logic to use (old or new).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (IGNITE-3140) C++: UTF-16 surrogate symbols are not serialized properly

2016-05-17 Thread Igor Sapego (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15286334#comment-15286334
 ] 

Igor Sapego commented on IGNITE-3140:
-

Denis,

According to [wikipedia|https://en.wikipedia.org/wiki/UTF-8#Description], code 
points between {{U+0800}} and {{U+}} are serialized using 3 bytes in UTF-8, 
so everything seems to be according to specification in our case. Though these 
code points themselves may be considered invalid by some of the 
implementations, encoding is still valid.

C++ standard itself does not specify string encoding in any way and does not 
include functions to operate encodings so there is no such thing as 
serialization in encoding sense on C++ side. It means that if you put something 
(no matter what) in C++ string it is going to be operable as C++ standard does 
not specify string encoding. In C++ string is just a sequence of characters of 
a specified size. So I simply can't serialize UTF-16 string on the C++ side 
unless I write serialization algorithm by myself or if I'm not going to use 
some third party implementation.

> C++: UTF-16 surrogate symbols are not serialized properly
> -
>
> Key: IGNITE-3140
> URL: https://issues.apache.org/jira/browse/IGNITE-3140
> Project: Ignite
>  Issue Type: Bug
>  Components: platforms
>Affects Versions: 1.5.0.final
>Reporter: Denis Magda
>Assignee: Vladimir Ozerov
> Fix For: 1.6
>
>
> There is an issue with serialization of a surrogate symbol with 
> {{BinaryMarshaller}}. On Java side String's serialization logic was improved 
> to support all the cases. Refer to IGNITE-3098.
> C++ serialization logic has to be updated as well. Please refer to the 
> algorithm located in ignite-3098 branch in the following places:
> - {{BinaryUtils.utf8BytesToStr}} - serialization
> - {{BinaryUtils.strToUtf8Bytes}} - deserialization
> - 
> {{IgniteSystemProperties.IGNITE_BINARY_MARSHALLER_USE_STRING_SERIALIZATION_VER_2}}
>  controls which version of serialization logic to use (old or new).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (IGNITE-3140) C++: UTF-16 surrogate symbols are not serialized properly

2016-05-16 Thread Denis Magda (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15286028#comment-15286028
 ] 

Denis Magda commented on IGNITE-3140:
-

Igor,

The new serialization algorithm on Java side serializes all symbols that are 
bigger than 0x07FF in 3 bytes.
It means that if there is a valid surrogate pair in a String like this one 
{{0xD801, 0xDC37}} then the new algorithm will use 6 bytes to code it while 
basic UTF-8 coders/decoders will use only 4 bytes. C++ side won't be able to 
properly deserialize {{0xD801, 0xDC37}} on its side because it will be encoded 
in 6 bytes.

Try to serialize this String on C++ side. It should be encoded in 4 bytes while 
the new Java algorithm encodes it in 6 bytes.
{noformat}
str = new String(new char[] {0xD801, 0xDC37});
{noformat}



> C++: UTF-16 surrogate symbols are not serialized properly
> -
>
> Key: IGNITE-3140
> URL: https://issues.apache.org/jira/browse/IGNITE-3140
> Project: Ignite
>  Issue Type: Bug
>  Components: platforms
>Affects Versions: 1.5.0.final
>Reporter: Denis Magda
>Assignee: Vladimir Ozerov
> Fix For: 1.6
>
>
> There is an issue with serialization of a surrogate symbol with 
> {{BinaryMarshaller}}. On Java side String's serialization logic was improved 
> to support all the cases. Refer to IGNITE-3098.
> C++ serialization logic has to be updated as well. Please refer to the 
> algorithm located in ignite-3098 branch in the following places:
> - {{BinaryUtils.utf8BytesToStr}} - serialization
> - {{BinaryUtils.strToUtf8Bytes}} - deserialization
> - 
> {{IgniteSystemProperties.IGNITE_BINARY_MARSHALLER_USE_STRING_SERIALIZATION_VER_2}}
>  controls which version of serialization logic to use (old or new).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (IGNITE-3140) C++: UTF-16 surrogate symbols are not serialized properly

2016-05-16 Thread Igor Sapego (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285021#comment-15285021
 ] 

Igor Sapego commented on IGNITE-3140:
-

Denis,

C++ uses UTF-8 encoding right now and as long as Java and .NET nodes would 
write strings in UTF-8 we are not going to have any problems with 
deserialization. On the C++ side we just copy those received string bytes 
without performing any complex processing and use it as string. As long as it 
is valid UTF-8 data (and in our Binary protocol it is) everything is going to 
work just fine.

> C++: UTF-16 surrogate symbols are not serialized properly
> -
>
> Key: IGNITE-3140
> URL: https://issues.apache.org/jira/browse/IGNITE-3140
> Project: Ignite
>  Issue Type: Bug
>  Components: platforms
>Affects Versions: 1.5.0.final
>Reporter: Denis Magda
>Assignee: Vladimir Ozerov
> Fix For: 1.6
>
>
> There is an issue with serialization of a surrogate symbol with 
> {{BinaryMarshaller}}. On Java side String's serialization logic was improved 
> to support all the cases. Refer to IGNITE-3098.
> C++ serialization logic has to be updated as well. Please refer to the 
> algorithm located in ignite-3098 branch in the following places:
> - {{BinaryUtils.utf8BytesToStr}} - serialization
> - {{BinaryUtils.strToUtf8Bytes}} - deserialization
> - 
> {{IgniteSystemProperties.IGNITE_BINARY_MARSHALLER_USE_STRING_SERIALIZATION_VER_2}}
>  controls which version of serialization logic to use (old or new).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (IGNITE-3140) C++: UTF-16 surrogate symbols are not serialized properly

2016-05-16 Thread Denis Magda (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284919#comment-15284919
 ] 

Denis Magda commented on IGNITE-3140:
-

Igor,

In case of heterogeneous cluster (Java and C++ nodes) C++ side won't be able to 
properly deserialize surrogate symbols from a String serialized on Java side if 
the new algo (from IGNITE-3098) is used. This is the reason why we should 
rewrite strings serialization on logic on C++ side as well.

> C++: UTF-16 surrogate symbols are not serialized properly
> -
>
> Key: IGNITE-3140
> URL: https://issues.apache.org/jira/browse/IGNITE-3140
> Project: Ignite
>  Issue Type: Bug
>  Components: platforms
>Affects Versions: 1.5.0.final
>Reporter: Denis Magda
>Assignee: Vladimir Ozerov
> Fix For: 1.6
>
>
> There is an issue with serialization of a surrogate symbol with 
> {{BinaryMarshaller}}. On Java side String's serialization logic was improved 
> to support all the cases. Refer to IGNITE-3098.
> C++ serialization logic has to be updated as well. Please refer to the 
> algorithm located in ignite-3098 branch in the following places:
> - {{BinaryUtils.utf8BytesToStr}} - serialization
> - {{BinaryUtils.strToUtf8Bytes}} - deserialization
> - 
> {{IgniteSystemProperties.IGNITE_BINARY_MARSHALLER_USE_STRING_SERIALIZATION_VER_2}}
>  controls which version of serialization logic to use (old or new).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (IGNITE-3140) C++: UTF-16 surrogate symbols are not serialized properly

2016-05-16 Thread Igor Sapego (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284659#comment-15284659
 ] 

Igor Sapego commented on IGNITE-3140:
-

We do not deal with the UTF-16 in C++ code. We expect user to provide us 
strings in a valid UTF-8 format.
Added test with malformed UTF-8 string where we are expecting an exception.

> C++: UTF-16 surrogate symbols are not serialized properly
> -
>
> Key: IGNITE-3140
> URL: https://issues.apache.org/jira/browse/IGNITE-3140
> Project: Ignite
>  Issue Type: Bug
>  Components: platforms
>Affects Versions: 1.5.0.final
>Reporter: Denis Magda
>Assignee: Igor Sapego
> Fix For: 1.6
>
>
> There is an issue with serialization of a surrogate symbol with 
> {{BinaryMarshaller}}. On Java side String's serialization logic was improved 
> to support all the cases. Refer to IGNITE-3098.
> C++ serialization logic has to be updated as well. Please refer to the 
> algorithm located in ignite-3098 branch in the following places:
> - {{BinaryUtils.utf8BytesToStr}} - serialization
> - {{BinaryUtils.strToUtf8Bytes}} - deserialization
> - 
> {{IgniteSystemProperties.IGNITE_BINARY_MARSHALLER_USE_STRING_SERIALIZATION_VER_2}}
>  controls which version of serialization logic to use (old or new).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)