[jira] [Commented] (ARROW-786) [Format] In-memory format for 128-bit Decimals, handling of sign bit
[ https://issues.apache.org/jira/browse/ARROW-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16110990#comment-16110990 ] Phillip Cloud commented on ARROW-786: - I'm going to try to get to this > [Format] In-memory format for 128-bit Decimals, handling of sign bit > > > Key: ARROW-786 > URL: https://issues.apache.org/jira/browse/ARROW-786 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Reporter: Wes McKinney > Fix For: 0.7.0 > > > cc [~cpcloud] > We found in ARROW-655 that we needed to add an extra bit for signedness for > decimals stored as 128-bit values to be able to use the Boost multiprecision > libraries. This makes Decimal128 not fit completely neatly as a 16-byte fixed > size binary value, and more of a {{struct fixed_size_binary(16)>}}. What is the current formata in the Java > implementation? We will need to document the memory layout for decimals that > maximizes compatibility across languages and eventually implement integration > tests for IPC. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-786) [Format] In-memory format for 128-bit Decimals, handling of sign bit
[ https://issues.apache.org/jira/browse/ARROW-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16110206#comment-16110206 ] Wes McKinney commented on ARROW-786: It doesn't look like this will be resolved in 0.6.0; moving to the next release > [Format] In-memory format for 128-bit Decimals, handling of sign bit > > > Key: ARROW-786 > URL: https://issues.apache.org/jira/browse/ARROW-786 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Reporter: Wes McKinney > Fix For: 0.7.0 > > > cc [~cpcloud] > We found in ARROW-655 that we needed to add an extra bit for signedness for > decimals stored as 128-bit values to be able to use the Boost multiprecision > libraries. This makes Decimal128 not fit completely neatly as a 16-byte fixed > size binary value, and more of a {{struct fixed_size_binary(16)>}}. What is the current formata in the Java > implementation? We will need to document the memory layout for decimals that > maximizes compatibility across languages and eventually implement integration > tests for IPC. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-786) [Format] In-memory format for 128-bit Decimals, handling of sign bit
[ https://issues.apache.org/jira/browse/ARROW-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102037#comment-16102037 ] Wes McKinney commented on ARROW-786: It seems like we might end up having to build a from-scratch implementation of Java's BigDecimal in C++. It might be worth it, but it's also a lot of work. The JDK source code is not ASF-friendly so we would have to start from scratch from a mathematical resource > [Format] In-memory format for 128-bit Decimals, handling of sign bit > > > Key: ARROW-786 > URL: https://issues.apache.org/jira/browse/ARROW-786 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Reporter: Wes McKinney > Fix For: 0.6.0 > > > cc [~cpcloud] > We found in ARROW-655 that we needed to add an extra bit for signedness for > decimals stored as 128-bit values to be able to use the Boost multiprecision > libraries. This makes Decimal128 not fit completely neatly as a 16-byte fixed > size binary value, and more of a {{struct fixed_size_binary(16)>}}. What is the current formata in the Java > implementation? We will need to document the memory layout for decimals that > maximizes compatibility across languages and eventually implement integration > tests for IPC. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-786) [Format] In-memory format for 128-bit Decimals, handling of sign bit
[ https://issues.apache.org/jira/browse/ARROW-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101976#comment-16101976 ] Phillip Cloud commented on ARROW-786: - That should be possible. Though because boost is using 128 bits + a sign bit, going from arrow-cpp to Java won't be possible in every case since the boost representation's values range from {{+/-0..2 ** 128 - 1}}, whereas the Java implementation's values range from {{-2 ** 127..2 ** 127 - 1}}. The more I think about this and reread the boost multiprecision docs, I think we should just implement our own very small wrapper around native types. Boost multiprecision has some optimizations that arrow doesn't care about like this that increase implementation complexity at best and hurt performance at worst: {code} When used at fixed precision, the size of this type is always one machine word larger than you would expect for an N-bit integer: the extra word stores both the sign, and how many machine words in the integer are actually in use. {code} plus the complexities of have two signed integer representations are enough to make me want to try jettisoning boost multiprecision. > [Format] In-memory format for 128-bit Decimals, handling of sign bit > > > Key: ARROW-786 > URL: https://issues.apache.org/jira/browse/ARROW-786 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Reporter: Wes McKinney > Fix For: 0.6.0 > > > cc [~cpcloud] > We found in ARROW-655 that we needed to add an extra bit for signedness for > decimals stored as 128-bit values to be able to use the Boost multiprecision > libraries. This makes Decimal128 not fit completely neatly as a 16-byte fixed > size binary value, and more of a {{struct fixed_size_binary(16)>}}. What is the current formata in the Java > implementation? We will need to document the memory layout for decimals that > maximizes compatibility across languages and eventually implement integration > tests for IPC. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-786) [Format] In-memory format for 128-bit Decimals, handling of sign bit
[ https://issues.apache.org/jira/browse/ARROW-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101901#comment-16101901 ] Wes McKinney commented on ARROW-786: [~cpcloud] is it possible to do bit twiddling to convert between the 16-byte Java/Parquet-compatible representation and the Boost::Multiprecision representation? > [Format] In-memory format for 128-bit Decimals, handling of sign bit > > > Key: ARROW-786 > URL: https://issues.apache.org/jira/browse/ARROW-786 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Reporter: Wes McKinney > Fix For: 0.6.0 > > > cc [~cpcloud] > We found in ARROW-655 that we needed to add an extra bit for signedness for > decimals stored as 128-bit values to be able to use the Boost multiprecision > libraries. This makes Decimal128 not fit completely neatly as a 16-byte fixed > size binary value, and more of a {{struct fixed_size_binary(16)>}}. What is the current formata in the Java > implementation? We will need to document the memory layout for decimals that > maximizes compatibility across languages and eventually implement integration > tests for IPC. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-786) [Format] In-memory format for 128-bit Decimals, handling of sign bit
[ https://issues.apache.org/jira/browse/ARROW-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101887#comment-16101887 ] Phillip Cloud commented on ARROW-786: - [~jnadeau] {{__int128_t}} doesn't work on Windows (with vc++) and when I originally wrote the decimal code for arrow-cpp, it was buggy with clang. The symbols required to link in libc++ code necessary to use that type were not exported by clang. See here: https://bugs.llvm.org//show_bug.cgi?id=26156. We ultimately went with the boost multiprecision representation (which is sign magnitude) because of desire to reuse existing libraries and cross platform capabilities out of the box. One possible alternative (depending on whether clang issues have been resolved) is to write our own pared down of version of something like boost multiprecision that uses {{__int128_t}} on UNIXes and two {{int64_t}}s on Windows. It wouldn't need to have any operations at the moment, just the ability to print itself like a decimal number and convert decimal strings to the underlying type. Even those may be able to be functions and not methods on the class. > [Format] In-memory format for 128-bit Decimals, handling of sign bit > > > Key: ARROW-786 > URL: https://issues.apache.org/jira/browse/ARROW-786 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Reporter: Wes McKinney > Fix For: 0.6.0 > > > cc [~cpcloud] > We found in ARROW-655 that we needed to add an extra bit for signedness for > decimals stored as 128-bit values to be able to use the Boost multiprecision > libraries. This makes Decimal128 not fit completely neatly as a 16-byte fixed > size binary value, and more of a {{struct fixed_size_binary(16)>}}. What is the current formata in the Java > implementation? We will need to document the memory layout for decimals that > maximizes compatibility across languages and eventually implement integration > tests for IPC. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-786) [Format] In-memory format for 128-bit Decimals, handling of sign bit
[ https://issues.apache.org/jira/browse/ARROW-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101842#comment-16101842 ] Jacques Nadeau commented on ARROW-786: -- Would also be good to confirm the memory representation of <16 x i128> vector using llvm on x86-64. > [Format] In-memory format for 128-bit Decimals, handling of sign bit > > > Key: ARROW-786 > URL: https://issues.apache.org/jira/browse/ARROW-786 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Reporter: Wes McKinney > Fix For: 0.6.0 > > > cc [~cpcloud] > We found in ARROW-655 that we needed to add an extra bit for signedness for > decimals stored as 128-bit values to be able to use the Boost multiprecision > libraries. This makes Decimal128 not fit completely neatly as a 16-byte fixed > size binary value, and more of a {{struct fixed_size_binary(16)>}}. What is the current formata in the Java > implementation? We will need to document the memory layout for decimals that > maximizes compatibility across languages and eventually implement integration > tests for IPC. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-786) [Format] In-memory format for 128-bit Decimals, handling of sign bit
[ https://issues.apache.org/jira/browse/ARROW-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101836#comment-16101836 ] Jacques Nadeau commented on ARROW-786: -- The current format of the java implementation is an embedded sign bit. GCC/Clang/Intel support __int128 which I believe on x86-64 machines is represented with the sign bit embedded (?). I remember talking to [~nongli] about this years ago and (if I recall correctly), we chose the Parquet representation based on his experiments with GCC or Clang/LLVM. (Unfortunately, I'm unable to find the thread.) The current Java implementation supports a 16-bit wide, sign-bit embedded twos-complement big-endian representation that is the same as the Parquet description here: https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L81 > [Format] In-memory format for 128-bit Decimals, handling of sign bit > > > Key: ARROW-786 > URL: https://issues.apache.org/jira/browse/ARROW-786 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Reporter: Wes McKinney > Fix For: 0.6.0 > > > cc [~cpcloud] > We found in ARROW-655 that we needed to add an extra bit for signedness for > decimals stored as 128-bit values to be able to use the Boost multiprecision > libraries. This makes Decimal128 not fit completely neatly as a 16-byte fixed > size binary value, and more of a {{struct fixed_size_binary(16)>}}. What is the current formata in the Java > implementation? We will need to document the memory layout for decimals that > maximizes compatibility across languages and eventually implement integration > tests for IPC. -- This message was sent by Atlassian JIRA (v6.4.14#64029)