[jira] [Comment Edited] (ARROW-786) [Format] In-memory format for 128-bit Decimals, handling of sign bit

2017-07-26 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101836#comment-16101836
 ] 

Jacques Nadeau edited comment on ARROW-786 at 7/26/17 3:51 PM:
---

The current format of the java implementation is an embedded sign bit. 
GCC/Clang/Intel support __int128 which I believe on x86-64 machines is 
represented with the sign bit embedded ( ? ). I remember talking to [~nongli] 
about this years ago and (if I recall correctly), we chose the Parquet 
representation based on his experiments with GCC or Clang/LLVM. (Unfortunately, 
I'm unable to find the thread.)

The current Java implementation supports a 16-bit wide, sign-bit embedded 
twos-complement big-endian representation that is the same as the Parquet 
description here: 

https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L81


was (Author: jnadeau):
The current format of the java implementation is an embedded sign bit. 
GCC/Clang/Intel support __int128 which I believe on x86-64 machines is 
represented with the sign bit embedded (?). I remember talking to [~nongli] 
about this years ago and (if I recall correctly), we chose the Parquet 
representation based on his experiments with GCC or Clang/LLVM. (Unfortunately, 
I'm unable to find the thread.)

The current Java implementation supports a 16-bit wide, sign-bit embedded 
twos-complement big-endian representation that is the same as the Parquet 
description here: 

https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L81

> [Format] In-memory format for 128-bit Decimals, handling of sign bit
> 
>
> Key: ARROW-786
> URL: https://issues.apache.org/jira/browse/ARROW-786
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Wes McKinney
> Fix For: 0.6.0
>
>
> cc [~cpcloud]
> We found in ARROW-655 that we needed to add an extra bit for signedness for 
> decimals stored as 128-bit values to be able to use the Boost multiprecision 
> libraries. This makes Decimal128 not fit completely neatly as a 16-byte fixed 
> size binary value, and more of a {{struct fixed_size_binary(16)>}}. What is the current formata in the Java 
> implementation? We will need to document the memory layout for decimals that 
> maximizes compatibility across languages and eventually implement integration 
> tests for IPC. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (ARROW-786) [Format] In-memory format for 128-bit Decimals, handling of sign bit

2017-07-26 Thread Phillip Cloud (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101887#comment-16101887
 ] 

Phillip Cloud edited comment on ARROW-786 at 7/26/17 4:33 PM:
--

[~jnadeau] {{__int128_t}} doesn't work on Windows (with visual cpp) and when I 
originally wrote the decimal code for arrow-cpp, it was buggy with clang. The 
symbols required to link in libc++ code necessary to use that type were not 
exported by clang. See here: https://bugs.llvm.org//show_bug.cgi?id=26156.

We ultimately went with the boost multiprecision representation (which is sign 
magnitude) because of desire to reuse existing libraries and cross platform 
capabilities out of the box.

One possible alternative (depending on whether clang issues have been resolved) 
is to write our own pared down of version of something like boost 
multiprecision that uses {{__int128_t}} on UNIXes and two {{int64_t}}s on 
Windows. It wouldn't need to have any operations at the moment, just the 
ability to print itself like a decimal number and convert decimal strings to 
the underlying type. Even those may be able to be functions and not methods on 
the class.


was (Author: cpcloud):
[~jnadeau] {{__int128_t}} doesn't work on Windows (with vc++) and when I 
originally wrote the decimal code for arrow-cpp, it was buggy with clang. The 
symbols required to link in libc++ code necessary to use that type were not 
exported by clang. See here: https://bugs.llvm.org//show_bug.cgi?id=26156.

We ultimately went with the boost multiprecision representation (which is sign 
magnitude) because of desire to reuse existing libraries and cross platform 
capabilities out of the box.

One possible alternative (depending on whether clang issues have been resolved) 
is to write our own pared down of version of something like boost 
multiprecision that uses {{__int128_t}} on UNIXes and two {{int64_t}}s on 
Windows. It wouldn't need to have any operations at the moment, just the 
ability to print itself like a decimal number and convert decimal strings to 
the underlying type. Even those may be able to be functions and not methods on 
the class.

> [Format] In-memory format for 128-bit Decimals, handling of sign bit
> 
>
> Key: ARROW-786
> URL: https://issues.apache.org/jira/browse/ARROW-786
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Wes McKinney
> Fix For: 0.6.0
>
>
> cc [~cpcloud]
> We found in ARROW-655 that we needed to add an extra bit for signedness for 
> decimals stored as 128-bit values to be able to use the Boost multiprecision 
> libraries. This makes Decimal128 not fit completely neatly as a 16-byte fixed 
> size binary value, and more of a {{struct fixed_size_binary(16)>}}. What is the current formata in the Java 
> implementation? We will need to document the memory layout for decimals that 
> maximizes compatibility across languages and eventually implement integration 
> tests for IPC. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (ARROW-786) [Format] In-memory format for 128-bit Decimals, handling of sign bit

2017-07-26 Thread Phillip Cloud (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101887#comment-16101887
 ] 

Phillip Cloud edited comment on ARROW-786 at 7/26/17 4:33 PM:
--

[~jnadeau] {{__int128_t}} doesn't work on Windows (with vc++) and when I 
originally wrote the decimal code for arrow-cpp, it was buggy with clang. The 
symbols required to link in libc++ code necessary to use that type were not 
exported by clang. See here: https://bugs.llvm.org//show_bug.cgi?id=26156.

We ultimately went with the boost multiprecision representation (which is sign 
magnitude) because of desire to reuse existing libraries and cross platform 
capabilities out of the box.

One possible alternative (depending on whether clang issues have been resolved) 
is to write our own pared down of version of something like boost 
multiprecision that uses {{__int128_t}} on UNIXes and two {{int64_t}}s on 
Windows. It wouldn't need to have any operations at the moment, just the 
ability to print itself like a decimal number and convert decimal strings to 
the underlying type. Even those may be able to be functions and not methods on 
the class.


was (Author: cpcloud):
[~jnadeau] {{__int128_t}} doesn't work on Windows (with vc++) and when I 
originally wrote the decimal code for arrow-cpp, it was buggy with clang. The 
symbols required to link in libc++ code necessary to use that type were not 
exported by clang. See here: https://bugs.llvm.org//show_bug.cgi?id=26156.

We ultimately went with the boost multiprecision representation (which is sign 
magnitude) because of desire to reuse existing libraries and cross platform 
capabilities out of the box.

One possible alternative (depending on whether clang issues have been resolved) 
is to write our own pared down of version of something like boost 
multiprecision that uses {{__int128_t}} on UNIXes and two {{int64_t}}s on 
Windows. It wouldn't need to have any operations at the moment, just the 
ability to print itself like a decimal number and convert decimal strings to 
the underlying type. Even those may be able to be functions and not methods on 
the class.

> [Format] In-memory format for 128-bit Decimals, handling of sign bit
> 
>
> Key: ARROW-786
> URL: https://issues.apache.org/jira/browse/ARROW-786
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Wes McKinney
> Fix For: 0.6.0
>
>
> cc [~cpcloud]
> We found in ARROW-655 that we needed to add an extra bit for signedness for 
> decimals stored as 128-bit values to be able to use the Boost multiprecision 
> libraries. This makes Decimal128 not fit completely neatly as a 16-byte fixed 
> size binary value, and more of a {{struct fixed_size_binary(16)>}}. What is the current formata in the Java 
> implementation? We will need to document the memory layout for decimals that 
> maximizes compatibility across languages and eventually implement integration 
> tests for IPC. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (ARROW-786) [Format] In-memory format for 128-bit Decimals, handling of sign bit

2017-07-26 Thread Phillip Cloud (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101887#comment-16101887
 ] 

Phillip Cloud edited comment on ARROW-786 at 7/26/17 4:33 PM:
--

[~jnadeau] {{__int128_t}} doesn't work on Windows (with visual cpp) and when I 
originally wrote the decimal code for arrow-cpp, it was buggy with clang. The 
symbols required to link in libc++ code necessary to use that type were not 
exported by clang. See here: https://bugs.llvm.org//show_bug.cgi?id=26156.

We ultimately went with the boost multiprecision representation (which is sign 
magnitude) because of desire to reuse existing libraries and cross platform 
capabilities out of the box.

One possible alternative (depending on whether clang issues have been resolved) 
is to write our own pared down of version of something like boost 
multiprecision that uses {{__int128_t}} on UNIXes and two {{int64_t}} s on 
Windows. It wouldn't need to have any operations at the moment, just the 
ability to print itself like a decimal number and convert decimal strings to 
the underlying type. Even those may be able to be functions and not methods on 
the class.


was (Author: cpcloud):
[~jnadeau] {{__int128_t}} doesn't work on Windows (with visual cpp) and when I 
originally wrote the decimal code for arrow-cpp, it was buggy with clang. The 
symbols required to link in libc++ code necessary to use that type were not 
exported by clang. See here: https://bugs.llvm.org//show_bug.cgi?id=26156.

We ultimately went with the boost multiprecision representation (which is sign 
magnitude) because of desire to reuse existing libraries and cross platform 
capabilities out of the box.

One possible alternative (depending on whether clang issues have been resolved) 
is to write our own pared down of version of something like boost 
multiprecision that uses {{__int128_t}} on UNIXes and two {{int64_t}}s on 
Windows. It wouldn't need to have any operations at the moment, just the 
ability to print itself like a decimal number and convert decimal strings to 
the underlying type. Even those may be able to be functions and not methods on 
the class.

> [Format] In-memory format for 128-bit Decimals, handling of sign bit
> 
>
> Key: ARROW-786
> URL: https://issues.apache.org/jira/browse/ARROW-786
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Wes McKinney
> Fix For: 0.6.0
>
>
> cc [~cpcloud]
> We found in ARROW-655 that we needed to add an extra bit for signedness for 
> decimals stored as 128-bit values to be able to use the Boost multiprecision 
> libraries. This makes Decimal128 not fit completely neatly as a 16-byte fixed 
> size binary value, and more of a {{struct fixed_size_binary(16)>}}. What is the current formata in the Java 
> implementation? We will need to document the memory layout for decimals that 
> maximizes compatibility across languages and eventually implement integration 
> tests for IPC. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (ARROW-786) [Format] In-memory format for 128-bit Decimals, handling of sign bit

2017-08-02 Thread Phillip Cloud (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16110990#comment-16110990
 ] 

Phillip Cloud edited comment on ARROW-786 at 8/2/17 2:21 PM:
-

I'm going to try to start working on this this weekend. This should be up for 
the next release.


was (Author: cpcloud):
I'm going to try to get to this 

> [Format] In-memory format for 128-bit Decimals, handling of sign bit
> 
>
> Key: ARROW-786
> URL: https://issues.apache.org/jira/browse/ARROW-786
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Wes McKinney
> Fix For: 0.7.0
>
>
> cc [~cpcloud]
> We found in ARROW-655 that we needed to add an extra bit for signedness for 
> decimals stored as 128-bit values to be able to use the Boost multiprecision 
> libraries. This makes Decimal128 not fit completely neatly as a 16-byte fixed 
> size binary value, and more of a {{struct fixed_size_binary(16)>}}. What is the current formata in the Java 
> implementation? We will need to document the memory layout for decimals that 
> maximizes compatibility across languages and eventually implement integration 
> tests for IPC. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)