[ 
https://issues.apache.org/jira/browse/ARROW-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101836#comment-16101836
 ] 

Jacques Nadeau commented on ARROW-786:
--------------------------------------

The current format of the java implementation is an embedded sign bit. 
GCC/Clang/Intel support __int128 which I believe on x86-64 machines is 
represented with the sign bit embedded (?). I remember talking to [~nongli] 
about this years ago and (if I recall correctly), we chose the Parquet 
representation based on his experiments with GCC or Clang/LLVM. (Unfortunately, 
I'm unable to find the thread.)

The current Java implementation supports a 16-bit wide, sign-bit embedded 
twos-complement big-endian representation that is the same as the Parquet 
description here: 

https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L81

> [Format] In-memory format for 128-bit Decimals, handling of sign bit
> --------------------------------------------------------------------
>
>                 Key: ARROW-786
>                 URL: https://issues.apache.org/jira/browse/ARROW-786
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Format
>            Reporter: Wes McKinney
>             Fix For: 0.6.0
>
>
> cc [~cpcloud]
> We found in ARROW-655 that we needed to add an extra bit for signedness for 
> decimals stored as 128-bit values to be able to use the Boost multiprecision 
> libraries. This makes Decimal128 not fit completely neatly as a 16-byte fixed 
> size binary value, and more of a {{struct<sign_bitmap: boolean, data: 
> fixed_size_binary(16)>}}. What is the current formata in the Java 
> implementation? We will need to document the memory layout for decimals that 
> maximizes compatibility across languages and eventually implement integration 
> tests for IPC. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to