[ 
https://issues.apache.org/jira/browse/HIVE-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13839178#comment-13839178
 ] 

Eric Hanson commented on HIVE-5762:
-----------------------------------

The new fixed precision/scale decimal type DECIMAL(p, s) has maximum precision 
and scale of 38. 38 decimal digits, signed, will fit in a signed, 128-bit int 
(2 longs). 2^127-1 is 1.70141E+38. In a column, every number must have the same 
precision and scale, so that can be abstracted into the column vector or the 
VectorExpression operator itself and out if the individual data elements. 

So I'm thinking that a new DecimalColumnVector type could be created that 
contains 2 arrays of long. 

class DecimalColumnVector extends ColumnVector {
  long[] vectorLow;  // low order 64 bits of 128 bit int
  long[] vectorHigh; // high order 64 bits of 128 bit int
  int precision;
  int scale;
}

Then arithmetic and comparisons can be implemented that can be fast by relying 
on standard arithmetic and comparisons on long as a building block. How exactly 
to do the arithmetic and comparisons operations needs more thought.


> Implement vectorized support for the DECIMAL data type
> ------------------------------------------------------
>
>                 Key: HIVE-5762
>                 URL: https://issues.apache.org/jira/browse/HIVE-5762
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Eric Hanson
>
> Add support to allow queries referencing DECIMAL columns and expression 
> results to run efficiently in vectorized mode.  Include unit tests and 
> end-to-end tests. 
> Before starting or at least going very far, please write design specification 
> (a new section for the design spec attached to HIVE-4160) for how support for 
> the different DECIMAL types should work in vectorized mode, and the roadmap, 
> and have it reviewed. 
> It may be feasible to re-use LongColumnVector and related VectorExpression 
> classes for fixed-point decimal in certain data ranges. That should be at 
> least considered to get faster performance and save code. For unlimited 
> precision DECIMAL, a new column vector subtype may be needed, or a 
> BytesColumnVector could be re-used.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to