[
https://issues.apache.org/jira/browse/HIVE-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843601#comment-13843601
]
Eric Hanson commented on HIVE-5762:
-----------------------------------
I'm thinking about using this basic structure for a decimal column vector for
limited-precision decimals. Then a utility package of static functions can be
implemented to do decimal arithmetic on individual values. It should be
possible to make this a lot faster than if the code relies on
java.math.BigDecimal, because it is less general, and because new() and garbage
collection will be reduced.
{code}
public class DecimalColumnVector extends ColumnVector {
public int precision; // precision of all elements in vector (max 38)
public int scale; // scale of all elements in vector (max 38)
public static final int WORDS_PER_VALUE = 4;
/**
* Logically a vector of 128 bit unsigned int, that is "little-endian." This
* means that for a value v, v[0] is least significant. The 4-word
* 32 bit values are treated as unsigned. However,the high-order bit
* of the highest word (word 3) must be 0.
*/
public int[][] vector;
public byte[] sign; // -1 if negative, 0 if zero, 1 if positive
public DecimalColumnVector() {
super(VectorizedRowBatch.DEFAULT_SIZE);
final int len = VectorizedRowBatch.DEFAULT_SIZE;
vector = new int[len][];
for (int i = 0; i < len; i++) {
vector[i] = new int[WORDS_PER_VALUE];
}
sign = new byte[len];
}
...
}
{code}
> Implement vectorized support for the DECIMAL data type
> ------------------------------------------------------
>
> Key: HIVE-5762
> URL: https://issues.apache.org/jira/browse/HIVE-5762
> Project: Hive
> Issue Type: Sub-task
> Reporter: Eric Hanson
>
> Add support to allow queries referencing DECIMAL columns and expression
> results to run efficiently in vectorized mode. Include unit tests and
> end-to-end tests.
> Before starting or at least going very far, please write design specification
> (a new section for the design spec attached to HIVE-4160) for how support for
> the different DECIMAL types should work in vectorized mode, and the roadmap,
> and have it reviewed.
> It may be feasible to re-use LongColumnVector and related VectorExpression
> classes for fixed-point decimal in certain data ranges. That should be at
> least considered to get faster performance and save code. For unlimited
> precision DECIMAL, a new column vector subtype may be needed, or a
> BytesColumnVector could be re-used.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)