[ https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matt McCline updated HIVE-17433: -------------------------------- Status: Patch Available (was: In Progress) > Vectorization: Support Decimal64 in Hive Query Engine > ----------------------------------------------------- > > Key: HIVE-17433 > URL: https://issues.apache.org/jira/browse/HIVE-17433 > Project: Hive > Issue Type: Bug > Components: Hive > Reporter: Matt McCline > Assignee: Matt McCline > Priority: Critical > Attachments: HIVE-17433.03.patch, HIVE-17433.04.patch, > HIVE-17433.05.patch, HIVE-17433.06.patch, HIVE-17433.07.patch, > HIVE-17433.08.patch > > > Provide partial support for Decimal64 within Hive. By partial I mean that > our current decimal has a large surface area of features (rounding, multiply, > divide, remainder, power, big precision, and many more) but only a small > number has been identified as being performance hotspots. > Those are small precision decimals with precision <= 18 that fit within a > 64-bit long we are calling Decimal64 . Just as we optimize row-mode > execution engine hotspots by selectively adding new vectorization code, we > can treat the current decimal as the full featured one and add additional > Decimal64 optimization where query benchmarks really show it help. > This change creates a Decimal64ColumnVector. > This change currently detects small decimal with Hive for Vectorized text > input format and uses some new Decimal64 vectorized classes for comparison, > addition, and later perhaps a few GroupBy aggregations like sum, avg, min, > max. > The patch also supports a new annotation that can mark a > VectorizedInputFormat as supporting Decimal64 (it is called DECIMAL_64). So, > in separate work those other formats such as ORC, PARQUET, etc can be done in > later JIRAs so they participate in the Decimal64 performance optimization. > The idea is when you annotate your input format with: > @VectorizedInputFormatSupports(supports = {DECIMAL_64}) > the Vectorizer in Hive will plan usage of Decimal64ColumnVector instead of > DecimalColumnVector. Upon an input format seeing Decimal64ColumnVector being > used, the input format can fill that column vector with decimal64 longs > instead of HiveDecimalWritable objects of DecimalColumnVector. > There will be a Hive environment variable > hive.vectorized.input.format.supports.enabled that has a string list of > supported features. The default will start as "decimal_64". It can be > turned off to allow for performance comparisons and testing. > The query SELECT * FROM DECIMAL_6_1_txt where key - 100BD < 200BD ORDER BY > key, value > Will have a vectorized explain plan looking like: > ... > Filter Operator > Filter Vectorization: > className: VectorFilterOperator > native: true > predicateExpression: > FilterDecimal64ColLessDecimal64Scalar(col 2, val 20000000)(children: > Decimal64ColSubtractDecimal64Scalar(col 0, val 10000000, > outputDecimal64AbsMax 99999999999) -> 2:decimal(11,5)/DECIMAL_64) -> boolean > predicate: ((key - 100) < 200) (type: boolean) > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029)