[
https://issues.apache.org/jira/browse/HIVE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842001#comment-13842001
]
Jason Dere commented on HIVE-5356:
----------------------------------
One big effect in changing int / int => decimal is the performance impact,
since decimal arithmetic is quite a bit slower. Did a test similar to the
division unit tests, running GenericUDFOPDivide.evaluate() in a loop with both
double args and Decimal args. On my laptop, running the loop with decimal
division was over 50x slower than using double division.
Time in ms, 10M iterations:
double: 260
decimal: 13993
The loop I ran for double is below, I had a similar function for decimal:
{code:java}
public static long testDivideDouble(double a, double b, int iterations)
throws HiveException {
GenericUDFOPDivide udf = new GenericUDFOPDivide();
DoubleWritable left = new DoubleWritable(a);
DoubleWritable right = new DoubleWritable(b);
ObjectInspector[] inputOIs = {
PrimitiveObjectInspectorFactory.writableDoubleObjectInspector,
PrimitiveObjectInspectorFactory.writableDoubleObjectInspector
};
DeferredObject[] args = {
new DeferredJavaObject(left),
new DeferredJavaObject(right),
};
PrimitiveObjectInspector oi = (PrimitiveObjectInspector)
udf.initialize(inputOIs);
long start = System.currentTimeMillis();
for (int idx = 0; idx < iterations; ++idx) {
doubleResult = (DoubleWritable) udf.evaluate(args);
}
long end = System.currentTimeMillis();
return end - start;
}
{code}
> Move arithmatic UDFs to generic UDF implementations
> ---------------------------------------------------
>
> Key: HIVE-5356
> URL: https://issues.apache.org/jira/browse/HIVE-5356
> Project: Hive
> Issue Type: Task
> Components: UDF
> Affects Versions: 0.11.0
> Reporter: Xuefu Zhang
> Assignee: Xuefu Zhang
> Fix For: 0.13.0
>
> Attachments: HIVE-5356.1.patch, HIVE-5356.10.patch,
> HIVE-5356.11.patch, HIVE-5356.12.patch, HIVE-5356.2.patch, HIVE-5356.3.patch,
> HIVE-5356.4.patch, HIVE-5356.5.patch, HIVE-5356.6.patch, HIVE-5356.7.patch,
> HIVE-5356.8.patch, HIVE-5356.9.patch
>
>
> Currently, all of the arithmetic operators, such as add/sub/mult/div, are
> implemented as old-style UDFs and java reflection is used to determine the
> return type TypeInfos/ObjectInspectors, based on the return type of the
> evaluate() method chosen for the expression. This works fine for types that
> don't have type params.
> Hive decimal type participates in these operations just like int or double.
> Different from double or int, however, decimal has precision and scale, which
> cannot be determined by just looking at the return type (decimal) of the UDF
> evaluate() method, even though the operands have certain precision/scale.
> With the default of "decimal" without precision/scale, then (10, 0) will be
> the type params. This is certainly not desirable.
> To solve this problem, all of the arithmetic operators would need to be
> implemented as GenericUDFs, which allow returning ObjectInspector during the
> initialize() method. The object inspectors returned can carry type params,
> from which the "exact" return type can be determined.
> It's worth mentioning that, for user UDF implemented in non-generic way, if
> the return type of the chosen evaluate() method is decimal, the return type
> actually has (10,0) as precision/scale, which might not be desirable. This
> needs to be documented.
> This JIRA will cover minus, plus, divide, multiply, mod, and pmod, to limit
> the scope of review. The remaining ones will be covered under HIVE-5706.
--
This message was sent by Atlassian JIRA
(v6.1#6144)