[ https://issues.apache.org/jira/browse/HIVE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842001#comment-13842001 ]
Jason Dere commented on HIVE-5356: ---------------------------------- One big effect in changing int / int => decimal is the performance impact, since decimal arithmetic is quite a bit slower. Did a test similar to the division unit tests, running GenericUDFOPDivide.evaluate() in a loop with both double args and Decimal args. On my laptop, running the loop with decimal division was over 50x slower than using double division. Time in ms, 10M iterations: double: 260 decimal: 13993 The loop I ran for double is below, I had a similar function for decimal: {code:java} public static long testDivideDouble(double a, double b, int iterations) throws HiveException { GenericUDFOPDivide udf = new GenericUDFOPDivide(); DoubleWritable left = new DoubleWritable(a); DoubleWritable right = new DoubleWritable(b); ObjectInspector[] inputOIs = { PrimitiveObjectInspectorFactory.writableDoubleObjectInspector, PrimitiveObjectInspectorFactory.writableDoubleObjectInspector }; DeferredObject[] args = { new DeferredJavaObject(left), new DeferredJavaObject(right), }; PrimitiveObjectInspector oi = (PrimitiveObjectInspector) udf.initialize(inputOIs); long start = System.currentTimeMillis(); for (int idx = 0; idx < iterations; ++idx) { doubleResult = (DoubleWritable) udf.evaluate(args); } long end = System.currentTimeMillis(); return end - start; } {code} > Move arithmatic UDFs to generic UDF implementations > --------------------------------------------------- > > Key: HIVE-5356 > URL: https://issues.apache.org/jira/browse/HIVE-5356 > Project: Hive > Issue Type: Task > Components: UDF > Affects Versions: 0.11.0 > Reporter: Xuefu Zhang > Assignee: Xuefu Zhang > Fix For: 0.13.0 > > Attachments: HIVE-5356.1.patch, HIVE-5356.10.patch, > HIVE-5356.11.patch, HIVE-5356.12.patch, HIVE-5356.2.patch, HIVE-5356.3.patch, > HIVE-5356.4.patch, HIVE-5356.5.patch, HIVE-5356.6.patch, HIVE-5356.7.patch, > HIVE-5356.8.patch, HIVE-5356.9.patch > > > Currently, all of the arithmetic operators, such as add/sub/mult/div, are > implemented as old-style UDFs and java reflection is used to determine the > return type TypeInfos/ObjectInspectors, based on the return type of the > evaluate() method chosen for the expression. This works fine for types that > don't have type params. > Hive decimal type participates in these operations just like int or double. > Different from double or int, however, decimal has precision and scale, which > cannot be determined by just looking at the return type (decimal) of the UDF > evaluate() method, even though the operands have certain precision/scale. > With the default of "decimal" without precision/scale, then (10, 0) will be > the type params. This is certainly not desirable. > To solve this problem, all of the arithmetic operators would need to be > implemented as GenericUDFs, which allow returning ObjectInspector during the > initialize() method. The object inspectors returned can carry type params, > from which the "exact" return type can be determined. > It's worth mentioning that, for user UDF implemented in non-generic way, if > the return type of the chosen evaluate() method is decimal, the return type > actually has (10,0) as precision/scale, which might not be desirable. This > needs to be documented. > This JIRA will cover minus, plus, divide, multiply, mod, and pmod, to limit > the scope of review. The remaining ones will be covered under HIVE-5706. -- This message was sent by Atlassian JIRA (v6.1#6144)