[ 
https://issues.apache.org/jira/browse/HIVE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842001#comment-13842001
 ] 

Jason Dere commented on HIVE-5356:
----------------------------------

One big effect in changing int / int => decimal is the performance impact, 
since decimal arithmetic is quite a bit slower. Did a test similar to the 
division unit tests, running GenericUDFOPDivide.evaluate() in a loop with both 
double args and Decimal args.  On my laptop, running the loop with decimal 
division was over 50x slower than using double division.

Time in ms, 10M iterations:
double: 260
decimal: 13993

The loop I ran for double is below, I had a similar function for decimal:
{code:java}
  public static long testDivideDouble(double a, double b, int iterations) 
throws HiveException {
    GenericUDFOPDivide udf = new GenericUDFOPDivide();

    DoubleWritable left = new DoubleWritable(a);
    DoubleWritable right = new DoubleWritable(b);
    ObjectInspector[] inputOIs = {
        PrimitiveObjectInspectorFactory.writableDoubleObjectInspector,
        PrimitiveObjectInspectorFactory.writableDoubleObjectInspector
    };
    DeferredObject[] args = {
        new DeferredJavaObject(left),
        new DeferredJavaObject(right),
    };

    PrimitiveObjectInspector oi = (PrimitiveObjectInspector) 
udf.initialize(inputOIs);

    long start = System.currentTimeMillis();
    for (int idx = 0; idx < iterations; ++idx) {
      doubleResult = (DoubleWritable) udf.evaluate(args);
    }
    long end = System.currentTimeMillis();
    return end - start;
  }
{code}

> Move arithmatic UDFs to generic UDF implementations
> ---------------------------------------------------
>
>                 Key: HIVE-5356
>                 URL: https://issues.apache.org/jira/browse/HIVE-5356
>             Project: Hive
>          Issue Type: Task
>          Components: UDF
>    Affects Versions: 0.11.0
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>             Fix For: 0.13.0
>
>         Attachments: HIVE-5356.1.patch, HIVE-5356.10.patch, 
> HIVE-5356.11.patch, HIVE-5356.12.patch, HIVE-5356.2.patch, HIVE-5356.3.patch, 
> HIVE-5356.4.patch, HIVE-5356.5.patch, HIVE-5356.6.patch, HIVE-5356.7.patch, 
> HIVE-5356.8.patch, HIVE-5356.9.patch
>
>
> Currently, all of the arithmetic operators, such as add/sub/mult/div, are 
> implemented as old-style UDFs and java reflection is used to determine the 
> return type TypeInfos/ObjectInspectors, based on the return type of the 
> evaluate() method chosen for the expression. This works fine for types that 
> don't have type params.
> Hive decimal type participates in these operations just like int or double. 
> Different from double or int, however, decimal has precision and scale, which 
> cannot be determined by just looking at the return type (decimal) of the UDF 
> evaluate() method, even though the operands have certain precision/scale. 
> With the default of "decimal" without precision/scale, then (10, 0) will be 
> the type params. This is certainly not desirable.
> To solve this problem, all of the arithmetic operators would need to be 
> implemented as GenericUDFs, which allow returning ObjectInspector during the 
> initialize() method. The object inspectors returned can carry type params, 
> from which the "exact" return type can be determined.
> It's worth mentioning that, for user UDF implemented in non-generic way, if 
> the return type of the chosen evaluate() method is decimal, the return type 
> actually has (10,0) as precision/scale, which might not be desirable. This 
> needs to be documented.
> This JIRA will cover minus, plus, divide, multiply, mod, and pmod, to limit 
> the scope of review. The remaining ones will be covered under HIVE-5706.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to