Steve Carlin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18295 )

Change subject: IMPALA-11162: Support GenericUDFs for Hive
......................................................................


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/18295/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18295/3//COMMIT_MSG@7
PS3, Line 7: Support GenericUDFs for Hive
> Do you plan to merge this commit as it is, or this is a WIP change and you
The limitations are too big to fix right now.  I'd like to merge it as/is.


http://gerrit.cloudera.org:8080/#/c/18295/3//COMMIT_MSG@24
PS3, Line 24: Functions are not extracted from the jar file. The first 
generation
            : of Hive UDFs allowed this because the method prototypes are
            : explicitly defined and can be determined at funciton creation 
time. For
            : GenericUDFs, the return types are determined based on the 
parameters
            : passed in when running a query.
> Should we extract signatures for generic UDFs? My understanding is that all
My understanding of the current implementation is that a Function object is 
created which has its return type and parameter types defined as a global 
object within catalogd.

The problem I am running into here is that it is impossible to create parameter 
types because of the permutations of all parameters.  For instance (and I 
realize that it wouldn't be used, but I'm just using an example here): the "+" 
UDF in Hive supports a Long and an Int as parameters.  The number of 
permutations we would have with Longs, Ints, SmallInts, TinyInts, etc... would 
be too many to register at "create function" time.

Of course, we can eventually resolve this limitation, but that would be for a 
future commit.


http://gerrit.cloudera.org:8080/#/c/18295/3//COMMIT_MSG@30
PS3, Line 30: For the same reason as above, GenericUDFs cannot be made 
permanent.
            : They will need to be recreated everytime the server is restarted.
            : This is a severe limitation and will be resolved in the near 
future.
> This is not 100% clear to me.
There is logic in CreateFunctionStatementBase:

    if (getBinaryType() == TFunctionBinaryType.JAVA && hasSignature()) {
      fn_.setIsPersistent(false);
    } else {
      fn_.setIsPersistent(true);
    }

It doesn't save functions to the database if there is no signature defined at 
creation time.  There is a "V1" syntax for java functions where parameters are 
used and a "V2" syntax where one passes in the function name and they are 
extracted.  As stated in my above comment, there are too many permutations of 
the V2 function with Generic UDFs for this to make sense. There will have to be 
another code change to support this.


http://gerrit.cloudera.org:8080/#/c/18295/3/fe/src/main/java/org/apache/impala/hive/executor/HiveGenericJavaFunction.java
File 
fe/src/main/java/org/apache/impala/hive/executor/HiveGenericJavaFunction.java:

http://gerrit.cloudera.org:8080/#/c/18295/3/fe/src/main/java/org/apache/impala/hive/executor/HiveGenericJavaFunction.java@183
PS3, Line 183:       return 
PrimitiveObjectInspectorFactory.writableIntObjectInspector;
> nit: indentation
Done


http://gerrit.cloudera.org:8080/#/c/18295/3/fe/src/main/java/org/apache/impala/hive/executor/HiveGenericJavaFunction.java@189
PS3, Line 189:       return 
PrimitiveObjectInspectorFactory.writableDoubleObjectInspector;
> nit: indentation
Done


http://gerrit.cloudera.org:8080/#/c/18295/3/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java
File 
fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java:

http://gerrit.cloudera.org:8080/#/c/18295/3/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java@73
PS3, Line 73:   private DeferredObject[] deferredObjects_;
            :   private DeferredObject[] runtimeDeferredObjects_;
> Can you add some comments about the roles of these?
Hopefully it's clear with my comment?

A unit test has been added that works with Hive to confirm the behavior 
mentioned in the comment. I've also tested with a variety of Hive UDFs while 
implementing.


http://gerrit.cloudera.org:8080/#/c/18295/3/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java
File fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java:

http://gerrit.cloudera.org:8080/#/c/18295/3/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@46
PS3, Line 46: /**
> nit: add extra line
Done



--
To view, visit http://gerrit.cloudera.org:8080/18295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6fd09120db413fade94410c83ebe8ff104013cd
Gerrit-Change-Number: 18295
Gerrit-PatchSet: 3
Gerrit-Owner: Steve Carlin <scar...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Steve Carlin <scar...@cloudera.com>
Gerrit-Comment-Date: Sun, 03 Apr 2022 20:19:04 +0000
Gerrit-HasComments: Yes

Reply via email to