Steve Carlin has posted comments on this change. ( http://gerrit.cloudera.org:8080/18295 )
Change subject: IMPALA-11162: Support GenericUDFs for Hive ...................................................................... Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/18295/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18295/3//COMMIT_MSG@7 PS3, Line 7: Support GenericUDFs for Hive > Do you plan to merge this commit as it is, or this is a WIP change and you The limitations are too big to fix right now. I'd like to merge it as/is. http://gerrit.cloudera.org:8080/#/c/18295/3//COMMIT_MSG@24 PS3, Line 24: Functions are not extracted from the jar file. The first generation : of Hive UDFs allowed this because the method prototypes are : explicitly defined and can be determined at funciton creation time. For : GenericUDFs, the return types are determined based on the parameters : passed in when running a query. > Should we extract signatures for generic UDFs? My understanding is that all My understanding of the current implementation is that a Function object is created which has its return type and parameter types defined as a global object within catalogd. The problem I am running into here is that it is impossible to create parameter types because of the permutations of all parameters. For instance (and I realize that it wouldn't be used, but I'm just using an example here): the "+" UDF in Hive supports a Long and an Int as parameters. The number of permutations we would have with Longs, Ints, SmallInts, TinyInts, etc... would be too many to register at "create function" time. Of course, we can eventually resolve this limitation, but that would be for a future commit. http://gerrit.cloudera.org:8080/#/c/18295/3//COMMIT_MSG@30 PS3, Line 30: For the same reason as above, GenericUDFs cannot be made permanent. : They will need to be recreated everytime the server is restarted. : This is a severe limitation and will be resolved in the near future. > This is not 100% clear to me. There is logic in CreateFunctionStatementBase: if (getBinaryType() == TFunctionBinaryType.JAVA && hasSignature()) { fn_.setIsPersistent(false); } else { fn_.setIsPersistent(true); } It doesn't save functions to the database if there is no signature defined at creation time. There is a "V1" syntax for java functions where parameters are used and a "V2" syntax where one passes in the function name and they are extracted. As stated in my above comment, there are too many permutations of the V2 function with Generic UDFs for this to make sense. There will have to be another code change to support this. http://gerrit.cloudera.org:8080/#/c/18295/3/fe/src/main/java/org/apache/impala/hive/executor/HiveGenericJavaFunction.java File fe/src/main/java/org/apache/impala/hive/executor/HiveGenericJavaFunction.java: http://gerrit.cloudera.org:8080/#/c/18295/3/fe/src/main/java/org/apache/impala/hive/executor/HiveGenericJavaFunction.java@183 PS3, Line 183: return PrimitiveObjectInspectorFactory.writableIntObjectInspector; > nit: indentation Done http://gerrit.cloudera.org:8080/#/c/18295/3/fe/src/main/java/org/apache/impala/hive/executor/HiveGenericJavaFunction.java@189 PS3, Line 189: return PrimitiveObjectInspectorFactory.writableDoubleObjectInspector; > nit: indentation Done http://gerrit.cloudera.org:8080/#/c/18295/3/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java File fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java: http://gerrit.cloudera.org:8080/#/c/18295/3/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java@73 PS3, Line 73: private DeferredObject[] deferredObjects_; : private DeferredObject[] runtimeDeferredObjects_; > Can you add some comments about the roles of these? Hopefully it's clear with my comment? A unit test has been added that works with Hive to confirm the behavior mentioned in the comment. I've also tested with a variety of Hive UDFs while implementing. http://gerrit.cloudera.org:8080/#/c/18295/3/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java File fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java: http://gerrit.cloudera.org:8080/#/c/18295/3/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@46 PS3, Line 46: /** > nit: add extra line Done -- To view, visit http://gerrit.cloudera.org:8080/18295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie6fd09120db413fade94410c83ebe8ff104013cd Gerrit-Change-Number: 18295 Gerrit-PatchSet: 3 Gerrit-Owner: Steve Carlin <scar...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Steve Carlin <scar...@cloudera.com> Gerrit-Comment-Date: Sun, 03 Apr 2022 20:19:04 +0000 Gerrit-HasComments: Yes