Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/18295 )
Change subject: IMPALA-11162: Support GenericUDFs for Hive ...................................................................... Patch Set 5: (39 comments) Thanks, and again sorry for not coming back earlier. http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/main/java/org/apache/impala/hive/executor/HiveGenericJavaFunction.java File fe/src/main/java/org/apache/impala/hive/executor/HiveGenericJavaFunction.java: http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/main/java/org/apache/impala/hive/executor/HiveGenericJavaFunction.java@65 PS4, Line 65: > Removing that comment. That was changed in one of my iterations. Thanks. http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/main/java/org/apache/impala/hive/executor/HiveGenericJavaFunction.java@128 PS4, Line 128: xception { > Changed to "create" Thanks. http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/main/java/org/apache/impala/hive/executor/HiveGenericJavaFunction.java@139 PS4, Line 139: throw new CatalogException("Unable to call create UDF instance.", e); : } : } : : private void checkValidFunction() throws CatalogException { : try { > Done Thanks. http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/main/java/org/apache/impala/hive/executor/HiveGenericJavaFunction.java@154 PS4, Line 154: > It is important for the caller to match the UDF when it is any specific typ It's a good point, thanks. http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java File fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java: http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java@67 PS4, Line 67: public class HiveUdfExecutorGeneric extends HiveUdfExecutor { > Cut and paste. Removing it. Thanks. http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java@76 PS4, Line 76: e > Done Thanks. http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java@79 PS4, Line 79: runtimeDeferredP > Done Thanks, but don't forget the underscore at the end, 'deferredParameters_'. http://gerrit.cloudera.org:8080/#/c/18295/5/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java File fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java: http://gerrit.cloudera.org:8080/#/c/18295/5/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java@79 PS5, Line 79: private DeferredObject[] runtimeDeferredParameters; Also here, don't forget the underscore, 'runtimeDeferredParameters_'. http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java File fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java: http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java@132 PS4, Line 132: from > Done Thanks. http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java File fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java: http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@50 PS4, Line 50: Simple Generic UDFs for testing. > Adjusted comment and code. Thanks. http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@59 PS4, Line 59: HDFS in testda > I'm not sure I understand this. I meant like it computes the sum if the arguments are a number type, concatenates if the arguments are strings etc. http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@61 PS4, Line 61: s GenericU > Done Thanks. http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@85 PS4, Line 85: > Done Thanks. http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@89 PS4, Line 89: : for (ObjectInspector oi : arguments) { : if (!(oi instanceof PrimitiveObjectInspector)) { : throw new UDFArgumentException("Found an input that is not a primitive."); : } : PrimitiveObjectInspector poi = (PrimitiveObjectIns > Done Thanks. http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@97 PS4, Line 97: > Done Thanks. http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@150 PS4, Line 150: > Done Thanks. http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@159 PS4, Line 159: SignatureString(a > Done Thanks. http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@276 PS4, Line 276: input : inpu > Done Thanks. http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@296 PS4, Line 296: r > Done Thanks. http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@384 PS4, Line 384: > Done Thanks. http://gerrit.cloudera.org:8080/#/c/18295/5/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java File fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java: http://gerrit.cloudera.org:8080/#/c/18295/5/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@103 PS5, Line 103: int numArgsToCheck = (argAndRetType_ == PrimitiveCategory.BYTE) ? 1 : 3; I don't think there is any point in keeping 'Byte' separate. I know it only has a one-argument version in TestUdf, but I think it is just because nobody bothered to write any other version, I don't think there was any intention to handle it separately. My opinion is that we should accept any number of arguments for these UDFs. http://gerrit.cloudera.org:8080/#/c/18295/5/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@145 PS5, Line 145: int numArgsAllowed See L103. http://gerrit.cloudera.org:8080/#/c/18295/5/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@149 PS5, Line 149: return It should also be "Unsupported argument type", like on L135. http://gerrit.cloudera.org:8080/#/c/18295/5/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@151 PS5, Line 151: if (inputTypes_.size() > numArgsAllowed) { See L103. http://gerrit.cloudera.org:8080/#/c/18295/5/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@190 PS5, Line 190: Short Should be 'Byte'. http://gerrit.cloudera.org:8080/#/c/18295/5/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@199 PS5, Line 199: Nit: unnecessary line. http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java File fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java: http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java@475 PS4, Line 475: GenericU > Done Thanks. http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java@485 PS4, Line 485: TestUdf(null, TestGenericUdf.class, createText("ABCD"), createText("ABCD")); : TestUdf(null, TestGenericUdf.class, createDouble(3), > Done Thanks. http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java@506 PS4, Line 506: } If we choose not to limit the number of UDF arguments like I suggested on L103 in TestGenericUdf.java, no need to remove these tests. http://gerrit.cloudera.org:8080/#/c/18295/4/java/test-hive-udfs/src/main/java/org/apache/impala/TestGenericUdfException.java File java/test-hive-udfs/src/main/java/org/apache/impala/TestGenericUdfException.java: http://gerrit.cloudera.org:8080/#/c/18295/4/java/test-hive-udfs/src/main/java/org/apache/impala/TestGenericUdfException.java@30 PS4, Line 30: > Done Thanks. http://gerrit.cloudera.org:8080/#/c/18295/4/java/test-hive-udfs/src/main/java/org/apache/impala/TestGenericUdfException.java@46 PS4, Line 46: RuntimeException("Te > Done Thanks. http://gerrit.cloudera.org:8080/#/c/18295/4/testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test File testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test: http://gerrit.cloudera.org:8080/#/c/18295/4/testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test@24 PS4, Line 24: generic_identity > Done I think the comment can stay, just we should say "Test GenericUDF functions" or something like that instead. http://gerrit.cloudera.org:8080/#/c/18295/4/testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test@74 PS4, Line 74: : gener > Done Do we actually need this test? TestUdf.java has functions for different string-like types (BytesWritable, String, Text) but TestGenericUdf does not. http://gerrit.cloudera.org:8080/#/c/18295/4/testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test@170 PS4, Line 170: > This test was essentially copied from java-udf.test, so we have a bug there I would prefer option 2 because the semantics of the functions in the TestGenericUdf class should be consistent. On the other hand I don't think there is any strict requirement that these tests should be the same as the older TestUdf tests. Also, option 1 is not bad either, it is not uncommon that our commits contain small miscellaneous fixes besides the main issue. http://gerrit.cloudera.org:8080/#/c/18295/4/testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test@272 PS4, Line 272: AnalysisException: Type TIMESTAMP is not supported for Java UDFs. > Not sure how to create a test like that. Something like ''' create function generic_add(smallint, smallint) returns int location '$FILESYSTEM_PREFIX/test-warehouse/impala-hive-udfs.jar' symbol='org.apache.impala.TestGenericUdf'; ---- CATCH AnalysisException: "SOME_ERROR_MESSAGE" ==== ''' http://gerrit.cloudera.org:8080/#/c/18295/5/testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test File testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test: http://gerrit.cloudera.org:8080/#/c/18295/5/testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test@85 PS5, Line 85: select length(generic_identity("0123456789")), I checked IMPALA-1134 and I think we can still reference it here, but we shouldn't say that each call tests a different type as we don't use different string-like types as arguments in TestGenericUdf. http://gerrit.cloudera.org:8080/#/c/18295/4/testdata/workloads/functional-query/queries/QueryTest/load-generic-java-udfs.test File testdata/workloads/functional-query/queries/QueryTest/load-generic-java-udfs.test: http://gerrit.cloudera.org:8080/#/c/18295/4/testdata/workloads/functional-query/queries/QueryTest/load-generic-java-udfs.test@35 PS4, Line 35: boolean > Done Thanks. http://gerrit.cloudera.org:8080/#/c/18295/5/testdata/workloads/functional-query/queries/QueryTest/load-generic-java-udfs.test File testdata/workloads/functional-query/queries/QueryTest/load-generic-java-udfs.test: http://gerrit.cloudera.org:8080/#/c/18295/5/testdata/workloads/functional-query/queries/QueryTest/load-generic-java-udfs.test@27 PS5, Line 27: IDENTITY Should be lower-case. http://gerrit.cloudera.org:8080/#/c/18295/5/testdata/workloads/functional-query/queries/QueryTest/load-generic-java-udfs.test@51 PS5, Line 51: hive If we don't use 'hive' in the names of the 'generic_identity' functions and 'generic_throws_exception', 'generic_replace_string', 'generic_import_nearby_classes', we shouldn't use it for the generic add functions either. Or if we do, we should use 'hive' in the other function names as well. -- To view, visit http://gerrit.cloudera.org:8080/18295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie6fd09120db413fade94410c83ebe8ff104013cd Gerrit-Change-Number: 18295 Gerrit-PatchSet: 5 Gerrit-Owner: Steve Carlin <scar...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Steve Carlin <scar...@cloudera.com> Gerrit-Comment-Date: Wed, 13 Apr 2022 16:55:42 +0000 Gerrit-HasComments: Yes