Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18295 )

Change subject: IMPALA-11162: Support GenericUDFs for Hive
......................................................................


Patch Set 5:

(39 comments)

Thanks, and again sorry for not coming back earlier.

http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/main/java/org/apache/impala/hive/executor/HiveGenericJavaFunction.java
File 
fe/src/main/java/org/apache/impala/hive/executor/HiveGenericJavaFunction.java:

http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/main/java/org/apache/impala/hive/executor/HiveGenericJavaFunction.java@65
PS4, Line 65:
> Removing that comment.  That was changed in one of my iterations.
Thanks.


http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/main/java/org/apache/impala/hive/executor/HiveGenericJavaFunction.java@128
PS4, Line 128: xception {
> Changed to "create"
Thanks.


http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/main/java/org/apache/impala/hive/executor/HiveGenericJavaFunction.java@139
PS4, Line 139: throw new CatalogException("Unable to call create UDF 
instance.", e);
             :     }
             :   }
             :
             :   private void checkValidFunction() throws CatalogException {
             :     try {
> Done
Thanks.


http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/main/java/org/apache/impala/hive/executor/HiveGenericJavaFunction.java@154
PS4, Line 154:
> It is important for the caller to match the UDF when it is any specific typ
It's a good point, thanks.


http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java
File 
fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java:

http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java@67
PS4, Line 67: public class HiveUdfExecutorGeneric extends HiveUdfExecutor {
> Cut and paste.  Removing it.
Thanks.


http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java@76
PS4, Line 76: e
> Done
Thanks.


http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java@79
PS4, Line 79: runtimeDeferredP
> Done
Thanks, but don't forget the underscore at the end, 'deferredParameters_'.


http://gerrit.cloudera.org:8080/#/c/18295/5/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java
File 
fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java:

http://gerrit.cloudera.org:8080/#/c/18295/5/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java@79
PS5, Line 79:   private DeferredObject[] runtimeDeferredParameters;
Also here, don't forget the underscore, 'runtimeDeferredParameters_'.


http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java
File fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java:

http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java@132
PS4, Line 132: from
> Done
Thanks.


http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java
File fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java:

http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@50
PS4, Line 50: Simple Generic UDFs for testing.
> Adjusted comment and code.
Thanks.


http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@59
PS4, Line 59: HDFS in testda
> I'm not sure I understand this.
I meant like it computes the sum if the arguments are a number type, 
concatenates if the arguments are strings etc.


http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@61
PS4, Line 61: s GenericU
> Done
Thanks.


http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@85
PS4, Line 85:
> Done
Thanks.


http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@89
PS4, Line 89:
            :     for (ObjectInspector oi : arguments) {
            :       if (!(oi instanceof PrimitiveObjectInspector)) {
            :         throw new UDFArgumentException("Found an input that is 
not a primitive.");
            :       }
            :       PrimitiveObjectInspector poi = (PrimitiveObjectIns
> Done
Thanks.


http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@97
PS4, Line 97:
> Done
Thanks.


http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@150
PS4, Line 150:
> Done
Thanks.


http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@159
PS4, Line 159: SignatureString(a
> Done
Thanks.


http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@276
PS4, Line 276:  input : inpu
> Done
Thanks.


http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@296
PS4, Line 296:     r
> Done
Thanks.


http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@384
PS4, Line 384:
> Done
Thanks.


http://gerrit.cloudera.org:8080/#/c/18295/5/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java
File fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java:

http://gerrit.cloudera.org:8080/#/c/18295/5/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@103
PS5, Line 103:     int numArgsToCheck = (argAndRetType_ == 
PrimitiveCategory.BYTE) ? 1 : 3;
I don't think there is any point in keeping 'Byte' separate. I know it only has 
a one-argument version in TestUdf, but I think it is just because nobody 
bothered to write any other version, I don't think there was any intention to 
handle it separately.
My opinion is that we should accept any number of arguments for these UDFs.


http://gerrit.cloudera.org:8080/#/c/18295/5/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@145
PS5, Line 145: int numArgsAllowed
See L103.


http://gerrit.cloudera.org:8080/#/c/18295/5/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@149
PS5, Line 149: return
It should also be "Unsupported argument type", like on L135.


http://gerrit.cloudera.org:8080/#/c/18295/5/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@151
PS5, Line 151:     if (inputTypes_.size() > numArgsAllowed) {
See L103.


http://gerrit.cloudera.org:8080/#/c/18295/5/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@190
PS5, Line 190: Short
Should be 'Byte'.


http://gerrit.cloudera.org:8080/#/c/18295/5/fe/src/test/java/org/apache/impala/hive/executor/TestGenericUdf.java@199
PS5, Line 199:
Nit: unnecessary line.


http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
File fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java:

http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java@475
PS4, Line 475: GenericU
> Done
Thanks.


http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java@485
PS4, Line 485:     TestUdf(null, TestGenericUdf.class, createText("ABCD"), 
createText("ABCD"));
             :     TestUdf(null, TestGenericUdf.class, createDouble(3),
> Done
Thanks.


http://gerrit.cloudera.org:8080/#/c/18295/4/fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java@506
PS4, Line 506:   }
If we choose not to limit the number of UDF arguments like I suggested on L103 
in TestGenericUdf.java, no need to remove these tests.


http://gerrit.cloudera.org:8080/#/c/18295/4/java/test-hive-udfs/src/main/java/org/apache/impala/TestGenericUdfException.java
File 
java/test-hive-udfs/src/main/java/org/apache/impala/TestGenericUdfException.java:

http://gerrit.cloudera.org:8080/#/c/18295/4/java/test-hive-udfs/src/main/java/org/apache/impala/TestGenericUdfException.java@30
PS4, Line 30:
> Done
Thanks.


http://gerrit.cloudera.org:8080/#/c/18295/4/java/test-hive-udfs/src/main/java/org/apache/impala/TestGenericUdfException.java@46
PS4, Line 46: RuntimeException("Te
> Done
Thanks.


http://gerrit.cloudera.org:8080/#/c/18295/4/testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test
File 
testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test:

http://gerrit.cloudera.org:8080/#/c/18295/4/testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test@24
PS4, Line 24: generic_identity
> Done
I think the comment can stay, just we should say "Test GenericUDF functions" or 
something like that instead.


http://gerrit.cloudera.org:8080/#/c/18295/4/testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test@74
PS4, Line 74:
            :        gener
> Done
Do we actually need this test? TestUdf.java has functions for different 
string-like types (BytesWritable, String, Text) but TestGenericUdf does not.


http://gerrit.cloudera.org:8080/#/c/18295/4/testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test@170
PS4, Line 170:
> This test was essentially copied from java-udf.test, so we have a bug there
I would prefer option 2 because the semantics of the functions in the 
TestGenericUdf class should be consistent. On the other hand I don't think 
there is any strict requirement that these tests should be the same as the 
older TestUdf tests.
Also, option 1 is not bad either, it is not uncommon that our commits contain 
small miscellaneous fixes besides the main issue.


http://gerrit.cloudera.org:8080/#/c/18295/4/testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test@272
PS4, Line 272: AnalysisException: Type TIMESTAMP is not supported for Java UDFs.
> Not sure how to create a test like that.
Something like
'''
create function generic_add(smallint, smallint) returns int
location '$FILESYSTEM_PREFIX/test-warehouse/impala-hive-udfs.jar'
symbol='org.apache.impala.TestGenericUdf';
---- CATCH
AnalysisException: "SOME_ERROR_MESSAGE"
====
'''


http://gerrit.cloudera.org:8080/#/c/18295/5/testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test
File 
testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test:

http://gerrit.cloudera.org:8080/#/c/18295/5/testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test@85
PS5, Line 85: select length(generic_identity("0123456789")),
I checked IMPALA-1134 and I think we can still reference it here, but we 
shouldn't say that each call tests a different type as we don't use different 
string-like types as arguments in TestGenericUdf.


http://gerrit.cloudera.org:8080/#/c/18295/4/testdata/workloads/functional-query/queries/QueryTest/load-generic-java-udfs.test
File 
testdata/workloads/functional-query/queries/QueryTest/load-generic-java-udfs.test:

http://gerrit.cloudera.org:8080/#/c/18295/4/testdata/workloads/functional-query/queries/QueryTest/load-generic-java-udfs.test@35
PS4, Line 35: boolean
> Done
Thanks.


http://gerrit.cloudera.org:8080/#/c/18295/5/testdata/workloads/functional-query/queries/QueryTest/load-generic-java-udfs.test
File 
testdata/workloads/functional-query/queries/QueryTest/load-generic-java-udfs.test:

http://gerrit.cloudera.org:8080/#/c/18295/5/testdata/workloads/functional-query/queries/QueryTest/load-generic-java-udfs.test@27
PS5, Line 27: IDENTITY
Should be lower-case.


http://gerrit.cloudera.org:8080/#/c/18295/5/testdata/workloads/functional-query/queries/QueryTest/load-generic-java-udfs.test@51
PS5, Line 51: hive
If we don't use 'hive' in the names of the 'generic_identity' functions and 
'generic_throws_exception', 'generic_replace_string', 
'generic_import_nearby_classes', we shouldn't use it for the generic add 
functions either. Or if we do, we should use 'hive' in the other function names 
as well.



--
To view, visit http://gerrit.cloudera.org:8080/18295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6fd09120db413fade94410c83ebe8ff104013cd
Gerrit-Change-Number: 18295
Gerrit-PatchSet: 5
Gerrit-Owner: Steve Carlin <scar...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Steve Carlin <scar...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Apr 2022 16:55:42 +0000
Gerrit-HasComments: Yes

Reply via email to