Review Request 71561: HIVE-22250
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71561/ --- Review request for hive, Jesús Camacho Rodríguez, Zoltan Haindrich, and Vineet Garg. Bugs: HIVE-22250 https://issues.apache.org/jira/browse/HIVE-22250 Repository: hive-git Description --- Describe function does not provide description for rank functions = The `DESCRIBE FUNCTION` command gets the description of a function from the `@Description` annotations `value` field. If an UDF is annotated with the `@WindowFunctionDescription` hive prints ``` There is no documentation for function ``` Even if the description is present in the `@WindowFunctionDescription` annotation. This patch implements a fall back to get the description text from `@WindowFunctionDescription` if `@Description` annotation does not exists. Diffs - ql/src/java/org/apache/hadoop/hive/ql/ddl/function/desc/DescFunctionOperation.java 6a94a93ef9 ql/src/test/queries/clientpositive/desc_function.q d055d9ca03 ql/src/test/results/clientpositive/desc_function.q.out 1f804bba60 Diff: https://reviews.apache.org/r/71561/diff/1/ Testing --- Added test cases to `desc_function.q`: ``` DESCRIBE FUNCTION dense_rank; DESCRIBE FUNCTION EXTENDED dense_rank; ``` Thanks, Krisztian Kasa
[jira] [Created] (HIVE-22275) OperationManager.queryIdOperation does not properly clean up multiple queryIds
Jason Dere created HIVE-22275: - Summary: OperationManager.queryIdOperation does not properly clean up multiple queryIds Key: HIVE-22275 URL: https://issues.apache.org/jira/browse/HIVE-22275 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Jason Dere Assignee: Jason Dere In the case that multiple statements are run by a single Session before being cleaned up, it appears that OperationManager.queryIdOperation is not cleaned up properly. See the log statements below - with the exception of the first "Removed queryId:" log line, the queryId listed during cleanup is the same, when each of these handles should have their own queryId. Looks like only the last queryId executed is being cleaned up. As a result, HS2 can run out of memory as OperationManager.queryIdOperation grows and never cleans these queryIds/Operations up. {noformat} 2019-09-13T08:37:36,785 INFO [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=dfed4c18-a284-4640-9f4a-1a20527105f9] 2019-09-13T08:37:38,432 INFO [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Removed queryId: hive_20190913083736_c49cf3cc-cfe8-48a1-bd22-8b924dfb0396 corresponding to operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=dfed4c18-a284-4640-9f4a-1a20527105f9] with tag: null 2019-09-13T08:37:38,469 INFO [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=24d0030c-0e49-45fb-a918-2276f0941cfb] 2019-09-13T08:37:52,662 INFO [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=b983802c-1dec-4fa0-8680-d05ab555321b] 2019-09-13T08:37:56,239 INFO [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=75dbc531-2964-47b2-84d7-85b59f88999c] 2019-09-13T08:38:02,551 INFO [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=72c79076-9d67-4894-a526-c233fa5450b2] 2019-09-13T08:38:10,558 INFO [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=17b30a62-612d-4b70-9ba7-4287d2d9229b] 2019-09-13T08:38:16,930 INFO [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=ea97e99d-cc77-470b-b49a-b869c73a4615] 2019-09-13T08:38:20,440 INFO [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=a277b789-ebb8-4925-878f-6728d3e8c5fb] 2019-09-13T08:38:26,303 INFO [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=9a023ab8-aa80-45db-af88-94790cc83033] 2019-09-13T08:38:30,791 INFO [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=b697c801-7da0-4544-bcfa-442eb1d3bd77] 2019-09-13T08:39:10,187 INFO [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=bda93c8f-0822-4592-a61c-4701720a1a5c] 2019-09-13T08:39:15,471 INFO [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Removed queryId: hive_20190913083910_c4809ca8-d8db-423c-8b6d-fbe3eee89971 corresponding to operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=24d0030c-0e49-45fb-a918-2276f0941cfb] with tag: null 2019-09-13T08:39:15,507 INFO [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Removed queryId: hive_20190913083910_c4809ca8-d8db-423c-8b6d-fbe3eee89971 corresponding to operation: OperationHandle [opType=EXECUTE_STATEMENT,
[jira] [Created] (HIVE-22274) Upgrade Calcite version to 1.21.0
Steve Carlin created HIVE-22274: --- Summary: Upgrade Calcite version to 1.21.0 Key: HIVE-22274 URL: https://issues.apache.org/jira/browse/HIVE-22274 Project: Hive Issue Type: Task Affects Versions: 3.1.2 Reporter: Steve Carlin Assignee: Steve Carlin -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Review Request 71558: HIVE-21987: Hive is unable to read Parquet int32 annotated with decimal
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71558/#review217991 --- Ship it! Ship It! - Peter Vary On szept. 30, 2019, 11:53 de, Marta Kuczora wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71558/ > --- > > (Updated szept. 30, 2019, 11:53 de) > > > Review request for hive and Peter Vary. > > > Bugs: HIVE-21987 > https://issues.apache.org/jira/browse/HIVE-21987 > > > Repository: hive-git > > > Description > --- > > Added support to read INT32 Parquet decimals. > > > Diffs > - > > data/files/parquet_int_decimal_1.parquet PRE-CREATION > data/files/parquet_int_decimal_2.parquet PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java > 350ae2d > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java > 320ce52 > ql/src/test/queries/clientpositive/parquet_int_decimal.q PRE-CREATION > ql/src/test/results/clientpositive/parquet_int_decimal.q.out PRE-CREATION > ql/src/test/results/clientpositive/type_change_test_fraction.q.out 07cf8fa > > > Diff: https://reviews.apache.org/r/71558/diff/1/ > > > Testing > --- > > Added new q tests for the use-case. > > > Thanks, > > Marta Kuczora > >
[jira] [Created] (HIVE-22273) Access check is failed when a temporary directory is removed
Peter Vary created HIVE-22273: - Summary: Access check is failed when a temporary directory is removed Key: HIVE-22273 URL: https://issues.apache.org/jira/browse/HIVE-22273 Project: Hive Issue Type: Bug Reporter: Peter Vary Assignee: Peter Vary The following exception is thrown if a temporary file is deleted during the access checks: {code} 2019-09-24T16:12:59,611 ERROR [7e491237-1505-4388-afb9-5ec2a688b0dc HiveServer2-HttpHandler-Pool: Thread-11941]: authorizer.RangerHiveAuthorizer (:()) - Error getting permissions for hdfs://HDFS_FOLDER/TABLE_NAME java.io.FileNotFoundException: File hdfs://HDFS_FOLDER/TABLE_NAME/.hive-staging_hive_2019-09-24_16-12-48_899_7291847300113791212-245/_tmp.-ext-10001 does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:1059) ~[hadoop-hdfs-client-3.1.1.3.1.0.0-78.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:131) ~[hadoop-hdfs-client-3.1.1.3.1.0.0-78.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1119) ~[hadoop-hdfs-client-3.1.1.3.1.0.0-78.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1116) ~[hadoop-hdfs-client-3.1.1.3.1.0.0-78.jar:?] at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-3.1.1.3.1.0.0-78.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1126) ~[hadoop-hdfs-client-3.1.1.3.1.0.0-78.jar:?] at org.apache.hadoop.hive.common.FileUtils.checkIsOwnerOfFileHierarchy(FileUtils.java:561) ~[hive-common-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at org.apache.hadoop.hive.common.FileUtils.checkIsOwnerOfFileHierarchy(FileUtils.java:564) ~[hive-common-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at org.apache.hadoop.hive.common.FileUtils.checkIsOwnerOfFileHierarchy(FileUtils.java:564) ~[hive-common-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at org.apache.hadoop.hive.common.FileUtils$4.run(FileUtils.java:540) ~[hive-common-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at org.apache.hadoop.hive.common.FileUtils$4.run(FileUtils.java:536) ~[hive-common-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_171] at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_171] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) ~[hadoop-common-3.1.1.3.1.0.0-78.jar:?] at org.apache.hadoop.hive.common.FileUtils.isOwnerOfFileHierarchy(FileUtils.java:536) ~[hive-common-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at org.apache.hadoop.hive.common.FileUtils.isOwnerOfFileHierarchy(FileUtils.java:527) ~[hive-common-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.isURIAccessAllowed(RangerHiveAuthorizer.java:1420) ~[?:?] at org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.checkPrivileges(RangerHiveAuthorizer.java:287) ~[?:?] at org.apache.hadoop.hive.ql.Driver.doAuthorizationV2(Driver.java:1336) ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:1100) ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:709) ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1869) ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1816) ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1811) ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126) ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:197) ~[hive-service-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:262) ~[hive-service-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at org.apache.hive.service.cli.operation.Operation.run(Operation.java:247) ~[hive-service-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:575) ~[hive-service-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:561) ~[hive-service-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at sun.reflect.GeneratedMethodAccessor148.invoke(Unknown Source) ~[?:?]
Review Request 71558: HIVE-21987: Hive is unable to read Parquet int32 annotated with decimal
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71558/ --- Review request for hive and Peter Vary. Bugs: HIVE-21987 https://issues.apache.org/jira/browse/HIVE-21987 Repository: hive-git Description --- Added support to read INT32 Parquet decimals. Diffs - data/files/parquet_int_decimal_1.parquet PRE-CREATION data/files/parquet_int_decimal_2.parquet PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 350ae2d ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java 320ce52 ql/src/test/queries/clientpositive/parquet_int_decimal.q PRE-CREATION ql/src/test/results/clientpositive/parquet_int_decimal.q.out PRE-CREATION ql/src/test/results/clientpositive/type_change_test_fraction.q.out 07cf8fa Diff: https://reviews.apache.org/r/71558/diff/1/ Testing --- Added new q tests for the use-case. Thanks, Marta Kuczora
[jira] [Created] (HIVE-22272) Hive embedded HS2 throws metastore exceptions from MetastoreStatsConnector thread
mahesh kumar behera created HIVE-22272: -- Summary: Hive embedded HS2 throws metastore exceptions from MetastoreStatsConnector thread Key: HIVE-22272 URL: https://issues.apache.org/jira/browse/HIVE-22272 Project: Hive Issue Type: Bug Reporter: mahesh kumar behera Assignee: mahesh kumar behera Hive config is not passed to MetastoreStatsConnector. This causes RuntimeStatsLoader connects to embedded HMS (even tough HMS is configured to be remote) and causes metastore exceptions as metastore db will not be created. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22271) Create index on the TBL_COL_PRIVS table for the columns COLUMN_NAME, PRINCIPAL_NAME, PRINCIPAL_TYPE and TBL_ID
Marta Kuczora created HIVE-22271: Summary: Create index on the TBL_COL_PRIVS table for the columns COLUMN_NAME, PRINCIPAL_NAME, PRINCIPAL_TYPE and TBL_ID Key: HIVE-22271 URL: https://issues.apache.org/jira/browse/HIVE-22271 Project: Hive Issue Type: Bug Components: Metastore Reporter: Marta Kuczora In one of the escalations for HDP-3.1.0 we found that the table privilege checks could be very slow and these checks could be speed up by defining an INDEX on the TBL_COL_PRIVS table for the following columns: COLUMN_NAME,PRINCIPAL_NAME,PRINCIPAL_TYPE,TBL_ID In the MYSQL slow query log, we found that the following query is executed slowly: {noformat} SELECT DISTINCT 'org.apache.hadoop.hive.metastore.model.MTableColumnPrivilege' AS `NUCLEUS_TYPE`,`A0`.`AUTHORIZER`,`A0`.`COLUMN_NAME`,`A0`.`CREATE_TIME`,`A0`.`GRANT_OPTION`,`A0`.`GRANTOR`,`A0`.`GRANTOR_TYPE`,`A0`.`PRINCIPAL_NAME`,`A0`.`PRINCIPAL_TYPE`,`A0`.`TBL_COL_PRIV`,`A0`.`TBL_COLUMN_GRANT_ID` FROM `TBL_COL_PRIVS` `A0` LEFT OUTER JOIN `TBLS` `B0` ON `A0`.`TBL_ID` = `B0`.`TBL_ID` LEFT OUTER JOIN `DBS` `C0` ON `B0`.`DB_ID` = `C0`.`DB_ID` WHERE `A0`.`PRINCIPAL_NAME` = 'xxx' AND `A0`.`PRINCIPAL_TYPE` = 'GROUP' AND `B0`.`TBL_NAME` = '' AND `C0`.`NAME` = 'xxx' AND `C0`.`CTLG_NAME` = 'xxx' AND `A0`.`COLUMN_NAME` = 'xxx' {noformat} When checked the explain plan of the this query, it could be seen that the index defined on the TBL_COL_PRIVS table is not used. In the slow query, the COLUMN_NAME, PRINCIPAL_NAME, PRINCIPAL_TYPE and TBL_ID columns were used, and after creating an index on these columns only, we saw significant performance improvement. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22270) Upgrade commons-io to 2.6
David Lavati created HIVE-22270: --- Summary: Upgrade commons-io to 2.6 Key: HIVE-22270 URL: https://issues.apache.org/jira/browse/HIVE-22270 Project: Hive Issue Type: Improvement Reporter: David Lavati Assignee: David Lavati Hive's currently using commons-io 2.4 and according to HIVE-21273, a number of issues are present in it, which can be resolved by upgrading to 2.6: ??IOUtils copyLarge() and skip() methods are performance hogs?? ?? affectsVersions:2.3;2.4?? ?? [https://issues.apache.org/jira/projects/IO/issues/IO-355?filter=allopenissues]?? ?? CharSequenceInputStream#reset() behaves incorrectly in case when buffer size is not dividable by data size?? ?? affectsVersions:2.4?? ?? [https://issues.apache.org/jira/projects/IO/issues/IO-356?filter=allopenissues]?? ?? [Tailer] InterruptedException while the thead is sleeping is silently ignored?? ?? affectsVersions:2.4?? ?? [https://issues.apache.org/jira/projects/IO/issues/IO-357?filter=allopenissues]?? ?? IOUtils.contentEquals* methods returns false if input1 == input2; should return true?? ?? affectsVersions:2.4?? ?? [https://issues.apache.org/jira/projects/IO/issues/IO-362?filter=allopenissues]?? ?? Apache Commons - standard links for documents are failing?? ?? affectsVersions:2.4?? ?? [https://issues.apache.org/jira/projects/IO/issues/IO-369?filter=allopenissues]?? ?? FileUtils.sizeOfDirectoryAsBigInteger can overflow?? ?? affectsVersions:2.4?? ?? [https://issues.apache.org/jira/projects/IO/issues/IO-390?filter=allopenissues]?? ?? Regression in FileUtils.readFileToString from 2.0.1?? ?? affectsVersions:2.1;2.2;2.3;2.4?? ?? [https://issues.apache.org/jira/projects/IO/issues/IO-453?filter=allopenissues]?? ?? Correct exception message in FileUtils.getFile(File; String...)?? ?? affectsVersions:2.4?? ?? [https://issues.apache.org/jira/projects/IO/issues/IO-479?filter=allopenissues]?? ?? org.apache.commons.io.FileUtils#waitFor waits too long?? ?? affectsVersions:2.4?? ?? [https://issues.apache.org/jira/projects/IO/issues/IO-481?filter=allopenissues]?? ?? FilenameUtils should handle embedded null bytes?? ?? affectsVersions:2.4?? ?? [https://issues.apache.org/jira/projects/IO/issues/IO-484?filter=allopenissues]?? ?? Exceptions are suppressed incorrectly when copying files.?? ?? affectsVersions:2.4;2.5?? ?? [https://issues.apache.org/jira/projects/IO/issues/IO-502?filter=allopenissues]?? -- This message was sent by Atlassian Jira (v8.3.4#803005)
Review Request 71555: Incompatible java.util.ArrayList for java 11
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71555/ --- Review request for hive, Laszlo Bodor, Ashutosh Chauhan, and Prasanth_J. Bugs: HIVE-22097 https://issues.apache.org/jira/browse/HIVE-22097 Repository: hive-git Description --- The following exceptions come when running a query on Java 11: java.lang.RuntimeException: java.lang.NoSuchFieldException: parentOffset at org.apache.hadoop.hive.ql.exec.SerializationUtilities$ArrayListSubListSerializer.(SerializationUtilities.java:390) at org.apache.hadoop.hive.ql.exec.SerializationUtilities$1.create(SerializationUtilities.java:235) at org.apache.hive.com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48) at org.apache.hadoop.hive.ql.exec.SerializationUtilities.borrowKryo(SerializationUtilities.java:280) at org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:595) at org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:587) at org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:579) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:357) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:159) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2317) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1969) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1636) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1396) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1390) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:162) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:223) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:242) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:189) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:838) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:777) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:696) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.hadoop.util.RunJar.run(RunJar.java:323) at org.apache.hadoop.util.RunJar.main(RunJar.java:236) Caused by: java.lang.NoSuchFieldException: parentOffset at java.base/java.lang.Class.getDeclaredField(Class.java:2412) at org.apache.hadoop.hive.ql.exec.SerializationUtilities$ArrayListSubListSerializer.(SerializationUtilities.java:384) ... 29 more The internal structure of ArrayList$SubList changed and our serializer fails. This serialzier comes from kryo-serializers package where they already updated the code. This patch does the some. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java e4d33e82168 Diff: https://reviews.apache.org/r/71555/diff/1/ Testing --- Tested on a real cluster with Java 11. Thanks, Attila Magyar
[jira] [Created] (HIVE-22269) Missing stats in the operator with "hive.optimize.sort.dynamic.partition" (SortedDynPartitionOptimizer) misestimates reducer count
Rajesh Balamohan created HIVE-22269: --- Summary: Missing stats in the operator with "hive.optimize.sort.dynamic.partition" (SortedDynPartitionOptimizer) misestimates reducer count Key: HIVE-22269 URL: https://issues.apache.org/jira/browse/HIVE-22269 Project: Hive Issue Type: Bug Components: Statistics Reporter: Rajesh Balamohan {{hive.optimize.sort.dynamic.partition=true}} introduces new stage to reduce number of writes in dynamic partitioning usecase. Earlier {{SortedDynPartitionOptimizer}} added this new operator via {{Optimizer.java}} and the stats for the newly added operator was populated via {{StatsRulesProcFactory$ReduceSinkStatsRule}}. However, with "HIVE-20703" this got changed. This is moved to {{TezCompiler}} for cost based decision. Though the operator gets added correctly, the stats for this does not get added (as it runs after runStatsAnnotation()). This causes reducer count to be mis-estimated in the query. {noformat} e.g For the following query, reducer_2 would be estimated as "2" instead of "1009". This causes huge delay in the runtime. explain from tpcds_xtext_1000.store_sales ss insert overwrite table store_sales partition (ss_sold_date_sk) select ss.ss_sold_time_sk, ss.ss_item_sk, ss.ss_customer_sk, ss.ss_cdemo_sk, ss.ss_hdemo_sk, ss.ss_addr_sk, ss.ss_store_sk, ss.ss_promo_sk, ss.ss_ticket_number, ss.ss_quantity, ss.ss_wholesale_cost, ss.ss_list_price, ss.ss_sales_price, ss.ss_ext_discount_amt, ss.ss_ext_sales_price, ss.ss_ext_wholesale_cost, ss.ss_ext_list_price, ss.ss_ext_tax, ss.ss_coupon_amt, ss.ss_net_paid, ss.ss_net_paid_inc_tax, ss.ss_net_profit, ss.ss_sold_date_sk where ss.ss_sold_date_sk is not null insert overwrite table store_sales partition (ss_sold_date_sk) select ss.ss_sold_time_sk, ss.ss_item_sk, ss.ss_customer_sk, ss.ss_cdemo_sk, ss.ss_hdemo_sk, ss.ss_addr_sk, ss.ss_store_sk, ss.ss_promo_sk, ss.ss_ticket_number, ss.ss_quantity, ss.ss_wholesale_cost, ss.ss_list_price, ss.ss_sales_price, ss.ss_ext_discount_amt, ss.ss_ext_sales_price, ss.ss_ext_wholesale_cost, ss.ss_ext_list_price, ss.ss_ext_tax, ss.ss_coupon_amt, ss.ss_net_paid, ss.ss_net_paid_inc_tax, ss.ss_net_profit, ss.ss_sold_date_sk where ss.ss_sold_date_sk is null distribute by ss.ss_item_sk ; {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)