[jira] [Work logged] (HIVE-25330) Make FS calls in CopyUtils retryable
[ https://issues.apache.org/jira/browse/HIVE-25330?focusedWorklogId=642685=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642685 ] ASF GitHub Bot logged work on HIVE-25330: - Author: ASF GitHub Bot Created on: 27/Aug/21 03:41 Start Date: 27/Aug/21 03:41 Worklog Time Spent: 10m Work Description: ayushtkn commented on a change in pull request #2516: URL: https://github.com/apache/hive/pull/2516#discussion_r697127775 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java ## @@ -190,11 +247,14 @@ private void doCopyRetry(FileSystem sourceFs, List s // If copy fails, fall through the retry logic LOG.info("file operation failed", e); -if (repeat >= (MAX_IO_RETRY - 1)) { - //no need to wait in the last iteration +if (repeat >= (MAX_IO_RETRY - 1) || failOnExceptions.stream().anyMatch(k -> e.getClass().equals(k)) +|| ErrorMsg.REPL_FILE_SYSTEM_OPERATION_RETRY.getMsg().equals(e.getMessage())) { + //Don't retry in the following cases: Review comment: pull the comment above the if statement, and check in case rather than matching entries with ``failOnExceptions`` you can try something isAssignableFrom kind of stuff, so that the child classes of the mentioned exceptions can also be considered. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 642685) Time Spent: 40m (was: 0.5h) > Make FS calls in CopyUtils retryable > > > Key: HIVE-25330 > URL: https://issues.apache.org/jira/browse/HIVE-25330 > Project: Hive > Issue Type: Improvement >Reporter: Pravin Sinha >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25330) Make FS calls in CopyUtils retryable
[ https://issues.apache.org/jira/browse/HIVE-25330?focusedWorklogId=642682=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642682 ] ASF GitHub Bot logged work on HIVE-25330: - Author: ASF GitHub Bot Created on: 27/Aug/21 03:38 Start Date: 27/Aug/21 03:38 Worklog Time Spent: 10m Work Description: ayushtkn commented on a change in pull request #2516: URL: https://github.com/apache/hive/pull/2516#discussion_r697128887 ## File path: ql/src/test/org/apache/hadoop/hive/ql/parse/repl/TestCopyUtils.java ## @@ -110,6 +112,40 @@ public void shouldThrowExceptionOnDistcpFailure() throws Exception { copyUtils.doCopy(destination, srcPaths); } + @Test + public void testRetryableFSCalls() throws Exception { +mockStatic(UserGroupInformation.class); +mockStatic(ReplChangeManager.class); + when(UserGroupInformation.getCurrentUser()).thenReturn(mock(UserGroupInformation.class)); +HiveConf conf = mock(HiveConf.class); +conf.set(HiveConf.ConfVars.REPL_RETRY_INTIAL_DELAY.varname, "1s"); +FileSystem fs = mock(FileSystem.class); +Path source = mock(Path.class); +Path destination = mock(Path.class); +ContentSummary cs = mock(ContentSummary.class); + +when(ReplChangeManager.checksumFor(source, fs)).thenThrow(new IOException("Failed")).thenReturn("dummy"); +when(fs.exists(same(source))).thenThrow(new IOException("Failed")).thenReturn(true); +when(fs.delete(same(source), anyBoolean())).thenThrow(new IOException("Failed")).thenReturn(true); +when(fs.mkdirs(same(source))).thenThrow(new IOException("Failed")).thenReturn(true); +when(fs.rename(same(source), same(destination))).thenThrow(new IOException("Failed")).thenReturn(true); +when(fs.getContentSummary(same(source))).thenThrow(new IOException("Failed")).thenReturn(cs); + +CopyUtils copyUtils = new CopyUtils(UserGroupInformation.getCurrentUser().getUserName(), conf, fs); +CopyUtils copyUtilsSpy = Mockito.spy(copyUtils); +assertEquals (copyUtilsSpy.exists(fs, source), true); Review comment: change to `assertTrue` similarly for the others as well ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java ## @@ -66,6 +67,16 @@ private FileSystem destinationFs; private final int maxParallelCopyTask; + private List> failOnExceptions = Arrays.asList(org.apache.hadoop.fs.PathIOException.class, + org.apache.hadoop.fs.UnsupportedFileSystemException.class, + org.apache.hadoop.fs.InvalidPathException.class, + org.apache.hadoop.fs.InvalidRequestException.class, + org.apache.hadoop.fs.FileAlreadyExistsException.class, + org.apache.hadoop.fs.ChecksumException.class, + org.apache.hadoop.fs.ParentNotDirectoryException.class, + org.apache.hadoop.hdfs.protocol.NSQuotaExceededException.class, Review comment: We can include other quota exceptions also, say the children and grandchildren of `ClusterStorageCapacityExceededException` or directly `ClusterStorageCapacityExceededException` if retryable function can take the parent class and block its children as well. ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java ## @@ -190,11 +247,14 @@ private void doCopyRetry(FileSystem sourceFs, List s // If copy fails, fall through the retry logic LOG.info("file operation failed", e); -if (repeat >= (MAX_IO_RETRY - 1)) { - //no need to wait in the last iteration +if (repeat >= (MAX_IO_RETRY - 1) || failOnExceptions.stream().anyMatch(k -> e.getClass().equals(k)) +|| ErrorMsg.REPL_FILE_SYSTEM_OPERATION_RETRY.getMsg().equals(e.getMessage())) { + //Don't retry in the following cases: Review comment: pull the comment above the if statement ## File path: ql/src/test/org/apache/hadoop/hive/ql/parse/repl/TestCopyUtils.java ## @@ -110,6 +112,40 @@ public void shouldThrowExceptionOnDistcpFailure() throws Exception { copyUtils.doCopy(destination, srcPaths); } + @Test + public void testRetryableFSCalls() throws Exception { +mockStatic(UserGroupInformation.class); +mockStatic(ReplChangeManager.class); + when(UserGroupInformation.getCurrentUser()).thenReturn(mock(UserGroupInformation.class)); +HiveConf conf = mock(HiveConf.class); +conf.set(HiveConf.ConfVars.REPL_RETRY_INTIAL_DELAY.varname, "1s"); +FileSystem fs = mock(FileSystem.class); +Path source = mock(Path.class); +Path destination = mock(Path.class); +ContentSummary cs = mock(ContentSummary.class); + +when(ReplChangeManager.checksumFor(source, fs)).thenThrow(new IOException("Failed")).thenReturn("dummy"); +when(fs.exists(same(source))).thenThrow(new IOException("Failed")).thenReturn(true); +
[jira] [Updated] (HIVE-25303) CTAS hive.create.as.external.legacy tries to place data files in managed WH path
[ https://issues.apache.org/jira/browse/HIVE-25303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sai Hemanth Gantasala updated HIVE-25303: - Description: Under legacy table creation mode (hive.create.as.external.legacy=true), when a database has been created in a specific LOCATION, in a session where that database is Used, tables are created using the following command: {code:java} CREATE TABLE AS SELECT {code} should inherit the HDFS path from the database's location. Instead, Hive is trying to write the table data into /warehouse/tablespace/managed/hive// +Design+: In the CTAS query, first data is written in the target directory (which happens in HS2) and then the table is created(This happens in HMS). So here two decisions are being made i) target directory location ii) how the table should be created (table type, sd e.t.c). When HS2 needs a target location that needs to be set, it'll make create table dry run call to HMS (where table translation happens) and i) and ii) decisions are made within HMS and returns table object. Then HS2 will use this location set by HMS for placing the data. was: Under legacy table creation mode (hive.create.as.external.legacy=true), when a database has been created in a specific LOCATION, in a session where that database is USEd, tables created using CREATE TABLE AS SELECT should inherit the HDFS path from the database's location. Instead, Hive is trying to write the table data into /warehouse/tablespace/managed/hive// > CTAS hive.create.as.external.legacy tries to place data files in managed WH > path > > > Key: HIVE-25303 > URL: https://issues.apache.org/jira/browse/HIVE-25303 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Standalone Metastore >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Under legacy table creation mode (hive.create.as.external.legacy=true), when > a database has been created in a specific LOCATION, in a session where that > database is Used, tables are created using the following command: > {code:java} > CREATE TABLE AS SELECT {code} > should inherit the HDFS path from the database's location. Instead, Hive is > trying to write the table data into > /warehouse/tablespace/managed/hive// > +Design+: > In the CTAS query, first data is written in the target directory (which > happens in HS2) and then the table is created(This happens in HMS). So here > two decisions are being made i) target directory location ii) how the table > should be created (table type, sd e.t.c). > When HS2 needs a target location that needs to be set, it'll make create > table dry run call to HMS (where table translation happens) and i) and ii) > decisions are made within HMS and returns table object. Then HS2 will use > this location set by HMS for placing the data. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25317) Relocate dependencies in shaded hive-exec module
[ https://issues.apache.org/jira/browse/HIVE-25317?focusedWorklogId=642455=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642455 ] ASF GitHub Bot logged work on HIVE-25317: - Author: ASF GitHub Bot Created on: 26/Aug/21 16:59 Start Date: 26/Aug/21 16:59 Worklog Time Spent: 10m Work Description: sunchao commented on a change in pull request #2459: URL: https://github.com/apache/hive/pull/2459#discussion_r696816642 ## File path: llap-server/pom.xml ## @@ -38,6 +38,7 @@ org.apache.hive hive-exec ${project.version} + core Review comment: Yea I think so since this PR is trying to shade things from the `hive-exec-core` I believe? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 642455) Time Spent: 2h 40m (was: 2.5h) > Relocate dependencies in shaded hive-exec module > > > Key: HIVE-25317 > URL: https://issues.apache.org/jira/browse/HIVE-25317 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.3.8 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > When we want to use shaded version of hive-exec (i.e., w/o classifier), more > dependencies conflict with Spark. We need to relocate these dependencies too. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25372) [Hive] Advance write ID for remaining DDLs
[ https://issues.apache.org/jira/browse/HIVE-25372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405326#comment-17405326 ] Kishen Das commented on HIVE-25372: --- Rest of DDLs will be addressed as part of https://issues.apache.org/jira/browse/HIVE-25407 > [Hive] Advance write ID for remaining DDLs > -- > > Key: HIVE-25372 > URL: https://issues.apache.org/jira/browse/HIVE-25372 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > We guarantee data consistency for table metadata, when serving data from the > HMS cache. HMS cache relies on Valid Write IDs to decide whether to serve > from cache or refresh from the backing DB and serve, so we have to ensure we > advance write IDs during all alter table flows. We have to ensure we advance > the write ID for below DDLs. > AlterTableSetOwnerAnalyzer.java > -AlterTableSkewedByAnalyzer.java- > AlterTableSetSerdeAnalyzer.java > AlterTableSetSerdePropsAnalyzer.java > -AlterTableUnsetSerdePropsAnalyzer.java- > AlterTableSetPartitionSpecAnalyzer > AlterTableClusterSortAnalyzer.java > AlterTableIntoBucketsAnalyzer.java > AlterTableConcatenateAnalyzer.java > -AlterTableCompactAnalyzer.java- > AlterTableSetFileFormatAnalyzer.java > -AlterTableSetSkewedLocationAnalyzer.java- -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25407) Advance Write ID during ALTER TABLE ( NOT SKEWED, SKEWED BY, SET SKEWED LOCATION, UNSET SERDEPROPERTIES)
[ https://issues.apache.org/jira/browse/HIVE-25407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kishen Das updated HIVE-25407: -- Summary: Advance Write ID during ALTER TABLE ( NOT SKEWED, SKEWED BY, SET SKEWED LOCATION, UNSET SERDEPROPERTIES) (was: Advance Write ID during ALTER TABLE ( NOT SKEWED, UNSET SERDEPROPERTIES) > Advance Write ID during ALTER TABLE ( NOT SKEWED, SKEWED BY, SET SKEWED > LOCATION, UNSET SERDEPROPERTIES) > > > Key: HIVE-25407 > URL: https://issues.apache.org/jira/browse/HIVE-25407 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > > Below DDLs should be investigated separately on why the advancing the write > ID is not working for transactional tables, even after adding the logic to > advance the write ID. > * -ALTER TABLE SET PARTITION SPEC- > * ALTER TABLE UNSET SERDEPROPERTIES > * ALTER TABLE NOT SKEWED > * -ALTER TABLE COMPACT- > * ALTER TABLE SKEWED BY > * ALTER TABLE SET SKEWED LOCATION -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25407) Advance Write ID during ALTER TABLE ( NOT SKEWED, UNSET SERDEPROPERTIES
[ https://issues.apache.org/jira/browse/HIVE-25407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kishen Das updated HIVE-25407: -- Summary: Advance Write ID during ALTER TABLE ( NOT SKEWED, UNSET SERDEPROPERTIES (was: Advance Write ID during ALTER TABLE ( UNSET SERDEPROPERTIES) > Advance Write ID during ALTER TABLE ( NOT SKEWED, UNSET SERDEPROPERTIES > --- > > Key: HIVE-25407 > URL: https://issues.apache.org/jira/browse/HIVE-25407 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > > Below DDLs should be investigated separately on why the advancing the write > ID is not working for transactional tables, even after adding the logic to > advance the write ID. > * -ALTER TABLE SET PARTITION SPEC- > * ALTER TABLE UNSET SERDEPROPERTIES > * ALTER TABLE NOT SKEWED > * -ALTER TABLE COMPACT- > * ALTER TABLE SKEWED BY > * ALTER TABLE SET SKEWED LOCATION -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25407) Advance Write ID during ALTER TABLE ( UNSET SERDEPROPERTIES
[ https://issues.apache.org/jira/browse/HIVE-25407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kishen Das updated HIVE-25407: -- Summary: Advance Write ID during ALTER TABLE ( UNSET SERDEPROPERTIES (was: [Hive] Investigate why advancing the Write ID not working for some DDLs and fix it, if appropriate) > Advance Write ID during ALTER TABLE ( UNSET SERDEPROPERTIES > --- > > Key: HIVE-25407 > URL: https://issues.apache.org/jira/browse/HIVE-25407 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > > Below DDLs should be investigated separately on why the advancing the write > ID is not working for transactional tables, even after adding the logic to > advance the write ID. > * -ALTER TABLE SET PARTITION SPEC- > * ALTER TABLE UNSET SERDEPROPERTIES > * ALTER TABLE NOT SKEWED > * -ALTER TABLE COMPACT- > * ALTER TABLE SKEWED BY > * ALTER TABLE SET SKEWED LOCATION -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-25461) Add a test case to ensure Truncate table advances the write ID
[ https://issues.apache.org/jira/browse/HIVE-25461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-25461 started by Kishen Das. - > Add a test case to ensure Truncate table advances the write ID > -- > > Key: HIVE-25461 > URL: https://issues.apache.org/jira/browse/HIVE-25461 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25461) Add a test case to ensure Truncate table advances the write ID
[ https://issues.apache.org/jira/browse/HIVE-25461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kishen Das reassigned HIVE-25461: - Assignee: Kishen Das > Add a test case to ensure Truncate table advances the write ID > -- > > Key: HIVE-25461 > URL: https://issues.apache.org/jira/browse/HIVE-25461 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25461) Add a test case to ensure Truncate table advances the write ID
[ https://issues.apache.org/jira/browse/HIVE-25461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kishen Das updated HIVE-25461: -- Summary: Add a test case to ensure Truncate table advances the write ID (was: Truncate table should advance the write ID) > Add a test case to ensure Truncate table advances the write ID > -- > > Key: HIVE-25461 > URL: https://issues.apache.org/jira/browse/HIVE-25461 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25372) [Hive] Advance write ID for remaining DDLs
[ https://issues.apache.org/jira/browse/HIVE-25372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kishen Das updated HIVE-25372: -- Description: We guarantee data consistency for table metadata, when serving data from the HMS cache. HMS cache relies on Valid Write IDs to decide whether to serve from cache or refresh from the backing DB and serve, so we have to ensure we advance write IDs during all alter table flows. We have to ensure we advance the write ID for below DDLs. AlterTableSetOwnerAnalyzer.java -AlterTableSkewedByAnalyzer.java- AlterTableSetSerdeAnalyzer.java AlterTableSetSerdePropsAnalyzer.java -AlterTableUnsetSerdePropsAnalyzer.java- AlterTableSetPartitionSpecAnalyzer AlterTableClusterSortAnalyzer.java AlterTableIntoBucketsAnalyzer.java AlterTableConcatenateAnalyzer.java -AlterTableCompactAnalyzer.java- AlterTableSetFileFormatAnalyzer.java -AlterTableSetSkewedLocationAnalyzer.java- was: We guarantee data consistency for table metadata, when serving data from the HMS cache. HMS cache relies on Valid Write IDs to decide whether to serve from cache or refresh from the backing DB and serve, so we have to ensure we advance write IDs during all alter table flows. We have to ensure we advance the write ID for below DDLs. AlterTableSetOwnerAnalyzer.java AlterTableSkewedByAnalyzer.java AlterTableSetSerdeAnalyzer.java AlterTableSetSerdePropsAnalyzer.java AlterTableUnsetSerdePropsAnalyzer.java AlterTableSetPartitionSpecAnalyzer AlterTableClusterSortAnalyzer.java AlterTableIntoBucketsAnalyzer.java AlterTableConcatenateAnalyzer.java AlterTableCompactAnalyzer.java AlterTableSetFileFormatAnalyzer.java AlterTableSetSkewedLocationAnalyzer.java > [Hive] Advance write ID for remaining DDLs > -- > > Key: HIVE-25372 > URL: https://issues.apache.org/jira/browse/HIVE-25372 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > We guarantee data consistency for table metadata, when serving data from the > HMS cache. HMS cache relies on Valid Write IDs to decide whether to serve > from cache or refresh from the backing DB and serve, so we have to ensure we > advance write IDs during all alter table flows. We have to ensure we advance > the write ID for below DDLs. > AlterTableSetOwnerAnalyzer.java > -AlterTableSkewedByAnalyzer.java- > AlterTableSetSerdeAnalyzer.java > AlterTableSetSerdePropsAnalyzer.java > -AlterTableUnsetSerdePropsAnalyzer.java- > AlterTableSetPartitionSpecAnalyzer > AlterTableClusterSortAnalyzer.java > AlterTableIntoBucketsAnalyzer.java > AlterTableConcatenateAnalyzer.java > -AlterTableCompactAnalyzer.java- > AlterTableSetFileFormatAnalyzer.java > -AlterTableSetSkewedLocationAnalyzer.java- -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25475) TestStatsReplicationScenarios.testForParallelBootstrapLoad is unstable
[ https://issues.apache.org/jira/browse/HIVE-25475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405300#comment-17405300 ] Haymant Mangla commented on HIVE-25475: --- Committed to master. Thanks for the review, [~maheshk114] > TestStatsReplicationScenarios.testForParallelBootstrapLoad is unstable > -- > > Key: HIVE-25475 > URL: https://issues.apache.org/jira/browse/HIVE-25475 > Project: Hive > Issue Type: Bug >Reporter: Krisztian Kasa >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > http://ci.hive.apache.org/job/hive-flaky-check/389/ > {code} > 16:19:18 [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time > elapsed: 141.73 s <<< FAILURE! - in > org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios > 16:19:18 [ERROR] > org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.testForParallelBootstrapLoad > Time elapsed: 122.979 s <<< ERROR! > 16:19:18 org.apache.hadoop.hive.ql.metadata.HiveException > 16:19:18 at > org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:5032) > 16:19:18 at > org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:3348) > 16:19:18 at > org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:429) > 16:19:18 at > org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) > 16:19:18 at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > 16:19:18 at > org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:361) > 16:19:18 at > org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:334) > 16:19:18 at > org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:245) > 16:19:18 at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:108) > 16:19:18 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:348) > 16:19:18 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:204) > 16:19:18 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:153) > 16:19:18 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:148) > 16:19:18 at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:164) > 16:19:18 at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:230) > 16:19:18 at > org.apache.hadoop.hive.ql.parse.WarehouseInstance.run(WarehouseInstance.java:235) > 16:19:18 at > org.apache.hadoop.hive.ql.parse.WarehouseInstance.load(WarehouseInstance.java:309) > 16:19:18 at > org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.dumpLoadVerify(TestStatsReplicationScenarios.java:359) > 16:19:18 at > org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.testStatsReplicationCommon(TestStatsReplicationScenarios.java:663) > 16:19:18 at > org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.testForParallelBootstrapLoad(TestStatsReplicationScenarios.java:688) > 16:19:18 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 16:19:18 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 16:19:18 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 16:19:18 at java.lang.reflect.Method.invoke(Method.java:498) > 16:19:18 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > 16:19:18 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > 16:19:18 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > 16:19:18 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > 16:19:18 at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > 16:19:18 at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > 16:19:18 at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) > 16:19:18 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > 16:19:18 at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > 16:19:18 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > 16:19:18 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > 16:19:18 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > 16:19:18 at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > 16:19:18 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > 16:19:18 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
[jira] [Resolved] (HIVE-25475) TestStatsReplicationScenarios.testForParallelBootstrapLoad is unstable
[ https://issues.apache.org/jira/browse/HIVE-25475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haymant Mangla resolved HIVE-25475. --- Resolution: Fixed > TestStatsReplicationScenarios.testForParallelBootstrapLoad is unstable > -- > > Key: HIVE-25475 > URL: https://issues.apache.org/jira/browse/HIVE-25475 > Project: Hive > Issue Type: Bug >Reporter: Krisztian Kasa >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > http://ci.hive.apache.org/job/hive-flaky-check/389/ > {code} > 16:19:18 [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time > elapsed: 141.73 s <<< FAILURE! - in > org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios > 16:19:18 [ERROR] > org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.testForParallelBootstrapLoad > Time elapsed: 122.979 s <<< ERROR! > 16:19:18 org.apache.hadoop.hive.ql.metadata.HiveException > 16:19:18 at > org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:5032) > 16:19:18 at > org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:3348) > 16:19:18 at > org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:429) > 16:19:18 at > org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) > 16:19:18 at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > 16:19:18 at > org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:361) > 16:19:18 at > org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:334) > 16:19:18 at > org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:245) > 16:19:18 at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:108) > 16:19:18 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:348) > 16:19:18 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:204) > 16:19:18 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:153) > 16:19:18 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:148) > 16:19:18 at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:164) > 16:19:18 at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:230) > 16:19:18 at > org.apache.hadoop.hive.ql.parse.WarehouseInstance.run(WarehouseInstance.java:235) > 16:19:18 at > org.apache.hadoop.hive.ql.parse.WarehouseInstance.load(WarehouseInstance.java:309) > 16:19:18 at > org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.dumpLoadVerify(TestStatsReplicationScenarios.java:359) > 16:19:18 at > org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.testStatsReplicationCommon(TestStatsReplicationScenarios.java:663) > 16:19:18 at > org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.testForParallelBootstrapLoad(TestStatsReplicationScenarios.java:688) > 16:19:18 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 16:19:18 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 16:19:18 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 16:19:18 at java.lang.reflect.Method.invoke(Method.java:498) > 16:19:18 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > 16:19:18 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > 16:19:18 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > 16:19:18 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > 16:19:18 at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > 16:19:18 at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > 16:19:18 at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) > 16:19:18 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > 16:19:18 at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > 16:19:18 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > 16:19:18 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > 16:19:18 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > 16:19:18 at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > 16:19:18 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > 16:19:18 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > 16:19:18 at >
[jira] [Work logged] (HIVE-25475) TestStatsReplicationScenarios.testForParallelBootstrapLoad is unstable
[ https://issues.apache.org/jira/browse/HIVE-25475?focusedWorklogId=642392=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642392 ] ASF GitHub Bot logged work on HIVE-25475: - Author: ASF GitHub Bot Created on: 26/Aug/21 15:05 Start Date: 26/Aug/21 15:05 Worklog Time Spent: 10m Work Description: maheshk114 merged pull request #2605: URL: https://github.com/apache/hive/pull/2605 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 642392) Time Spent: 20m (was: 10m) > TestStatsReplicationScenarios.testForParallelBootstrapLoad is unstable > -- > > Key: HIVE-25475 > URL: https://issues.apache.org/jira/browse/HIVE-25475 > Project: Hive > Issue Type: Bug >Reporter: Krisztian Kasa >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > http://ci.hive.apache.org/job/hive-flaky-check/389/ > {code} > 16:19:18 [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time > elapsed: 141.73 s <<< FAILURE! - in > org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios > 16:19:18 [ERROR] > org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.testForParallelBootstrapLoad > Time elapsed: 122.979 s <<< ERROR! > 16:19:18 org.apache.hadoop.hive.ql.metadata.HiveException > 16:19:18 at > org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:5032) > 16:19:18 at > org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:3348) > 16:19:18 at > org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:429) > 16:19:18 at > org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) > 16:19:18 at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > 16:19:18 at > org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:361) > 16:19:18 at > org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:334) > 16:19:18 at > org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:245) > 16:19:18 at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:108) > 16:19:18 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:348) > 16:19:18 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:204) > 16:19:18 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:153) > 16:19:18 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:148) > 16:19:18 at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:164) > 16:19:18 at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:230) > 16:19:18 at > org.apache.hadoop.hive.ql.parse.WarehouseInstance.run(WarehouseInstance.java:235) > 16:19:18 at > org.apache.hadoop.hive.ql.parse.WarehouseInstance.load(WarehouseInstance.java:309) > 16:19:18 at > org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.dumpLoadVerify(TestStatsReplicationScenarios.java:359) > 16:19:18 at > org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.testStatsReplicationCommon(TestStatsReplicationScenarios.java:663) > 16:19:18 at > org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.testForParallelBootstrapLoad(TestStatsReplicationScenarios.java:688) > 16:19:18 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 16:19:18 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 16:19:18 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 16:19:18 at java.lang.reflect.Method.invoke(Method.java:498) > 16:19:18 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > 16:19:18 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > 16:19:18 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > 16:19:18 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > 16:19:18 at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > 16:19:18 at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > 16:19:18 at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) > 16:19:18 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) >
[jira] [Work logged] (HIVE-25286) Set stats to inaccurate when an Iceberg table is modified outside Hive
[ https://issues.apache.org/jira/browse/HIVE-25286?focusedWorklogId=642386=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642386 ] ASF GitHub Bot logged work on HIVE-25286: - Author: ASF GitHub Bot Created on: 26/Aug/21 14:51 Start Date: 26/Aug/21 14:51 Worklog Time Spent: 10m Work Description: pvary merged pull request #2427: URL: https://github.com/apache/hive/pull/2427 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 642386) Time Spent: 1.5h (was: 1h 20m) > Set stats to inaccurate when an Iceberg table is modified outside Hive > -- > > Key: HIVE-25286 > URL: https://issues.apache.org/jira/browse/HIVE-25286 > Project: Hive > Issue Type: New Feature >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > When an Iceberg table is modified outside of Hive then the stats should be > set to inaccurate since there is no way to ensure that the HMS stats are > updated correctly and this could cause incorrect query results. > The proposed solution is only working for HiveCatalog -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25286) Set stats to inaccurate when an Iceberg table is modified outside Hive
[ https://issues.apache.org/jira/browse/HIVE-25286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary resolved HIVE-25286. --- Fix Version/s: 4.0.0 Resolution: Fixed Pushed to master. Thanks for the review [~Marton Bod] > Set stats to inaccurate when an Iceberg table is modified outside Hive > -- > > Key: HIVE-25286 > URL: https://issues.apache.org/jira/browse/HIVE-25286 > Project: Hive > Issue Type: New Feature >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > When an Iceberg table is modified outside of Hive then the stats should be > set to inaccurate since there is no way to ensure that the HMS stats are > updated correctly and this could cause incorrect query results. > The proposed solution is only working for HiveCatalog -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25429) Delta metrics collection may cause number of tez counters to exceed tez.counters.max limit
[ https://issues.apache.org/jira/browse/HIVE-25429?focusedWorklogId=642378=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642378 ] ASF GitHub Bot logged work on HIVE-25429: - Author: ASF GitHub Bot Created on: 26/Aug/21 14:43 Start Date: 26/Aug/21 14:43 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #2563: URL: https://github.com/apache/hive/pull/2563#discussion_r695474352 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java ## @@ -272,30 +272,34 @@ private void prepare(InputInitializerContext initializerContext) throws IOExcept String groupName = null; String vertexName = null; if (inputInitializerContext != null) { - tezCounters = new TezCounters(); Review comment: @abstractdog would you mind reviewing at least the changes to HiveSplitGenerator? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 642378) Time Spent: 1.5h (was: 1h 20m) > Delta metrics collection may cause number of tez counters to exceed > tez.counters.max limit > -- > > Key: HIVE-25429 > URL: https://issues.apache.org/jira/browse/HIVE-25429 > Project: Hive > Issue Type: Sub-task > Components: Hive >Affects Versions: 4.0.0 >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > There's a limit to the number of tez counters allowed (tez.counters.max). > Delta metrics collection (i.e. DeltaFileMetricsReporter) was creating 3 > counters for each partition touched by a given query, which can result in a > huge number of counters, which is unnecessary because we're only interested > in n number of partitions with the most deltas. This change limits the number > of counters created to hive.txn.acid.metrics.max.cache.size*3. > Also when tez.counters.max is reached a LimitExceededException is thrown but > isn't caught on the Hive side and causes the query to fail. We should catch > this and skip delta metrics collection in this case. > Also make sure that metrics are only collected if > hive.metastore.acidmetrics.ext.on=true -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25485) Transform selects of literals under a UNION ALL to inline table scan
[ https://issues.apache.org/jira/browse/HIVE-25485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reassigned HIVE-25485: --- > Transform selects of literals under a UNION ALL to inline table scan > > > Key: HIVE-25485 > URL: https://issues.apache.org/jira/browse/HIVE-25485 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > {code} > select 1 > union all > select 1 > union all > [...] > union all > select 1 > {code} > results in a very big plan; which will have vertexes proportional to the > number of union all branch - hence it could be slow to execute it -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23016) Extract JdbcConnectionParams from Utils Class
[ https://issues.apache.org/jira/browse/HIVE-23016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405257#comment-17405257 ] Timur Malikin commented on HIVE-23016: -- Hi! I'd like to take this task and make the PR :) > Extract JdbcConnectionParams from Utils Class > - > > Key: HIVE-23016 > URL: https://issues.apache.org/jira/browse/HIVE-23016 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Priority: Minor > Labels: n00b, newbie, noob > > And make it its own class. > https://github.com/apache/hive/blob/4700e210ef7945278c4eb313c9ebd810b0224da1/jdbc/src/java/org/apache/hive/jdbc/Utils.java#L72 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=642304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642304 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 26/Aug/21 12:25 Start Date: 26/Aug/21 12:25 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #2479: URL: https://github.com/apache/hive/pull/2479#discussion_r696577856 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -129,21 +131,16 @@ private boolean fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego private void addElement(ListColumnVector lcv, List elements, PrimitiveObjectInspector.PrimitiveCategory category, int index) throws IOException { lcv.offsets[index] = elements.size(); -// Return directly if last value is null Review comment: this part has been reworked a bit, created a nested loop and put it into a separate method -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 642304) Time Spent: 3h 50m (was: 3h 40m) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 3h 50m > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at >
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=642305=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642305 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 26/Aug/21 12:25 Start Date: 26/Aug/21 12:25 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #2479: URL: https://github.com/apache/hive/pull/2479#discussion_r696577856 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -129,21 +131,16 @@ private boolean fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego private void addElement(ListColumnVector lcv, List elements, PrimitiveObjectInspector.PrimitiveCategory category, int index) throws IOException { lcv.offsets[index] = elements.size(); -// Return directly if last value is null Review comment: this part has been reworked a bit, created a nested loop and put it into a separate method (collectDataFromParquetPage) + added comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 642305) Time Spent: 4h (was: 3h 50m) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 4h > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at >
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=642303=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642303 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 26/Aug/21 12:24 Start Date: 26/Aug/21 12:24 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #2479: URL: https://github.com/apache/hive/pull/2479#discussion_r696577326 ## File path: ql/src/test/queries/clientpositive/parquet_map_null_vectorization.q ## @@ -0,0 +1,20 @@ +set hive.mapred.mode=nonstrict; +set hive.vectorized.execution.enabled=true; +set hive.fetch.task.conversion=none; + +DROP TABLE parquet_map_type; + + +CREATE TABLE parquet_map_type ( +id int, +stringMap map Review comment: finally I decided to not block this because of some testcases, I created follow-up tickets: HIVE-25459, HIVE-25484 this patch is already solves customer's issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 642303) Time Spent: 3h 40m (was: 3.5h) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 3h 40m > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=642301=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642301 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 26/Aug/21 12:22 Start Date: 26/Aug/21 12:22 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #2479: URL: https://github.com/apache/hive/pull/2479#discussion_r696576391 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -479,6 +501,9 @@ private boolean compareBytesColumnVector(BytesColumnVector cv1, BytesColumnVecto int length2 = cv2.vector.length; if (length1 == length2) { for (int i = 0; i < length1; i++) { +if (cv1.vector[i] == null && cv2.vector[i] == null) { + continue; Review comment: okay, I got in the meantime, simply removing null check here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 642301) Time Spent: 3.5h (was: 3h 20m) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 3.5h > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=642300=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642300 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 26/Aug/21 12:19 Start Date: 26/Aug/21 12:19 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #2479: URL: https://github.com/apache/hive/pull/2479#discussion_r696573989 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -479,6 +501,9 @@ private boolean compareBytesColumnVector(BytesColumnVector cv1, BytesColumnVecto int length2 = cv2.vector.length; if (length1 == length2) { for (int i = 0; i < length1; i++) { +if (cv1.vector[i] == null && cv2.vector[i] == null) { + continue; Review comment: I think I got it, it will be much faster for non-null strings: ``` int innerLen1 = cv1.vector[i].length; int innerLen2 = cv2.vector[i].length; if (innerLen1 == innerLen2) { for (int j = 0; j < innerLen1; j++) { if (cv1.vector[i][j] != cv2.vector[i][j]) { return false; } } } else { return false; } if (cv1.isNull[i] && cv2.isNull[i]) { continue; } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 642300) Time Spent: 3h 20m (was: 3h 10m) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 3h 20m > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at
[jira] [Comment Edited] (HIVE-25459) Wrong VectorUDFMapIndexBaseScalar child class is used, leading to ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-25459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405174#comment-17405174 ] László Bodor edited comment on HIVE-25459 at 8/26/21, 11:49 AM: same for decimals too: {code} -- *** DECIMAL *** CREATE TABLE parquet_map_type_decimal ( id int, decimalMap map ) stored as parquet; insert into parquet_map_type_decimal SELECT 1, MAP(1.0, NULL, 2.0, 3.0); insert into parquet_map_type_decimal (id) VALUES (2); select id, decimalMap from parquet_map_type_decimal; select id, decimalMap[1] from parquet_map_type_decimal group by id, decimalMap[1]; -- fails until HIVE-25459 is solved {code} {code} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.LongColumnVector at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFMapIndexLongScalar.findScalarInMap(VectorUDFMapIndexLongScalar.java:67) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFMapIndexBaseScalar.evaluate(VectorUDFMapIndexBaseScalar.java:74) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) {code} was (Author: abstractdog): same for decimals too: {code} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.LongColumnVector at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFMapIndexLongScalar.findScalarInMap(VectorUDFMapIndexLongScalar.java:67) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFMapIndexBaseScalar.evaluate(VectorUDFMapIndexBaseScalar.java:74) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) {code} > Wrong VectorUDFMapIndexBaseScalar child class is used, leading to > ClassCastException > > > Key: HIVE-25459 > URL: https://issues.apache.org/jira/browse/HIVE-25459 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > given this query: > {code} > CREATE TABLE map_of_doubles ( > id int, > doubleMap map > ) stored as orc; > insert overwrite table map_of_doubles SELECT 1, MAP(CAST(1.0 as DOUBLE), > null, CAST(2.0 as DOUBLE), CAST(3.0 as DOUBLE)); > select id, doubleMap from map_of_doubles; > select id, doubleMap[1] from map_of_doubles group by id, doubleMap[1]; -- > this fails > {code} > error is: > {code} > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.ql.exec.vector.DoubleColumnVector cannot be cast to > org.apache.hadoop.hive.ql.exec.vector.LongColumnVector > at > org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFMapIndexLongScalar.findScalarInMap(VectorUDFMapIndexLongScalar.java:67) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFMapIndexBaseScalar.evaluate(VectorUDFMapIndexBaseScalar.java:132) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) > ... 23 more > {code} > I found this error while was trying to write q test cases for all data types > in HIVE-23688, so this issue needs to be addressed first > HIVE-23688 is Parquet-specific, this one is not, it can be reproduced for ORC > and Parquet too -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25459) Wrong VectorUDFMapIndexBaseScalar child class is used, leading to ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-25459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405174#comment-17405174 ] László Bodor commented on HIVE-25459: - same for decimals too: {code} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.LongColumnVector at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFMapIndexLongScalar.findScalarInMap(VectorUDFMapIndexLongScalar.java:67) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFMapIndexBaseScalar.evaluate(VectorUDFMapIndexBaseScalar.java:74) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) {code} > Wrong VectorUDFMapIndexBaseScalar child class is used, leading to > ClassCastException > > > Key: HIVE-25459 > URL: https://issues.apache.org/jira/browse/HIVE-25459 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > given this query: > {code} > CREATE TABLE map_of_doubles ( > id int, > doubleMap map > ) stored as orc; > insert overwrite table map_of_doubles SELECT 1, MAP(CAST(1.0 as DOUBLE), > null, CAST(2.0 as DOUBLE), CAST(3.0 as DOUBLE)); > select id, doubleMap from map_of_doubles; > select id, doubleMap[1] from map_of_doubles group by id, doubleMap[1]; -- > this fails > {code} > error is: > {code} > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.ql.exec.vector.DoubleColumnVector cannot be cast to > org.apache.hadoop.hive.ql.exec.vector.LongColumnVector > at > org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFMapIndexLongScalar.findScalarInMap(VectorUDFMapIndexLongScalar.java:67) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFMapIndexBaseScalar.evaluate(VectorUDFMapIndexBaseScalar.java:132) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) > ... 23 more > {code} > I found this error while was trying to write q test cases for all data types > in HIVE-23688, so this issue needs to be addressed first > HIVE-23688 is Parquet-specific, this one is not, it can be reproduced for ORC > and Parquet too -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25484) NULL cannot be inserted as a Parquet map (and as key in a map)
[ https://issues.apache.org/jira/browse/HIVE-25484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-25484: Description: not sure if this is a bug or a feature request, but I haven't been able to test NULL as map in the scope of HIVE-23688: {code} CREATE TABLE parquet_map_type_string ( id int, stringMap map ) stored as parquet; insert into parquet_map_type_string SELECT 1, MAP('k1', null, 'k2', 'v2'); -- NULL as value, works insert into parquet_map_type_string (id) VALUES (2); -- NULL as map, fails insert into parquet_map_type_string SELECT 3, MAP(null, 'k3', 'k4', 'v4'); -- NULL as key, fails {code} leads to: {code} Caused by: java.lang.NullPointerException at org.apache.parquet.io.api.Binary$FromStringBinary.encodeUTF8(Binary.java:218) at org.apache.parquet.io.api.Binary$FromStringBinary.(Binary.java:209) at org.apache.parquet.io.api.Binary.fromString(Binary.java:537) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$StringDataWriter.write(DataWritableWriter.java:474) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MapDataWriter.write(DataWritableWriter.java:354) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:228) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:251) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:115) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:76) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:35) at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128) at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:182) at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:44) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:161) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:174) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1160) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:173) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:155) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) {code} was: not sure if this is a bug or a feature request, but I haven't been able to test NULL as map in the scope of HIVE-23688: {code} CREATE TABLE parquet_map_type_string ( id int, stringMap map ) stored as parquet; insert into parquet_map_type_string SELECT 1, MAP('k1', null, 'k2', 'v2'); -- NULL as value, works insert into parquet_map_type_string (id) VALUES (2) -- NULL as map, fails insert into parquet_map_type_string SELECT 3, MAP(null, 'k3', 'k4', 'v4'); -- NULL as key, fails {code} leads to: {code} Caused by: java.lang.NullPointerException at org.apache.parquet.io.api.Binary$FromStringBinary.encodeUTF8(Binary.java:218) at org.apache.parquet.io.api.Binary$FromStringBinary.(Binary.java:209) at org.apache.parquet.io.api.Binary.fromString(Binary.java:537) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$StringDataWriter.write(DataWritableWriter.java:474) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MapDataWriter.write(DataWritableWriter.java:354) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:228) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:251) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:115) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:76) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:35) at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128) at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:182) at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:44) at
[jira] [Updated] (HIVE-25484) NULL cannot be inserted as a Parquet map (and as key in a map)
[ https://issues.apache.org/jira/browse/HIVE-25484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-25484: Summary: NULL cannot be inserted as a Parquet map (and as key in a map) (was: NULL cannot be inserted as a Parquet map) > NULL cannot be inserted as a Parquet map (and as key in a map) > -- > > Key: HIVE-25484 > URL: https://issues.apache.org/jira/browse/HIVE-25484 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > not sure if this is a bug or a feature request, but I haven't been able to > test NULL as map in the scope of HIVE-23688: > {code} > CREATE TABLE parquet_map_type_string ( > id int, > stringMap map > ) stored as parquet; > insert into parquet_map_type_string SELECT 1, MAP('k1', null, 'k2', 'v2'); -- > NULL as value, works > insert into parquet_map_type_string (id) VALUES (2) -- NULL as map, fails > insert into parquet_map_type_string SELECT 3, MAP(null, 'k3', 'k4', 'v4'); -- > NULL as key, fails > {code} > leads to: > {code} > Caused by: java.lang.NullPointerException > at > org.apache.parquet.io.api.Binary$FromStringBinary.encodeUTF8(Binary.java:218) > at > org.apache.parquet.io.api.Binary$FromStringBinary.(Binary.java:209) > at org.apache.parquet.io.api.Binary.fromString(Binary.java:537) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$StringDataWriter.write(DataWritableWriter.java:474) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MapDataWriter.write(DataWritableWriter.java:354) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:228) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:251) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:115) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:76) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:35) > at > org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128) > at > org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:182) > at > org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:44) > at > org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:161) > at > org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:174) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1160) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:173) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:155) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25484) NULL cannot be inserted as a Parquet map
[ https://issues.apache.org/jira/browse/HIVE-25484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-25484: Description: not sure if this is a bug or a feature request, but I haven't been able to test NULL as map in the scope of HIVE-23688: {code} CREATE TABLE parquet_map_type_string ( id int, stringMap map ) stored as parquet; insert into parquet_map_type_string SELECT 1, MAP('k1', null, 'k2', 'v2'); -- NULL as value, works insert into parquet_map_type_string (id) VALUES (2) -- NULL as map, fails insert into parquet_map_type_string SELECT 3, MAP(null, 'k3', 'k4', 'v4'); -- NULL as key, fails {code} leads to: {code} Caused by: java.lang.NullPointerException at org.apache.parquet.io.api.Binary$FromStringBinary.encodeUTF8(Binary.java:218) at org.apache.parquet.io.api.Binary$FromStringBinary.(Binary.java:209) at org.apache.parquet.io.api.Binary.fromString(Binary.java:537) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$StringDataWriter.write(DataWritableWriter.java:474) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MapDataWriter.write(DataWritableWriter.java:354) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:228) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:251) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:115) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:76) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:35) at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128) at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:182) at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:44) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:161) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:174) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1160) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:173) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:155) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) {code} was: not sure if this is a bug or a feature request, but I haven't been able to test NULL as map in the scope of HIVE-23688: {code} CREATE TABLE parquet_map_type_string ( id int, stringMap map ) stored as parquet; insert into parquet_map_type_string SELECT 1, MAP('k1', null, 'k2', 'v2'); -- NULL as value insert into parquet_map_type_string (id) VALUES (2) -- NULL as map; {code} leads to: {code} Caused by: java.lang.NullPointerException at org.apache.parquet.io.api.Binary$FromStringBinary.encodeUTF8(Binary.java:218) at org.apache.parquet.io.api.Binary$FromStringBinary.(Binary.java:209) at org.apache.parquet.io.api.Binary.fromString(Binary.java:537) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$StringDataWriter.write(DataWritableWriter.java:474) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MapDataWriter.write(DataWritableWriter.java:354) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:228) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:251) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:115) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:76) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:35) at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128) at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:182) at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:44) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:161) at
[jira] [Updated] (HIVE-25484) NULL cannot be inserted as a Parquet map
[ https://issues.apache.org/jira/browse/HIVE-25484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-25484: Description: not sure if this is a bug or a feature request, but I haven't been able to test NULL as map in the scope of HIVE-23688: {code} CREATE TABLE parquet_map_type_string ( id int, stringMap map ) stored as parquet; insert into parquet_map_type_string SELECT 1, MAP('k1', null, 'k2', 'v2'); -- NULL as value insert into parquet_map_type_string (id) VALUES (2) -- NULL as map; {code} leads to: {code} Caused by: java.lang.NullPointerException at org.apache.parquet.io.api.Binary$FromStringBinary.encodeUTF8(Binary.java:218) at org.apache.parquet.io.api.Binary$FromStringBinary.(Binary.java:209) at org.apache.parquet.io.api.Binary.fromString(Binary.java:537) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$StringDataWriter.write(DataWritableWriter.java:474) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MapDataWriter.write(DataWritableWriter.java:354) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:228) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:251) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:115) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:76) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:35) at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128) at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:182) at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:44) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:161) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:174) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1160) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:173) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:155) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) {code} was: {code} CREATE TABLE parquet_map_type_string ( id int, stringMap map ) stored as parquet; insert into parquet_map_type_string SELECT 1, MAP('k1', null, 'k2', 'v2'); -- NULL as value insert into parquet_map_type_string (id) VALUES (2) -- NULL as map; {code} leads to: {code} Caused by: java.lang.NullPointerException at org.apache.parquet.io.api.Binary$FromStringBinary.encodeUTF8(Binary.java:218) at org.apache.parquet.io.api.Binary$FromStringBinary.(Binary.java:209) at org.apache.parquet.io.api.Binary.fromString(Binary.java:537) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$StringDataWriter.write(DataWritableWriter.java:474) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MapDataWriter.write(DataWritableWriter.java:354) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:228) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:251) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:115) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:76) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:35) at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128) at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:182) at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:44) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:161) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:174) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1160) at
[jira] [Updated] (HIVE-25484) NULL cannot be inserted as a Parquet map
[ https://issues.apache.org/jira/browse/HIVE-25484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-25484: Description: {code} CREATE TABLE parquet_map_type_string ( id int, stringMap map ) stored as parquet; insert into parquet_map_type_string SELECT 1, MAP('k1', null, 'k2', 'v2'); -- NULL as value insert into parquet_map_type_string (id) VALUES (2) -- NULL as map; {code} leads to: {code} Caused by: java.lang.NullPointerException at org.apache.parquet.io.api.Binary$FromStringBinary.encodeUTF8(Binary.java:218) at org.apache.parquet.io.api.Binary$FromStringBinary.(Binary.java:209) at org.apache.parquet.io.api.Binary.fromString(Binary.java:537) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$StringDataWriter.write(DataWritableWriter.java:474) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MapDataWriter.write(DataWritableWriter.java:354) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:228) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:251) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:115) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:76) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:35) at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128) at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:182) at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:44) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:161) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:174) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1160) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:173) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:155) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) {code} > NULL cannot be inserted as a Parquet map > > > Key: HIVE-25484 > URL: https://issues.apache.org/jira/browse/HIVE-25484 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > {code} > CREATE TABLE parquet_map_type_string ( > id int, > stringMap map > ) stored as parquet; > insert into parquet_map_type_string SELECT 1, MAP('k1', null, 'k2', 'v2'); -- > NULL as value > insert into parquet_map_type_string (id) VALUES (2) -- NULL as map; > {code} > leads to: > {code} > Caused by: java.lang.NullPointerException > at > org.apache.parquet.io.api.Binary$FromStringBinary.encodeUTF8(Binary.java:218) > at > org.apache.parquet.io.api.Binary$FromStringBinary.(Binary.java:209) > at org.apache.parquet.io.api.Binary.fromString(Binary.java:537) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$StringDataWriter.write(DataWritableWriter.java:474) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MapDataWriter.write(DataWritableWriter.java:354) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:228) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:251) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:115) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:76) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:35) > at > org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128) > at > org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:182) > at > org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:44) > at >
[jira] [Assigned] (HIVE-25484) NULL cannot be inserted as a Parquet map
[ https://issues.apache.org/jira/browse/HIVE-25484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor reassigned HIVE-25484: --- > NULL cannot be inserted as a Parquet map > > > Key: HIVE-25484 > URL: https://issues.apache.org/jira/browse/HIVE-25484 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25483) TxnHandler::acquireLock should close the DB conn to avoid connection leaks
[ https://issues.apache.org/jira/browse/HIVE-25483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-25483: Description: TxnHandler::acquireLock should close DB connection on exiting the function. [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L5688] [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L5726] [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L5737-L5740] If there are any exceptions downstream, this connection isn't closed cleanly. In a corner case, hikari connection leak detector reported the following {noformat} 2021-08-26 09:19:18,102 WARN com.zaxxer.hikari.pool.ProxyLeakTask: [HikariPool-4 housekeeper]: Connection leak detection triggered for org.postgresql.jdbc.PgConnection@77f76747, stack trace follows java.lang.Exception: Apparent connection leak detected at org.apache.hadoop.hive.metastore.txn.TxnHandler.getDbConn(TxnHandler.java:3843) at org.apache.hadoop.hive.metastore.txn.TxnHandler.acquireLock(TxnHandler.java:5135) at org.apache.hadoop.hive.ql.txn.compactor.Initiator.run(Initiator.java:107) {noformat} was: TxnHandler::acquireLock should close DB connection on exiting the function. [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L5688] [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L5726] [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L5737-L5740] > TxnHandler::acquireLock should close the DB conn to avoid connection leaks > -- > > Key: HIVE-25483 > URL: https://issues.apache.org/jira/browse/HIVE-25483 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Major > > TxnHandler::acquireLock should close DB connection on exiting the function. > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L5688] > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L5726] > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L5737-L5740] > If there are any exceptions downstream, this connection isn't closed > cleanly. In a corner case, hikari connection leak detector reported the > following > {noformat} > 2021-08-26 09:19:18,102 WARN com.zaxxer.hikari.pool.ProxyLeakTask: > [HikariPool-4 housekeeper]: Connection leak detection triggered for > org.postgresql.jdbc.PgConnection@77f76747, stack trace follows > java.lang.Exception: Apparent connection leak detected > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.getDbConn(TxnHandler.java:3843) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.acquireLock(TxnHandler.java:5135) > > at > org.apache.hadoop.hive.ql.txn.compactor.Initiator.run(Initiator.java:107) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24292) hive webUI should support keystoretype by config
[ https://issues.apache.org/jira/browse/HIVE-24292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-24292. - Fix Version/s: 4.0.0 Resolution: Fixed this was merged in october; but the jira was left open > hive webUI should support keystoretype by config > > > Key: HIVE-24292 > URL: https://issues.apache.org/jira/browse/HIVE-24292 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Yongzhi Chen >Assignee: Yongzhi Chen >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > We need a property to pass-in keystore type in webui too. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25329) CTAS creates a managed table as non-ACID table
[ https://issues.apache.org/jira/browse/HIVE-25329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405051#comment-17405051 ] László Bodor commented on HIVE-25329: - PR merged to master, thanks [~robbiezhang] for the patch! > CTAS creates a managed table as non-ACID table > -- > > Key: HIVE-25329 > URL: https://issues.apache.org/jira/browse/HIVE-25329 > Project: Hive > Issue Type: Bug >Reporter: Robbie Zhang >Assignee: Robbie Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > According to HIVE-22158, MANAGED tables should be ACID tables only. When we > set hive.create.as.external.legacy to true, the query like 'create managed > table as select 1' creates a non-ACID table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25329) CTAS creates a managed table as non-ACID table
[ https://issues.apache.org/jira/browse/HIVE-25329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor resolved HIVE-25329. - Fix Version/s: 4.0.0 Resolution: Fixed > CTAS creates a managed table as non-ACID table > -- > > Key: HIVE-25329 > URL: https://issues.apache.org/jira/browse/HIVE-25329 > Project: Hive > Issue Type: Bug >Reporter: Robbie Zhang >Assignee: Robbie Zhang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > According to HIVE-22158, MANAGED tables should be ACID tables only. When we > set hive.create.as.external.legacy to true, the query like 'create managed > table as select 1' creates a non-ACID table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25329) CTAS creates a managed table as non-ACID table
[ https://issues.apache.org/jira/browse/HIVE-25329?focusedWorklogId=642203=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642203 ] ASF GitHub Bot logged work on HIVE-25329: - Author: ASF GitHub Bot Created on: 26/Aug/21 08:17 Start Date: 26/Aug/21 08:17 Worklog Time Spent: 10m Work Description: abstractdog merged pull request #2477: URL: https://github.com/apache/hive/pull/2477 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 642203) Time Spent: 1h 50m (was: 1h 40m) > CTAS creates a managed table as non-ACID table > -- > > Key: HIVE-25329 > URL: https://issues.apache.org/jira/browse/HIVE-25329 > Project: Hive > Issue Type: Bug >Reporter: Robbie Zhang >Assignee: Robbie Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > According to HIVE-22158, MANAGED tables should be ACID tables only. When we > set hive.create.as.external.legacy to true, the query like 'create managed > table as select 1' creates a non-ACID table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24948) Enhancing performance of OrcInputFormat.getSplits with bucket pruning
[ https://issues.apache.org/jira/browse/HIVE-24948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Chung updated HIVE-24948: Description: The summarized flow of generating input splits at Tez AM is like below; (by calling HiveSplitGenerator.initialize()) # Perform dynamic partition pruning # Get the list of InputSplit by calling InputFormat.getSplits() [https://github.com/apache/hive/blob/624f62aadc08577cafaa299cfcf17c71fa6cdb3a/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java#L260-L260] # Perform bucket pruning with the list above if it's possible [https://github.com/apache/hive/blob/624f62aadc08577cafaa299cfcf17c71fa6cdb3a/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java#L299-L301] But I observed that the action 2, getting the list of InputSplit, can make big overhead when the inputs are ORC files in HDFS. For example, there is a ORC table T partitioned by 'log_date' and each partition is bucketed by a column 'q'. There are 240 buckets in each partition and the size of each bucket(ORC file) is, let's say, 100MB. The SQL is like this. {noformat} set hive.tez.bucket.pruning=true; select count(*) from T where log_date between '2020-01-01' and '2020-06-30' and q = 'foobar';{noformat} It means there are 240 * 183(days) = 43680 ORC files in the input paths, but thanks to bucket pruning, only 183 files should be processed. In my company's environment, the whole processing time of the SQL was roughly 5 minutes. However, I've checked that it took more than 3 minutes to make the list of OrcSplit for 43680 ORC files. The logs with tez.am.log.level=DEBUG showed like below; {noformat} 2021-03-25 01:21:31,850 [DEBUG] [InputInitializer {Map 1} #0] |orc.OrcInputFormat|: getSplits started ... 2021-03-25 01:24:51,435 [DEBUG] [InputInitializer {Map 1} #0] |orc.OrcInputFormat|: getSplits finished 2021-03-25 01:24:51,444 [INFO] [InputInitializer {Map 1} #0] |io.HiveInputFormat|: number of splits 43680 2021-03-25 01:24:51,444 [DEBUG] [InputInitializer {Map 1} #0] |log.PerfLogger|: /PERFLOG method=getSplits start=1616602891776 end=1616602891776 duration=199668 from=org.apache.hadoop.hive.ql.io.HiveInputFormat ... 2021-03-25 01:26:03,385 [INFO] [Dispatcher thread {Central}] |app.DAGAppMaster|: DAG completed, dagId=dag_1615862187190_731117_1, dagState=SUCCEEDED {noformat} 43680 - 183 = 43497 InputSplits which consume about 60% of entire processing time are just simply discarded by the action 3, pruneBuckets(). With bucket pruning, I think making the whole list of ORC input splits is not necessary. Therefore, I suggest that the flow would be like this; # Perform dynamic partition pruning # Get the list of InputSplit by calling InputFormat.getSplits() ## OrcInputFormat.getSplits() returns the bucket-pruned list if BitSet from FixedBucketPruningOptimizer exists was: The summarized flow of generating input splits at Tez AM is like below; (by calling HiveSplitGenerator.initialize()) # Perform dynamic partition pruning # Get the list of InputSplit by calling InputFormat.getSplits() [https://github.com/apache/hive/blob/624f62aadc08577cafaa299cfcf17c71fa6cdb3a/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java#L260-L260] # Perform bucket pruning with the list above if it's possible [https://github.com/apache/hive/blob/624f62aadc08577cafaa299cfcf17c71fa6cdb3a/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java#L299-L301] But I observed that the action 2, getting the list of InputSplit, can make big overhead when the inputs are ORC files in HDFS. For example, there is a ORC table T partitioned by 'log_date' and each partition is bucketed by a column 'q'. There are 240 buckets in each partition and the size of each bucket(ORC file) is, let's say, 100MB. The SQL is like this. {noformat} set hive.tez.bucket.pruning=true; select count(*) from T where log_date between '2020-01-01' and '2020-06-30' and q = 'foobar';{noformat} It means there are 240 * 183(days) = 43680 ORC files in the input paths, but thanks to bucket pruning, only 183 files should be processed. In my company's environment, the whole processing time of the SQL was roughly 5 minutes. However, I've checked that it took more than 3 minutes to make the list of OrcSplit for 43680 ORC files. The logs with tez.am.log.level=DEBUG showed like below; {noformat} 2021-03-25 01:21:31,850 [DEBUG] [InputInitializer {Map 1} #0] |orc.OrcInputFormat|: getSplits started ... 2021-03-25 01:24:51,435 [DEBUG] [InputInitializer {Map 1} #0] |orc.OrcInputFormat|: getSplits finished 2021-03-25 01:24:51,444 [INFO] [InputInitializer {Map 1} #0] |io.HiveInputFormat|: number of splits 43680 2021-03-25 01:24:51,444 [DEBUG] [InputInitializer {Map 1} #0] |log.PerfLogger|: /PERFLOG method=getSplits start=1616602891776 end=1616602891776 duration=199668