[jira] [Work logged] (HIVE-25330) Make FS calls in CopyUtils retryable

2021-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25330?focusedWorklogId=642685=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642685
 ]

ASF GitHub Bot logged work on HIVE-25330:
-

Author: ASF GitHub Bot
Created on: 27/Aug/21 03:41
Start Date: 27/Aug/21 03:41
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2516:
URL: https://github.com/apache/hive/pull/2516#discussion_r697127775



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
##
@@ -190,11 +247,14 @@ private void doCopyRetry(FileSystem sourceFs, 
List s
 // If copy fails, fall through the retry logic
 LOG.info("file operation failed", e);
 
-if (repeat >= (MAX_IO_RETRY - 1)) {
-  //no need to wait in the last iteration
+if (repeat >= (MAX_IO_RETRY - 1) || 
failOnExceptions.stream().anyMatch(k -> e.getClass().equals(k))
+|| 
ErrorMsg.REPL_FILE_SYSTEM_OPERATION_RETRY.getMsg().equals(e.getMessage())) {
+  //Don't retry in the following cases:

Review comment:
   pull the comment above the if statement, and check in case rather than 
matching entries with ``failOnExceptions`` you can try something 
isAssignableFrom kind of stuff, so that the child classes of the mentioned 
exceptions can also be considered.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 642685)
Time Spent: 40m  (was: 0.5h)

> Make FS calls in CopyUtils retryable
> 
>
> Key: HIVE-25330
> URL: https://issues.apache.org/jira/browse/HIVE-25330
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25330) Make FS calls in CopyUtils retryable

2021-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25330?focusedWorklogId=642682=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642682
 ]

ASF GitHub Bot logged work on HIVE-25330:
-

Author: ASF GitHub Bot
Created on: 27/Aug/21 03:38
Start Date: 27/Aug/21 03:38
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2516:
URL: https://github.com/apache/hive/pull/2516#discussion_r697128887



##
File path: ql/src/test/org/apache/hadoop/hive/ql/parse/repl/TestCopyUtils.java
##
@@ -110,6 +112,40 @@ public void shouldThrowExceptionOnDistcpFailure() throws 
Exception {
 copyUtils.doCopy(destination, srcPaths);
   }
 
+  @Test
+  public void testRetryableFSCalls() throws Exception {
+mockStatic(UserGroupInformation.class);
+mockStatic(ReplChangeManager.class);
+
when(UserGroupInformation.getCurrentUser()).thenReturn(mock(UserGroupInformation.class));
+HiveConf conf = mock(HiveConf.class);
+conf.set(HiveConf.ConfVars.REPL_RETRY_INTIAL_DELAY.varname, "1s");
+FileSystem fs = mock(FileSystem.class);
+Path source = mock(Path.class);
+Path destination = mock(Path.class);
+ContentSummary cs = mock(ContentSummary.class);
+
+when(ReplChangeManager.checksumFor(source, fs)).thenThrow(new 
IOException("Failed")).thenReturn("dummy");
+when(fs.exists(same(source))).thenThrow(new 
IOException("Failed")).thenReturn(true);
+when(fs.delete(same(source), anyBoolean())).thenThrow(new 
IOException("Failed")).thenReturn(true);
+when(fs.mkdirs(same(source))).thenThrow(new 
IOException("Failed")).thenReturn(true);
+when(fs.rename(same(source), same(destination))).thenThrow(new 
IOException("Failed")).thenReturn(true);
+when(fs.getContentSummary(same(source))).thenThrow(new 
IOException("Failed")).thenReturn(cs);
+
+CopyUtils copyUtils = new 
CopyUtils(UserGroupInformation.getCurrentUser().getUserName(), conf, fs);
+CopyUtils copyUtilsSpy = Mockito.spy(copyUtils);
+assertEquals (copyUtilsSpy.exists(fs, source), true);

Review comment:
   change to `assertTrue` similarly for the others as well

##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
##
@@ -66,6 +67,16 @@
   private FileSystem destinationFs;
   private final int maxParallelCopyTask;
 
+  private List> failOnExceptions = 
Arrays.asList(org.apache.hadoop.fs.PathIOException.class,
+  org.apache.hadoop.fs.UnsupportedFileSystemException.class,
+  org.apache.hadoop.fs.InvalidPathException.class,
+  org.apache.hadoop.fs.InvalidRequestException.class,
+  org.apache.hadoop.fs.FileAlreadyExistsException.class,
+  org.apache.hadoop.fs.ChecksumException.class,
+  org.apache.hadoop.fs.ParentNotDirectoryException.class,
+  org.apache.hadoop.hdfs.protocol.NSQuotaExceededException.class,

Review comment:
   We can include other quota exceptions also, say the children and 
grandchildren of `ClusterStorageCapacityExceededException` or directly 
`ClusterStorageCapacityExceededException` if retryable function can take the 
parent class and block its children as well.

##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
##
@@ -190,11 +247,14 @@ private void doCopyRetry(FileSystem sourceFs, 
List s
 // If copy fails, fall through the retry logic
 LOG.info("file operation failed", e);
 
-if (repeat >= (MAX_IO_RETRY - 1)) {
-  //no need to wait in the last iteration
+if (repeat >= (MAX_IO_RETRY - 1) || 
failOnExceptions.stream().anyMatch(k -> e.getClass().equals(k))
+|| 
ErrorMsg.REPL_FILE_SYSTEM_OPERATION_RETRY.getMsg().equals(e.getMessage())) {
+  //Don't retry in the following cases:

Review comment:
   pull the comment above the if statement

##
File path: ql/src/test/org/apache/hadoop/hive/ql/parse/repl/TestCopyUtils.java
##
@@ -110,6 +112,40 @@ public void shouldThrowExceptionOnDistcpFailure() throws 
Exception {
 copyUtils.doCopy(destination, srcPaths);
   }
 
+  @Test
+  public void testRetryableFSCalls() throws Exception {
+mockStatic(UserGroupInformation.class);
+mockStatic(ReplChangeManager.class);
+
when(UserGroupInformation.getCurrentUser()).thenReturn(mock(UserGroupInformation.class));
+HiveConf conf = mock(HiveConf.class);
+conf.set(HiveConf.ConfVars.REPL_RETRY_INTIAL_DELAY.varname, "1s");
+FileSystem fs = mock(FileSystem.class);
+Path source = mock(Path.class);
+Path destination = mock(Path.class);
+ContentSummary cs = mock(ContentSummary.class);
+
+when(ReplChangeManager.checksumFor(source, fs)).thenThrow(new 
IOException("Failed")).thenReturn("dummy");
+when(fs.exists(same(source))).thenThrow(new 
IOException("Failed")).thenReturn(true);
+

[jira] [Updated] (HIVE-25303) CTAS hive.create.as.external.legacy tries to place data files in managed WH path

2021-08-26 Thread Sai Hemanth Gantasala (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala updated HIVE-25303:
-
Description: 
Under legacy table creation mode (hive.create.as.external.legacy=true), when a 
database has been created in a specific LOCATION, in a session where that 
database is Used, tables are created using the following command:
{code:java}
CREATE TABLE  AS SELECT {code}
should inherit the HDFS path from the database's location. Instead, Hive is 
trying to write the table data into 
/warehouse/tablespace/managed/hive//

+Design+: 
In the CTAS query, first data is written in the target directory (which happens 
in HS2) and then the table is created(This happens in HMS). So here two 
decisions are being made i) target directory location ii) how the table should 
be created (table type, sd e.t.c).
When HS2 needs a target location that needs to be set, it'll make create table 
dry run call to HMS (where table translation happens) and i) and ii) decisions 
are made within HMS and returns table object. Then HS2 will use this location 
set by HMS for placing the data.

  was:
Under legacy table creation mode (hive.create.as.external.legacy=true), when a 
database has been created in a specific LOCATION, in a session where that 
database is USEd, tables created using

CREATE TABLE  AS SELECT 

should inherit the HDFS path from the database's location.

Instead, Hive is trying to write the table data into 
/warehouse/tablespace/managed/hive//


> CTAS hive.create.as.external.legacy tries to place data files in managed WH 
> path
> 
>
> Key: HIVE-25303
> URL: https://issues.apache.org/jira/browse/HIVE-25303
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Under legacy table creation mode (hive.create.as.external.legacy=true), when 
> a database has been created in a specific LOCATION, in a session where that 
> database is Used, tables are created using the following command:
> {code:java}
> CREATE TABLE  AS SELECT {code}
> should inherit the HDFS path from the database's location. Instead, Hive is 
> trying to write the table data into 
> /warehouse/tablespace/managed/hive//
> +Design+: 
> In the CTAS query, first data is written in the target directory (which 
> happens in HS2) and then the table is created(This happens in HMS). So here 
> two decisions are being made i) target directory location ii) how the table 
> should be created (table type, sd e.t.c).
> When HS2 needs a target location that needs to be set, it'll make create 
> table dry run call to HMS (where table translation happens) and i) and ii) 
> decisions are made within HMS and returns table object. Then HS2 will use 
> this location set by HMS for placing the data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25317) Relocate dependencies in shaded hive-exec module

2021-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25317?focusedWorklogId=642455=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642455
 ]

ASF GitHub Bot logged work on HIVE-25317:
-

Author: ASF GitHub Bot
Created on: 26/Aug/21 16:59
Start Date: 26/Aug/21 16:59
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #2459:
URL: https://github.com/apache/hive/pull/2459#discussion_r696816642



##
File path: llap-server/pom.xml
##
@@ -38,6 +38,7 @@
   org.apache.hive
   hive-exec
   ${project.version}
+  core

Review comment:
   Yea I think so since this PR is trying to shade things from the 
`hive-exec-core` I believe? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 642455)
Time Spent: 2h 40m  (was: 2.5h)

> Relocate dependencies in shaded hive-exec module
> 
>
> Key: HIVE-25317
> URL: https://issues.apache.org/jira/browse/HIVE-25317
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.8
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When we want to use shaded version of hive-exec (i.e., w/o classifier), more 
> dependencies conflict with Spark. We need to relocate these dependencies too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25372) [Hive] Advance write ID for remaining DDLs

2021-08-26 Thread Kishen Das (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405326#comment-17405326
 ] 

Kishen Das commented on HIVE-25372:
---

Rest of DDLs will be addressed as part of 
https://issues.apache.org/jira/browse/HIVE-25407 

> [Hive] Advance write ID for remaining DDLs
> --
>
> Key: HIVE-25372
> URL: https://issues.apache.org/jira/browse/HIVE-25372
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> We guarantee data consistency for table metadata, when serving data from the 
> HMS cache. HMS cache relies on Valid Write IDs to decide whether to serve 
> from cache or refresh from the backing DB and serve, so we have to ensure we 
> advance write IDs during all alter table flows. We have to ensure we advance 
> the write ID for below DDLs.
> AlterTableSetOwnerAnalyzer.java 
>  -AlterTableSkewedByAnalyzer.java-
>  AlterTableSetSerdeAnalyzer.java
>  AlterTableSetSerdePropsAnalyzer.java
>  -AlterTableUnsetSerdePropsAnalyzer.java-
>  AlterTableSetPartitionSpecAnalyzer
>  AlterTableClusterSortAnalyzer.java
>  AlterTableIntoBucketsAnalyzer.java
>  AlterTableConcatenateAnalyzer.java
>  -AlterTableCompactAnalyzer.java-
>  AlterTableSetFileFormatAnalyzer.java
>  -AlterTableSetSkewedLocationAnalyzer.java-



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25407) Advance Write ID during ALTER TABLE ( NOT SKEWED, SKEWED BY, SET SKEWED LOCATION, UNSET SERDEPROPERTIES)

2021-08-26 Thread Kishen Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishen Das updated HIVE-25407:
--
Summary: Advance Write ID during ALTER TABLE ( NOT SKEWED, SKEWED BY, SET 
SKEWED LOCATION, UNSET SERDEPROPERTIES)  (was: Advance Write ID during ALTER 
TABLE ( NOT SKEWED, UNSET SERDEPROPERTIES)

> Advance Write ID during ALTER TABLE ( NOT SKEWED, SKEWED BY, SET SKEWED 
> LOCATION, UNSET SERDEPROPERTIES)
> 
>
> Key: HIVE-25407
> URL: https://issues.apache.org/jira/browse/HIVE-25407
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> Below DDLs should be investigated separately on why the advancing the write 
> ID is not working for transactional tables, even after adding the logic to 
> advance the write ID. 
>  * -ALTER TABLE SET PARTITION SPEC- 
>  * ALTER TABLE  UNSET SERDEPROPERTIES 
>  * ALTER TABLE NOT SKEWED
>  * -ALTER TABLE COMPACT- 
>  * ALTER TABLE SKEWED BY
>  * ALTER TABLE SET SKEWED LOCATION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25407) Advance Write ID during ALTER TABLE ( NOT SKEWED, UNSET SERDEPROPERTIES

2021-08-26 Thread Kishen Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishen Das updated HIVE-25407:
--
Summary: Advance Write ID during ALTER TABLE ( NOT SKEWED, UNSET 
SERDEPROPERTIES  (was: Advance Write ID during ALTER TABLE ( UNSET 
SERDEPROPERTIES)

> Advance Write ID during ALTER TABLE ( NOT SKEWED, UNSET SERDEPROPERTIES
> ---
>
> Key: HIVE-25407
> URL: https://issues.apache.org/jira/browse/HIVE-25407
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> Below DDLs should be investigated separately on why the advancing the write 
> ID is not working for transactional tables, even after adding the logic to 
> advance the write ID. 
>  * -ALTER TABLE SET PARTITION SPEC- 
>  * ALTER TABLE  UNSET SERDEPROPERTIES 
>  * ALTER TABLE NOT SKEWED
>  * -ALTER TABLE COMPACT- 
>  * ALTER TABLE SKEWED BY
>  * ALTER TABLE SET SKEWED LOCATION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25407) Advance Write ID during ALTER TABLE ( UNSET SERDEPROPERTIES

2021-08-26 Thread Kishen Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishen Das updated HIVE-25407:
--
Summary: Advance Write ID during ALTER TABLE ( UNSET SERDEPROPERTIES  (was: 
[Hive] Investigate why advancing the Write ID not working for some DDLs and fix 
it, if appropriate)

> Advance Write ID during ALTER TABLE ( UNSET SERDEPROPERTIES
> ---
>
> Key: HIVE-25407
> URL: https://issues.apache.org/jira/browse/HIVE-25407
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> Below DDLs should be investigated separately on why the advancing the write 
> ID is not working for transactional tables, even after adding the logic to 
> advance the write ID. 
>  * -ALTER TABLE SET PARTITION SPEC- 
>  * ALTER TABLE  UNSET SERDEPROPERTIES 
>  * ALTER TABLE NOT SKEWED
>  * -ALTER TABLE COMPACT- 
>  * ALTER TABLE SKEWED BY
>  * ALTER TABLE SET SKEWED LOCATION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-25461) Add a test case to ensure Truncate table advances the write ID

2021-08-26 Thread Kishen Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25461 started by Kishen Das.
-
> Add a test case to ensure Truncate table advances the write ID
> --
>
> Key: HIVE-25461
> URL: https://issues.apache.org/jira/browse/HIVE-25461
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25461) Add a test case to ensure Truncate table advances the write ID

2021-08-26 Thread Kishen Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishen Das reassigned HIVE-25461:
-

Assignee: Kishen Das

> Add a test case to ensure Truncate table advances the write ID
> --
>
> Key: HIVE-25461
> URL: https://issues.apache.org/jira/browse/HIVE-25461
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25461) Add a test case to ensure Truncate table advances the write ID

2021-08-26 Thread Kishen Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishen Das updated HIVE-25461:
--
Summary: Add a test case to ensure Truncate table advances the write ID  
(was: Truncate table should advance the write ID)

> Add a test case to ensure Truncate table advances the write ID
> --
>
> Key: HIVE-25461
> URL: https://issues.apache.org/jira/browse/HIVE-25461
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25372) [Hive] Advance write ID for remaining DDLs

2021-08-26 Thread Kishen Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishen Das updated HIVE-25372:
--
Description: 
We guarantee data consistency for table metadata, when serving data from the 
HMS cache. HMS cache relies on Valid Write IDs to decide whether to serve from 
cache or refresh from the backing DB and serve, so we have to ensure we advance 
write IDs during all alter table flows. We have to ensure we advance the write 
ID for below DDLs.

AlterTableSetOwnerAnalyzer.java 
 -AlterTableSkewedByAnalyzer.java-
 AlterTableSetSerdeAnalyzer.java
 AlterTableSetSerdePropsAnalyzer.java
 -AlterTableUnsetSerdePropsAnalyzer.java-
 AlterTableSetPartitionSpecAnalyzer
 AlterTableClusterSortAnalyzer.java
 AlterTableIntoBucketsAnalyzer.java
 AlterTableConcatenateAnalyzer.java
 -AlterTableCompactAnalyzer.java-
 AlterTableSetFileFormatAnalyzer.java
 -AlterTableSetSkewedLocationAnalyzer.java-

  was:
We guarantee data consistency for table metadata, when serving data from the 
HMS cache. HMS cache relies on Valid Write IDs to decide whether to serve from 
cache or refresh from the backing DB and serve, so we have to ensure we advance 
write IDs during all alter table flows. We have to ensure we advance the write 
ID for below DDLs.

AlterTableSetOwnerAnalyzer.java 
AlterTableSkewedByAnalyzer.java
AlterTableSetSerdeAnalyzer.java
AlterTableSetSerdePropsAnalyzer.java
AlterTableUnsetSerdePropsAnalyzer.java
AlterTableSetPartitionSpecAnalyzer
AlterTableClusterSortAnalyzer.java
AlterTableIntoBucketsAnalyzer.java
AlterTableConcatenateAnalyzer.java
AlterTableCompactAnalyzer.java
AlterTableSetFileFormatAnalyzer.java
AlterTableSetSkewedLocationAnalyzer.java


> [Hive] Advance write ID for remaining DDLs
> --
>
> Key: HIVE-25372
> URL: https://issues.apache.org/jira/browse/HIVE-25372
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> We guarantee data consistency for table metadata, when serving data from the 
> HMS cache. HMS cache relies on Valid Write IDs to decide whether to serve 
> from cache or refresh from the backing DB and serve, so we have to ensure we 
> advance write IDs during all alter table flows. We have to ensure we advance 
> the write ID for below DDLs.
> AlterTableSetOwnerAnalyzer.java 
>  -AlterTableSkewedByAnalyzer.java-
>  AlterTableSetSerdeAnalyzer.java
>  AlterTableSetSerdePropsAnalyzer.java
>  -AlterTableUnsetSerdePropsAnalyzer.java-
>  AlterTableSetPartitionSpecAnalyzer
>  AlterTableClusterSortAnalyzer.java
>  AlterTableIntoBucketsAnalyzer.java
>  AlterTableConcatenateAnalyzer.java
>  -AlterTableCompactAnalyzer.java-
>  AlterTableSetFileFormatAnalyzer.java
>  -AlterTableSetSkewedLocationAnalyzer.java-



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25475) TestStatsReplicationScenarios.testForParallelBootstrapLoad is unstable

2021-08-26 Thread Haymant Mangla (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405300#comment-17405300
 ] 

Haymant Mangla commented on HIVE-25475:
---

Committed to master.

Thanks for the review, [~maheshk114]

> TestStatsReplicationScenarios.testForParallelBootstrapLoad is unstable
> --
>
> Key: HIVE-25475
> URL: https://issues.apache.org/jira/browse/HIVE-25475
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> http://ci.hive.apache.org/job/hive-flaky-check/389/
> {code}
> 16:19:18  [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time 
> elapsed: 141.73 s <<< FAILURE! - in 
> org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios
> 16:19:18  [ERROR] 
> org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.testForParallelBootstrapLoad
>   Time elapsed: 122.979 s  <<< ERROR!
> 16:19:18  org.apache.hadoop.hive.ql.metadata.HiveException
> 16:19:18  at 
> org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:5032)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:3348)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:429)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:361)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:334)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:245)
> 16:19:18  at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:108)
> 16:19:18  at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:348)
> 16:19:18  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:204)
> 16:19:18  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:153)
> 16:19:18  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:148)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:164)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:230)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.run(WarehouseInstance.java:235)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.load(WarehouseInstance.java:309)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.dumpLoadVerify(TestStatsReplicationScenarios.java:359)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.testStatsReplicationCommon(TestStatsReplicationScenarios.java:663)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.testForParallelBootstrapLoad(TestStatsReplicationScenarios.java:688)
> 16:19:18  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 16:19:18  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 16:19:18  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 16:19:18  at java.lang.reflect.Method.invoke(Method.java:498)
> 16:19:18  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> 16:19:18  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 16:19:18  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> 16:19:18  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 16:19:18  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 16:19:18  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 16:19:18  at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
> 16:19:18  at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> 16:19:18  at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> 16:19:18  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> 16:19:18  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> 16:19:18  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> 16:19:18  at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> 16:19:18  at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> 16:19:18  at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)

[jira] [Resolved] (HIVE-25475) TestStatsReplicationScenarios.testForParallelBootstrapLoad is unstable

2021-08-26 Thread Haymant Mangla (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haymant Mangla resolved HIVE-25475.
---
Resolution: Fixed

> TestStatsReplicationScenarios.testForParallelBootstrapLoad is unstable
> --
>
> Key: HIVE-25475
> URL: https://issues.apache.org/jira/browse/HIVE-25475
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> http://ci.hive.apache.org/job/hive-flaky-check/389/
> {code}
> 16:19:18  [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time 
> elapsed: 141.73 s <<< FAILURE! - in 
> org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios
> 16:19:18  [ERROR] 
> org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.testForParallelBootstrapLoad
>   Time elapsed: 122.979 s  <<< ERROR!
> 16:19:18  org.apache.hadoop.hive.ql.metadata.HiveException
> 16:19:18  at 
> org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:5032)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:3348)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:429)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:361)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:334)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:245)
> 16:19:18  at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:108)
> 16:19:18  at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:348)
> 16:19:18  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:204)
> 16:19:18  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:153)
> 16:19:18  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:148)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:164)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:230)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.run(WarehouseInstance.java:235)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.load(WarehouseInstance.java:309)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.dumpLoadVerify(TestStatsReplicationScenarios.java:359)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.testStatsReplicationCommon(TestStatsReplicationScenarios.java:663)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.testForParallelBootstrapLoad(TestStatsReplicationScenarios.java:688)
> 16:19:18  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 16:19:18  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 16:19:18  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 16:19:18  at java.lang.reflect.Method.invoke(Method.java:498)
> 16:19:18  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> 16:19:18  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 16:19:18  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> 16:19:18  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 16:19:18  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 16:19:18  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 16:19:18  at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
> 16:19:18  at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> 16:19:18  at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> 16:19:18  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> 16:19:18  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> 16:19:18  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> 16:19:18  at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> 16:19:18  at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> 16:19:18  at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> 16:19:18  at 
> 

[jira] [Work logged] (HIVE-25475) TestStatsReplicationScenarios.testForParallelBootstrapLoad is unstable

2021-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25475?focusedWorklogId=642392=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642392
 ]

ASF GitHub Bot logged work on HIVE-25475:
-

Author: ASF GitHub Bot
Created on: 26/Aug/21 15:05
Start Date: 26/Aug/21 15:05
Worklog Time Spent: 10m 
  Work Description: maheshk114 merged pull request #2605:
URL: https://github.com/apache/hive/pull/2605


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 642392)
Time Spent: 20m  (was: 10m)

> TestStatsReplicationScenarios.testForParallelBootstrapLoad is unstable
> --
>
> Key: HIVE-25475
> URL: https://issues.apache.org/jira/browse/HIVE-25475
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> http://ci.hive.apache.org/job/hive-flaky-check/389/
> {code}
> 16:19:18  [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time 
> elapsed: 141.73 s <<< FAILURE! - in 
> org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios
> 16:19:18  [ERROR] 
> org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.testForParallelBootstrapLoad
>   Time elapsed: 122.979 s  <<< ERROR!
> 16:19:18  org.apache.hadoop.hive.ql.metadata.HiveException
> 16:19:18  at 
> org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:5032)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:3348)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:429)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:361)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:334)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:245)
> 16:19:18  at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:108)
> 16:19:18  at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:348)
> 16:19:18  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:204)
> 16:19:18  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:153)
> 16:19:18  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:148)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:164)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:230)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.run(WarehouseInstance.java:235)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.load(WarehouseInstance.java:309)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.dumpLoadVerify(TestStatsReplicationScenarios.java:359)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.testStatsReplicationCommon(TestStatsReplicationScenarios.java:663)
> 16:19:18  at 
> org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.testForParallelBootstrapLoad(TestStatsReplicationScenarios.java:688)
> 16:19:18  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 16:19:18  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 16:19:18  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 16:19:18  at java.lang.reflect.Method.invoke(Method.java:498)
> 16:19:18  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> 16:19:18  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 16:19:18  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> 16:19:18  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 16:19:18  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 16:19:18  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 16:19:18  at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
> 16:19:18  at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> 

[jira] [Work logged] (HIVE-25286) Set stats to inaccurate when an Iceberg table is modified outside Hive

2021-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25286?focusedWorklogId=642386=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642386
 ]

ASF GitHub Bot logged work on HIVE-25286:
-

Author: ASF GitHub Bot
Created on: 26/Aug/21 14:51
Start Date: 26/Aug/21 14:51
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #2427:
URL: https://github.com/apache/hive/pull/2427


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 642386)
Time Spent: 1.5h  (was: 1h 20m)

> Set stats to inaccurate when an Iceberg table is modified outside Hive
> --
>
> Key: HIVE-25286
> URL: https://issues.apache.org/jira/browse/HIVE-25286
> Project: Hive
>  Issue Type: New Feature
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When an Iceberg table is modified outside of Hive then the stats should be 
> set to inaccurate since there is no way to ensure that the HMS stats are 
> updated correctly and this could cause incorrect query results.
> The proposed solution is only working for HiveCatalog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25286) Set stats to inaccurate when an Iceberg table is modified outside Hive

2021-08-26 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-25286.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the review [~Marton Bod]

> Set stats to inaccurate when an Iceberg table is modified outside Hive
> --
>
> Key: HIVE-25286
> URL: https://issues.apache.org/jira/browse/HIVE-25286
> Project: Hive
>  Issue Type: New Feature
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When an Iceberg table is modified outside of Hive then the stats should be 
> set to inaccurate since there is no way to ensure that the HMS stats are 
> updated correctly and this could cause incorrect query results.
> The proposed solution is only working for HiveCatalog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25429) Delta metrics collection may cause number of tez counters to exceed tez.counters.max limit

2021-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25429?focusedWorklogId=642378=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642378
 ]

ASF GitHub Bot logged work on HIVE-25429:
-

Author: ASF GitHub Bot
Created on: 26/Aug/21 14:43
Start Date: 26/Aug/21 14:43
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2563:
URL: https://github.com/apache/hive/pull/2563#discussion_r695474352



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java
##
@@ -272,30 +272,34 @@ private void prepare(InputInitializerContext 
initializerContext) throws IOExcept
 String groupName = null;
 String vertexName = null;
 if (inputInitializerContext != null) {
-  tezCounters = new TezCounters();

Review comment:
   @abstractdog  would you mind reviewing at least the changes to 
HiveSplitGenerator?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 642378)
Time Spent: 1.5h  (was: 1h 20m)

> Delta metrics collection may cause number of tez counters to exceed 
> tez.counters.max limit
> --
>
> Key: HIVE-25429
> URL: https://issues.apache.org/jira/browse/HIVE-25429
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> There's a limit to the number of tez counters allowed (tez.counters.max). 
> Delta metrics collection (i.e. DeltaFileMetricsReporter) was creating 3 
> counters for each partition touched by a given query, which can result in a 
> huge number of counters, which is unnecessary because we're only interested 
> in n number of partitions with the most deltas. This change limits the number 
> of counters created to hive.txn.acid.metrics.max.cache.size*3.
> Also when tez.counters.max is reached a LimitExceededException is thrown but 
> isn't caught on the Hive side and causes the query to fail. We should catch 
> this and skip delta metrics collection in this case.
> Also make sure that metrics are only collected if 
> hive.metastore.acidmetrics.ext.on=true



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25485) Transform selects of literals under a UNION ALL to inline table scan

2021-08-26 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-25485:
---


> Transform selects of literals under a UNION ALL to inline table scan
> 
>
> Key: HIVE-25485
> URL: https://issues.apache.org/jira/browse/HIVE-25485
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> {code}
> select 1
> union all
> select 1
> union all
> [...]
> union all
> select 1
> {code}
> results in a very big plan; which will have vertexes proportional to the 
> number of union all branch - hence it could be slow to execute it



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23016) Extract JdbcConnectionParams from Utils Class

2021-08-26 Thread Timur Malikin (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405257#comment-17405257
 ] 

Timur Malikin commented on HIVE-23016:
--

Hi! I'd like to take this task and make the PR :)

> Extract JdbcConnectionParams from Utils Class
> -
>
> Key: HIVE-23016
> URL: https://issues.apache.org/jira/browse/HIVE-23016
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Minor
>  Labels: n00b, newbie, noob
>
> And make it its own class.
> https://github.com/apache/hive/blob/4700e210ef7945278c4eb313c9ebd810b0224da1/jdbc/src/java/org/apache/hive/jdbc/Utils.java#L72



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=642304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642304
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 26/Aug/21 12:25
Start Date: 26/Aug/21 12:25
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2479:
URL: https://github.com/apache/hive/pull/2479#discussion_r696577856



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -129,21 +131,16 @@ private boolean 
fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego
   private void addElement(ListColumnVector lcv, List elements, 
PrimitiveObjectInspector.PrimitiveCategory category, int index) throws 
IOException {
 lcv.offsets[index] = elements.size();
 
-// Return directly if last value is null

Review comment:
   this part has been reworked a bit, created a nested loop and put it into 
a separate method




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 642304)
Time Spent: 3h 50m  (was: 3h 40m)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=642305=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642305
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 26/Aug/21 12:25
Start Date: 26/Aug/21 12:25
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2479:
URL: https://github.com/apache/hive/pull/2479#discussion_r696577856



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -129,21 +131,16 @@ private boolean 
fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego
   private void addElement(ListColumnVector lcv, List elements, 
PrimitiveObjectInspector.PrimitiveCategory category, int index) throws 
IOException {
 lcv.offsets[index] = elements.size();
 
-// Return directly if last value is null

Review comment:
   this part has been reworked a bit, created a nested loop and put it into 
a separate method (collectDataFromParquetPage) + added comments 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 642305)
Time Spent: 4h  (was: 3h 50m)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=642303=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642303
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 26/Aug/21 12:24
Start Date: 26/Aug/21 12:24
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2479:
URL: https://github.com/apache/hive/pull/2479#discussion_r696577326



##
File path: ql/src/test/queries/clientpositive/parquet_map_null_vectorization.q
##
@@ -0,0 +1,20 @@
+set hive.mapred.mode=nonstrict;
+set hive.vectorized.execution.enabled=true;
+set hive.fetch.task.conversion=none;
+
+DROP TABLE parquet_map_type;
+
+
+CREATE TABLE parquet_map_type (
+id int,
+stringMap map

Review comment:
   finally I decided to not block this because of some testcases, I created 
follow-up tickets: HIVE-25459, HIVE-25484
   this patch is already solves customer's issue




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 642303)
Time Spent: 3h 40m  (was: 3.5h)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=642301=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642301
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 26/Aug/21 12:22
Start Date: 26/Aug/21 12:22
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2479:
URL: https://github.com/apache/hive/pull/2479#discussion_r696576391



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -479,6 +501,9 @@ private boolean compareBytesColumnVector(BytesColumnVector 
cv1, BytesColumnVecto
 int length2 = cv2.vector.length;
 if (length1 == length2) {
   for (int i = 0; i < length1; i++) {
+if (cv1.vector[i] == null && cv2.vector[i] == null) {
+  continue;

Review comment:
   okay, I got in the meantime, simply removing null check here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 642301)
Time Spent: 3.5h  (was: 3h 20m)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=642300=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642300
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 26/Aug/21 12:19
Start Date: 26/Aug/21 12:19
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2479:
URL: https://github.com/apache/hive/pull/2479#discussion_r696573989



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -479,6 +501,9 @@ private boolean compareBytesColumnVector(BytesColumnVector 
cv1, BytesColumnVecto
 int length2 = cv2.vector.length;
 if (length1 == length2) {
   for (int i = 0; i < length1; i++) {
+if (cv1.vector[i] == null && cv2.vector[i] == null) {
+  continue;

Review comment:
   I think I got it, it will be much faster for non-null strings:
   ```
   int innerLen1 = cv1.vector[i].length;
   int innerLen2 = cv2.vector[i].length;
   if (innerLen1 == innerLen2) {
 for (int j = 0; j < innerLen1; j++) {
   if (cv1.vector[i][j] != cv2.vector[i][j]) {
 return false;
   }
 }
   } else {
 return false;
   }
   
   if (cv1.isNull[i] && cv2.isNull[i]) {
 continue;
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 642300)
Time Spent: 3h 20m  (was: 3h 10m)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at 

[jira] [Comment Edited] (HIVE-25459) Wrong VectorUDFMapIndexBaseScalar child class is used, leading to ClassCastException

2021-08-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-25459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405174#comment-17405174
 ] 

László Bodor edited comment on HIVE-25459 at 8/26/21, 11:49 AM:


same for decimals too:
{code}
-- *** DECIMAL ***
CREATE TABLE parquet_map_type_decimal (
id int,
decimalMap map
) stored as parquet;

insert into parquet_map_type_decimal SELECT 1, MAP(1.0, NULL, 2.0, 3.0);
insert into parquet_map_type_decimal (id) VALUES (2);

select id, decimalMap from parquet_map_type_decimal;
select id, decimalMap[1] from parquet_map_type_decimal group by id, 
decimalMap[1]; -- fails until HIVE-25459 is solved
{code}
{code}
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector cannot be cast to 
org.apache.hadoop.hive.ql.exec.vector.LongColumnVector
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFMapIndexLongScalar.findScalarInMap(VectorUDFMapIndexLongScalar.java:67)
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFMapIndexBaseScalar.evaluate(VectorUDFMapIndexBaseScalar.java:74)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
{code}


was (Author: abstractdog):
same for decimals too:
{code}
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector cannot be cast to 
org.apache.hadoop.hive.ql.exec.vector.LongColumnVector
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFMapIndexLongScalar.findScalarInMap(VectorUDFMapIndexLongScalar.java:67)
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFMapIndexBaseScalar.evaluate(VectorUDFMapIndexBaseScalar.java:74)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
{code}

> Wrong VectorUDFMapIndexBaseScalar child class is used, leading to 
> ClassCastException
> 
>
> Key: HIVE-25459
> URL: https://issues.apache.org/jira/browse/HIVE-25459
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> given this query:
> {code}
> CREATE TABLE map_of_doubles (
> id int,
> doubleMap map
> ) stored as orc;
> insert overwrite table map_of_doubles SELECT 1, MAP(CAST(1.0 as DOUBLE), 
> null, CAST(2.0 as DOUBLE), CAST(3.0 as DOUBLE));
> select id, doubleMap from map_of_doubles;
> select id, doubleMap[1] from map_of_doubles group by id, doubleMap[1]; -- 
> this fails
> {code}
> error is:
> {code}
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.DoubleColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFMapIndexLongScalar.findScalarInMap(VectorUDFMapIndexLongScalar.java:67)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFMapIndexBaseScalar.evaluate(VectorUDFMapIndexBaseScalar.java:132)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
>   ... 23 more
> {code}
> I found this error while was trying to write q test cases for all data types 
> in HIVE-23688, so this issue needs to be addressed first
> HIVE-23688 is Parquet-specific, this one is not, it can be reproduced for ORC 
> and Parquet too



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25459) Wrong VectorUDFMapIndexBaseScalar child class is used, leading to ClassCastException

2021-08-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-25459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405174#comment-17405174
 ] 

László Bodor commented on HIVE-25459:
-

same for decimals too:
{code}
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector cannot be cast to 
org.apache.hadoop.hive.ql.exec.vector.LongColumnVector
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFMapIndexLongScalar.findScalarInMap(VectorUDFMapIndexLongScalar.java:67)
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFMapIndexBaseScalar.evaluate(VectorUDFMapIndexBaseScalar.java:74)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
{code}

> Wrong VectorUDFMapIndexBaseScalar child class is used, leading to 
> ClassCastException
> 
>
> Key: HIVE-25459
> URL: https://issues.apache.org/jira/browse/HIVE-25459
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> given this query:
> {code}
> CREATE TABLE map_of_doubles (
> id int,
> doubleMap map
> ) stored as orc;
> insert overwrite table map_of_doubles SELECT 1, MAP(CAST(1.0 as DOUBLE), 
> null, CAST(2.0 as DOUBLE), CAST(3.0 as DOUBLE));
> select id, doubleMap from map_of_doubles;
> select id, doubleMap[1] from map_of_doubles group by id, doubleMap[1]; -- 
> this fails
> {code}
> error is:
> {code}
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.DoubleColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFMapIndexLongScalar.findScalarInMap(VectorUDFMapIndexLongScalar.java:67)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFMapIndexBaseScalar.evaluate(VectorUDFMapIndexBaseScalar.java:132)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
>   ... 23 more
> {code}
> I found this error while was trying to write q test cases for all data types 
> in HIVE-23688, so this issue needs to be addressed first
> HIVE-23688 is Parquet-specific, this one is not, it can be reproduced for ORC 
> and Parquet too



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25484) NULL cannot be inserted as a Parquet map (and as key in a map)

2021-08-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-25484:

Description: 
not sure if this is a bug or a feature request, but I haven't been able to test 
NULL as map in the scope of HIVE-23688:
{code}
CREATE TABLE parquet_map_type_string (
id int,
stringMap map
) stored as parquet;

insert into parquet_map_type_string SELECT 1, MAP('k1', null, 'k2', 'v2'); -- 
NULL as value, works
insert into parquet_map_type_string (id) VALUES (2); -- NULL as map, fails
insert into parquet_map_type_string SELECT 3, MAP(null, 'k3', 'k4', 'v4'); -- 
NULL as key, fails
{code}

leads to:
{code}
Caused by: java.lang.NullPointerException
at 
org.apache.parquet.io.api.Binary$FromStringBinary.encodeUTF8(Binary.java:218)
at 
org.apache.parquet.io.api.Binary$FromStringBinary.(Binary.java:209)
at org.apache.parquet.io.api.Binary.fromString(Binary.java:537)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$StringDataWriter.write(DataWritableWriter.java:474)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MapDataWriter.write(DataWritableWriter.java:354)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:228)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:251)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:115)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:76)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:35)
at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
at 
org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:182)
at 
org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:44)
at 
org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:161)
at 
org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:174)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1160)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:173)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:155)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
{code}

  was:
not sure if this is a bug or a feature request, but I haven't been able to test 
NULL as map in the scope of HIVE-23688:
{code}
CREATE TABLE parquet_map_type_string (
id int,
stringMap map
) stored as parquet;

insert into parquet_map_type_string SELECT 1, MAP('k1', null, 'k2', 'v2'); -- 
NULL as value, works
insert into parquet_map_type_string (id) VALUES (2) -- NULL as map, fails
insert into parquet_map_type_string SELECT 3, MAP(null, 'k3', 'k4', 'v4'); -- 
NULL as key, fails
{code}

leads to:
{code}
Caused by: java.lang.NullPointerException
at 
org.apache.parquet.io.api.Binary$FromStringBinary.encodeUTF8(Binary.java:218)
at 
org.apache.parquet.io.api.Binary$FromStringBinary.(Binary.java:209)
at org.apache.parquet.io.api.Binary.fromString(Binary.java:537)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$StringDataWriter.write(DataWritableWriter.java:474)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MapDataWriter.write(DataWritableWriter.java:354)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:228)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:251)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:115)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:76)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:35)
at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
at 
org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:182)
at 
org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:44)
at 

[jira] [Updated] (HIVE-25484) NULL cannot be inserted as a Parquet map (and as key in a map)

2021-08-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-25484:

Summary: NULL cannot be inserted as a Parquet map (and as key in a map)  
(was: NULL cannot be inserted as a Parquet map)

> NULL cannot be inserted as a Parquet map (and as key in a map)
> --
>
> Key: HIVE-25484
> URL: https://issues.apache.org/jira/browse/HIVE-25484
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> not sure if this is a bug or a feature request, but I haven't been able to 
> test NULL as map in the scope of HIVE-23688:
> {code}
> CREATE TABLE parquet_map_type_string (
> id int,
> stringMap map
> ) stored as parquet;
> insert into parquet_map_type_string SELECT 1, MAP('k1', null, 'k2', 'v2'); -- 
> NULL as value, works
> insert into parquet_map_type_string (id) VALUES (2) -- NULL as map, fails
> insert into parquet_map_type_string SELECT 3, MAP(null, 'k3', 'k4', 'v4'); -- 
> NULL as key, fails
> {code}
> leads to:
> {code}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.parquet.io.api.Binary$FromStringBinary.encodeUTF8(Binary.java:218)
>   at 
> org.apache.parquet.io.api.Binary$FromStringBinary.(Binary.java:209)
>   at org.apache.parquet.io.api.Binary.fromString(Binary.java:537)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$StringDataWriter.write(DataWritableWriter.java:474)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MapDataWriter.write(DataWritableWriter.java:354)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:228)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:251)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:115)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:76)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:35)
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
>   at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:182)
>   at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:44)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:161)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:174)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1160)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:173)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:155)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25484) NULL cannot be inserted as a Parquet map

2021-08-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-25484:

Description: 
not sure if this is a bug or a feature request, but I haven't been able to test 
NULL as map in the scope of HIVE-23688:
{code}
CREATE TABLE parquet_map_type_string (
id int,
stringMap map
) stored as parquet;

insert into parquet_map_type_string SELECT 1, MAP('k1', null, 'k2', 'v2'); -- 
NULL as value, works
insert into parquet_map_type_string (id) VALUES (2) -- NULL as map, fails
insert into parquet_map_type_string SELECT 3, MAP(null, 'k3', 'k4', 'v4'); -- 
NULL as key, fails
{code}

leads to:
{code}
Caused by: java.lang.NullPointerException
at 
org.apache.parquet.io.api.Binary$FromStringBinary.encodeUTF8(Binary.java:218)
at 
org.apache.parquet.io.api.Binary$FromStringBinary.(Binary.java:209)
at org.apache.parquet.io.api.Binary.fromString(Binary.java:537)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$StringDataWriter.write(DataWritableWriter.java:474)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MapDataWriter.write(DataWritableWriter.java:354)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:228)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:251)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:115)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:76)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:35)
at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
at 
org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:182)
at 
org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:44)
at 
org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:161)
at 
org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:174)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1160)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:173)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:155)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
{code}

  was:
not sure if this is a bug or a feature request, but I haven't been able to test 
NULL as map in the scope of HIVE-23688:
{code}
CREATE TABLE parquet_map_type_string (
id int,
stringMap map
) stored as parquet;

insert into parquet_map_type_string SELECT 1, MAP('k1', null, 'k2', 'v2'); -- 
NULL as value
insert into parquet_map_type_string (id) VALUES (2) -- NULL as map;
{code}

leads to:
{code}
Caused by: java.lang.NullPointerException
at 
org.apache.parquet.io.api.Binary$FromStringBinary.encodeUTF8(Binary.java:218)
at 
org.apache.parquet.io.api.Binary$FromStringBinary.(Binary.java:209)
at org.apache.parquet.io.api.Binary.fromString(Binary.java:537)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$StringDataWriter.write(DataWritableWriter.java:474)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MapDataWriter.write(DataWritableWriter.java:354)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:228)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:251)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:115)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:76)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:35)
at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
at 
org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:182)
at 
org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:44)
at 
org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:161)
at 

[jira] [Updated] (HIVE-25484) NULL cannot be inserted as a Parquet map

2021-08-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-25484:

Description: 
not sure if this is a bug or a feature request, but I haven't been able to test 
NULL as map in the scope of HIVE-23688:
{code}
CREATE TABLE parquet_map_type_string (
id int,
stringMap map
) stored as parquet;

insert into parquet_map_type_string SELECT 1, MAP('k1', null, 'k2', 'v2'); -- 
NULL as value
insert into parquet_map_type_string (id) VALUES (2) -- NULL as map;
{code}

leads to:
{code}
Caused by: java.lang.NullPointerException
at 
org.apache.parquet.io.api.Binary$FromStringBinary.encodeUTF8(Binary.java:218)
at 
org.apache.parquet.io.api.Binary$FromStringBinary.(Binary.java:209)
at org.apache.parquet.io.api.Binary.fromString(Binary.java:537)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$StringDataWriter.write(DataWritableWriter.java:474)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MapDataWriter.write(DataWritableWriter.java:354)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:228)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:251)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:115)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:76)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:35)
at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
at 
org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:182)
at 
org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:44)
at 
org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:161)
at 
org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:174)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1160)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:173)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:155)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
{code}

  was:
{code}
CREATE TABLE parquet_map_type_string (
id int,
stringMap map
) stored as parquet;

insert into parquet_map_type_string SELECT 1, MAP('k1', null, 'k2', 'v2'); -- 
NULL as value
insert into parquet_map_type_string (id) VALUES (2) -- NULL as map;
{code}

leads to:
{code}
Caused by: java.lang.NullPointerException
at 
org.apache.parquet.io.api.Binary$FromStringBinary.encodeUTF8(Binary.java:218)
at 
org.apache.parquet.io.api.Binary$FromStringBinary.(Binary.java:209)
at org.apache.parquet.io.api.Binary.fromString(Binary.java:537)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$StringDataWriter.write(DataWritableWriter.java:474)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MapDataWriter.write(DataWritableWriter.java:354)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:228)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:251)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:115)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:76)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:35)
at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
at 
org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:182)
at 
org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:44)
at 
org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:161)
at 
org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:174)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1160)
at 

[jira] [Updated] (HIVE-25484) NULL cannot be inserted as a Parquet map

2021-08-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-25484:

Description: 
{code}
CREATE TABLE parquet_map_type_string (
id int,
stringMap map
) stored as parquet;

insert into parquet_map_type_string SELECT 1, MAP('k1', null, 'k2', 'v2'); -- 
NULL as value
insert into parquet_map_type_string (id) VALUES (2) -- NULL as map;
{code}

leads to:
{code}
Caused by: java.lang.NullPointerException
at 
org.apache.parquet.io.api.Binary$FromStringBinary.encodeUTF8(Binary.java:218)
at 
org.apache.parquet.io.api.Binary$FromStringBinary.(Binary.java:209)
at org.apache.parquet.io.api.Binary.fromString(Binary.java:537)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$StringDataWriter.write(DataWritableWriter.java:474)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MapDataWriter.write(DataWritableWriter.java:354)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:228)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:251)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:115)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:76)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:35)
at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
at 
org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:182)
at 
org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:44)
at 
org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:161)
at 
org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:174)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1160)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:173)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:155)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
{code}

> NULL cannot be inserted as a Parquet map
> 
>
> Key: HIVE-25484
> URL: https://issues.apache.org/jira/browse/HIVE-25484
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> {code}
> CREATE TABLE parquet_map_type_string (
> id int,
> stringMap map
> ) stored as parquet;
> insert into parquet_map_type_string SELECT 1, MAP('k1', null, 'k2', 'v2'); -- 
> NULL as value
> insert into parquet_map_type_string (id) VALUES (2) -- NULL as map;
> {code}
> leads to:
> {code}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.parquet.io.api.Binary$FromStringBinary.encodeUTF8(Binary.java:218)
>   at 
> org.apache.parquet.io.api.Binary$FromStringBinary.(Binary.java:209)
>   at org.apache.parquet.io.api.Binary.fromString(Binary.java:537)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$StringDataWriter.write(DataWritableWriter.java:474)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MapDataWriter.write(DataWritableWriter.java:354)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:228)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:251)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:115)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:76)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:35)
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
>   at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:182)
>   at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:44)
>   at 
> 

[jira] [Assigned] (HIVE-25484) NULL cannot be inserted as a Parquet map

2021-08-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor reassigned HIVE-25484:
---


> NULL cannot be inserted as a Parquet map
> 
>
> Key: HIVE-25484
> URL: https://issues.apache.org/jira/browse/HIVE-25484
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25483) TxnHandler::acquireLock should close the DB conn to avoid connection leaks

2021-08-26 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-25483:

Description: 
TxnHandler::acquireLock should close DB connection on exiting the function. 

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L5688]

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L5726]

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L5737-L5740]

 If there are any exceptions downstream, this connection isn't closed cleanly. 
In a corner case, hikari connection leak detector reported the following
{noformat}
2021-08-26 09:19:18,102 WARN  com.zaxxer.hikari.pool.ProxyLeakTask: 
[HikariPool-4 housekeeper]: Connection leak detection triggered for 
org.postgresql.jdbc.PgConnection@77f76747, stack trace follows
java.lang.Exception: Apparent connection leak detected
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.getDbConn(TxnHandler.java:3843)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.acquireLock(TxnHandler.java:5135)
 
at 
org.apache.hadoop.hive.ql.txn.compactor.Initiator.run(Initiator.java:107) 
{noformat}

  was:
TxnHandler::acquireLock should close DB connection on exiting the function.

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L5688]

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L5726]

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L5737-L5740]

 


> TxnHandler::acquireLock should close the DB conn to avoid connection leaks
> --
>
> Key: HIVE-25483
> URL: https://issues.apache.org/jira/browse/HIVE-25483
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>
> TxnHandler::acquireLock should close DB connection on exiting the function. 
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L5688]
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L5726]
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L5737-L5740]
>  If there are any exceptions downstream, this connection isn't closed 
> cleanly. In a corner case, hikari connection leak detector reported the 
> following
> {noformat}
> 2021-08-26 09:19:18,102 WARN  com.zaxxer.hikari.pool.ProxyLeakTask: 
> [HikariPool-4 housekeeper]: Connection leak detection triggered for 
> org.postgresql.jdbc.PgConnection@77f76747, stack trace follows
> java.lang.Exception: Apparent connection leak detected
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.getDbConn(TxnHandler.java:3843)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.acquireLock(TxnHandler.java:5135)
>  
> at 
> org.apache.hadoop.hive.ql.txn.compactor.Initiator.run(Initiator.java:107) 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24292) hive webUI should support keystoretype by config

2021-08-26 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24292.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

this was merged in october; but the jira was left open

> hive webUI should support keystoretype by config
> 
>
> Key: HIVE-24292
> URL: https://issues.apache.org/jira/browse/HIVE-24292
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We need a property to pass-in  keystore type in webui too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25329) CTAS creates a managed table as non-ACID table

2021-08-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-25329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405051#comment-17405051
 ] 

László Bodor commented on HIVE-25329:
-

PR merged to master, thanks [~robbiezhang] for the patch!

> CTAS creates a managed table as non-ACID table
> --
>
> Key: HIVE-25329
> URL: https://issues.apache.org/jira/browse/HIVE-25329
> Project: Hive
>  Issue Type: Bug
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> According to HIVE-22158,  MANAGED tables should be ACID tables only. When we 
> set hive.create.as.external.legacy to true, the query like 'create managed 
> table as select 1' creates a non-ACID table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25329) CTAS creates a managed table as non-ACID table

2021-08-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor resolved HIVE-25329.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> CTAS creates a managed table as non-ACID table
> --
>
> Key: HIVE-25329
> URL: https://issues.apache.org/jira/browse/HIVE-25329
> Project: Hive
>  Issue Type: Bug
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> According to HIVE-22158,  MANAGED tables should be ACID tables only. When we 
> set hive.create.as.external.legacy to true, the query like 'create managed 
> table as select 1' creates a non-ACID table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25329) CTAS creates a managed table as non-ACID table

2021-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25329?focusedWorklogId=642203=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642203
 ]

ASF GitHub Bot logged work on HIVE-25329:
-

Author: ASF GitHub Bot
Created on: 26/Aug/21 08:17
Start Date: 26/Aug/21 08:17
Worklog Time Spent: 10m 
  Work Description: abstractdog merged pull request #2477:
URL: https://github.com/apache/hive/pull/2477


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 642203)
Time Spent: 1h 50m  (was: 1h 40m)

> CTAS creates a managed table as non-ACID table
> --
>
> Key: HIVE-25329
> URL: https://issues.apache.org/jira/browse/HIVE-25329
> Project: Hive
>  Issue Type: Bug
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> According to HIVE-22158,  MANAGED tables should be ACID tables only. When we 
> set hive.create.as.external.legacy to true, the query like 'create managed 
> table as select 1' creates a non-ACID table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24948) Enhancing performance of OrcInputFormat.getSplits with bucket pruning

2021-08-26 Thread Eugene Chung (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Chung updated HIVE-24948:

Description: 
The summarized flow of generating input splits at Tez AM is like below; (by 
calling HiveSplitGenerator.initialize())
 # Perform dynamic partition pruning
 # Get the list of InputSplit by calling InputFormat.getSplits()
 
[https://github.com/apache/hive/blob/624f62aadc08577cafaa299cfcf17c71fa6cdb3a/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java#L260-L260]
 # Perform bucket pruning with the list above if it's possible
 
[https://github.com/apache/hive/blob/624f62aadc08577cafaa299cfcf17c71fa6cdb3a/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java#L299-L301]

But I observed that the action 2, getting the list of InputSplit, can make big 
overhead when the inputs are ORC files in HDFS. 

For example, there is a ORC table T partitioned by 'log_date' and each 
partition is bucketed by a column 'q'. There are 240 buckets in each partition 
and the size of each bucket(ORC file) is, let's say, 100MB.

The SQL is like this.  
{noformat}
set hive.tez.bucket.pruning=true;
select count(*) from T
where log_date between '2020-01-01' and '2020-06-30'
and q = 'foobar';{noformat}
It means there are 240 * 183(days) = 43680 ORC files in the input paths, but 
thanks to bucket pruning, only 183 files should be processed.

In my company's environment, the whole processing time of the SQL was roughly 5 
minutes. However, I've checked that it took more than 3 minutes to make the 
list of OrcSplit for 43680 ORC files. The logs with tez.am.log.level=DEBUG 
showed like below;
{noformat}
2021-03-25 01:21:31,850 [DEBUG] [InputInitializer {Map 1} #0] 
|orc.OrcInputFormat|: getSplits started
...
2021-03-25 01:24:51,435 [DEBUG] [InputInitializer {Map 1} #0] 
|orc.OrcInputFormat|: getSplits finished
2021-03-25 01:24:51,444 [INFO] [InputInitializer {Map 1} #0] 
|io.HiveInputFormat|: number of splits 43680
2021-03-25 01:24:51,444 [DEBUG] [InputInitializer {Map 1} #0] |log.PerfLogger|: 
/PERFLOG method=getSplits start=1616602891776 end=1616602891776 
duration=199668 from=org.apache.hadoop.hive.ql.io.HiveInputFormat
...
2021-03-25 01:26:03,385 [INFO] [Dispatcher thread {Central}] 
|app.DAGAppMaster|: DAG completed, dagId=dag_1615862187190_731117_1, 
dagState=SUCCEEDED {noformat}
43680 - 183 = 43497 InputSplits which consume about 60% of entire processing 
time are just simply discarded by the action 3, pruneBuckets().

 

With bucket pruning, I think making the whole list of ORC input splits is not 
necessary.

Therefore, I suggest that the flow would be like this;
 # Perform dynamic partition pruning
 # Get the list of InputSplit by calling InputFormat.getSplits()
 ## OrcInputFormat.getSplits() returns the bucket-pruned list if BitSet from 
FixedBucketPruningOptimizer exists

  was:
The summarized flow of generating input splits at Tez AM is like below; (by 
calling HiveSplitGenerator.initialize())
 # Perform dynamic partition pruning
 # Get the list of InputSplit by calling InputFormat.getSplits()
 
[https://github.com/apache/hive/blob/624f62aadc08577cafaa299cfcf17c71fa6cdb3a/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java#L260-L260]
 # Perform bucket pruning with the list above if it's possible
 
[https://github.com/apache/hive/blob/624f62aadc08577cafaa299cfcf17c71fa6cdb3a/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java#L299-L301]

But I observed that the action 2, getting the list of InputSplit, can make big 
overhead when the inputs are ORC files in HDFS. 

For example, there is a ORC table T partitioned by 'log_date' and each 
partition is bucketed by a column 'q'. There are 240 buckets in each partition 
and the size of each bucket(ORC file) is, let's say, 100MB.

The SQL is like this.  
{noformat}
set hive.tez.bucket.pruning=true;
select count(*) from T
where log_date between '2020-01-01' and '2020-06-30'
and q = 'foobar';{noformat}
It means there are 240 * 183(days) = 43680 ORC files in the input paths, but 
thanks to bucket pruning, only 183 files should be processed.

In my company's environment, the whole processing time of the SQL was roughly 5 
minutes. However, I've checked that it took more than 3 minutes to make the 
list of OrcSplit for 43680 ORC files. The logs with tez.am.log.level=DEBUG 
showed like below;
{noformat}
2021-03-25 01:21:31,850 [DEBUG] [InputInitializer {Map 1} #0] 
|orc.OrcInputFormat|: getSplits started
...
2021-03-25 01:24:51,435 [DEBUG] [InputInitializer {Map 1} #0] 
|orc.OrcInputFormat|: getSplits finished
2021-03-25 01:24:51,444 [INFO] [InputInitializer {Map 1} #0] 
|io.HiveInputFormat|: number of splits 43680
2021-03-25 01:24:51,444 [DEBUG] [InputInitializer {Map 1} #0] |log.PerfLogger|: 
/PERFLOG method=getSplits start=1616602891776 end=1616602891776 
duration=199668