[jira] [Work logged] (HIVE-25912) Drop external table at root of s3 bucket throws NPE

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25912?focusedWorklogId=722620=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-722620
 ]

ASF GitHub Bot logged work on HIVE-25912:
-

Author: ASF GitHub Bot
Created on: 08/Feb/22 07:31
Start Date: 08/Feb/22 07:31
Worklog Time Spent: 10m 
  Work Description: baifachuan opened a new pull request #2987:
URL: https://github.com/apache/hive/pull/2987


   ### What changes were proposed in this pull request?
   
   modify the HiveMetaStore. create_table_core function add this check:
   
   ```
   if (!MetaStoreUtils.validateTblStorage(tbl.getSd())) {
   throw new InvalidObjectException(tbl.getTableName()
   + " location must not be root path");
   }
   ```
   If the path.getParent() is NULL we can be sure the location is ROOT path.  
   
   the validateTblStorage implements:
   
   ```
   /*
 * Check the table storage location must not be root path.
 */
static public boolean validateTblStorage(StorageDescriptor sd) {
  return !(StringUtils.isNotBlank(sd.getLocation())
  && new Path(sd.getLocation()).getParent() == null);
}
   ```
   
   
   ### Why are the changes needed?
   If I create an external table using the ROOT path, the table was created 
successfully, but when I drop the table throw the NPE.  So I can't drop the 
table forever.
   
   This is not a good phenomenon.
   
   ### Does this PR introduce _any_ user-facing change?
   no
   
   
   ### How was this patch tested?
   mvn test -Dtest=SomeTest --pl common
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 722620)
Remaining Estimate: 82.5h  (was: 82h 40m)
Time Spent: 13.5h  (was: 13h 20m)

> Drop external table at root of s3 bucket throws NPE
> ---
>
> Key: HIVE-25912
> URL: https://issues.apache.org/jira/browse/HIVE-25912
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.1.2
> Environment: Hive version: 3.1.2
>Reporter: Fachuan Bai
>Assignee: Fachuan Bai
>Priority: Major
>  Labels: metastore, pull-request-available
> Attachments: hive bugs.png
>
>   Original Estimate: 96h
>  Time Spent: 13.5h
>  Remaining Estimate: 82.5h
>
> I create the external hive table using this command:
>  
> {code:java}
> CREATE EXTERNAL TABLE `fcbai`(
> `inv_item_sk` int,
> `inv_warehouse_sk` int,
> `inv_quantity_on_hand` int)
> PARTITIONED BY (
> `inv_date_sk` int) STORED AS ORC
> LOCATION
> 'hdfs://emr-master-1:8020/';
> {code}
>  
> The table was created successfully, but  when I drop the table throw the NPE:
>  
> {code:java}
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
> MetaException(message:java.lang.NullPointerException) 
> (state=08S01,code=1){code}
>  
> The same bug can reproduction on the other object storage file system, such 
> as S3 or TOS:
> {code:java}
> CREATE EXTERNAL TABLE `fcbai`(
> `inv_item_sk` int,
> `inv_warehouse_sk` int,
> `inv_quantity_on_hand` int)
> PARTITIONED BY (
> `inv_date_sk` int) STORED AS ORC
> LOCATION
> 's3a://bucketname/'; // 'tos://bucketname/'{code}
>  
> I see the source code found:
>  common/src/java/org/apache/hadoop/hive/common/FileUtils.java
> {code:java}
> // check if sticky bit is set on the parent dir
> FileStatus parStatus = fs.getFileStatus(path.getParent());
> if (!shims.hasStickyBit(parStatus.getPermission())) {
>   // no sticky bit, so write permission on parent dir is sufficient
>   // no further checks needed
>   return;
> }{code}
>  
> because I set the table location to HDFS root path 
> (hdfs://emr-master-1:8020/), so the  path.getParent() function will be return 
> null cause the NPE.
> I think have four solutions to fix the bug:
>  # modify the create table function, if the location is root dir return 
> create table fail.
>  # modify the  FileUtils.checkDeletePermission function, check the 
> path.getParent(), if it is null, the function return, drop successfully.
>  # modify the RangerHiveAuthorizer.checkPrivileges function of the hive 
> ranger plugin(in ranger rep), if the location is root dir return create table 
> fail.
>  # modify the HDFS Path object, if the URI is root dir, path.getParent() 
> return not null.
> I recommend the first or second method, any suggestion for me? thx.
>  
>  



--
This message was sent by Atlassian Jira

[jira] [Work logged] (HIVE-25912) Drop external table at root of s3 bucket throws NPE

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25912?focusedWorklogId=722619=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-722619
 ]

ASF GitHub Bot logged work on HIVE-25912:
-

Author: ASF GitHub Bot
Created on: 08/Feb/22 07:30
Start Date: 08/Feb/22 07:30
Worklog Time Spent: 10m 
  Work Description: baifachuan removed a comment on pull request #2987:
URL: https://github.com/apache/hive/pull/2987#issuecomment-1032292606


   @steveloughran can you invite me to join the hive slack discuss group? or 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 722619)
Remaining Estimate: 82h 40m  (was: 82h 50m)
Time Spent: 13h 20m  (was: 13h 10m)

> Drop external table at root of s3 bucket throws NPE
> ---
>
> Key: HIVE-25912
> URL: https://issues.apache.org/jira/browse/HIVE-25912
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.1.2
> Environment: Hive version: 3.1.2
>Reporter: Fachuan Bai
>Assignee: Fachuan Bai
>Priority: Major
>  Labels: metastore, pull-request-available
> Attachments: hive bugs.png
>
>   Original Estimate: 96h
>  Time Spent: 13h 20m
>  Remaining Estimate: 82h 40m
>
> I create the external hive table using this command:
>  
> {code:java}
> CREATE EXTERNAL TABLE `fcbai`(
> `inv_item_sk` int,
> `inv_warehouse_sk` int,
> `inv_quantity_on_hand` int)
> PARTITIONED BY (
> `inv_date_sk` int) STORED AS ORC
> LOCATION
> 'hdfs://emr-master-1:8020/';
> {code}
>  
> The table was created successfully, but  when I drop the table throw the NPE:
>  
> {code:java}
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
> MetaException(message:java.lang.NullPointerException) 
> (state=08S01,code=1){code}
>  
> The same bug can reproduction on the other object storage file system, such 
> as S3 or TOS:
> {code:java}
> CREATE EXTERNAL TABLE `fcbai`(
> `inv_item_sk` int,
> `inv_warehouse_sk` int,
> `inv_quantity_on_hand` int)
> PARTITIONED BY (
> `inv_date_sk` int) STORED AS ORC
> LOCATION
> 's3a://bucketname/'; // 'tos://bucketname/'{code}
>  
> I see the source code found:
>  common/src/java/org/apache/hadoop/hive/common/FileUtils.java
> {code:java}
> // check if sticky bit is set on the parent dir
> FileStatus parStatus = fs.getFileStatus(path.getParent());
> if (!shims.hasStickyBit(parStatus.getPermission())) {
>   // no sticky bit, so write permission on parent dir is sufficient
>   // no further checks needed
>   return;
> }{code}
>  
> because I set the table location to HDFS root path 
> (hdfs://emr-master-1:8020/), so the  path.getParent() function will be return 
> null cause the NPE.
> I think have four solutions to fix the bug:
>  # modify the create table function, if the location is root dir return 
> create table fail.
>  # modify the  FileUtils.checkDeletePermission function, check the 
> path.getParent(), if it is null, the function return, drop successfully.
>  # modify the RangerHiveAuthorizer.checkPrivileges function of the hive 
> ranger plugin(in ranger rep), if the location is root dir return create table 
> fail.
>  # modify the HDFS Path object, if the URI is root dir, path.getParent() 
> return not null.
> I recommend the first or second method, any suggestion for me? thx.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25912) Drop external table at root of s3 bucket throws NPE

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25912?focusedWorklogId=722617=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-722617
 ]

ASF GitHub Bot logged work on HIVE-25912:
-

Author: ASF GitHub Bot
Created on: 08/Feb/22 07:29
Start Date: 08/Feb/22 07:29
Worklog Time Spent: 10m 
  Work Description: baifachuan commented on pull request #2987:
URL: https://github.com/apache/hive/pull/2987#issuecomment-1032292606


   @steveloughran can you invite me to join the hive slack discuss group? or 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 722617)
Remaining Estimate: 83h  (was: 83h 10m)
Time Spent: 13h  (was: 12h 50m)

> Drop external table at root of s3 bucket throws NPE
> ---
>
> Key: HIVE-25912
> URL: https://issues.apache.org/jira/browse/HIVE-25912
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.1.2
> Environment: Hive version: 3.1.2
>Reporter: Fachuan Bai
>Assignee: Fachuan Bai
>Priority: Major
>  Labels: metastore, pull-request-available
> Attachments: hive bugs.png
>
>   Original Estimate: 96h
>  Time Spent: 13h
>  Remaining Estimate: 83h
>
> I create the external hive table using this command:
>  
> {code:java}
> CREATE EXTERNAL TABLE `fcbai`(
> `inv_item_sk` int,
> `inv_warehouse_sk` int,
> `inv_quantity_on_hand` int)
> PARTITIONED BY (
> `inv_date_sk` int) STORED AS ORC
> LOCATION
> 'hdfs://emr-master-1:8020/';
> {code}
>  
> The table was created successfully, but  when I drop the table throw the NPE:
>  
> {code:java}
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
> MetaException(message:java.lang.NullPointerException) 
> (state=08S01,code=1){code}
>  
> The same bug can reproduction on the other object storage file system, such 
> as S3 or TOS:
> {code:java}
> CREATE EXTERNAL TABLE `fcbai`(
> `inv_item_sk` int,
> `inv_warehouse_sk` int,
> `inv_quantity_on_hand` int)
> PARTITIONED BY (
> `inv_date_sk` int) STORED AS ORC
> LOCATION
> 's3a://bucketname/'; // 'tos://bucketname/'{code}
>  
> I see the source code found:
>  common/src/java/org/apache/hadoop/hive/common/FileUtils.java
> {code:java}
> // check if sticky bit is set on the parent dir
> FileStatus parStatus = fs.getFileStatus(path.getParent());
> if (!shims.hasStickyBit(parStatus.getPermission())) {
>   // no sticky bit, so write permission on parent dir is sufficient
>   // no further checks needed
>   return;
> }{code}
>  
> because I set the table location to HDFS root path 
> (hdfs://emr-master-1:8020/), so the  path.getParent() function will be return 
> null cause the NPE.
> I think have four solutions to fix the bug:
>  # modify the create table function, if the location is root dir return 
> create table fail.
>  # modify the  FileUtils.checkDeletePermission function, check the 
> path.getParent(), if it is null, the function return, drop successfully.
>  # modify the RangerHiveAuthorizer.checkPrivileges function of the hive 
> ranger plugin(in ranger rep), if the location is root dir return create table 
> fail.
>  # modify the HDFS Path object, if the URI is root dir, path.getParent() 
> return not null.
> I recommend the first or second method, any suggestion for me? thx.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25912) Drop external table at root of s3 bucket throws NPE

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25912?focusedWorklogId=722618=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-722618
 ]

ASF GitHub Bot logged work on HIVE-25912:
-

Author: ASF GitHub Bot
Created on: 08/Feb/22 07:29
Start Date: 08/Feb/22 07:29
Worklog Time Spent: 10m 
  Work Description: baifachuan closed pull request #2987:
URL: https://github.com/apache/hive/pull/2987


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 722618)
Remaining Estimate: 82h 50m  (was: 83h)
Time Spent: 13h 10m  (was: 13h)

> Drop external table at root of s3 bucket throws NPE
> ---
>
> Key: HIVE-25912
> URL: https://issues.apache.org/jira/browse/HIVE-25912
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.1.2
> Environment: Hive version: 3.1.2
>Reporter: Fachuan Bai
>Assignee: Fachuan Bai
>Priority: Major
>  Labels: metastore, pull-request-available
> Attachments: hive bugs.png
>
>   Original Estimate: 96h
>  Time Spent: 13h 10m
>  Remaining Estimate: 82h 50m
>
> I create the external hive table using this command:
>  
> {code:java}
> CREATE EXTERNAL TABLE `fcbai`(
> `inv_item_sk` int,
> `inv_warehouse_sk` int,
> `inv_quantity_on_hand` int)
> PARTITIONED BY (
> `inv_date_sk` int) STORED AS ORC
> LOCATION
> 'hdfs://emr-master-1:8020/';
> {code}
>  
> The table was created successfully, but  when I drop the table throw the NPE:
>  
> {code:java}
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
> MetaException(message:java.lang.NullPointerException) 
> (state=08S01,code=1){code}
>  
> The same bug can reproduction on the other object storage file system, such 
> as S3 or TOS:
> {code:java}
> CREATE EXTERNAL TABLE `fcbai`(
> `inv_item_sk` int,
> `inv_warehouse_sk` int,
> `inv_quantity_on_hand` int)
> PARTITIONED BY (
> `inv_date_sk` int) STORED AS ORC
> LOCATION
> 's3a://bucketname/'; // 'tos://bucketname/'{code}
>  
> I see the source code found:
>  common/src/java/org/apache/hadoop/hive/common/FileUtils.java
> {code:java}
> // check if sticky bit is set on the parent dir
> FileStatus parStatus = fs.getFileStatus(path.getParent());
> if (!shims.hasStickyBit(parStatus.getPermission())) {
>   // no sticky bit, so write permission on parent dir is sufficient
>   // no further checks needed
>   return;
> }{code}
>  
> because I set the table location to HDFS root path 
> (hdfs://emr-master-1:8020/), so the  path.getParent() function will be return 
> null cause the NPE.
> I think have four solutions to fix the bug:
>  # modify the create table function, if the location is root dir return 
> create table fail.
>  # modify the  FileUtils.checkDeletePermission function, check the 
> path.getParent(), if it is null, the function return, drop successfully.
>  # modify the RangerHiveAuthorizer.checkPrivileges function of the hive 
> ranger plugin(in ranger rep), if the location is root dir return create table 
> fail.
>  # modify the HDFS Path object, if the URI is root dir, path.getParent() 
> return not null.
> I recommend the first or second method, any suggestion for me? thx.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25397) Snapshot support for controlled failover

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25397?focusedWorklogId=722602=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-722602
 ]

ASF GitHub Bot logged work on HIVE-25397:
-

Author: ASF GitHub Bot
Created on: 08/Feb/22 06:19
Start Date: 08/Feb/22 06:19
Worklog Time Spent: 10m 
  Work Description: ArkoSharma opened a new pull request #2539:
URL: https://github.com/apache/hive/pull/2539


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 722602)
Time Spent: 5h  (was: 4h 50m)

> Snapshot support for controlled failover
> 
>
> Key: HIVE-25397
> URL: https://issues.apache.org/jira/browse/HIVE-25397
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> In case the same locations are used for external tables on the source and 
> target, then the snapshots created during replication can be re-used during 
> reverse replication. This patch enables re-using the snapshots  during 
> reverse replication using a configuration.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25397) Snapshot support for controlled failover

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25397?focusedWorklogId=722601=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-722601
 ]

ASF GitHub Bot logged work on HIVE-25397:
-

Author: ASF GitHub Bot
Created on: 08/Feb/22 06:19
Start Date: 08/Feb/22 06:19
Worklog Time Spent: 10m 
  Work Description: ArkoSharma closed pull request #2539:
URL: https://github.com/apache/hive/pull/2539


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 722601)
Time Spent: 4h 50m  (was: 4h 40m)

> Snapshot support for controlled failover
> 
>
> Key: HIVE-25397
> URL: https://issues.apache.org/jira/browse/HIVE-25397
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> In case the same locations are used for external tables on the source and 
> target, then the snapshots created during replication can be re-used during 
> reverse replication. This patch enables re-using the snapshots  during 
> reverse replication using a configuration.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25863) join result is null

2022-02-07 Thread zengxl (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17488533#comment-17488533
 ] 

zengxl commented on HIVE-25863:
---

The three attachments are the corresponding test data

> join result is null
> ---
>
> Key: HIVE-25863
> URL: https://issues.apache.org/jira/browse/HIVE-25863
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.2
> Environment: hadoop 3.2.1
> hive 3.1.2
>Reporter: zengxl
>Priority: Blocker
> Attachments: test_partitions_2021_12_21_shuffle_1, 
> test_sds_2021_12_21_shuffle_1_new, test_tbls_2021_12_21_shuffle
>
>
> When I change the number of Reduce, the query result will change.Either inner 
> join or left join will appear.Partial join results for the 
> {color:#de350b}third table{color}(pdwd.test_sds_2021_12_21_shuffle_1_new) are 
> {color:#de350b}null{color}.When there is only one Reduce, the results are all 
> correct.
> when set hive.exec.reducers.bytes.per.reducer=256000 only one reduce;
> when set hive.exec.reducers.bytes.per.reducer=2560 has four reduce
> Here is my SQL and data
> {code:java}
> CREATE TABLE pdwd.hive_ah3_metastore_tbl_partitions_sds_test_20220112(
>   tbl_id bigint COMMENT 'TBL_ID', 
>   tbl_create_time bigint COMMENT 'TBL_CREATE_TIME', 
>   db_id bigint COMMENT 'DB_ID', 
>   tbl_last_access_time bigint COMMENT 'TBL_LAST_ACCESS_TIME', 
>   owner string COMMENT 'OWNER', 
>   retention bigint COMMENT 'RETENTION', 
>   sd_id bigint COMMENT 'SD_ID', 
>   tbl_name string COMMENT 'TBL_NAME', 
>   tbl_type string COMMENT 'TBL_TYPE', 
>   view_expanded_text string COMMENT 'VIEW_EXPANDED_TEXT', 
>   view_original_text string COMMENT 'VIEW_ORIGINAL_TEXT', 
>   is_rewrite_enabled bigint COMMENT 'IS_REWRITE_ENABLED', 
>   tbl_owner_type string COMMENT 'TBL_OWNER_TYPE', 
>   cd_id bigint COMMENT 'CD_ID', 
>   input_format string COMMENT 'INPUT_FORMAT', 
>   is_compressed bigint COMMENT 'IS_COMPRESSED', 
>   is_storedassubdirectories bigint COMMENT 'IS_STOREDASSUBDIRECTORIES', 
>   tbl_or_part_location string COMMENT 'tbl_or_part_location', 
>   num_buckets bigint COMMENT 'NUM_BUCKETS', 
>   output_format string COMMENT 'OUTPUT_FORMAT', 
>   serde_id bigint COMMENT 'SERDE_ID', 
>   part_id bigint COMMENT 'PART_ID', 
>   part_create_time bigint COMMENT 'PART_CREATE_TIME', 
>   part_last_access_time bigint COMMENT 'PART_LAST_ACCESS_TIME', 
>   part_name string COMMENT 'PART_NAME')
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
> STORED AS INPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
> OUTPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
>   
> CREATE TABLE pdwd.test_partitions_2021_12_21_shuffle_1(
>   part_id bigint, 
>   create_time bigint, 
>   last_access_time bigint, 
>   part_name string, 
>   sd_id bigint, 
>   tbl_id bigint)
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
> STORED AS INPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
> OUTPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
>   
> CREATE TABLE pdwd.test_tbls_2021_12_21_shuffle(
>   tbl_id bigint, 
>   create_time bigint, 
>   db_id bigint, 
>   last_access_time bigint, 
>   owner string, 
>   retention bigint, 
>   sd_id bigint, 
>   tbl_name string, 
>   tbl_type string, 
>   view_expanded_text string, 
>   view_original_text string, 
>   is_rewrite_enabled bigint, 
>   owner_type string)
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
> STORED AS INPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
> OUTPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
>   
> CREATE TABLE pdwd.test_sds_2021_12_21_shuffle_1_new(
>   sd_id bigint, 
>   cd_id bigint, 
>   input_format string, 
>   is_compressed bigint, 
>   is_storedassubdirectories bigint, 
>   _c5 string, 
>   num_buckets bigint, 
>   output_format string, 
>   serde_id bigint)
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
> STORED AS INPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
> OUTPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
> set hive.stats.column.autogather=false;
> set hive.exec.reducers.bytes.per.reducer=2560;
> set hive.auto.convert.join=false;  
> insert overwrite table 
> pdwd.hive_ah3_metastore_tbl_partitions_sds_test_20220112
> select
> a.tbl_id,
> b.create_time as tbl_create_time,
> b.db_id,
> b.last_access_time as tbl_last_access_time,
> b.owner,
> b.retention,
> a.sd_id,
> b.tbl_name,
> b.tbl_type,
> b.view_expanded_text,
> b.view_original_text,
> b.is_rewrite_enabled,
> b.owner_type as tbl_owner_type,
> d.cd_id,
> d.input_format,
> d.is_compressed,
> d.is_storedassubdirectories,
> d.tbl_location,
> d.num_buckets,
> d.output_format,
> d.serde_id,
> a.part_id,
> 

[jira] [Work logged] (HIVE-25912) Drop external table at root of s3 bucket throws NPE

2022-02-07 Thread Fachuan Bai (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25912?focusedWorklogId=722509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-722509
 ]

Fachuan Bai logged work on HIVE-25912:
--

Author: Fachuan Bai
Created on: 08/Feb/22 01:40
Start Date: 07/Feb/22 09:40
Worklog Time Spent: 12h 

Issue Time Tracking
---

Worklog Id: (was: 722509)
Remaining Estimate: 83h 10m  (was: 95h 10m)
Time Spent: 12h 50m  (was: 50m)

> Drop external table at root of s3 bucket throws NPE
> ---
>
> Key: HIVE-25912
> URL: https://issues.apache.org/jira/browse/HIVE-25912
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.1.2
> Environment: Hive version: 3.1.2
>Reporter: Fachuan Bai
>Assignee: Fachuan Bai
>Priority: Major
>  Labels: metastore, pull-request-available
> Attachments: hive bugs.png
>
>   Original Estimate: 96h
>  Time Spent: 12h 50m
>  Remaining Estimate: 83h 10m
>
> I create the external hive table using this command:
>  
> {code:java}
> CREATE EXTERNAL TABLE `fcbai`(
> `inv_item_sk` int,
> `inv_warehouse_sk` int,
> `inv_quantity_on_hand` int)
> PARTITIONED BY (
> `inv_date_sk` int) STORED AS ORC
> LOCATION
> 'hdfs://emr-master-1:8020/';
> {code}
>  
> The table was created successfully, but  when I drop the table throw the NPE:
>  
> {code:java}
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
> MetaException(message:java.lang.NullPointerException) 
> (state=08S01,code=1){code}
>  
> The same bug can reproduction on the other object storage file system, such 
> as S3 or TOS:
> {code:java}
> CREATE EXTERNAL TABLE `fcbai`(
> `inv_item_sk` int,
> `inv_warehouse_sk` int,
> `inv_quantity_on_hand` int)
> PARTITIONED BY (
> `inv_date_sk` int) STORED AS ORC
> LOCATION
> 's3a://bucketname/'; // 'tos://bucketname/'{code}
>  
> I see the source code found:
>  common/src/java/org/apache/hadoop/hive/common/FileUtils.java
> {code:java}
> // check if sticky bit is set on the parent dir
> FileStatus parStatus = fs.getFileStatus(path.getParent());
> if (!shims.hasStickyBit(parStatus.getPermission())) {
>   // no sticky bit, so write permission on parent dir is sufficient
>   // no further checks needed
>   return;
> }{code}
>  
> because I set the table location to HDFS root path 
> (hdfs://emr-master-1:8020/), so the  path.getParent() function will be return 
> null cause the NPE.
> I think have four solutions to fix the bug:
>  # modify the create table function, if the location is root dir return 
> create table fail.
>  # modify the  FileUtils.checkDeletePermission function, check the 
> path.getParent(), if it is null, the function return, drop successfully.
>  # modify the RangerHiveAuthorizer.checkPrivileges function of the hive 
> ranger plugin(in ranger rep), if the location is root dir return create table 
> fail.
>  # modify the HDFS Path object, if the URI is root dir, path.getParent() 
> return not null.
> I recommend the first or second method, any suggestion for me? thx.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25514) Alter table with partitions should honor {OWNER} policies from Apache Ranger in the HMS

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25514?focusedWorklogId=722451=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-722451
 ]

ASF GitHub Bot logged work on HIVE-25514:
-

Author: ASF GitHub Bot
Created on: 08/Feb/22 00:16
Start Date: 08/Feb/22 00:16
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #2634:
URL: https://github.com/apache/hive/pull/2634#issuecomment-1032076033


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 722451)
Time Spent: 1h 40m  (was: 1.5h)

> Alter table with partitions should honor {OWNER} policies from Apache Ranger 
> in the HMS
> ---
>
> Key: HIVE-25514
> URL: https://issues.apache.org/jira/browse/HIVE-25514
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The following commands should honor \{OWNER} policies from Apache Ranger in 
> the HMS.
> {code:java}
> Show partitions table_name;
> alter table foo.table_name partition (country='us') rename to partition 
> (country='canada);
> alter table foo.table_name drop partition (id='canada');{code}
> The examples above are tables with partitions. So the partition APIs in HMS 
> should be modifed to honor \{owner} policies from Apache ranger. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25495) Upgrade to JLine3

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25495?focusedWorklogId=722452=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-722452
 ]

ASF GitHub Bot logged work on HIVE-25495:
-

Author: ASF GitHub Bot
Created on: 08/Feb/22 00:16
Start Date: 08/Feb/22 00:16
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #2617:
URL: https://github.com/apache/hive/pull/2617#issuecomment-1032076057


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 722452)
Time Spent: 1h 50m  (was: 1h 40m)

> Upgrade to JLine3
> -
>
> Key: HIVE-25495
> URL: https://issues.apache.org/jira/browse/HIVE-25495
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Jline 2 has been discontinued a long while ago.  Hadoop uses JLine3 so Hive 
> should match.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25918) Invalid stats after multi inserting into the same partition

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25918?focusedWorklogId=722413=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-722413
 ]

ASF GitHub Bot logged work on HIVE-25918:
-

Author: ASF GitHub Bot
Created on: 07/Feb/22 23:11
Start Date: 07/Feb/22 23:11
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on a change in pull request #2991:
URL: https://github.com/apache/hive/pull/2991#discussion_r801136555



##
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java
##
@@ -512,6 +513,8 @@ private String toString(Map parameters) {
 if (dpPartSpecs != null) {
   // load the list of DP partitions and return the list of partition 
specs
   list.addAll(dpPartSpecs);
+  // Reload partition metadata because another BasicStatsTask instance 
may have updated the stats.
+  list = db.getPartitionsByNames(table, 
list.stream().map(Partition::getName).collect(Collectors.toList()));

Review comment:
   getPartitionByNames is a single call unlike 2000+ getPartition() calls 
(Which internally loads table object everytime etc). It will still add up to 
runtime depending on the number of partitions for regular insert queries.
   
   One option could be to figure out if it is multi-insert at the time of sem 
analyzer and pass it on to DynamicPartitionCtx; getPartitionByNames() could be 
invoked only in those conditions.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 722413)
Time Spent: 50m  (was: 40m)

> Invalid stats after multi inserting into the same partition
> ---
>
> Key: HIVE-25918
> URL: https://issues.apache.org/jira/browse/HIVE-25918
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {code}
> create table source(p int, key int,value string);
> insert into source(p, key, value) values (101,42,'string42');
> create table stats_part(key int,value string) partitioned by (p int);
> from source
> insert into stats_part select key, value, p
> insert into stats_part select key, value, p;
> select count(*) from stats_part;
> {code}
> In this case {{StatsOptimizer}} helps serving this query because the result 
> should be {{rowNum}} of the partition {{p=101}}. The result is
> {code}
> 1
> {code}
> however it shloud be
> {code}
> 2
> {code}
> because both insert branches inserts 1-1 records.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-20078) Remove ATSHook

2022-02-07 Thread Ramakrishnan Sundaram (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-20078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17488451#comment-17488451
 ] 

Ramakrishnan Sundaram commented on HIVE-20078:
--

[~ashutoshc] What is the reason ATSHook was removed? 

> Remove ATSHook
> --
>
> Key: HIVE-20078
> URL: https://issues.apache.org/jira/browse/HIVE-20078
> Project: Hive
>  Issue Type: Task
>  Components: Hooks
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-20078.2.patch, HIVE-20078.patch
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25904) ObjectStore's updateTableColumnStatistics is not ThreadSafe

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25904?focusedWorklogId=722363=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-722363
 ]

ASF GitHub Bot logged work on HIVE-25904:
-

Author: ASF GitHub Bot
Created on: 07/Feb/22 22:10
Start Date: 07/Feb/22 22:10
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on pull request #2977:
URL: https://github.com/apache/hive/pull/2977#issuecomment-1031985569


   LGTM. +1.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 722363)
Time Spent: 1h 10m  (was: 1h)

> ObjectStore's updateTableColumnStatistics is not ThreadSafe
> ---
>
> Key: HIVE-25904
> URL: https://issues.apache.org/jira/browse/HIVE-25904
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {code}
> [root@igansperger-hive-tgt-3 ~]# cat test.sh
> hive -e 'create database test; create external table test.foo(col1 string);' 
> 2> /dev/null
> hive -e "select count(*) from sys.tab_col_stats where db_name = 'test' and 
> table_name = 'foo'" 2> /dev/null
> export JAVA_HOME=/usr/java/jdk1.8.0_232-cloudera
> export JAVA_OPTS="-Xmx1g"
> export PATH="/root/scala-2.13.8/bin:$JAVA_HOME/bin:$PATH"
> export CONF_DIR=/run/cloudera-scm-agent/process/79-hive_on_tez-HIVESERVER2
> export CDH_HCAT_HOME=/opt/cloudera/parcels/CDH/lib/hive-hcatalog/
> export CDH_HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
> CLASSPATH="$CLASSPATH:$CONF_DIR/hadoop-conf"
> CLASSPATH="$CLASSPATH:$CONF_DIR/hive-conf"
> CLASSPATH="$CLASSPATH:$(hadoop classpath)"
> CLASSPATH="$CLASSPATH:$CDH_HIVE_HOME/*"
> CLASSPATH="$CLASSPATH:$CDH_HIVE_HOME/lib/*"
> CLASSPATH="$CLASSPATH:${CDH_HCAT_HOME}/share/webhcat/java-client/hive-webhcat-java-client.jar"
> CLASSPATH="$CLASSPATH:${CDH_HCAT_HOME}/share/hcatalog/hive-hcatalog-core.jar"
> scala -classpath $CLASSPATH <<-EOF
> import org.apache.hadoop.hive.metastore.HiveMetaStoreClient
> import org.apache.hadoop.hive.conf.HiveConf
> import org.apache.hadoop.hive.metastore.api._
> def go() = {
> val conf = new HiveConf()
> val client = new HiveMetaStoreClient(conf)
> val colStatData = new ColumnStatisticsData()
> colStatData.setStringStats(new StringColumnStatsData(3, 3.0, 0, 1))
> val colStatsObj = new ColumnStatisticsObj("col1", "string", colStatData)
> val colStatsObjs = java.util.Arrays.asList(colStatsObj)
> val colStatsDesc = new ColumnStatisticsDesc(true, "test", "foo")
> val colStats = new ColumnStatistics(colStatsDesc, colStatsObjs)
> colStats.setEngine("hive")
> client.updateTableColumnStatistics(colStats)
> println("SUCCESS")
> }
> val t1 = new Thread(() => go())
> val t2 = new Thread(() => go())
> t1.start()
> t2.start()
> t1.join()
> t2.join()
> go()
> EOF
> hive -e "select count(*) from sys.tab_col_stats where db_name = 'test' and 
> table_name = 'foo'" 2> /dev/null
> {code}
> This produces (minus logging):
> {code}
> [root@igansperger-hive-tgt-3 ~]# sh test.sh
> +--+
> | _c0  |
> +--+
> | 0|
> +--+
> Welcome to Scala 2.13.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_232).
> Type in expressions for evaluation. Or try :help.
> SUCCESS
> SUCCESS
> org.apache.hadoop.hive.metastore.api.MetaException: Unexpected 2 statistics 
> for 1 columns
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$update_table_column_statistics_req_result$update_table_column_statistics_req_resultStandardScheme.read(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$update_table_column_statistics_req_result$update_table_column_statistics_req_resultStandardScheme.read(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$update_table_column_statistics_req_result.read(ThriftHiveMetastore.java)
>   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_update_table_column_statistics_req(ThriftHiveMetastore.java:4597)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.update_table_column_statistics_req(ThriftHiveMetastore.java:4584)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.updateTableColumnStatistics(HiveMetaStoreClient.java:2846)
>   at go(:13)
>   ... 32 elided
> scala>
> scala> :quit
> +--+
> | _c0  |
> +--+
> 

[jira] [Work started] (HIVE-25936) ValidWriteIdList & table id are sometimes missing when requesting partitions by name via HS2

2022-02-07 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25936 started by Stamatis Zampetakis.
--
> ValidWriteIdList & table id are sometimes missing when requesting partitions 
> by name via HS2
> 
>
> Key: HIVE-25936
> URL: https://issues.apache.org/jira/browse/HIVE-25936
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> According to HIVE-24743 the table id and {{ValidWriteIdList}} are important 
> for keeping HMS remote metadata cache consistent. Although HIVE-24743 
> attempted to pass the write id list and table id in every call to HMS it 
> failed to do so completely. For those partitions not handled in the batch 
> logic, the [metastore 
> call|https://github.com/apache/hive/blob/4b7a948e45fd88372fef573be321cda40d189cc7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4161]
>  in {{Hive#getPartitionsByName}} method does not pass the table id and write 
> id list.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25936) ValidWriteIdList & table id are sometimes missing when requesting partitions by name via HS2

2022-02-07 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-25936:
--


> ValidWriteIdList & table id are sometimes missing when requesting partitions 
> by name via HS2
> 
>
> Key: HIVE-25936
> URL: https://issues.apache.org/jira/browse/HIVE-25936
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> According to HIVE-24743 the table id and {{ValidWriteIdList}} are important 
> for keeping HMS remote metadata cache consistent. Although HIVE-24743 
> attempted to pass the write id list and table id in every call to HMS it 
> failed to do so completely. For those partitions not handled in the batch 
> logic, the [metastore 
> call|https://github.com/apache/hive/blob/4b7a948e45fd88372fef573be321cda40d189cc7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4161]
>  in {{Hive#getPartitionsByName}} method does not pass the table id and write 
> id list.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25933) when hive has 10w+ tables ,hs2 will start for a long time,because hs2 will load all table as materialized view with single thread,this is sensitive for online task

2022-02-07 Thread lkl (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lkl updated HIVE-25933:
---
Description: 
modify the class Hive.java, the method is getAllTableObjects ,materialized view 
is not used for hive2
{code:java}
/**
 * Get all tables for the specified database.
 * @param dbName
 * @return List of table names
 * @throws HiveException
 */
public List getAllTableObjects(String dbName) throws HiveException {
  try {
List tableNames = getMSC().getAllTables(dbName);
if(tableNames.size()>1){
  tableNames = tableNames.subList(0,1);
}
return Lists.transform(getMSC().getTableObjectsByName(dbName,tableNames),
new 
com.google.common.base.Function() {
  @Override
  public Table apply(org.apache.hadoop.hive.metastore.api.Table 
table) {
return new Table(table);
  }
}
);
  } catch (Exception e) {
throw new HiveException(e);
  }
} {code}

  was:
modify the class Hive.java, the method is getAllTableObjects ,
{code:java}
/**
 * Get all tables for the specified database.
 * @param dbName
 * @return List of table names
 * @throws HiveException
 */
public List getAllTableObjects(String dbName) throws HiveException {
  try {
List tableNames = getMSC().getAllTables(dbName);
if(tableNames.size()>1){
  tableNames = tableNames.subList(0,1);
}
return Lists.transform(getMSC().getTableObjectsByName(dbName,tableNames),
new 
com.google.common.base.Function() {
  @Override
  public Table apply(org.apache.hadoop.hive.metastore.api.Table 
table) {
return new Table(table);
  }
}
);
  } catch (Exception e) {
throw new HiveException(e);
  }
} {code}


> when hive has 10w+ tables ,hs2 will start for a long time,because hs2 will 
> load all table as materialized view with single thread,this is sensitive for 
> online task
> ---
>
> Key: HIVE-25933
> URL: https://issues.apache.org/jira/browse/HIVE-25933
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning, Query Processor
>Affects Versions: 2.3.5, 2.3.7, 2.3.8
>Reporter: lkl
>Priority: Major
> Fix For: 3.1.1, 3.1.2
>
>
> modify the class Hive.java, the method is getAllTableObjects ,materialized 
> view is not used for hive2
> {code:java}
> /**
>  * Get all tables for the specified database.
>  * @param dbName
>  * @return List of table names
>  * @throws HiveException
>  */
> public List getAllTableObjects(String dbName) throws HiveException {
>   try {
> List tableNames = getMSC().getAllTables(dbName);
> if(tableNames.size()>1){
>   tableNames = tableNames.subList(0,1);
> }
> return Lists.transform(getMSC().getTableObjectsByName(dbName,tableNames),
> new 
> com.google.common.base.Function Table>() {
>   @Override
>   public Table apply(org.apache.hadoop.hive.metastore.api.Table 
> table) {
> return new Table(table);
>   }
> }
> );
>   } catch (Exception e) {
> throw new HiveException(e);
>   }
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25933) when hive has 10w+ tables ,hs2 will start for a long time,because hs2 will load all table as materialized view with single thread,this is sensitive for online task

2022-02-07 Thread lkl (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lkl updated HIVE-25933:
---
Description: 
modify the class Hive.java, the method is getAllTableObjects ,
{code:java}
/**
 * Get all tables for the specified database.
 * @param dbName
 * @return List of table names
 * @throws HiveException
 */
public List getAllTableObjects(String dbName) throws HiveException {
  try {
List tableNames = getMSC().getAllTables(dbName);
if(tableNames.size()>1){
  tableNames = tableNames.subList(0,1);
}
return Lists.transform(getMSC().getTableObjectsByName(dbName,tableNames),
new 
com.google.common.base.Function() {
  @Override
  public Table apply(org.apache.hadoop.hive.metastore.api.Table 
table) {
return new Table(table);
  }
}
);
  } catch (Exception e) {
throw new HiveException(e);
  }
} {code}

  was:
modify the class Hive.java, the method is getAllTableObjects
{code:java}
/**
 * Get all tables for the specified database.
 * @param dbName
 * @return List of table names
 * @throws HiveException
 */
public List getAllTableObjects(String dbName) throws HiveException {
  try {
List tableNames = getMSC().getAllTables(dbName);
if(tableNames.size()>1){
  tableNames = tableNames.subList(0,1);
}
return Lists.transform(getMSC().getTableObjectsByName(dbName,tableNames),
new 
com.google.common.base.Function() {
  @Override
  public Table apply(org.apache.hadoop.hive.metastore.api.Table 
table) {
return new Table(table);
  }
}
);
  } catch (Exception e) {
throw new HiveException(e);
  }
} {code}


> when hive has 10w+ tables ,hs2 will start for a long time,because hs2 will 
> load all table as materialized view with single thread,this is sensitive for 
> online task
> ---
>
> Key: HIVE-25933
> URL: https://issues.apache.org/jira/browse/HIVE-25933
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning, Query Processor
>Affects Versions: 2.3.5, 2.3.7, 2.3.8
>Reporter: lkl
>Priority: Major
> Fix For: 3.1.1, 3.1.2
>
>
> modify the class Hive.java, the method is getAllTableObjects ,
> {code:java}
> /**
>  * Get all tables for the specified database.
>  * @param dbName
>  * @return List of table names
>  * @throws HiveException
>  */
> public List getAllTableObjects(String dbName) throws HiveException {
>   try {
> List tableNames = getMSC().getAllTables(dbName);
> if(tableNames.size()>1){
>   tableNames = tableNames.subList(0,1);
> }
> return Lists.transform(getMSC().getTableObjectsByName(dbName,tableNames),
> new 
> com.google.common.base.Function Table>() {
>   @Override
>   public Table apply(org.apache.hadoop.hive.metastore.api.Table 
> table) {
> return new Table(table);
>   }
> }
> );
>   } catch (Exception e) {
> throw new HiveException(e);
>   }
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25933) when hive has 10w+ tables ,hs2 will start for a long time,because hs2 will load all table as materialized view with single thread,this is sensitive for online task

2022-02-07 Thread lkl (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lkl updated HIVE-25933:
---
Fix Version/s: 3.1.2
   3.1.1

> when hive has 10w+ tables ,hs2 will start for a long time,because hs2 will 
> load all table as materialized view with single thread,this is sensitive for 
> online task
> ---
>
> Key: HIVE-25933
> URL: https://issues.apache.org/jira/browse/HIVE-25933
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning, Query Processor
>Affects Versions: 2.3.5, 2.3.7, 2.3.8
>Reporter: lkl
>Priority: Major
> Fix For: 3.1.1, 3.1.2
>
>
> modify the class Hive.java, the method is getAllTableObjects
> {code:java}
> /**
>  * Get all tables for the specified database.
>  * @param dbName
>  * @return List of table names
>  * @throws HiveException
>  */
> public List getAllTableObjects(String dbName) throws HiveException {
>   try {
> List tableNames = getMSC().getAllTables(dbName);
> if(tableNames.size()>1){
>   tableNames = tableNames.subList(0,1);
> }
> return Lists.transform(getMSC().getTableObjectsByName(dbName,tableNames),
> new 
> com.google.common.base.Function Table>() {
>   @Override
>   public Table apply(org.apache.hadoop.hive.metastore.api.Table 
> table) {
> return new Table(table);
>   }
> }
> );
>   } catch (Exception e) {
> throw new HiveException(e);
>   }
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25933) when hive has 10w+ tables ,hs2 will start for a long time,because hs2 will load all table as materialized view with single thread,this is sensitive for online task

2022-02-07 Thread lkl (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lkl updated HIVE-25933:
---
Description: 
modify the class Hive.java, the method is getAllTableObjects
{code:java}
/**
 * Get all tables for the specified database.
 * @param dbName
 * @return List of table names
 * @throws HiveException
 */
public List getAllTableObjects(String dbName) throws HiveException {
  try {
List tableNames = getMSC().getAllTables(dbName);
if(tableNames.size()>1){
  tableNames = tableNames.subList(0,1);
}
return Lists.transform(getMSC().getTableObjectsByName(dbName,tableNames),
new 
com.google.common.base.Function() {
  @Override
  public Table apply(org.apache.hadoop.hive.metastore.api.Table 
table) {
return new Table(table);
  }
}
);
  } catch (Exception e) {
throw new HiveException(e);
  }
} {code}

> when hive has 10w+ tables ,hs2 will start for a long time,because hs2 will 
> load all table as materialized view with single thread,this is sensitive for 
> online task
> ---
>
> Key: HIVE-25933
> URL: https://issues.apache.org/jira/browse/HIVE-25933
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning, Query Processor
>Affects Versions: 2.3.5, 2.3.7, 2.3.8
>Reporter: lkl
>Priority: Major
>
> modify the class Hive.java, the method is getAllTableObjects
> {code:java}
> /**
>  * Get all tables for the specified database.
>  * @param dbName
>  * @return List of table names
>  * @throws HiveException
>  */
> public List getAllTableObjects(String dbName) throws HiveException {
>   try {
> List tableNames = getMSC().getAllTables(dbName);
> if(tableNames.size()>1){
>   tableNames = tableNames.subList(0,1);
> }
> return Lists.transform(getMSC().getTableObjectsByName(dbName,tableNames),
> new 
> com.google.common.base.Function Table>() {
>   @Override
>   public Table apply(org.apache.hadoop.hive.metastore.api.Table 
> table) {
> return new Table(table);
>   }
> }
> );
>   } catch (Exception e) {
> throw new HiveException(e);
>   }
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work started] (HIVE-25934) Non blocking RENAME PARTITION implementation

2022-02-07 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25934 started by Denys Kuzmenko.
-
> Non blocking RENAME PARTITION implementation
> 
>
> Key: HIVE-25934
> URL: https://issues.apache.org/jira/browse/HIVE-25934
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> Implement RENAME PARTITION in a way that doesn't have to wait for currently 
> running read operations to be finished.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25934) Non blocking RENAME PARTITION implementation

2022-02-07 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-25934:
--
Description: Implement RENAME PARTITION in a way that doesn't have to wait 
for currently running read operations to be finished.

> Non blocking RENAME PARTITION implementation
> 
>
> Key: HIVE-25934
> URL: https://issues.apache.org/jira/browse/HIVE-25934
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> Implement RENAME PARTITION in a way that doesn't have to wait for currently 
> running read operations to be finished.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-24906) Suffix the table location with UUID/txnId

2022-02-07 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-24906.
---
Resolution: Fixed

> Suffix the table location with UUID/txnId
> -
>
> Key: HIVE-24906
> URL: https://issues.apache.org/jira/browse/HIVE-24906
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Suffixing the table location during create table with UUID/txnId can help in 
> deleting the data in asynchronous fashion.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25934) Non blocking RENAME PARTITION implementation

2022-02-07 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko reassigned HIVE-25934:
-

Assignee: Denys Kuzmenko

> Non blocking RENAME PARTITION implementation
> 
>
> Key: HIVE-25934
> URL: https://issues.apache.org/jira/browse/HIVE-25934
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-24445) Non blocking DROP TABLE implementation

2022-02-07 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17488189#comment-17488189
 ] 

Denys Kuzmenko commented on HIVE-24445:
---

Merged to master.

> Non blocking DROP TABLE implementation
> --
>
> Key: HIVE-24445
> URL: https://issues.apache.org/jira/browse/HIVE-24445
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Reporter: Zoltan Chovan
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 15h
>  Remaining Estimate: 0h
>
> Implement a way to execute drop table operations in a way that doesn't have 
> to wait for currently running read operations to be finished.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-24445) Non blocking DROP TABLE implementation

2022-02-07 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-24445.
---
Resolution: Fixed

> Non blocking DROP TABLE implementation
> --
>
> Key: HIVE-24445
> URL: https://issues.apache.org/jira/browse/HIVE-24445
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Reporter: Zoltan Chovan
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 15h
>  Remaining Estimate: 0h
>
> Implement a way to execute drop table operations in a way that doesn't have 
> to wait for currently running read operations to be finished.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25932) two or more sql like "insert into table test paritition (cls=1) select xxx" sometimes one sql was failed with rename ,because the two sql create a same file name

2022-02-07 Thread lkl (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lkl updated HIVE-25932:
---
Fix Version/s: 2.3.7

> two or more sql like "insert into table test paritition (cls=1) select xxx" 
> sometimes one sql was failed with rename ,because the two sql create a same 
> file name
> -
>
> Key: HIVE-25932
> URL: https://issues.apache.org/jira/browse/HIVE-25932
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning, Query Processor
>Affects Versions: 2.3.7
>Reporter: lkl
>Assignee: lkl
>Priority: Minor
> Fix For: 2.3.7
>
>
> modify the class *"Hive.java "*
> {code:java}
>   for (final FileStatus srcStatus : srcs) {
> //  final Path destFile = new Path(destf, 
> srcStatus.getPath().getName());
>   final String name = srcStatus.getPath().getName();
>   final Path destDirPath = srcStatus.getPath();
>   if (null == pool) {
> Path destFile = new Path(destf, name);
> int counter = 1;
> while (!destFs.rename(destDirPath, destFile)) {
>   destFile =  new Path(destf, name + ("_copy_" + counter));
>   LOG.info("kugu log destFile is {}.",destFile.getName());
>   counter++;
> //  throw new IOException("rename for src path: " + 
> srcStatus.getPath() + " to dest:"
> //  + destf + " returned false");
> }
>   } else {
> futures.add(pool.submit(new Callable() {
>   @Override
>   public Void call() throws Exception {
> SessionState.setCurrentSessionState(parentSession);
> final String group = srcStatus.getGroup();
> Path destFile = new Path(destf, name);
> boolean rename_succ = false;
> int counter = 1;
> while (!rename_succ) {
>   rename_succ = destFs.rename(destDirPath, destFile);
>   if(rename_succ) {
> if (inheritPerms) {
>   HdfsUtils.setFullFileStatus(conf, desiredStatus, 
> group, destFs, destFile, false);
> }
>   }else {
> destFile =  new Path(destf, name + ("_copy_" + 
> counter));
> LOG.info("kugu log destFile is 
> {}.",destFile.getName());
>   }
>   counter++;
> }
> //if(destFs.rename(srcStatus.getPath(), destFile)) {
> //  if (inheritPerms) {
> //HdfsUtils.setFullFileStatus(conf, desiredStatus, 
> group, destFs, destFile, false);
> //  }
> //} else {
> //  throw new IOException("rename for src path: " + 
> srcStatus.getPath() + " to dest path:"
> //  + destFile + " returned false");
> //}
> return null;
>   }
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25932) two or more sql like "insert into table test paritition (cls=1) select xxx" sometimes one sql was failed with rename ,because the two sql create a same file name

2022-02-07 Thread lkl (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lkl updated HIVE-25932:
---
Description: 
modify the class *"Hive.java "*
{code:java}
  for (final FileStatus srcStatus : srcs) {

//  final Path destFile = new Path(destf, 
srcStatus.getPath().getName());

  final String name = srcStatus.getPath().getName();
  final Path destDirPath = srcStatus.getPath();

  if (null == pool) {
Path destFile = new Path(destf, name);
int counter = 1;
while (!destFs.rename(destDirPath, destFile)) {
  destFile =  new Path(destf, name + ("_copy_" + counter));
  LOG.info("kugu log destFile is {}.",destFile.getName());
  counter++;
//  throw new IOException("rename for src path: " + 
srcStatus.getPath() + " to dest:"
//  + destf + " returned false");
}
  } else {
futures.add(pool.submit(new Callable() {
  @Override
  public Void call() throws Exception {
SessionState.setCurrentSessionState(parentSession);
final String group = srcStatus.getGroup();

Path destFile = new Path(destf, name);
boolean rename_succ = false;
int counter = 1;
while (!rename_succ) {
  rename_succ = destFs.rename(destDirPath, destFile);
  if(rename_succ) {
if (inheritPerms) {
  HdfsUtils.setFullFileStatus(conf, desiredStatus, 
group, destFs, destFile, false);
}
  }else {
destFile =  new Path(destf, name + ("_copy_" + 
counter));
LOG.info("kugu log destFile is {}.",destFile.getName());
  }
  counter++;

}
//if(destFs.rename(srcStatus.getPath(), destFile)) {
//  if (inheritPerms) {
//HdfsUtils.setFullFileStatus(conf, desiredStatus, 
group, destFs, destFile, false);
//  }
//} else {
//  throw new IOException("rename for src path: " + 
srcStatus.getPath() + " to dest path:"
//  + destFile + " returned false");
//}
return null;
  }
 {code}

  was:
Hive.java
{code:java}
  for (final FileStatus srcStatus : srcs) {

//  final Path destFile = new Path(destf, 
srcStatus.getPath().getName());

  final String name = srcStatus.getPath().getName();
  final Path destDirPath = srcStatus.getPath();

  if (null == pool) {
Path destFile = new Path(destf, name);
int counter = 1;
while (!destFs.rename(destDirPath, destFile)) {
  destFile =  new Path(destf, name + ("_copy_" + counter));
  LOG.info("kugu log destFile is {}.",destFile.getName());
  counter++;
//  throw new IOException("rename for src path: " + 
srcStatus.getPath() + " to dest:"
//  + destf + " returned false");
}
  } else {
futures.add(pool.submit(new Callable() {
  @Override
  public Void call() throws Exception {
SessionState.setCurrentSessionState(parentSession);
final String group = srcStatus.getGroup();

Path destFile = new Path(destf, name);
boolean rename_succ = false;
int counter = 1;
while (!rename_succ) {
  rename_succ = destFs.rename(destDirPath, destFile);
  if(rename_succ) {
if (inheritPerms) {
  HdfsUtils.setFullFileStatus(conf, desiredStatus, 
group, destFs, destFile, false);
}
  }else {
destFile =  new Path(destf, name + ("_copy_" + 
counter));
LOG.info("kugu log destFile is {}.",destFile.getName());
  }
  counter++;

}
//if(destFs.rename(srcStatus.getPath(), destFile)) {
//  if (inheritPerms) {
//HdfsUtils.setFullFileStatus(conf, desiredStatus, 
group, destFs, destFile, false);
//  }
//} else {
//  throw new IOException("rename for src path: " + 
srcStatus.getPath() + " to dest path:"
//  + destFile + " 

[jira] [Updated] (HIVE-25932) two or more sql like "insert into table test paritition (cls=1) select xxx" sometimes one sql was failed with rename ,because the two sql create a same file name

2022-02-07 Thread lkl (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lkl updated HIVE-25932:
---
Description: 
Hive.java
{code:java}
  for (final FileStatus srcStatus : srcs) {

//  final Path destFile = new Path(destf, 
srcStatus.getPath().getName());

  final String name = srcStatus.getPath().getName();
  final Path destDirPath = srcStatus.getPath();

  if (null == pool) {
Path destFile = new Path(destf, name);
int counter = 1;
while (!destFs.rename(destDirPath, destFile)) {
  destFile =  new Path(destf, name + ("_copy_" + counter));
  LOG.info("kugu log destFile is {}.",destFile.getName());
  counter++;
//  throw new IOException("rename for src path: " + 
srcStatus.getPath() + " to dest:"
//  + destf + " returned false");
}
  } else {
futures.add(pool.submit(new Callable() {
  @Override
  public Void call() throws Exception {
SessionState.setCurrentSessionState(parentSession);
final String group = srcStatus.getGroup();

Path destFile = new Path(destf, name);
boolean rename_succ = false;
int counter = 1;
while (!rename_succ) {
  rename_succ = destFs.rename(destDirPath, destFile);
  if(rename_succ) {
if (inheritPerms) {
  HdfsUtils.setFullFileStatus(conf, desiredStatus, 
group, destFs, destFile, false);
}
  }else {
destFile =  new Path(destf, name + ("_copy_" + 
counter));
LOG.info("kugu log destFile is {}.",destFile.getName());
  }
  counter++;

}
//if(destFs.rename(srcStatus.getPath(), destFile)) {
//  if (inheritPerms) {
//HdfsUtils.setFullFileStatus(conf, desiredStatus, 
group, destFs, destFile, false);
//  }
//} else {
//  throw new IOException("rename for src path: " + 
srcStatus.getPath() + " to dest path:"
//  + destFile + " returned false");
//}
return null;
  }
 {code}

  was:
Hive.java

 
 {{for (final FileStatus srcStatus : srcs) {//  final 
Path destFile = new Path(destf, srcStatus.getPath().getName());  
final String name = srcStatus.getPath().getName();  final Path 
destDirPath = srcStatus.getPath();  if (null == pool) { 
   Path destFile = new Path(destf, name);int counter = 1;   
 while (!destFs.rename(destDirPath, destFile)) {  
destFile =  new Path(destf, name + ("_copy_" + counter));  
LOG.info("kugu log destFile is {}.",destFile.getName());  
counter++;//  throw new IOException("rename for src path: " + 
srcStatus.getPath() + " to dest:"//  + destf + " returned 
false");}  } else {
futures.add(pool.submit(new Callable() {  @Override   
   public Void call() throws Exception {
SessionState.setCurrentSessionState(parentSession);final 
String group = srcStatus.getGroup();Path destFile = new 
Path(destf, name);boolean rename_succ = false;  
  int counter = 1;while (!rename_succ) {
  rename_succ = destFs.rename(destDirPath, destFile);  
if(rename_succ) {if (inheritPerms) {
  HdfsUtils.setFullFileStatus(conf, desiredStatus, group, destFs, destFile, 
false);}  }else {   
 destFile =  new Path(destf, name + ("_copy_" + counter));  
  LOG.info("kugu log destFile is {}.",destFile.getName());  
}  counter++;}//
if(destFs.rename(srcStatus.getPath(), destFile)) {//  if 
(inheritPerms) {//HdfsUtils.setFullFileStatus(conf, 
desiredStatus, group, destFs, destFile, false);//  }//  
  } else {//  throw new IOException("rename for 
src path: " + srcStatus.getPath() + " to dest path:"//  
+ destFile + " returned false");//}

[jira] [Updated] (HIVE-25932) two or more sql like "insert into table test paritition (cls=1) select xxx" sometimes one sql was failed with rename ,because the two sql create a same file name

2022-02-07 Thread lkl (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lkl updated HIVE-25932:
---
Description: 
Hive.java

 
 {{for (final FileStatus srcStatus : srcs) {//  final 
Path destFile = new Path(destf, srcStatus.getPath().getName());  
final String name = srcStatus.getPath().getName();  final Path 
destDirPath = srcStatus.getPath();  if (null == pool) { 
   Path destFile = new Path(destf, name);int counter = 1;   
 while (!destFs.rename(destDirPath, destFile)) {  
destFile =  new Path(destf, name + ("_copy_" + counter));  
LOG.info("kugu log destFile is {}.",destFile.getName());  
counter++;//  throw new IOException("rename for src path: " + 
srcStatus.getPath() + " to dest:"//  + destf + " returned 
false");}  } else {
futures.add(pool.submit(new Callable() {  @Override   
   public Void call() throws Exception {
SessionState.setCurrentSessionState(parentSession);final 
String group = srcStatus.getGroup();Path destFile = new 
Path(destf, name);boolean rename_succ = false;  
  int counter = 1;while (!rename_succ) {
  rename_succ = destFs.rename(destDirPath, destFile);  
if(rename_succ) {if (inheritPerms) {
  HdfsUtils.setFullFileStatus(conf, desiredStatus, group, destFs, destFile, 
false);}  }else {   
 destFile =  new Path(destf, name + ("_copy_" + counter));  
  LOG.info("kugu log destFile is {}.",destFile.getName());  
}  counter++;}//
if(destFs.rename(srcStatus.getPath(), destFile)) {//  if 
(inheritPerms) {//HdfsUtils.setFullFileStatus(conf, 
desiredStatus, group, destFs, destFile, false);//  }//  
  } else {//  throw new IOException("rename for 
src path: " + srcStatus.getPath() + " to dest path:"//  
+ destFile + " returned false");//}
return null;  }}}

> two or more sql like "insert into table test paritition (cls=1) select xxx" 
> sometimes one sql was failed with rename ,because the two sql create a same 
> file name
> -
>
> Key: HIVE-25932
> URL: https://issues.apache.org/jira/browse/HIVE-25932
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning, Query Processor
>Affects Versions: 2.3.7
>Reporter: lkl
>Assignee: lkl
>Priority: Minor
>
> Hive.java
>  
>  {{for (final FileStatus srcStatus : srcs) {//  final 
> Path destFile = new Path(destf, srcStatus.getPath().getName());  
> final String name = srcStatus.getPath().getName();  final Path 
> destDirPath = srcStatus.getPath();  if (null == pool) {   
>  Path destFile = new Path(destf, name);int counter = 1;   
>  while (!destFs.rename(destDirPath, destFile)) {  
> destFile =  new Path(destf, name + ("_copy_" + counter));  
> LOG.info("kugu log destFile is {}.",destFile.getName());  
> counter++;//  throw new IOException("rename for src path: " + 
> srcStatus.getPath() + " to dest:"//  + destf + " returned 
> false");}  } else {
> futures.add(pool.submit(new Callable() {  @Override 
>  public Void call() throws Exception {
> SessionState.setCurrentSessionState(parentSession);final 
> String group = srcStatus.getGroup();Path destFile = new 
> Path(destf, name);boolean rename_succ = false;
> int counter = 1;while (!rename_succ) {
>   rename_succ = destFs.rename(destDirPath, destFile); 
>  if(rename_succ) {if (inheritPerms) { 
>  HdfsUtils.setFullFileStatus(conf, desiredStatus, group, destFs, 
> destFile, false);}  }else {   
>  destFile =  new Path(destf, name + ("_copy_" 

[jira] [Assigned] (HIVE-25932) two or more sql like "insert into table test paritition (cls=1) select xxx" sometimes one sql was failed with rename ,because the two sql create a same file name

2022-02-07 Thread lkl (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lkl reassigned HIVE-25932:
--


> two or more sql like "insert into table test paritition (cls=1) select xxx" 
> sometimes one sql was failed with rename ,because the two sql create a same 
> file name
> -
>
> Key: HIVE-25932
> URL: https://issues.apache.org/jira/browse/HIVE-25932
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning, Query Processor
>Affects Versions: 2.3.7
>Reporter: lkl
>Assignee: lkl
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25918) Invalid stats after multi inserting into the same partition

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25918?focusedWorklogId=721937=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-721937
 ]

ASF GitHub Bot logged work on HIVE-25918:
-

Author: ASF GitHub Bot
Created on: 07/Feb/22 14:25
Start Date: 07/Feb/22 14:25
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2991:
URL: https://github.com/apache/hive/pull/2991#discussion_r800706374



##
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java
##
@@ -512,6 +513,8 @@ private String toString(Map parameters) {
 if (dpPartSpecs != null) {
   // load the list of DP partitions and return the list of partition 
specs
   list.addAll(dpPartSpecs);
+  // Reload partition metadata because another BasicStatsTask instance 
may have updated the stats.
+  list = db.getPartitionsByNames(table, 
list.stream().map(Partition::getName).collect(Collectors.toList()));

Review comment:
   I think these are thrift API calls from HS2 to HMS and the number of 
calls depend on the number of partitions and 
`hive.metastore.batch.retrieve.max`.
   Also there is some communication from HMS to backend db which is also done 
in batches.
   
   I explored another way:
   * In the StatsTask instead of applying the stats on the Partition objects 
collect it to a delta objects.
   * Implement a HMS call and backend DB call to update stats based on the 
collected delta objects.
   However this would be a bigger change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 721937)
Time Spent: 40m  (was: 0.5h)

> Invalid stats after multi inserting into the same partition
> ---
>
> Key: HIVE-25918
> URL: https://issues.apache.org/jira/browse/HIVE-25918
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code}
> create table source(p int, key int,value string);
> insert into source(p, key, value) values (101,42,'string42');
> create table stats_part(key int,value string) partitioned by (p int);
> from source
> insert into stats_part select key, value, p
> insert into stats_part select key, value, p;
> select count(*) from stats_part;
> {code}
> In this case {{StatsOptimizer}} helps serving this query because the result 
> should be {{rowNum}} of the partition {{p=101}}. The result is
> {code}
> 1
> {code}
> however it shloud be
> {code}
> 2
> {code}
> because both insert branches inserts 1-1 records.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25918) Invalid stats after multi inserting into the same partition

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25918?focusedWorklogId=721913=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-721913
 ]

ASF GitHub Bot logged work on HIVE-25918:
-

Author: ASF GitHub Bot
Created on: 07/Feb/22 13:43
Start Date: 07/Feb/22 13:43
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #2991:
URL: https://github.com/apache/hive/pull/2991#discussion_r800666998



##
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java
##
@@ -512,6 +513,8 @@ private String toString(Map parameters) {
 if (dpPartSpecs != null) {
   // load the list of DP partitions and return the list of partition 
specs
   list.addAll(dpPartSpecs);
+  // Reload partition metadata because another BasicStatsTask instance 
may have updated the stats.
+  list = db.getPartitionsByNames(table, 
list.stream().map(Partition::getName).collect(Collectors.toList()));

Review comment:
   Truth is that things are a bit different here (in comparison with the 
state before HIVE-15250) in the sense that we are calling 
`getPartitionsByNames` instead of `getPartition` so if I understand well it 
will be only one call to DB and not multiple so maybe the perf overhead is not 
as important as it was in HIVE-15250.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 721913)
Time Spent: 0.5h  (was: 20m)

> Invalid stats after multi inserting into the same partition
> ---
>
> Key: HIVE-25918
> URL: https://issues.apache.org/jira/browse/HIVE-25918
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code}
> create table source(p int, key int,value string);
> insert into source(p, key, value) values (101,42,'string42');
> create table stats_part(key int,value string) partitioned by (p int);
> from source
> insert into stats_part select key, value, p
> insert into stats_part select key, value, p;
> select count(*) from stats_part;
> {code}
> In this case {{StatsOptimizer}} helps serving this query because the result 
> should be {{rowNum}} of the partition {{p=101}}. The result is
> {code}
> 1
> {code}
> however it shloud be
> {code}
> 2
> {code}
> because both insert branches inserts 1-1 records.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25918) Invalid stats after multi inserting into the same partition

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25918?focusedWorklogId=721909=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-721909
 ]

ASF GitHub Bot logged work on HIVE-25918:
-

Author: ASF GitHub Bot
Created on: 07/Feb/22 13:36
Start Date: 07/Feb/22 13:36
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #2991:
URL: https://github.com/apache/hive/pull/2991#discussion_r800659669



##
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java
##
@@ -512,6 +513,8 @@ private String toString(Map parameters) {
 if (dpPartSpecs != null) {
   // load the list of DP partitions and return the list of partition 
specs
   list.addAll(dpPartSpecs);
+  // Reload partition metadata because another BasicStatsTask instance 
may have updated the stats.
+  list = db.getPartitionsByNames(table, 
list.stream().map(Partition::getName).collect(Collectors.toList()));

Review comment:
   This change is kind of reverting 
https://issues.apache.org/jira/browse/HIVE-15250. Obviously correctness is more 
important than performance but I am wondering if we explored other ways to fix 
the problem. 
   
   @rbalamohan since you worked on HIVE-15250 you may want to have a look into 
the changes here. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 721909)
Time Spent: 20m  (was: 10m)

> Invalid stats after multi inserting into the same partition
> ---
>
> Key: HIVE-25918
> URL: https://issues.apache.org/jira/browse/HIVE-25918
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code}
> create table source(p int, key int,value string);
> insert into source(p, key, value) values (101,42,'string42');
> create table stats_part(key int,value string) partitioned by (p int);
> from source
> insert into stats_part select key, value, p
> insert into stats_part select key, value, p;
> select count(*) from stats_part;
> {code}
> In this case {{StatsOptimizer}} helps serving this query because the result 
> should be {{rowNum}} of the partition {{p=101}}. The result is
> {code}
> 1
> {code}
> however it shloud be
> {code}
> 2
> {code}
> because both insert branches inserts 1-1 records.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25876) Update log4j2 version to 2.17.1

2022-02-07 Thread Raghavendra Singh (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17488090#comment-17488090
 ] 

Raghavendra Singh commented on HIVE-25876:
--

Is this not getting picked? :(
waiting on stable release with this fix

> Update log4j2 version to 2.17.1
> ---
>
> Key: HIVE-25876
> URL: https://issues.apache.org/jira/browse/HIVE-25876
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 3.1.2
>Reporter: Anatoly
>Priority: Blocker
>
> Hive version 3.1.2 -> log2j -> Should upgrade the version to 2.17.1



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25814) Add entry in replication_metrics table for skipped replication iterations.

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25814?focusedWorklogId=721855=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-721855
 ]

ASF GitHub Bot logged work on HIVE-25814:
-

Author: ASF GitHub Bot
Created on: 07/Feb/22 11:35
Start Date: 07/Feb/22 11:35
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2907:
URL: https://github.com/apache/hive/pull/2907#discussion_r800566878



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -156,6 +160,106 @@ public void tearDown() throws Throwable {
 primary.run("drop database if exists " + primaryDbName + "_extra cascade");
   }
 
+  @Test
+  public void testReplicationMetricForSkippedIteration() throws Throwable {
+isMetricsEnabledForTests(true);
+MetricCollector collector = MetricCollector.getInstance();
+WarehouseInstance.Tuple dumpData = primary.run("use " + primaryDbName)
+.run("create table t1 (id int) clustered by(id) into 3 buckets " +
+"stored as orc tblproperties (\"transactional\"=\"true\")")
+.run("insert into t1 values(1)")
+.dump(primaryDbName);
+
+
+ReplicationMetric metric = collector.getMetrics().getLast();
+assertEquals(metric.getProgress().getStatus(), Status.SUCCESS);
+
+primary.dump(primaryDbName);
+
+metric = collector.getMetrics().getLast();
+assertEquals(metric.getProgress().getStatus(), Status.SKIPPED);
+
+replica.load(replicatedDbName, primaryDbName)
+.run("use " + replicatedDbName)
+.run("show tables")
+.verifyResults(new String[]{"t1"})
+.run("repl status " + replicatedDbName)
+.verifyResult(dumpData.lastReplicationId)
+.run("select id from t1")
+.verifyResults(new String[]{"1"});
+
+metric = collector.getMetrics().getLast();
+assertEquals(metric.getProgress().getStatus(), Status.SUCCESS);
+
+replica.load(replicatedDbName, primaryDbName);
+
+metric = collector.getMetrics().getLast();
+assertEquals(metric.getProgress().getStatus(), Status.SKIPPED);
+isMetricsEnabledForTests(false);
+  }
+
+  @Test
+  public void testReplicationMetricForFailedIteration() throws Throwable {
+isMetricsEnabledForTests(true);
+MetricCollector collector = MetricCollector.getInstance();
+WarehouseInstance.Tuple dumpData = primary.run("use " + primaryDbName)
+.run("create table t1 (id int) clustered by(id) into 3 buckets " +
+"stored as orc tblproperties (\"transactional\"=\"true\")")
+.run("insert into t1 values(1)")
+.dump(primaryDbName);
+
+ReplicationMetric metric = collector.getMetrics().getLast();
+assertEquals(metric.getProgress().getStatus(), Status.SUCCESS);
+
+replica.load(replicatedDbName, primaryDbName)
+.run("use " + replicatedDbName)
+.run("show tables")
+.verifyResults(new String[]{"t1"})
+.run("repl status " + replicatedDbName)
+.verifyResult(dumpData.lastReplicationId)
+.run("select id from t1")
+.verifyResults(new String[]{"1"});
+
+Path nonRecoverableFile = new Path(new Path(dumpData.dumpLocation), 
ReplAck.NON_RECOVERABLE_MARKER.toString());
+FileSystem fs = new Path(dumpData.dumpLocation).getFileSystem(conf);
+fs.create(nonRecoverableFile);
+
+primary.dumpFailure(primaryDbName);
+
+metric = collector.getMetrics().getLast();
+assertEquals(metric.getProgress().getStatus(), Status.SKIPPED);
+assertEquals(metric.getProgress().getStages().get(0).getErrorLogPath(), 
nonRecoverableFile.toString());
+
+primary.dumpFailure(primaryDbName);
+metric = collector.getMetrics().getLast();
+assertEquals(metric.getProgress().getStatus(), Status.SKIPPED);
+assertEquals(metric.getProgress().getStages().get(0).getErrorLogPath(), 
nonRecoverableFile.toString());
+
+fs.delete(nonRecoverableFile, true);
+dumpData = primary.dump(primaryDbName);
+
+metric = collector.getMetrics().getLast();
+assertEquals(metric.getProgress().getStatus(), Status.SUCCESS);
+
+replica.run("ALTER DATABASE " + replicatedDbName +
+" SET DBPROPERTIES('" + ReplConst.REPL_INCOMPATIBLE + "'='true')");
+replica.loadFailure(replicatedDbName, primaryDbName);
+
+nonRecoverableFile = new Path(new Path(dumpData.dumpLocation), 
ReplAck.NON_RECOVERABLE_MARKER.toString());
+assertTrue(fs.exists(nonRecoverableFile));
+
+metric = collector.getMetrics().getLast();
+assertEquals(metric.getProgress().getStatus(), Status.FAILED_ADMIN);
+assertEquals(metric.getProgress().getStages().get(0).getErrorLogPath(), 
nonRecoverableFile.toString());
+
+replica.loadFailure(replicatedDbName, primaryDbName);

[jira] [Work logged] (HIVE-25814) Add entry in replication_metrics table for skipped replication iterations.

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25814?focusedWorklogId=721854=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-721854
 ]

ASF GitHub Bot logged work on HIVE-25814:
-

Author: ASF GitHub Bot
Created on: 07/Feb/22 11:35
Start Date: 07/Feb/22 11:35
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2907:
URL: https://github.com/apache/hive/pull/2907#discussion_r800566878



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -156,6 +160,106 @@ public void tearDown() throws Throwable {
 primary.run("drop database if exists " + primaryDbName + "_extra cascade");
   }
 
+  @Test
+  public void testReplicationMetricForSkippedIteration() throws Throwable {
+isMetricsEnabledForTests(true);
+MetricCollector collector = MetricCollector.getInstance();
+WarehouseInstance.Tuple dumpData = primary.run("use " + primaryDbName)
+.run("create table t1 (id int) clustered by(id) into 3 buckets " +
+"stored as orc tblproperties (\"transactional\"=\"true\")")
+.run("insert into t1 values(1)")
+.dump(primaryDbName);
+
+
+ReplicationMetric metric = collector.getMetrics().getLast();
+assertEquals(metric.getProgress().getStatus(), Status.SUCCESS);
+
+primary.dump(primaryDbName);
+
+metric = collector.getMetrics().getLast();
+assertEquals(metric.getProgress().getStatus(), Status.SKIPPED);
+
+replica.load(replicatedDbName, primaryDbName)
+.run("use " + replicatedDbName)
+.run("show tables")
+.verifyResults(new String[]{"t1"})
+.run("repl status " + replicatedDbName)
+.verifyResult(dumpData.lastReplicationId)
+.run("select id from t1")
+.verifyResults(new String[]{"1"});
+
+metric = collector.getMetrics().getLast();
+assertEquals(metric.getProgress().getStatus(), Status.SUCCESS);
+
+replica.load(replicatedDbName, primaryDbName);
+
+metric = collector.getMetrics().getLast();
+assertEquals(metric.getProgress().getStatus(), Status.SKIPPED);
+isMetricsEnabledForTests(false);
+  }
+
+  @Test
+  public void testReplicationMetricForFailedIteration() throws Throwable {
+isMetricsEnabledForTests(true);
+MetricCollector collector = MetricCollector.getInstance();
+WarehouseInstance.Tuple dumpData = primary.run("use " + primaryDbName)
+.run("create table t1 (id int) clustered by(id) into 3 buckets " +
+"stored as orc tblproperties (\"transactional\"=\"true\")")
+.run("insert into t1 values(1)")
+.dump(primaryDbName);
+
+ReplicationMetric metric = collector.getMetrics().getLast();
+assertEquals(metric.getProgress().getStatus(), Status.SUCCESS);
+
+replica.load(replicatedDbName, primaryDbName)
+.run("use " + replicatedDbName)
+.run("show tables")
+.verifyResults(new String[]{"t1"})
+.run("repl status " + replicatedDbName)
+.verifyResult(dumpData.lastReplicationId)
+.run("select id from t1")
+.verifyResults(new String[]{"1"});
+
+Path nonRecoverableFile = new Path(new Path(dumpData.dumpLocation), 
ReplAck.NON_RECOVERABLE_MARKER.toString());
+FileSystem fs = new Path(dumpData.dumpLocation).getFileSystem(conf);
+fs.create(nonRecoverableFile);
+
+primary.dumpFailure(primaryDbName);
+
+metric = collector.getMetrics().getLast();
+assertEquals(metric.getProgress().getStatus(), Status.SKIPPED);
+assertEquals(metric.getProgress().getStages().get(0).getErrorLogPath(), 
nonRecoverableFile.toString());
+
+primary.dumpFailure(primaryDbName);
+metric = collector.getMetrics().getLast();
+assertEquals(metric.getProgress().getStatus(), Status.SKIPPED);
+assertEquals(metric.getProgress().getStages().get(0).getErrorLogPath(), 
nonRecoverableFile.toString());
+
+fs.delete(nonRecoverableFile, true);
+dumpData = primary.dump(primaryDbName);
+
+metric = collector.getMetrics().getLast();
+assertEquals(metric.getProgress().getStatus(), Status.SUCCESS);
+
+replica.run("ALTER DATABASE " + replicatedDbName +
+" SET DBPROPERTIES('" + ReplConst.REPL_INCOMPATIBLE + "'='true')");
+replica.loadFailure(replicatedDbName, primaryDbName);
+
+nonRecoverableFile = new Path(new Path(dumpData.dumpLocation), 
ReplAck.NON_RECOVERABLE_MARKER.toString());
+assertTrue(fs.exists(nonRecoverableFile));
+
+metric = collector.getMetrics().getLast();
+assertEquals(metric.getProgress().getStatus(), Status.FAILED_ADMIN);
+assertEquals(metric.getProgress().getStages().get(0).getErrorLogPath(), 
nonRecoverableFile.toString());
+
+replica.loadFailure(replicatedDbName, primaryDbName);

[jira] [Work logged] (HIVE-25898) Compaction txn heartbeating after Worker timeout

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25898?focusedWorklogId=721844=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-721844
 ]

ASF GitHub Bot logged work on HIVE-25898:
-

Author: ASF GitHub Bot
Created on: 07/Feb/22 11:04
Start Date: 07/Feb/22 11:04
Worklog Time Spent: 10m 
  Work Description: veghlaci05 commented on a change in pull request #2981:
URL: https://github.com/apache/hive/pull/2981#discussion_r800461666



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
##
@@ -685,11 +672,16 @@ void open(CompactionInfo ci) throws TException {
 + "}, status {" + res.getState() + "}, reason {" + 
res.getErrorMessage() + "}");
   }
   lockId = res.getLockid();
-
-  heartbeatExecutor = Executors.newSingleThreadScheduledExecutor();
+  heartbeatExecutor = Executors.newSingleThreadScheduledExecutor(
+  CompactorUtil.createThreadFactory(
+  "CompactionTxn Heartbeater - " + txnId, 
Thread.MIN_PRIORITY, true));

Review comment:
   I would keep it as-is, with Denys we think the utility method can be 
used elsewhere in the future, and the thread naming will also be more explicit 
this way.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 721844)
Time Spent: 3h  (was: 2h 50m)

> Compaction txn heartbeating after Worker timeout
> 
>
> Key: HIVE-25898
> URL: https://issues.apache.org/jira/browse/HIVE-25898
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> In some cases, when the compaction transaction is aborted, the hearbeater 
> thread is not shut down and keeps heartbeating.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-21100) Allow flattening of table subdirectories resulted when using TEZ engine and UNION clause

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21100?focusedWorklogId=721812=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-721812
 ]

ASF GitHub Bot logged work on HIVE-21100:
-

Author: ASF GitHub Bot
Created on: 07/Feb/22 09:55
Start Date: 07/Feb/22 09:55
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #2921:
URL: https://github.com/apache/hive/pull/2921#issuecomment-1031274531


   Thanks @hsnusonic for the feedback!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 721812)
Time Spent: 2.5h  (was: 2h 20m)

> Allow flattening of table subdirectories resulted when using TEZ engine and 
> UNION clause
> 
>
> Key: HIVE-21100
> URL: https://issues.apache.org/jira/browse/HIVE-21100
> Project: Hive
>  Issue Type: Improvement
>Reporter: George Pachitariu
>Assignee: George Pachitariu
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-21100.1.patch, HIVE-21100.2.patch, 
> HIVE-21100.3.patch, HIVE-21100.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Right now, when writing data into a table with Tez engine and the clause 
> UNION ALL is the last step of the query, Hive on Tez will create a 
> subdirectory for each branch of the UNION ALL.
> With this patch the subdirectories are removed, and the files are renamed and 
> moved to the parent directory.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25892) Group HMSHandler's thread locals into a single context

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25892?focusedWorklogId=721806=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-721806
 ]

ASF GitHub Bot logged work on HIVE-25892:
-

Author: ASF GitHub Bot
Created on: 07/Feb/22 09:43
Start Date: 07/Feb/22 09:43
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #2967:
URL: https://github.com/apache/hive/pull/2967#issuecomment-1031263338


   @dengzhhu653: Sorry, I was out of town last week.
   LGTM
   
   @kgyrtkirk, or @nrg4878 would you like to take a look?
   
   Thanks,
   Peter


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 721806)
Time Spent: 3h 40m  (was: 3.5h)

> Group HMSHandler's thread locals into a single context
> --
>
> Key: HIVE-25892
> URL: https://issues.apache.org/jira/browse/HIVE-25892
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> There are more than six ThreadLocal variables in HMSHandler, we can group 
> them together into a single context to improve the management of variables 
> and the code readability.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25915) Query based MINOR compaction fails with NPE if the data is loaded into the ACID table

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25915?focusedWorklogId=721805=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-721805
 ]

ASF GitHub Bot logged work on HIVE-25915:
-

Author: ASF GitHub Bot
Created on: 07/Feb/22 09:42
Start Date: 07/Feb/22 09:42
Worklog Time Spent: 10m 
  Work Description: veghlaci05 commented on a change in pull request #3000:
URL: https://github.com/apache/hive/pull/3000#discussion_r800473637



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
##
@@ -242,6 +242,7 @@ private void scheduleCompactionIfRequired(CompactionInfo 
ci, Table t, Partition
 try {
   ValidWriteIdList validWriteIds = resolveValidWriteIds(t);
   CompactionType type = checkForCompaction(ci, validWriteIds, sd, 
t.getParameters(), runAs);
+  ci.type = type;

Review comment:
   This required to log the proper type in all cases. In case of changing 
the type, and later failing, without this change, the original type would be 
logged.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 721805)
Time Spent: 50m  (was: 40m)

> Query based MINOR compaction fails with NPE if the data is loaded into the 
> ACID table
> -
>
> Key: HIVE-25915
> URL: https://issues.apache.org/jira/browse/HIVE-25915
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
>  #  Create a table with import:
> {{CREATE TABLE temp_acid(id string, value string) CLUSTERED BY(id) INTO 10 
> BUCKETS STORED AS ORC TBLPROPERTIES('transactional'='true');}}
>  # {{insert into temp_acid values 
> ('1','one'),('2','two'),('3','three'),('4','four'),('5','five'),('6','six'),('7','seven'),('8','eight'),('9','nine'),('10','ten'),('11','eleven'),('12','twelve'),('13','thirteen'),('14','fourteen'),('15','fifteen'),('16','sixteen'),('17','seventeen'),('18','eighteen'),('19','nineteen'),('20','twenty');}}
> {{export table temp_acid to '/tmp/temp_acid';}}
> {{{}i{}}}{{{}mport table imported from '/tmp/temp_acid';{}}}
>  # Do some inserts:
> {{insert into imported values ('21', 'value21'),('84', 'value84'),('66', 
> 'value66'),('54', 'value54');
> insert into imported values ('22', 'value22'),('34', 'value34'),('35', 
> 'value35');
> insert into imported values ('75', 'value75'),('99', 'value99');}}
>  # {{Run a minor compaction}}
> If the data is loaded or imported into the table they way it is described 
> above, the rows in the ORC file don't contain the ACID metadata. The 
> query-based MINOR compaction fails on this kind of table, because when the 
> FileSinkOperator tries to read out the bucket metadata from the rows it will 
> throw a NPE. But deleting and updating a table like this is possible. So 
> somehow the bucketId can be calculated for rows like this.
> The non-query based MINOR compaction works fine on a table like this.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25898) Compaction txn heartbeating after Worker timeout

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25898?focusedWorklogId=721795=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-721795
 ]

ASF GitHub Bot logged work on HIVE-25898:
-

Author: ASF GitHub Bot
Created on: 07/Feb/22 09:28
Start Date: 07/Feb/22 09:28
Worklog Time Spent: 10m 
  Work Description: veghlaci05 commented on a change in pull request #2981:
URL: https://github.com/apache/hive/pull/2981#discussion_r800461666



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
##
@@ -685,11 +672,16 @@ void open(CompactionInfo ci) throws TException {
 + "}, status {" + res.getState() + "}, reason {" + 
res.getErrorMessage() + "}");
   }
   lockId = res.getLockid();
-
-  heartbeatExecutor = Executors.newSingleThreadScheduledExecutor();
+  heartbeatExecutor = Executors.newSingleThreadScheduledExecutor(
+  CompactorUtil.createThreadFactory(
+  "CompactionTxn Heartbeater - " + txnId, 
Thread.MIN_PRIORITY, true));

Review comment:
   I would keep it as-is, with Denys we think the utility method can be 
used elsewhere in the future, and the thread naming will also be more explicit 
this way.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 721795)
Time Spent: 2h 50m  (was: 2h 40m)

> Compaction txn heartbeating after Worker timeout
> 
>
> Key: HIVE-25898
> URL: https://issues.apache.org/jira/browse/HIVE-25898
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> In some cases, when the compaction transaction is aborted, the hearbeater 
> thread is not shut down and keeps heartbeating.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-24887) getDatabase() to call translation code even if client has no capabilities

2022-02-07 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24887:

Labels: metastore_translator pull-request-available  (was: 
pull-request-available)

> getDatabase() to call translation code even if client has no capabilities
> -
>
> Key: HIVE-24887
> URL: https://issues.apache.org/jira/browse/HIVE-24887
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: metastore_translator, pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We do this for other calls that go thru translation layer. For some reason, 
> the current code only calls it when the client sets the capabilities.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-24920) TRANSLATED_TO_EXTERNAL tables may write to the same location

2022-02-07 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24920:

Labels: metastore_translator pull-request-available  (was: 
pull-request-available)

> TRANSLATED_TO_EXTERNAL tables may write to the same location
> 
>
> Key: HIVE-24920
> URL: https://issues.apache.org/jira/browse/HIVE-24920
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: metastore_translator, pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {code}
> create table t (a integer);
> insert into t values(1);
> alter table t rename to t2;
> create table t (a integer); -- I expected an exception from this command 
> (location already exists) but because its an external table no exception
> insert into t values(2);
> select * from t;  -- shows 1 and 2
> drop table t2;-- wipes out data location
> select * from t;  -- empty resultset
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25303) CTAS hive.create.as.external.legacy tries to place data files in managed WH path

2022-02-07 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-25303:

Labels: metastore_translator pull-request-available  (was: 
pull-request-available)

> CTAS hive.create.as.external.legacy tries to place data files in managed WH 
> path
> 
>
> Key: HIVE-25303
> URL: https://issues.apache.org/jira/browse/HIVE-25303
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: metastore_translator, pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Under legacy table creation mode (hive.create.as.external.legacy=true), when 
> a database has been created in a specific LOCATION, in a session where that 
> database is Used, tables are created using the following command:
> {code:java}
> CREATE TABLE  AS SELECT {code}
> should inherit the HDFS path from the database's location. Instead, Hive is 
> trying to write the table data into 
> /warehouse/tablespace/managed/hive//
> +Design+: 
>  In the CTAS query, first data is written in the target directory (which 
> happens in HS2) and then the table is created(This happens in HMS). So here 
> two decisions are being made i) target directory location ii) how the table 
> should be created (table type, sd e.t.c).
>  When HS2 needs a target location that needs to be set, it'll make create a 
> table dry run call to HMS (where table translation happens) and i) and ii) 
> decisions are made within HMS and returns table object. Then HS2 will use 
> this location set by HMS for placing the data.
> The  patch for issue addresses the table location being incorrect and table 
> data being empty for the following cases 1) when the external legacy config 
> is set i.e.., hive.create.as.external.legacy=true 2) when the table is 
> created with the transactional property set to false i.e.., TBLPROPERTIES 
> ('transactional'='false')



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25782) Create Table As Select fails for managed ACID tables

2022-02-07 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-25782:

Labels: metastore_translator pull-request-available  (was: 
pull-request-available)

> Create Table As Select fails for managed ACID tables
> 
>
> Key: HIVE-25782
> URL: https://issues.apache.org/jira/browse/HIVE-25782
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Csaba Juhász
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: metastore_translator, pull-request-available
> Attachments: ctas_acid_managed.q
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Create Table As Select fails for managed ACID tables:
> *MetaException(message:Processor has no capabilities, cannot create an ACID 
> table.)*
> HMSHandler.translate_table_dryrun invokes 
> MetastoreDefaultTransformer.transformCreateTable with null 
> processorCapabilities and processorId.
> https://github.com/apache/hive/blob/c7fdd459305f4bf6913dc4bed7e8df8c7bf9e458/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java#L2251
> {code:java}
> Dec 06 05:32:47 Starting translation for CreateTable for processor null with 
> null on table vectortab10korc
> Dec 06 05:32:47 MetaException(message:Processor has no capabilities, cannot 
> create an ACID table.)
>   at 
> org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer.transformCreateTable(MetastoreDefaultTransformer.java:663)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.translate_table_dryrun(HiveMetaStore.java:2159)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
>   at com.sun.proxy.$Proxy29.translate_table_dryrun(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$translate_table_dryrun.getResult(ThriftHiveMetastore.java:16981)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$translate_table_dryrun.getResult(ThriftHiveMetastore.java:16965)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:643)
>   at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:638)
>   at java.base/java.security.AccessController.doPrivileged(Native Method)
>   at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>   at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:638)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> {code}
> Reproduction ([^ctas_acid_managed.q]):
> {code:java}
> set hive.support.concurrency=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set 
> metastore.metadata.transformer.class=org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer;
> create table test stored as orc tblproperties ('transactional'='true') as 
> select from_unixtime(unix_timestamp("0002-01-01 09:57:21", "-MM-dd 
> HH:mm:ss")); {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25630) Transformer fixes

2022-02-07 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-25630:

Labels: metastore_translator pull-request-available  (was: 
pull-request-available)

> Transformer fixes
> -
>
> Key: HIVE-25630
> URL: https://issues.apache.org/jira/browse/HIVE-25630
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: metastore_translator, pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> there are some issues:
> * AlreadyExistsException might be suppressed by the translator
> * uppercase letter usage may cause problems for some clients
> * add a way to suppress location checks for legacy clients



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-24954) MetastoreTransformer is disabled during testing

2022-02-07 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24954:

Labels: metastore_translator pull-request-available  (was: 
pull-request-available)

> MetastoreTransformer is disabled during testing
> ---
>
> Key: HIVE-24954
> URL: https://issues.apache.org/jira/browse/HIVE-24954
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: metastore_translator, pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> all calls are fortified with "isInTest" guards to avoid testing those calls 
> (!@#$#)
> https://github.com/apache/hive/blob/86fa9b30fe347c7fc78a2930f4d20ece2e124f03/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java#L1647
> this causes some wierd behaviour:
> out of the box hive installation creates TRANSLATED_TO_EXTERNAL external 
> tables for plain CREATE TABLE commands
> meanwhile during when most testing is executed CREATE table creates regular 
> MANAGED tables...



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-24951) Table created with Uppercase name using CTAS does not produce result for select queries

2022-02-07 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24951:

Labels: metastore_translator pull-request-available  (was: 
pull-request-available)

> Table created with Uppercase name using CTAS does not produce result for 
> select queries
> ---
>
> Key: HIVE-24951
> URL: https://issues.apache.org/jira/browse/HIVE-24951
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
>  Labels: metastore_translator, pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Steps to repro:
> {code:java}
> CREATE EXTERNAL TABLE MY_TEST AS SELECT * FROM source
> Table created with Location but does not have any data moved to it.
> /warehouse/tablespace/external/hive/MY_TEST
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25884) Improve rule description for rules defined as subclasses

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25884?focusedWorklogId=721783=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-721783
 ]

ASF GitHub Bot logged work on HIVE-25884:
-

Author: ASF GitHub Bot
Created on: 07/Feb/22 09:11
Start Date: 07/Feb/22 09:11
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #2957:
URL: https://github.com/apache/hive/pull/2957#discussion_r800446246



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterJoinRule.java
##
@@ -130,7 +130,7 @@ public void onMatch(RelOptRuleCall call) {
 
   public static class HiveFilterJoinTransposeRule extends HiveFilterJoinRule {
 public HiveFilterJoinTransposeRule() {
-  super(RelOptRule.operand(Join.class, RelOptRule.any()), 
"HiveFilterJoinRule:no-filter", true,
+  super(RelOptRule.operand(Join.class, RelOptRule.any()), 
"HiveFilterJoinTransposeRule", true,

Review comment:
   this could also be handled by `getClass().getSimpleName()` - I think 
that would have worked in the superclass as well




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 721783)
Time Spent: 0.5h  (was: 20m)

> Improve rule description for rules defined as subclasses
> 
>
> Key: HIVE-25884
> URL: https://issues.apache.org/jira/browse/HIVE-25884
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Consider the instances of _HivePointLookupOptimizerRule_ (for joins, filters 
> and projects). 
> They use the [default 
> constructor|https://github.com/apache/calcite/blob/0065d7c179b98698f018f83b0af0845a6698fc54/core/src/main/java/org/apache/calcite/plan/RelOptRule.java#L79]
>  for _RelOptRule_, which builds the rule description from the class name, and 
> in case of nested classes, it takes only the inner class name.
> In this case, the names do not refer to _HivePointLookupOptimizerRule_ and 
> are too generic (e.g.,_FilerCondition_), it's hard to link them back to the 
> rule they belong to without looking at the source code.
> This is particularly problematic now that we have more detailed logging for 
> CBO (see [HIVE-25816|https://issues.apache.org/jira/browse/HIVE-25816]), 
> where rule descriptions are printed.
> The aim of the PR is to improve the rule description by passing an explicit 
> string whenever the rule (class) name alone is not enough.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25766) java.util.NoSuchElementException in HiveFilterProjectTransposeRule if predicate has no InputRef

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25766?focusedWorklogId=721778=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-721778
 ]

ASF GitHub Bot logged work on HIVE-25766:
-

Author: ASF GitHub Bot
Created on: 07/Feb/22 09:03
Start Date: 07/Feb/22 09:03
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #2839:
URL: https://github.com/apache/hive/pull/2839


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 721778)
Time Spent: 2h  (was: 1h 50m)

> java.util.NoSuchElementException in HiveFilterProjectTransposeRule if 
> predicate has no InputRef
> ---
>
> Key: HIVE-25766
> URL: https://issues.apache.org/jira/browse/HIVE-25766
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The issue can be reproduced with the following query:
> {code:java}
> create table test1 (s string);
> create table test2 (m string);
> EXPLAIN CBO SELECT c.m
> FROM (
>   SELECT substr(from_unixtime(unix_timestamp(), '-MM-dd'), 1, 1) as m
>   FROM test1
>   WHERE substr(from_unixtime(unix_timestamp(), '-MM-dd'), 1, 1) = '2') c
> JOIN test2 d ON c.m = d.m;
> {code}
> It fails with the following exception:
> {noformat}
>  java.util.NoSuchElementException
>     at java.util.HashMap$HashIterator.nextNode(HashMap.java:1447)
>     at java.util.HashMap$KeyIterator.next(HashMap.java:1469)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFilterProjectTransposeRule$RedundancyChecker.check(HiveFilterProjectTransposeRule.java:348)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFilterProjectTransposeRule$RedundancyChecker.visit(HiveFilterProjectTransposeRule.java:306)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFilterProjectTransposeRule$RedundancyChecker.visit(HiveFilterProjectTransposeRule.java:303)
>     at org.apache.calcite.rel.SingleRel.childrenAccept(SingleRel.java:72)
>     at org.apache.calcite.rel.RelVisitor.visit(RelVisitor.java:44)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFilterProjectTransposeRule$RedundancyChecker.visit(HiveFilterProjectTransposeRule.java:316)
>     at org.apache.calcite.rel.RelVisitor.go(RelVisitor.java:61)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFilterProjectTransposeRule.isRedundantIsNotNull(HiveFilterProjectTransposeRule.java:276)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFilterProjectTransposeRule.onMatch(HiveFilterProjectTransposeRule.java:191){noformat}
> The current implementation, while checking if the predicate to be transposed 
> is redundant or not, it expects at least one InputRef, but the predicate can 
> have none as in this case.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25915) Query based MINOR compaction fails with NPE if the data is loaded into the ACID table

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25915?focusedWorklogId=721774=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-721774
 ]

ASF GitHub Bot logged work on HIVE-25915:
-

Author: ASF GitHub Bot
Created on: 07/Feb/22 08:57
Start Date: 07/Feb/22 08:57
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #3000:
URL: https://github.com/apache/hive/pull/3000#discussion_r800434472



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
##
@@ -501,7 +510,7 @@ protected Boolean findNextCompactionAndExecute(boolean 
collectGenericStats, bool
 LOG.error("Caught exception while trying to compact " + ci +
 ". Marking failed to avoid repeated failures", e);
 final CompactionType ctype = ci.type;
-markFailed(ci, e);
+markFailed(ci, e.getMessage());

Review comment:
   We shouldn't mark compaction as failed but rather attempted/did not 
initiate




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 721774)
Time Spent: 40m  (was: 0.5h)

> Query based MINOR compaction fails with NPE if the data is loaded into the 
> ACID table
> -
>
> Key: HIVE-25915
> URL: https://issues.apache.org/jira/browse/HIVE-25915
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
>  #  Create a table with import:
> {{CREATE TABLE temp_acid(id string, value string) CLUSTERED BY(id) INTO 10 
> BUCKETS STORED AS ORC TBLPROPERTIES('transactional'='true');}}
>  # {{insert into temp_acid values 
> ('1','one'),('2','two'),('3','three'),('4','four'),('5','five'),('6','six'),('7','seven'),('8','eight'),('9','nine'),('10','ten'),('11','eleven'),('12','twelve'),('13','thirteen'),('14','fourteen'),('15','fifteen'),('16','sixteen'),('17','seventeen'),('18','eighteen'),('19','nineteen'),('20','twenty');}}
> {{export table temp_acid to '/tmp/temp_acid';}}
> {{{}i{}}}{{{}mport table imported from '/tmp/temp_acid';{}}}
>  # Do some inserts:
> {{insert into imported values ('21', 'value21'),('84', 'value84'),('66', 
> 'value66'),('54', 'value54');
> insert into imported values ('22', 'value22'),('34', 'value34'),('35', 
> 'value35');
> insert into imported values ('75', 'value75'),('99', 'value99');}}
>  # {{Run a minor compaction}}
> If the data is loaded or imported into the table they way it is described 
> above, the rows in the ORC file don't contain the ACID metadata. The 
> query-based MINOR compaction fails on this kind of table, because when the 
> FileSinkOperator tries to read out the bucket metadata from the rows it will 
> throw a NPE. But deleting and updating a table like this is possible. So 
> somehow the bucketId can be calculated for rows like this.
> The non-query based MINOR compaction works fine on a table like this.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25915) Query based MINOR compaction fails with NPE if the data is loaded into the ACID table

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25915?focusedWorklogId=721773=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-721773
 ]

ASF GitHub Bot logged work on HIVE-25915:
-

Author: ASF GitHub Bot
Created on: 07/Feb/22 08:55
Start Date: 07/Feb/22 08:55
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #3000:
URL: https://github.com/apache/hive/pull/3000#discussion_r800433132



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
##
@@ -433,6 +434,11 @@ private AcidDirectory getAcidDirectory(StorageDescriptor 
sd, ValidWriteIdList wr
   private CompactionType determineCompactionType(CompactionInfo ci, 
AcidDirectory dir, Map tblproperties, long baseSize, long deltaSize) throws IOException 
{
 boolean noBase = false;
+
+//Minor compaction is not possible for tables having raw format (non-acid) 
data in them.
+if (dir.getOriginalFiles().size() > 0 || 
dir.getCurrentDirectories().stream().anyMatch(AcidUtils.ParsedDelta::isRawFormat))
 {

Review comment:
   could we extract this check into an Initiator and Worker common parent?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 721773)
Time Spent: 0.5h  (was: 20m)

> Query based MINOR compaction fails with NPE if the data is loaded into the 
> ACID table
> -
>
> Key: HIVE-25915
> URL: https://issues.apache.org/jira/browse/HIVE-25915
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Steps to reproduce:
>  #  Create a table with import:
> {{CREATE TABLE temp_acid(id string, value string) CLUSTERED BY(id) INTO 10 
> BUCKETS STORED AS ORC TBLPROPERTIES('transactional'='true');}}
>  # {{insert into temp_acid values 
> ('1','one'),('2','two'),('3','three'),('4','four'),('5','five'),('6','six'),('7','seven'),('8','eight'),('9','nine'),('10','ten'),('11','eleven'),('12','twelve'),('13','thirteen'),('14','fourteen'),('15','fifteen'),('16','sixteen'),('17','seventeen'),('18','eighteen'),('19','nineteen'),('20','twenty');}}
> {{export table temp_acid to '/tmp/temp_acid';}}
> {{{}i{}}}{{{}mport table imported from '/tmp/temp_acid';{}}}
>  # Do some inserts:
> {{insert into imported values ('21', 'value21'),('84', 'value84'),('66', 
> 'value66'),('54', 'value54');
> insert into imported values ('22', 'value22'),('34', 'value34'),('35', 
> 'value35');
> insert into imported values ('75', 'value75'),('99', 'value99');}}
>  # {{Run a minor compaction}}
> If the data is loaded or imported into the table they way it is described 
> above, the rows in the ORC file don't contain the ACID metadata. The 
> query-based MINOR compaction fails on this kind of table, because when the 
> FileSinkOperator tries to read out the bucket metadata from the rows it will 
> throw a NPE. But deleting and updating a table like this is possible. So 
> somehow the bucketId can be calculated for rows like this.
> The non-query based MINOR compaction works fine on a table like this.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25915) Query based MINOR compaction fails with NPE if the data is loaded into the ACID table

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25915?focusedWorklogId=721770=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-721770
 ]

ASF GitHub Bot logged work on HIVE-25915:
-

Author: ASF GitHub Bot
Created on: 07/Feb/22 08:52
Start Date: 07/Feb/22 08:52
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #3000:
URL: https://github.com/apache/hive/pull/3000#discussion_r800430818



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
##
@@ -242,6 +242,7 @@ private void scheduleCompactionIfRequired(CompactionInfo 
ci, Table t, Partition
 try {
   ValidWriteIdList validWriteIds = resolveValidWriteIds(t);
   CompactionType type = checkForCompaction(ci, validWriteIds, sd, 
t.getParameters(), runAs);
+  ci.type = type;

Review comment:
   I don't think this is necessary as we are passing compaction type in the 
request method. Type is set on CompactionRequest object.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 721770)
Time Spent: 20m  (was: 10m)

> Query based MINOR compaction fails with NPE if the data is loaded into the 
> ACID table
> -
>
> Key: HIVE-25915
> URL: https://issues.apache.org/jira/browse/HIVE-25915
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
>  #  Create a table with import:
> {{CREATE TABLE temp_acid(id string, value string) CLUSTERED BY(id) INTO 10 
> BUCKETS STORED AS ORC TBLPROPERTIES('transactional'='true');}}
>  # {{insert into temp_acid values 
> ('1','one'),('2','two'),('3','three'),('4','four'),('5','five'),('6','six'),('7','seven'),('8','eight'),('9','nine'),('10','ten'),('11','eleven'),('12','twelve'),('13','thirteen'),('14','fourteen'),('15','fifteen'),('16','sixteen'),('17','seventeen'),('18','eighteen'),('19','nineteen'),('20','twenty');}}
> {{export table temp_acid to '/tmp/temp_acid';}}
> {{{}i{}}}{{{}mport table imported from '/tmp/temp_acid';{}}}
>  # Do some inserts:
> {{insert into imported values ('21', 'value21'),('84', 'value84'),('66', 
> 'value66'),('54', 'value54');
> insert into imported values ('22', 'value22'),('34', 'value34'),('35', 
> 'value35');
> insert into imported values ('75', 'value75'),('99', 'value99');}}
>  # {{Run a minor compaction}}
> If the data is loaded or imported into the table they way it is described 
> above, the rows in the ORC file don't contain the ACID metadata. The 
> query-based MINOR compaction fails on this kind of table, because when the 
> FileSinkOperator tries to read out the bucket metadata from the rows it will 
> throw a NPE. But deleting and updating a table like this is possible. So 
> somehow the bucketId can be calculated for rows like this.
> The non-query based MINOR compaction works fine on a table like this.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-25926) Move all logging from AcidMetricService to AcidMetricLogger

2022-02-07 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage resolved HIVE-25926.
--
Resolution: Fixed

Committed to master branch. Thanks for your contribution [~vcsomor]!

> Move all logging from AcidMetricService to AcidMetricLogger
> ---
>
> Key: HIVE-25926
> URL: https://issues.apache.org/jira/browse/HIVE-25926
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Viktor Csomor
>Assignee: Viktor Csomor
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Some logging is left over in AcidMetricService (which used to both log 
> errors/warnings and emit metrics, and now just emits metrics). Logging should 
> be in the AcidMetricLogger as much as possible. Stragglers left over in 
> AcidMetricService:
>  * compaction_oldest_enqueue_age_in_sec
>  * COMPACTOR_FAILED_COMPACTION_RATIO_THRESHOLD logging
>  * Multiple Compaction Worker versions (in method: 
> detectMultipleWorkerVersions)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25926) Move all logging from AcidMetricService to AcidMetricLogger

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25926?focusedWorklogId=721768=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-721768
 ]

ASF GitHub Bot logged work on HIVE-25926:
-

Author: ASF GitHub Bot
Created on: 07/Feb/22 08:38
Start Date: 07/Feb/22 08:38
Worklog Time Spent: 10m 
  Work Description: klcopp merged pull request #2995:
URL: https://github.com/apache/hive/pull/2995


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 721768)
Time Spent: 40m  (was: 0.5h)

> Move all logging from AcidMetricService to AcidMetricLogger
> ---
>
> Key: HIVE-25926
> URL: https://issues.apache.org/jira/browse/HIVE-25926
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Viktor Csomor
>Assignee: Viktor Csomor
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Some logging is left over in AcidMetricService (which used to both log 
> errors/warnings and emit metrics, and now just emits metrics). Logging should 
> be in the AcidMetricLogger as much as possible. Stragglers left over in 
> AcidMetricService:
>  * compaction_oldest_enqueue_age_in_sec
>  * COMPACTOR_FAILED_COMPACTION_RATIO_THRESHOLD logging
>  * Multiple Compaction Worker versions (in method: 
> detectMultipleWorkerVersions)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25912) Drop external table at root of s3 bucket throws NPE

2022-02-07 Thread Fachuan Bai (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fachuan Bai updated HIVE-25912:
---
Hadoop Flags: Incompatible change

> Drop external table at root of s3 bucket throws NPE
> ---
>
> Key: HIVE-25912
> URL: https://issues.apache.org/jira/browse/HIVE-25912
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.1.2
> Environment: Hive version: 3.1.2
>Reporter: Fachuan Bai
>Assignee: Fachuan Bai
>Priority: Major
>  Labels: metastore, pull-request-available
> Attachments: hive bugs.png
>
>   Original Estimate: 96h
>  Time Spent: 50m
>  Remaining Estimate: 95h 10m
>
> I create the external hive table using this command:
>  
> {code:java}
> CREATE EXTERNAL TABLE `fcbai`(
> `inv_item_sk` int,
> `inv_warehouse_sk` int,
> `inv_quantity_on_hand` int)
> PARTITIONED BY (
> `inv_date_sk` int) STORED AS ORC
> LOCATION
> 'hdfs://emr-master-1:8020/';
> {code}
>  
> The table was created successfully, but  when I drop the table throw the NPE:
>  
> {code:java}
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
> MetaException(message:java.lang.NullPointerException) 
> (state=08S01,code=1){code}
>  
> The same bug can reproduction on the other object storage file system, such 
> as S3 or TOS:
> {code:java}
> CREATE EXTERNAL TABLE `fcbai`(
> `inv_item_sk` int,
> `inv_warehouse_sk` int,
> `inv_quantity_on_hand` int)
> PARTITIONED BY (
> `inv_date_sk` int) STORED AS ORC
> LOCATION
> 's3a://bucketname/'; // 'tos://bucketname/'{code}
>  
> I see the source code found:
>  common/src/java/org/apache/hadoop/hive/common/FileUtils.java
> {code:java}
> // check if sticky bit is set on the parent dir
> FileStatus parStatus = fs.getFileStatus(path.getParent());
> if (!shims.hasStickyBit(parStatus.getPermission())) {
>   // no sticky bit, so write permission on parent dir is sufficient
>   // no further checks needed
>   return;
> }{code}
>  
> because I set the table location to HDFS root path 
> (hdfs://emr-master-1:8020/), so the  path.getParent() function will be return 
> null cause the NPE.
> I think have four solutions to fix the bug:
>  # modify the create table function, if the location is root dir return 
> create table fail.
>  # modify the  FileUtils.checkDeletePermission function, check the 
> path.getParent(), if it is null, the function return, drop successfully.
>  # modify the RangerHiveAuthorizer.checkPrivileges function of the hive 
> ranger plugin(in ranger rep), if the location is root dir return create table 
> fail.
>  # modify the HDFS Path object, if the URI is root dir, path.getParent() 
> return not null.
> I recommend the first or second method, any suggestion for me? thx.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25912) Drop external table at root of s3 bucket throws NPE

2022-02-07 Thread Fachuan Bai (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fachuan Bai updated HIVE-25912:
---
Hadoop Flags:   (was: Incompatible change)

> Drop external table at root of s3 bucket throws NPE
> ---
>
> Key: HIVE-25912
> URL: https://issues.apache.org/jira/browse/HIVE-25912
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.1.2
> Environment: Hive version: 3.1.2
>Reporter: Fachuan Bai
>Assignee: Fachuan Bai
>Priority: Major
>  Labels: metastore, pull-request-available
> Attachments: hive bugs.png
>
>   Original Estimate: 96h
>  Time Spent: 50m
>  Remaining Estimate: 95h 10m
>
> I create the external hive table using this command:
>  
> {code:java}
> CREATE EXTERNAL TABLE `fcbai`(
> `inv_item_sk` int,
> `inv_warehouse_sk` int,
> `inv_quantity_on_hand` int)
> PARTITIONED BY (
> `inv_date_sk` int) STORED AS ORC
> LOCATION
> 'hdfs://emr-master-1:8020/';
> {code}
>  
> The table was created successfully, but  when I drop the table throw the NPE:
>  
> {code:java}
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
> MetaException(message:java.lang.NullPointerException) 
> (state=08S01,code=1){code}
>  
> The same bug can reproduction on the other object storage file system, such 
> as S3 or TOS:
> {code:java}
> CREATE EXTERNAL TABLE `fcbai`(
> `inv_item_sk` int,
> `inv_warehouse_sk` int,
> `inv_quantity_on_hand` int)
> PARTITIONED BY (
> `inv_date_sk` int) STORED AS ORC
> LOCATION
> 's3a://bucketname/'; // 'tos://bucketname/'{code}
>  
> I see the source code found:
>  common/src/java/org/apache/hadoop/hive/common/FileUtils.java
> {code:java}
> // check if sticky bit is set on the parent dir
> FileStatus parStatus = fs.getFileStatus(path.getParent());
> if (!shims.hasStickyBit(parStatus.getPermission())) {
>   // no sticky bit, so write permission on parent dir is sufficient
>   // no further checks needed
>   return;
> }{code}
>  
> because I set the table location to HDFS root path 
> (hdfs://emr-master-1:8020/), so the  path.getParent() function will be return 
> null cause the NPE.
> I think have four solutions to fix the bug:
>  # modify the create table function, if the location is root dir return 
> create table fail.
>  # modify the  FileUtils.checkDeletePermission function, check the 
> path.getParent(), if it is null, the function return, drop successfully.
>  # modify the RangerHiveAuthorizer.checkPrivileges function of the hive 
> ranger plugin(in ranger rep), if the location is root dir return create table 
> fail.
>  # modify the HDFS Path object, if the URI is root dir, path.getParent() 
> return not null.
> I recommend the first or second method, any suggestion for me? thx.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)