[jira] [Work logged] (HIVE-16352) Ability to skip or repair out of sync blocks with HIVE at runtime

2020-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-16352?focusedWorklogId=475100=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-475100
 ]

ASF GitHub Bot logged work on HIVE-16352:
-

Author: ASF GitHub Bot
Created on: 27/Aug/20 02:02
Start Date: 27/Aug/20 02:02
Worklog Time Spent: 10m 
  Work Description: gabrywu opened a new pull request #1436:
URL: https://github.com/apache/hive/pull/1436


   ### What changes were proposed in this pull request?
   1. add AvroGenericRecordReader.nextRecord
   2. optimize AvroGenericRecordReader.next adding ability to skip invalid sync 
blocks
   3. add enum value AVRO_SERDE_ERROR_SKIP to AvroSerdeUtils.AvroTableProperties
   
   ### Why are the changes needed?
   
   when reading the Avro file which has a bad file format in Hive, we want to 
skip the invalid sync errors simply
   https://issues.apache.org/jira/browse/HIVE-16352
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   NO. The default value of AVRO_SERDE_ERROR_SKIP is false keeping the original 
logic
   
   ### How was this patch tested?
   
   add unit test cases in TestAvroGenericRecordReader.class
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 475100)
Time Spent: 0.5h  (was: 20m)

> Ability to skip or repair out of sync blocks with HIVE at runtime
> -
>
> Key: HIVE-16352
> URL: https://issues.apache.org/jira/browse/HIVE-16352
> Project: Hive
>  Issue Type: New Feature
>Reporter: Navdeep Poonia
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When a file is corrupted it raises the error java.io.IOException: Invalid 
> sync! with hive.
>  Can we have some functionality to skip or repair such blocks at runtime to 
> make avro more error resilient in case of data corruption.
>  Error: java.io.IOException: java.io.IOException: java.io.IOException: While 
> processing file 
> s3n:///navdeepp/warehouse/avro_test/354dc34474404f4bbc0d8013fc8e6e4b_42.
>  java.io.IOException: Invalid sync!
>  at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>  at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>  at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:334)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-16352) Ability to skip or repair out of sync blocks with HIVE at runtime

2020-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-16352?focusedWorklogId=475092=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-475092
 ]

ASF GitHub Bot logged work on HIVE-16352:
-

Author: ASF GitHub Bot
Created on: 27/Aug/20 01:39
Start Date: 27/Aug/20 01:39
Worklog Time Spent: 10m 
  Work Description: gabrywu closed pull request #1434:
URL: https://github.com/apache/hive/pull/1434


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 475092)
Time Spent: 20m  (was: 10m)

> Ability to skip or repair out of sync blocks with HIVE at runtime
> -
>
> Key: HIVE-16352
> URL: https://issues.apache.org/jira/browse/HIVE-16352
> Project: Hive
>  Issue Type: New Feature
>Reporter: Navdeep Poonia
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When a file is corrupted it raises the error java.io.IOException: Invalid 
> sync! with hive.
>  Can we have some functionality to skip or repair such blocks at runtime to 
> make avro more error resilient in case of data corruption.
>  Error: java.io.IOException: java.io.IOException: java.io.IOException: While 
> processing file 
> s3n:///navdeepp/warehouse/avro_test/354dc34474404f4bbc0d8013fc8e6e4b_42.
>  java.io.IOException: Invalid sync!
>  at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>  at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>  at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:334)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23668) Clean up Task for Hive Metrics

2020-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23668?focusedWorklogId=475077=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-475077
 ]

ASF GitHub Bot logged work on HIVE-23668:
-

Author: ASF GitHub Bot
Created on: 27/Aug/20 00:41
Start Date: 27/Aug/20 00:41
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1129:
URL: https://github.com/apache/hive/pull/1129#issuecomment-681195608


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 475077)
Time Spent: 2.5h  (was: 2h 20m)

> Clean up Task for Hive Metrics
> --
>
> Key: HIVE-23668
> URL: https://issues.apache.org/jira/browse/HIVE-23668
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23668.01.patch, HIVE-23668.02.patch, 
> HIVE-23668.03.patch, HIVE-23668.04.patch, HIVE-23668.05.patch, 
> HIVE-23668.06.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24076) MetastoreDirectSql.getDatabase() needs a space in the query

2020-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24076?focusedWorklogId=474992=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474992
 ]

ASF GitHub Bot logged work on HIVE-24076:
-

Author: ASF GitHub Bot
Created on: 26/Aug/20 20:08
Start Date: 26/Aug/20 20:08
Worklog Time Spent: 10m 
  Work Description: nrg4878 closed pull request #1433:
URL: https://github.com/apache/hive/pull/1433


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 474992)
Time Spent: 0.5h  (was: 20m)

> MetastoreDirectSql.getDatabase() needs a space in the query
> ---
>
> Key: HIVE-24076
> URL: https://issues.apache.org/jira/browse/HIVE-24076
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> String queryTextDbSelector= "select "
>   + "\"DB_ID\", \"NAME\", \"DB_LOCATION_URI\", \"DESC\", "
>   + "\"OWNER_NAME\", \"OWNER_TYPE\", \"CTLG_NAME\" , \"CREATE_TIME\", 
> \"DB_MANAGED_LOCATION_URI\""
>   + "FROM "+ DBS
> There needs to be a space before FROM so the query is right. Currently it 
> falls back to JDO, so not lapse in functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24076) MetastoreDirectSql.getDatabase() needs a space in the query

2020-08-26 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam resolved HIVE-24076.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Fix has been merged into master.  Thank you for the review.

> MetastoreDirectSql.getDatabase() needs a space in the query
> ---
>
> Key: HIVE-24076
> URL: https://issues.apache.org/jira/browse/HIVE-24076
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> String queryTextDbSelector= "select "
>   + "\"DB_ID\", \"NAME\", \"DB_LOCATION_URI\", \"DESC\", "
>   + "\"OWNER_NAME\", \"OWNER_TYPE\", \"CTLG_NAME\" , \"CREATE_TIME\", 
> \"DB_MANAGED_LOCATION_URI\""
>   + "FROM "+ DBS
> There needs to be a space before FROM so the query is right. Currently it 
> falls back to JDO, so not lapse in functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24076) MetastoreDirectSql.getDatabase() needs a space in the query

2020-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24076?focusedWorklogId=474972=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474972
 ]

ASF GitHub Bot logged work on HIVE-24076:
-

Author: ASF GitHub Bot
Created on: 26/Aug/20 19:28
Start Date: 26/Aug/20 19:28
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on pull request #1433:
URL: https://github.com/apache/hive/pull/1433#issuecomment-681078499


   Thank you for the review @yongzhi 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 474972)
Time Spent: 20m  (was: 10m)

> MetastoreDirectSql.getDatabase() needs a space in the query
> ---
>
> Key: HIVE-24076
> URL: https://issues.apache.org/jira/browse/HIVE-24076
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> String queryTextDbSelector= "select "
>   + "\"DB_ID\", \"NAME\", \"DB_LOCATION_URI\", \"DESC\", "
>   + "\"OWNER_NAME\", \"OWNER_TYPE\", \"CTLG_NAME\" , \"CREATE_TIME\", 
> \"DB_MANAGED_LOCATION_URI\""
>   + "FROM "+ DBS
> There needs to be a space before FROM so the query is right. Currently it 
> falls back to JDO, so not lapse in functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23971) Cleanup unreleased method signatures in IMetastoreClient

2020-08-26 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17185408#comment-17185408
 ] 

Vihang Karajgaonkar commented on HIVE-23971:


Also affects getTables API. Bumping this up to blocker so that we make sure we 
fix this before releasing next version of Hive.

> Cleanup unreleased method signatures in IMetastoreClient
> 
>
> Key: HIVE-23971
> URL: https://issues.apache.org/jira/browse/HIVE-23971
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Blocker
>
> There are many methods in IMetastoreClient which are simply wrappers around 
> another method. The code has become very intertwined and needs some cleanup. 
> For instance, I see the following variations of {{getPartitionsByNames}} in 
> {{IMetastoreClient}} 
> {noformat}
> List getPartitionsByNames(String db_name, String tbl_name, 
> List part_names, boolean getColStats, String engine)
> List getPartitionsByNames(String catName, String db_name, String 
> tbl_name, List part_names)
> List getPartitionsByNames(String catName, String db_name, String 
> tbl_name, List part_names, boolean getColStats, String engine)
> {noformat}
> The problem seems be that every time a new field is added to the request 
> object {{GetPartitionsByNamesRequest}} and new variant is introduced in 
> IMetastoreClient. Many of these methods are not released yet and it would be 
> good to clean them up by using the request object as method argument instead 
> of individual fields. Once we release we will not be able to change the 
> method signatures since we annotate IMetastoreClient as public API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23971) Cleanup unreleased method signatures in IMetastoreClient

2020-08-26 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-23971:
---
Priority: Blocker  (was: Major)

> Cleanup unreleased method signatures in IMetastoreClient
> 
>
> Key: HIVE-23971
> URL: https://issues.apache.org/jira/browse/HIVE-23971
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Blocker
>
> There are many methods in IMetastoreClient which are simply wrappers around 
> another method. The code has become very intertwined and needs some cleanup. 
> For instance, I see the following variations of {{getPartitionsByNames}} in 
> {{IMetastoreClient}} 
> {noformat}
> List getPartitionsByNames(String db_name, String tbl_name, 
> List part_names, boolean getColStats, String engine)
> List getPartitionsByNames(String catName, String db_name, String 
> tbl_name, List part_names)
> List getPartitionsByNames(String catName, String db_name, String 
> tbl_name, List part_names, boolean getColStats, String engine)
> {noformat}
> The problem seems be that every time a new field is added to the request 
> object {{GetPartitionsByNamesRequest}} and new variant is introduced in 
> IMetastoreClient. Many of these methods are not released yet and it would be 
> good to clean them up by using the request object as method argument instead 
> of individual fields. Once we release we will not be able to change the 
> method signatures since we annotate IMetastoreClient as public API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24020) Automatic Compaction not working in existing partitions for Streaming Ingest with Dynamic Partition

2020-08-26 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17185392#comment-17185392
 ] 

Prasanth Jayachandran commented on HIVE-24020:
--

Merged to master. Thanks [~vpnvishv] !

> Automatic Compaction not working in existing partitions for Streaming Ingest 
> with Dynamic Partition
> ---
>
> Key: HIVE-24020
> URL: https://issues.apache.org/jira/browse/HIVE-24020
> Project: Hive
>  Issue Type: Bug
>  Components: Streaming, Transactions
>Affects Versions: 4.0.0, 3.1.2
>Reporter: Vipin Vishvkarma
>Assignee: Vipin Vishvkarma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This issue happens when we try to do streaming ingest with dynamic partition 
> on already existing partitions. I checked in the code, we have following 
> check in the AbstractRecordWriter.
>  
> {code:java}
> PartitionInfo partitionInfo = 
> conn.createPartitionIfNotExists(partitionValues);
> // collect the newly added partitions. connection.commitTransaction() will 
> report the dynamically added
> // partitions to TxnHandler
> if (!partitionInfo.isExists()) {
>   addedPartitions.add(partitionInfo.getName());
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Partition {} already exists for table {}",
> partitionInfo.getName(), fullyQualifiedTableName);
>   }
> }
> {code}
> Above *addedPartitions* is passed to *addDynamicPartitions* during 
> TransactionBatch commit. So in case of already existing partitions, 
> *addedPartitions* will be empty and *addDynamicPartitions* **will not move 
> entries from TXN_COMPONENTS to COMPLETED_TXN_COMPONENTS. This results in 
> Initiator not able to trigger auto compaction.
> Another issue which has been observed is, we are not clearing 
> *addedPartitions* on writer close, which results in information flowing 
> across transactions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24061) Improve llap task scheduling for better cache hit rate

2020-08-26 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-24061.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Merged to master. Thanks [~rajesh.balamohan] !

> Improve llap task scheduling for better cache hit rate 
> ---
>
> Key: HIVE-24061
> URL: https://issues.apache.org/jira/browse/HIVE-24061
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
>  Labels: perfomance, pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> TaskInfo is initialized with the "requestTime and locality delay". When lots 
> of vertices are in the same level, "taskInfo" details would be available 
> upfront. By the time, it gets to scheduling, "requestTime + localityDelay" 
> won't be higher than current time. Due to this, it misses scheduling delay 
> details and ends up choosing random node. This ends up missing cache hits and 
> reads data from remote storage.
> E.g Observed this pattern in Q75 of tpcds.
> Related lines of interest in scheduler: 
> [https://github.com/apache/hive/blob/master/llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
>  
> |https://github.com/apache/hive/blob/master/llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java]
> {code:java}
>boolean shouldDelayForLocality = 
> request.shouldDelayForLocality(schedulerAttemptTime);
> ..
> ..
> boolean shouldDelayForLocality(long schedulerAttemptTime) {
>   return localityDelayTimeout > schedulerAttemptTime;
> }
> {code}
>  
> Ideally, "localityDelayTimeout" should be adjusted based on it's first 
> scheduling opportunity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24061) Improve llap task scheduling for better cache hit rate

2020-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24061?focusedWorklogId=474928=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474928
 ]

ASF GitHub Bot logged work on HIVE-24061:
-

Author: ASF GitHub Bot
Created on: 26/Aug/20 17:52
Start Date: 26/Aug/20 17:52
Worklog Time Spent: 10m 
  Work Description: prasanthj merged pull request #1431:
URL: https://github.com/apache/hive/pull/1431


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 474928)
Time Spent: 50m  (was: 40m)

> Improve llap task scheduling for better cache hit rate 
> ---
>
> Key: HIVE-24061
> URL: https://issues.apache.org/jira/browse/HIVE-24061
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
>  Labels: perfomance, pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> TaskInfo is initialized with the "requestTime and locality delay". When lots 
> of vertices are in the same level, "taskInfo" details would be available 
> upfront. By the time, it gets to scheduling, "requestTime + localityDelay" 
> won't be higher than current time. Due to this, it misses scheduling delay 
> details and ends up choosing random node. This ends up missing cache hits and 
> reads data from remote storage.
> E.g Observed this pattern in Q75 of tpcds.
> Related lines of interest in scheduler: 
> [https://github.com/apache/hive/blob/master/llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
>  
> |https://github.com/apache/hive/blob/master/llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java]
> {code:java}
>boolean shouldDelayForLocality = 
> request.shouldDelayForLocality(schedulerAttemptTime);
> ..
> ..
> boolean shouldDelayForLocality(long schedulerAttemptTime) {
>   return localityDelayTimeout > schedulerAttemptTime;
> }
> {code}
>  
> Ideally, "localityDelayTimeout" should be adjusted based on it's first 
> scheduling opportunity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24020) Automatic Compaction not working in existing partitions for Streaming Ingest with Dynamic Partition

2020-08-26 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-24020.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

> Automatic Compaction not working in existing partitions for Streaming Ingest 
> with Dynamic Partition
> ---
>
> Key: HIVE-24020
> URL: https://issues.apache.org/jira/browse/HIVE-24020
> Project: Hive
>  Issue Type: Bug
>  Components: Streaming, Transactions
>Affects Versions: 4.0.0, 3.1.2
>Reporter: Vipin Vishvkarma
>Assignee: Vipin Vishvkarma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This issue happens when we try to do streaming ingest with dynamic partition 
> on already existing partitions. I checked in the code, we have following 
> check in the AbstractRecordWriter.
>  
> {code:java}
> PartitionInfo partitionInfo = 
> conn.createPartitionIfNotExists(partitionValues);
> // collect the newly added partitions. connection.commitTransaction() will 
> report the dynamically added
> // partitions to TxnHandler
> if (!partitionInfo.isExists()) {
>   addedPartitions.add(partitionInfo.getName());
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Partition {} already exists for table {}",
> partitionInfo.getName(), fullyQualifiedTableName);
>   }
> }
> {code}
> Above *addedPartitions* is passed to *addDynamicPartitions* during 
> TransactionBatch commit. So in case of already existing partitions, 
> *addedPartitions* will be empty and *addDynamicPartitions* **will not move 
> entries from TXN_COMPONENTS to COMPLETED_TXN_COMPONENTS. This results in 
> Initiator not able to trigger auto compaction.
> Another issue which has been observed is, we are not clearing 
> *addedPartitions* on writer close, which results in information flowing 
> across transactions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24068) Add re-execution plugin for handling DAG submission and unmanaged AM failures

2020-08-26 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-24068:
-
Fix Version/s: 4.0.0

> Add re-execution plugin for handling DAG submission and unmanaged AM failures
> -
>
> Key: HIVE-24068
> URL: https://issues.apache.org/jira/browse/HIVE-24068
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> DAG submission failure can also happen in environments where AM container 
> died causing DNS issues. DAG submissions are safe to retry as the DAG hasn't 
> started execution yet. There are retries at getSession and submitDAG level 
> individually but some submitDAG failure has to retry getSession as well as AM 
> could be unreachable, this can be handled in re-execution plugin.
> There is already AM loss retry execution plugin but it only handles managed 
> AMs. It can be extended to handle unmanaged AMs as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24020) Automatic Compaction not working in existing partitions for Streaming Ingest with Dynamic Partition

2020-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24020?focusedWorklogId=474927=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474927
 ]

ASF GitHub Bot logged work on HIVE-24020:
-

Author: ASF GitHub Bot
Created on: 26/Aug/20 17:51
Start Date: 26/Aug/20 17:51
Worklog Time Spent: 10m 
  Work Description: prasanthj merged pull request #1382:
URL: https://github.com/apache/hive/pull/1382


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 474927)
Time Spent: 1h  (was: 50m)

> Automatic Compaction not working in existing partitions for Streaming Ingest 
> with Dynamic Partition
> ---
>
> Key: HIVE-24020
> URL: https://issues.apache.org/jira/browse/HIVE-24020
> Project: Hive
>  Issue Type: Bug
>  Components: Streaming, Transactions
>Affects Versions: 4.0.0, 3.1.2
>Reporter: Vipin Vishvkarma
>Assignee: Vipin Vishvkarma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This issue happens when we try to do streaming ingest with dynamic partition 
> on already existing partitions. I checked in the code, we have following 
> check in the AbstractRecordWriter.
>  
> {code:java}
> PartitionInfo partitionInfo = 
> conn.createPartitionIfNotExists(partitionValues);
> // collect the newly added partitions. connection.commitTransaction() will 
> report the dynamically added
> // partitions to TxnHandler
> if (!partitionInfo.isExists()) {
>   addedPartitions.add(partitionInfo.getName());
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Partition {} already exists for table {}",
> partitionInfo.getName(), fullyQualifiedTableName);
>   }
> }
> {code}
> Above *addedPartitions* is passed to *addDynamicPartitions* during 
> TransactionBatch commit. So in case of already existing partitions, 
> *addedPartitions* will be empty and *addDynamicPartitions* **will not move 
> entries from TXN_COMPONENTS to COMPLETED_TXN_COMPONENTS. This results in 
> Initiator not able to trigger auto compaction.
> Another issue which has been observed is, we are not clearing 
> *addedPartitions* on writer close, which results in information flowing 
> across transactions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24068) Add re-execution plugin for handling DAG submission and unmanaged AM failures

2020-08-26 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-24068.
--
Resolution: Fixed

PR merged to master. Thanks [~kgyrtkirk]  for the review!

> Add re-execution plugin for handling DAG submission and unmanaged AM failures
> -
>
> Key: HIVE-24068
> URL: https://issues.apache.org/jira/browse/HIVE-24068
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> DAG submission failure can also happen in environments where AM container 
> died causing DNS issues. DAG submissions are safe to retry as the DAG hasn't 
> started execution yet. There are retries at getSession and submitDAG level 
> individually but some submitDAG failure has to retry getSession as well as AM 
> could be unreachable, this can be handled in re-execution plugin.
> There is already AM loss retry execution plugin but it only handles managed 
> AMs. It can be extended to handle unmanaged AMs as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24068) Add re-execution plugin for handling DAG submission and unmanaged AM failures

2020-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24068?focusedWorklogId=474926=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474926
 ]

ASF GitHub Bot logged work on HIVE-24068:
-

Author: ASF GitHub Bot
Created on: 26/Aug/20 17:48
Start Date: 26/Aug/20 17:48
Worklog Time Spent: 10m 
  Work Description: prasanthj merged pull request #1428:
URL: https://github.com/apache/hive/pull/1428


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 474926)
Time Spent: 40m  (was: 0.5h)

> Add re-execution plugin for handling DAG submission and unmanaged AM failures
> -
>
> Key: HIVE-24068
> URL: https://issues.apache.org/jira/browse/HIVE-24068
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> DAG submission failure can also happen in environments where AM container 
> died causing DNS issues. DAG submissions are safe to retry as the DAG hasn't 
> started execution yet. There are retries at getSession and submitDAG level 
> individually but some submitDAG failure has to retry getSession as well as AM 
> could be unreachable, this can be handled in re-execution plugin.
> There is already AM loss retry execution plugin but it only handles managed 
> AMs. It can be extended to handle unmanaged AMs as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24059) Llap external client - Initial changes for running in cloud environment

2020-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24059?focusedWorklogId=474913=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474913
 ]

ASF GitHub Bot logged work on HIVE-24059:
-

Author: ASF GitHub Bot
Created on: 26/Aug/20 17:31
Start Date: 26/Aug/20 17:31
Worklog Time Spent: 10m 
  Work Description: prasanthj commented on a change in pull request #1418:
URL: https://github.com/apache/hive/pull/1418#discussion_r477463702



##
File path: 
llap-common/src/java/org/apache/hadoop/hive/llap/security/ConfBasedJwtSharedSecretProvider.java
##
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.llap.security;
+
+import com.google.common.base.Preconditions;
+import io.jsonwebtoken.security.Keys;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+
+import java.security.Key;
+
+/**
+ * Default implementation of {@link JwtSecretProvider}.
+ * It uses the same encryption and decryption secret which can be used to sign 
and verify JWT.
+ */
+public class ConfBasedJwtSharedSecretProvider implements JwtSecretProvider {
+
+  private Key jwtEncryptionKey;
+
+  @Override public Key getEncryptionSecret() {
+return jwtEncryptionKey;
+  }
+
+  @Override public Key getDecryptionSecret() {
+return jwtEncryptionKey;
+  }
+
+  @Override public void init(final Configuration conf) {
+final String sharedSecret = HiveConf.getVar(conf, 
HiveConf.ConfVars.LLAP_EXTERNAL_CLIENT_CLOUD_JWT_SHARED_SECRET);

Review comment:
   Use conf.getPassword(). This should be fetched from jceks file. 

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4880,6 +4880,22 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 
LLAP_EXTERNAL_CLIENT_USE_HYBRID_CALENDAR("hive.llap.external.client.use.hybrid.calendar",
 false,
 "Whether to use hybrid calendar for parsing of data/timestamps."),
+
+// confs for llap-external-client cloud deployment
+
LLAP_EXTERNAL_CLIENT_CLOUD_RPC_PORT("hive.llap.external.client.cloud.rpc.port", 
30004,
+"The LLAP daemon RPC port for external clients when llap is running in 
cloud environment."),
+
LLAP_EXTERNAL_CLIENT_CLOUD_OUTPUT_SERVICE_PORT("hive.llap.external.client.cloud.output.service.port",
 30005,
+"LLAP output service port when llap is running in cloud 
environment"),
+LLAP_EXTERNAL_CLIENT_CLOUD_JWT_SHARED_SECRET_PROVIDER(
+"hive.llap.external.client.cloud.jwt.shared.secret.provider",
+
"org.apache.hadoop.hive.llap.security.ConfBasedJwtSharedSecretProvider",
+"Shared secret provider to be used to sign JWT"),
+
LLAP_EXTERNAL_CLIENT_CLOUD_JWT_SHARED_SECRET("hive.llap.external.client.cloud.jwt.shared.secret",
+"Let me give you this secret and you will get the access!",

Review comment:
   may be keep the default value empty. The system deploying this would 
have to randomly generate and store it in jceks file. empty default value and 
fail early instead of falling back to default. 

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4880,6 +4880,22 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 
LLAP_EXTERNAL_CLIENT_USE_HYBRID_CALENDAR("hive.llap.external.client.use.hybrid.calendar",
 false,
 "Whether to use hybrid calendar for parsing of data/timestamps."),
+
+// confs for llap-external-client cloud deployment
+
LLAP_EXTERNAL_CLIENT_CLOUD_RPC_PORT("hive.llap.external.client.cloud.rpc.port", 
30004,
+"The LLAP daemon RPC port for external clients when llap is running in 
cloud environment."),
+
LLAP_EXTERNAL_CLIENT_CLOUD_OUTPUT_SERVICE_PORT("hive.llap.external.client.cloud.output.service.port",
 30005,
+"LLAP output service port when llap is running in cloud 
environment"),
+LLAP_EXTERNAL_CLIENT_CLOUD_JWT_SHARED_SECRET_PROVIDER(
+"hive.llap.external.client.cloud.jwt.shared.secret.provider",
+

[jira] [Work logged] (HIVE-24020) Automatic Compaction not working in existing partitions for Streaming Ingest with Dynamic Partition

2020-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24020?focusedWorklogId=474904=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474904
 ]

ASF GitHub Bot logged work on HIVE-24020:
-

Author: ASF GitHub Bot
Created on: 26/Aug/20 17:21
Start Date: 26/Aug/20 17:21
Worklog Time Spent: 10m 
  Work Description: vpnvishv commented on pull request #1382:
URL: https://github.com/apache/hive/pull/1382#issuecomment-681015240


   @pvary @prasanthj  Can you please merge this PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 474904)
Time Spent: 50m  (was: 40m)

> Automatic Compaction not working in existing partitions for Streaming Ingest 
> with Dynamic Partition
> ---
>
> Key: HIVE-24020
> URL: https://issues.apache.org/jira/browse/HIVE-24020
> Project: Hive
>  Issue Type: Bug
>  Components: Streaming, Transactions
>Affects Versions: 4.0.0, 3.1.2
>Reporter: Vipin Vishvkarma
>Assignee: Vipin Vishvkarma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This issue happens when we try to do streaming ingest with dynamic partition 
> on already existing partitions. I checked in the code, we have following 
> check in the AbstractRecordWriter.
>  
> {code:java}
> PartitionInfo partitionInfo = 
> conn.createPartitionIfNotExists(partitionValues);
> // collect the newly added partitions. connection.commitTransaction() will 
> report the dynamically added
> // partitions to TxnHandler
> if (!partitionInfo.isExists()) {
>   addedPartitions.add(partitionInfo.getName());
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Partition {} already exists for table {}",
> partitionInfo.getName(), fullyQualifiedTableName);
>   }
> }
> {code}
> Above *addedPartitions* is passed to *addDynamicPartitions* during 
> TransactionBatch commit. So in case of already existing partitions, 
> *addedPartitions* will be empty and *addDynamicPartitions* **will not move 
> entries from TXN_COMPONENTS to COMPLETED_TXN_COMPONENTS. This results in 
> Initiator not able to trigger auto compaction.
> Another issue which has been observed is, we are not clearing 
> *addedPartitions* on writer close, which results in information flowing 
> across transactions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24035) Add Jenkinsfile for branch-2.3

2020-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24035?focusedWorklogId=474875=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474875
 ]

ASF GitHub Bot logged work on HIVE-24035:
-

Author: ASF GitHub Bot
Created on: 26/Aug/20 16:12
Start Date: 26/Aug/20 16:12
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #1398:
URL: https://github.com/apache/hive/pull/1398#issuecomment-680977910


   Yeah I remember seeing this in either branch-2 or branch-2.3 test failures 
before, but this time the number is much higher. Let me see if I can reproduce 
this locally.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 474875)
Time Spent: 1h  (was: 50m)

> Add Jenkinsfile for branch-2.3
> --
>
> Key: HIVE-24035
> URL: https://issues.apache.org/jira/browse/HIVE-24035
> Project: Hive
>  Issue Type: Test
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> To enable precommit tests for github PR, we need to have a Jenkinsfile in the 
> repo. This is already done for master and branch-2. This adds the same for 
> branch-2.3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22758) Create database with permission error when doas set to true

2020-08-26 Thread Chiran Ravani (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17185297#comment-17185297
 ] 

Chiran Ravani commented on HIVE-22758:
--

[~ngangam] You are right, this is not committed in upstream, however HIVE-20001 
has been backported for HDP 3.x release.

> Create database with permission error when doas set to true
> ---
>
> Key: HIVE-22758
> URL: https://issues.apache.org/jira/browse/HIVE-22758
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Critical
> Attachments: HIVE-22758.1.patch
>
>
> With doAs set to true, running create database on external location fails 
> with permission denied for write access on the directory for hive user (User 
> HMS is running as).
> Steps to reproduce the issue:
> 1. Turn on, Hive run as end-user to true.
> 2. Connect to hive as some user other than admin, eg:- chiran
> 3. Create a database with external location
> {code}
> create database externaldbexample location '/user/chiran/externaldbexample'
> {code}
> The above statement fails as write access is not available to hive service 
> user on HDFS as below.
> {code}
> > create database externaldbexample location '/user/chiran/externaldbexample';
> INFO  : Compiling 
> command(queryId=hive_20200122043626_5c95e1fd-ce00-45fd-b58d-54f5e579f87d): 
> create database externaldbexample location '/user/chiran/externaldbexample'
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20200122043626_5c95e1fd-ce00-45fd-b58d-54f5e579f87d); 
> Time taken: 1.377 seconds
> INFO  : Executing 
> command(queryId=hive_20200122043626_5c95e1fd-ce00-45fd-b58d-54f5e579f87d): 
> create database externaldbexample location '/user/chiran/externaldbexample'
> INFO  : Starting task [Stage-0:DDL] in serial mode
> ERROR : FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. 
> MetaException(message:java.lang.reflect.UndeclaredThrowableException)
> INFO  : Completed executing 
> command(queryId=hive_20200122043626_5c95e1fd-ce00-45fd-b58d-54f5e579f87d); 
> Time taken: 0.238 seconds
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
> MetaException(message:java.lang.reflect.UndeclaredThrowableException) 
> (state=08S01,code=1)
> {code}
> From Hive Metastore service log, below is seen.
> {code}
> 2020-01-22T04:36:27,870 WARN  [pool-6-thread-6]: metastore.ObjectStore 
> (ObjectStore.java:getDatabase(1010)) - Failed to get database 
> hive.externaldbexample, returning NoSuchObjectExcept
> ion
> 2020-01-22T04:36:27,898 INFO  [pool-6-thread-6]: metastore.HiveMetaStore 
> (HiveMetaStore.java:run(1339)) - Creating database path in managed directory 
> hdfs://c470-node2.squadron.support.
> hortonworks.com:8020/user/chiran/externaldbexample
> 2020-01-22T04:36:27,903 INFO  [pool-6-thread-6]: utils.FileUtils 
> (FileUtils.java:mkdir(170)) - Creating directory if it doesn't exist: 
> hdfs://namenodeaddress:8020/user/chiran/externaldbexample
> 2020-01-22T04:36:27,932 ERROR [pool-6-thread-6]: utils.MetaStoreUtils 
> (MetaStoreUtils.java:logAndThrowMetaException(169)) - Got exception: 
> org.apache.hadoop.security.AccessControlException Permission denied: 
> user=hive, access=WRITE, inode="/user/chiran":chiran:chiran:drwxr-xr-x
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:399)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:255)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:193)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1859)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1843)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1802)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:59)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3150)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1126)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:707)
> at 
> 

[jira] [Work logged] (HIVE-24035) Add Jenkinsfile for branch-2.3

2020-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24035?focusedWorklogId=474873=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474873
 ]

ASF GitHub Bot logged work on HIVE-24035:
-

Author: ASF GitHub Bot
Created on: 26/Aug/20 16:04
Start Date: 26/Aug/20 16:04
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #1398:
URL: https://github.com/apache/hive/pull/1398#issuecomment-680972857


   there are a lot of failures connected to some calcite CNF issues I would 
guess that either multiple versions of calcite is present via shading/etc - on 
the master once there were some issues because druid also has a calcite



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 474873)
Time Spent: 50m  (was: 40m)

> Add Jenkinsfile for branch-2.3
> --
>
> Key: HIVE-24035
> URL: https://issues.apache.org/jira/browse/HIVE-24035
> Project: Hive
>  Issue Type: Test
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> To enable precommit tests for github PR, we need to have a Jenkinsfile in the 
> repo. This is already done for master and branch-2. This adds the same for 
> branch-2.3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24073) Execution exception in sort-merge semijoin

2020-08-26 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24073:
---
Description: 
Working on HIVE-24041, we trigger an additional SJ conversion that leads to 
this exception at execution time:

{code}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
nextKeyWritables[1]
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1063)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:685)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:462)
... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
nextKeyWritables[1]
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1037)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1060)
... 22 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to 
overwrite nextKeyWritables[1]
at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.processKey(CommonMergeJoinOperator.java:564)
at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:243)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887)
at 
org.apache.hadoop.hive.ql.exec.TezDummyStoreOperator.process(TezDummyStoreOperator.java:49)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1003)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1020)
... 23 more
{code}

To reproduce, just set {{hive.auto.convert.sortmerge.join}} to {{true}} in the 
last query in {{auto_sortmerge_join_10.q}} after HIVE-24041 has been merged.

  was:
Working on HIVE-24001, we trigger an additional SJ conversion that leads to 
this exception at execution time:

{code}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
nextKeyWritables[1]
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1063)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:685)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:462)
... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
nextKeyWritables[1]
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1037)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1060)
... 22 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to 
overwrite nextKeyWritables[1]
at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.processKey(CommonMergeJoinOperator.java:564)
at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:243)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887)
at 
org.apache.hadoop.hive.ql.exec.TezDummyStoreOperator.process(TezDummyStoreOperator.java:49)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1003)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1020)
... 23 more
{code}

To reproduce, just set {{hive.auto.convert.sortmerge.join}} to {{true}} in the 
last query in {{auto_sortmerge_join_10.q}} after HIVE-24041 has been merged.


> Execution exception in sort-merge semijoin
> --
>
> Key: HIVE-24073
> URL: https://issues.apache.org/jira/browse/HIVE-24073
> Project: Hive
>  Issue Type: Bug
>  

[jira] [Work logged] (HIVE-24035) Add Jenkinsfile for branch-2.3

2020-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24035?focusedWorklogId=474868=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474868
 ]

ASF GitHub Bot logged work on HIVE-24035:
-

Author: ASF GitHub Bot
Created on: 26/Aug/20 15:45
Start Date: 26/Aug/20 15:45
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #1398:
URL: https://github.com/apache/hive/pull/1398#issuecomment-680961898


   Thanks @kgyrtkirk ! I got the error message a few times but this times seem 
the tests finished in a few hours, which is great. Lots of tests failed though 
so I'll take a look at those. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 474868)
Time Spent: 40m  (was: 0.5h)

> Add Jenkinsfile for branch-2.3
> --
>
> Key: HIVE-24035
> URL: https://issues.apache.org/jira/browse/HIVE-24035
> Project: Hive
>  Issue Type: Test
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> To enable precommit tests for github PR, we need to have a Jenkinsfile in the 
> repo. This is already done for master and branch-2. This adds the same for 
> branch-2.3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24064) Disable Materialized View Replication

2020-08-26 Thread Arko Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arko Sharma updated HIVE-24064:
---
Attachment: HIVE-24064.04.patch

> Disable Materialized View Replication
> -
>
> Key: HIVE-24064
> URL: https://issues.apache.org/jira/browse/HIVE-24064
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24064.01.patch, HIVE-24064.02.patch, 
> HIVE-24064.03.patch, HIVE-24064.04.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-16352) Ability to skip or repair out of sync blocks with HIVE at runtime

2020-08-26 Thread Navdeep Poonia (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-16352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17185231#comment-17185231
 ] 

Navdeep Poonia commented on HIVE-16352:
---

I agree with [~belugabehr], this does not seem to be a very generic issue 
happening very often. As an alternative, avro repair util can be used for one 
time data fix and investigate the cause for avro block sync. 

> Ability to skip or repair out of sync blocks with HIVE at runtime
> -
>
> Key: HIVE-16352
> URL: https://issues.apache.org/jira/browse/HIVE-16352
> Project: Hive
>  Issue Type: New Feature
>Reporter: Navdeep Poonia
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When a file is corrupted it raises the error java.io.IOException: Invalid 
> sync! with hive.
>  Can we have some functionality to skip or repair such blocks at runtime to 
> make avro more error resilient in case of data corruption.
>  Error: java.io.IOException: java.io.IOException: java.io.IOException: While 
> processing file 
> s3n:///navdeepp/warehouse/avro_test/354dc34474404f4bbc0d8013fc8e6e4b_42.
>  java.io.IOException: Invalid sync!
>  at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>  at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>  at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:334)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-16352) Ability to skip or repair out of sync blocks with HIVE at runtime

2020-08-26 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-16352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17185214#comment-17185214
 ] 

David Mollitor commented on HIVE-16352:
---

I am not in favor of this.  The worst kind of Hive error is a query that 
completes without error but with invalid results.  Better to just have the 
query fail, provide a clear error message and have the operator clear the bad 
Avro file manually and investigate why the file is corrupt.

> Ability to skip or repair out of sync blocks with HIVE at runtime
> -
>
> Key: HIVE-16352
> URL: https://issues.apache.org/jira/browse/HIVE-16352
> Project: Hive
>  Issue Type: New Feature
>Reporter: Navdeep Poonia
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When a file is corrupted it raises the error java.io.IOException: Invalid 
> sync! with hive.
>  Can we have some functionality to skip or repair such blocks at runtime to 
> make avro more error resilient in case of data corruption.
>  Error: java.io.IOException: java.io.IOException: java.io.IOException: While 
> processing file 
> s3n:///navdeepp/warehouse/avro_test/354dc34474404f4bbc0d8013fc8e6e4b_42.
>  java.io.IOException: Invalid sync!
>  at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>  at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>  at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:334)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22224) Support Parquet-Avro Timestamp Type

2020-08-26 Thread Felix Kizhakkel Jose (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17185208#comment-17185208
 ] 

Felix Kizhakkel Jose commented on HIVE-4:
-

[~bdscheller] [~chenxiang] Any update on this? What is the solution?

> Support Parquet-Avro Timestamp Type
> ---
>
> Key: HIVE-4
> URL: https://issues.apache.org/jira/browse/HIVE-4
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 2.3.5, 2.3.6
>Reporter: cdmikechen
>Assignee: cdmikechen
>Priority: Major
>  Labels: parquet, pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When user create an external table and import a parquet-avro data with 1.8.2 
> version which supported logical_type in Hive2.3 or before version, Hive can 
> not read timestamp type column data correctly.
> Hive will read it as LongWritable which it actually stores as 
> long(logical_type=timestamp-millis).So we may add some codes in 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableTimestampObjectInspector.java
>  to let Hive cast long type to timestamp type.
> Some code like below:
>  
> public Timestamp getPrimitiveJavaObject(Object o) {
>   if (o instanceof LongWritable) {
>     return new Timestamp(((LongWritable) o).get());
>   }
>   return o == null ? null : ((TimestampWritable) o).getTimestamp();
> }
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-16352) Ability to skip or repair out of sync blocks with HIVE at runtime

2020-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-16352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-16352:
--
Labels: pull-request-available  (was: )

> Ability to skip or repair out of sync blocks with HIVE at runtime
> -
>
> Key: HIVE-16352
> URL: https://issues.apache.org/jira/browse/HIVE-16352
> Project: Hive
>  Issue Type: New Feature
>Reporter: Navdeep Poonia
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When a file is corrupted it raises the error java.io.IOException: Invalid 
> sync! with hive.
>  Can we have some functionality to skip or repair such blocks at runtime to 
> make avro more error resilient in case of data corruption.
>  Error: java.io.IOException: java.io.IOException: java.io.IOException: While 
> processing file 
> s3n:///navdeepp/warehouse/avro_test/354dc34474404f4bbc0d8013fc8e6e4b_42.
>  java.io.IOException: Invalid sync!
>  at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>  at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>  at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:334)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-16352) Ability to skip or repair out of sync blocks with HIVE at runtime

2020-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-16352?focusedWorklogId=474808=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474808
 ]

ASF GitHub Bot logged work on HIVE-16352:
-

Author: ASF GitHub Bot
Created on: 26/Aug/20 13:40
Start Date: 26/Aug/20 13:40
Worklog Time Spent: 10m 
  Work Description: gabrywu opened a new pull request #1434:
URL: https://github.com/apache/hive/pull/1434


   ### What changes were proposed in this pull request?
   1. add AvroGenericRecordReader.nextRecord
   2. optimize AvroGenericRecordReader.next adding ability to skip invalid sync 
blocks
   3. add enum value AVRO_SERDE_ERROR_SKIP to AvroSerdeUtils.AvroTableProperties
   
   ### Why are the changes needed?
   
   when reading the Avro file which has a bad file format in Hive, we want to 
skip the invalid sync errors simply
   https://issues.apache.org/jira/browse/HIVE-16352
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   NO. The default value of AVRO_SERDE_ERROR_SKIP is false keeping the original 
logic
   
   ### How was this patch tested?
   
   add unit test cases in TestAvroGenericRecordReader.class
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 474808)
Remaining Estimate: 0h
Time Spent: 10m

> Ability to skip or repair out of sync blocks with HIVE at runtime
> -
>
> Key: HIVE-16352
> URL: https://issues.apache.org/jira/browse/HIVE-16352
> Project: Hive
>  Issue Type: New Feature
>Reporter: Navdeep Poonia
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When a file is corrupted it raises the error java.io.IOException: Invalid 
> sync! with hive.
>  Can we have some functionality to skip or repair such blocks at runtime to 
> make avro more error resilient in case of data corruption.
>  Error: java.io.IOException: java.io.IOException: java.io.IOException: While 
> processing file 
> s3n:///navdeepp/warehouse/avro_test/354dc34474404f4bbc0d8013fc8e6e4b_42.
>  java.io.IOException: Invalid sync!
>  at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>  at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>  at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:334)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24065) Bloom filters can be cached after deserialization in VectorInBloomFilterColDynamicValue

2020-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24065?focusedWorklogId=474788=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474788
 ]

ASF GitHub Bot logged work on HIVE-24065:
-

Author: ASF GitHub Bot
Created on: 26/Aug/20 12:31
Start Date: 26/Aug/20 12:31
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #1423:
URL: https://github.com/apache/hive/pull/1423#issuecomment-680849960


   could you please take a look @rbalamohan ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 474788)
Time Spent: 1h 10m  (was: 1h)

> Bloom filters can be cached after deserialization in 
> VectorInBloomFilterColDynamicValue
> ---
>
> Key: HIVE-24065
> URL: https://issues.apache.org/jira/browse/HIVE-24065
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2020-08-05-10-05-25-080.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Same bloom filter is loaded multiple times across tasks. It would be good to 
> check if we can optimise this, to avoid deserializing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-08-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor resolved HIVE-23880.
-
Resolution: Fixed

> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: lipwig-output3605036885489193068.svg
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> Merging bloom filters in semijoin reduction can become the main bottleneck in 
> case of large number of source mapper tasks (~1000, Map 1 in below example) 
> and a large amount of expected entries (50M) in bloom filters.
> For example in TPCDS Q93:
> {code}
> select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
> ss_customer_sk
> ,sum(act_sales) sumsales
>   from (select ss_item_sk
>   ,ss_ticket_number
>   ,ss_customer_sk
>   ,case when sr_return_quantity is not null then 
> (ss_quantity-sr_return_quantity)*ss_sales_price
> else 
> (ss_quantity*ss_sales_price) end act_sales
> from store_sales left outer join store_returns on (sr_item_sk = 
> ss_item_sk
>and 
> sr_ticket_number = ss_ticket_number)
> ,reason
> where sr_reason_sk = r_reason_sk
>   and r_reason_desc = 'reason 66') t
>   group by ss_customer_sk
>   order by sumsales, ss_customer_sk
> limit 100;
> {code}
> On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 
> mins are spent with merging bloom filters (Reducer 2), as in:  
> [^lipwig-output3605036885489193068.svg] 
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 3 ..  llap SUCCEEDED  1  100  
>  0   0
> Map 1 ..  llap SUCCEEDED   1263   126300  
>  0   0
> Reducer 2 llap   RUNNING  1  010  
>  0   0
> Map 4 llap   RUNNING   6154  0  207 5947  
>  0   0
> Reducer 5 llapINITED 43  00   43  
>  0   0
> Reducer 6 llapINITED  1  001  
>  0   0
> --
> VERTICES: 02/06  [>>--] 16%   ELAPSED TIME: 149.98 s
> --
> {code}
> For example, 70M entries in bloom filter leads to a 436 465 696 bits, so 
> merging 1263 bloom filters means running ~ 1263 * 436 465 696 bitwise OR 
> operation, which is very hot codepath, but can be parallelized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-08-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-23880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17185150#comment-17185150
 ] 

László Bodor commented on HIVE-23880:
-

pushed to master, thanks for all the review and help [~pgaref], [~mustafaiman], 
[~zabetak], [~rajesh.balamohan], [~belugabehr]!


> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: lipwig-output3605036885489193068.svg
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> Merging bloom filters in semijoin reduction can become the main bottleneck in 
> case of large number of source mapper tasks (~1000, Map 1 in below example) 
> and a large amount of expected entries (50M) in bloom filters.
> For example in TPCDS Q93:
> {code}
> select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
> ss_customer_sk
> ,sum(act_sales) sumsales
>   from (select ss_item_sk
>   ,ss_ticket_number
>   ,ss_customer_sk
>   ,case when sr_return_quantity is not null then 
> (ss_quantity-sr_return_quantity)*ss_sales_price
> else 
> (ss_quantity*ss_sales_price) end act_sales
> from store_sales left outer join store_returns on (sr_item_sk = 
> ss_item_sk
>and 
> sr_ticket_number = ss_ticket_number)
> ,reason
> where sr_reason_sk = r_reason_sk
>   and r_reason_desc = 'reason 66') t
>   group by ss_customer_sk
>   order by sumsales, ss_customer_sk
> limit 100;
> {code}
> On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 
> mins are spent with merging bloom filters (Reducer 2), as in:  
> [^lipwig-output3605036885489193068.svg] 
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 3 ..  llap SUCCEEDED  1  100  
>  0   0
> Map 1 ..  llap SUCCEEDED   1263   126300  
>  0   0
> Reducer 2 llap   RUNNING  1  010  
>  0   0
> Map 4 llap   RUNNING   6154  0  207 5947  
>  0   0
> Reducer 5 llapINITED 43  00   43  
>  0   0
> Reducer 6 llapINITED  1  001  
>  0   0
> --
> VERTICES: 02/06  [>>--] 16%   ELAPSED TIME: 149.98 s
> --
> {code}
> For example, 70M entries in bloom filter leads to a 436 465 696 bits, so 
> merging 1263 bloom filters means running ~ 1263 * 436 465 696 bitwise OR 
> operation, which is very hot codepath, but can be parallelized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-08-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-23880:

Fix Version/s: 4.0.0

> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: lipwig-output3605036885489193068.svg
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> Merging bloom filters in semijoin reduction can become the main bottleneck in 
> case of large number of source mapper tasks (~1000, Map 1 in below example) 
> and a large amount of expected entries (50M) in bloom filters.
> For example in TPCDS Q93:
> {code}
> select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
> ss_customer_sk
> ,sum(act_sales) sumsales
>   from (select ss_item_sk
>   ,ss_ticket_number
>   ,ss_customer_sk
>   ,case when sr_return_quantity is not null then 
> (ss_quantity-sr_return_quantity)*ss_sales_price
> else 
> (ss_quantity*ss_sales_price) end act_sales
> from store_sales left outer join store_returns on (sr_item_sk = 
> ss_item_sk
>and 
> sr_ticket_number = ss_ticket_number)
> ,reason
> where sr_reason_sk = r_reason_sk
>   and r_reason_desc = 'reason 66') t
>   group by ss_customer_sk
>   order by sumsales, ss_customer_sk
> limit 100;
> {code}
> On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 
> mins are spent with merging bloom filters (Reducer 2), as in:  
> [^lipwig-output3605036885489193068.svg] 
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 3 ..  llap SUCCEEDED  1  100  
>  0   0
> Map 1 ..  llap SUCCEEDED   1263   126300  
>  0   0
> Reducer 2 llap   RUNNING  1  010  
>  0   0
> Map 4 llap   RUNNING   6154  0  207 5947  
>  0   0
> Reducer 5 llapINITED 43  00   43  
>  0   0
> Reducer 6 llapINITED  1  001  
>  0   0
> --
> VERTICES: 02/06  [>>--] 16%   ELAPSED TIME: 149.98 s
> --
> {code}
> For example, 70M entries in bloom filter leads to a 436 465 696 bits, so 
> merging 1263 bloom filters means running ~ 1263 * 436 465 696 bitwise OR 
> operation, which is very hot codepath, but can be parallelized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=474786=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474786
 ]

ASF GitHub Bot logged work on HIVE-23880:
-

Author: ASF GitHub Bot
Created on: 26/Aug/20 12:24
Start Date: 26/Aug/20 12:24
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #1280:
URL: https://github.com/apache/hive/pull/1280#issuecomment-680846635


   all checks passed, pushing this to master, thanks for the review for all you 
guys



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 474786)
Time Spent: 8h 40m  (was: 8.5h)

> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: lipwig-output3605036885489193068.svg
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> Merging bloom filters in semijoin reduction can become the main bottleneck in 
> case of large number of source mapper tasks (~1000, Map 1 in below example) 
> and a large amount of expected entries (50M) in bloom filters.
> For example in TPCDS Q93:
> {code}
> select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
> ss_customer_sk
> ,sum(act_sales) sumsales
>   from (select ss_item_sk
>   ,ss_ticket_number
>   ,ss_customer_sk
>   ,case when sr_return_quantity is not null then 
> (ss_quantity-sr_return_quantity)*ss_sales_price
> else 
> (ss_quantity*ss_sales_price) end act_sales
> from store_sales left outer join store_returns on (sr_item_sk = 
> ss_item_sk
>and 
> sr_ticket_number = ss_ticket_number)
> ,reason
> where sr_reason_sk = r_reason_sk
>   and r_reason_desc = 'reason 66') t
>   group by ss_customer_sk
>   order by sumsales, ss_customer_sk
> limit 100;
> {code}
> On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 
> mins are spent with merging bloom filters (Reducer 2), as in:  
> [^lipwig-output3605036885489193068.svg] 
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 3 ..  llap SUCCEEDED  1  100  
>  0   0
> Map 1 ..  llap SUCCEEDED   1263   126300  
>  0   0
> Reducer 2 llap   RUNNING  1  010  
>  0   0
> Map 4 llap   RUNNING   6154  0  207 5947  
>  0   0
> Reducer 5 llapINITED 43  00   43  
>  0   0
> Reducer 6 llapINITED  1  001  
>  0   0
> --
> VERTICES: 02/06  [>>--] 16%   ELAPSED TIME: 149.98 s
> --
> {code}
> For example, 70M entries in bloom filter leads to a 436 465 696 bits, so 
> merging 1263 bloom filters means running ~ 1263 * 436 465 696 bitwise OR 
> operation, which is very hot codepath, but can be parallelized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24076) MetastoreDirectSql.getDatabase() needs a space in the query

2020-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24076:
--
Labels: pull-request-available  (was: )

> MetastoreDirectSql.getDatabase() needs a space in the query
> ---
>
> Key: HIVE-24076
> URL: https://issues.apache.org/jira/browse/HIVE-24076
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> String queryTextDbSelector= "select "
>   + "\"DB_ID\", \"NAME\", \"DB_LOCATION_URI\", \"DESC\", "
>   + "\"OWNER_NAME\", \"OWNER_TYPE\", \"CTLG_NAME\" , \"CREATE_TIME\", 
> \"DB_MANAGED_LOCATION_URI\""
>   + "FROM "+ DBS
> There needs to be a space before FROM so the query is right. Currently it 
> falls back to JDO, so not lapse in functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=474785=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474785
 ]

ASF GitHub Bot logged work on HIVE-23880:
-

Author: ASF GitHub Bot
Created on: 26/Aug/20 12:24
Start Date: 26/Aug/20 12:24
Worklog Time Spent: 10m 
  Work Description: abstractdog closed pull request #1280:
URL: https://github.com/apache/hive/pull/1280


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 474785)
Time Spent: 8.5h  (was: 8h 20m)

> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: lipwig-output3605036885489193068.svg
>
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Merging bloom filters in semijoin reduction can become the main bottleneck in 
> case of large number of source mapper tasks (~1000, Map 1 in below example) 
> and a large amount of expected entries (50M) in bloom filters.
> For example in TPCDS Q93:
> {code}
> select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
> ss_customer_sk
> ,sum(act_sales) sumsales
>   from (select ss_item_sk
>   ,ss_ticket_number
>   ,ss_customer_sk
>   ,case when sr_return_quantity is not null then 
> (ss_quantity-sr_return_quantity)*ss_sales_price
> else 
> (ss_quantity*ss_sales_price) end act_sales
> from store_sales left outer join store_returns on (sr_item_sk = 
> ss_item_sk
>and 
> sr_ticket_number = ss_ticket_number)
> ,reason
> where sr_reason_sk = r_reason_sk
>   and r_reason_desc = 'reason 66') t
>   group by ss_customer_sk
>   order by sumsales, ss_customer_sk
> limit 100;
> {code}
> On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 
> mins are spent with merging bloom filters (Reducer 2), as in:  
> [^lipwig-output3605036885489193068.svg] 
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 3 ..  llap SUCCEEDED  1  100  
>  0   0
> Map 1 ..  llap SUCCEEDED   1263   126300  
>  0   0
> Reducer 2 llap   RUNNING  1  010  
>  0   0
> Map 4 llap   RUNNING   6154  0  207 5947  
>  0   0
> Reducer 5 llapINITED 43  00   43  
>  0   0
> Reducer 6 llapINITED  1  001  
>  0   0
> --
> VERTICES: 02/06  [>>--] 16%   ELAPSED TIME: 149.98 s
> --
> {code}
> For example, 70M entries in bloom filter leads to a 436 465 696 bits, so 
> merging 1263 bloom filters means running ~ 1263 * 436 465 696 bitwise OR 
> operation, which is very hot codepath, but can be parallelized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24076) MetastoreDirectSql.getDatabase() needs a space in the query

2020-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24076?focusedWorklogId=474787=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474787
 ]

ASF GitHub Bot logged work on HIVE-24076:
-

Author: ASF GitHub Bot
Created on: 26/Aug/20 12:24
Start Date: 26/Aug/20 12:24
Worklog Time Spent: 10m 
  Work Description: nrg4878 opened a new pull request #1433:
URL: https://github.com/apache/hive/pull/1433


   …tring (Naveen Gangam)
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 474787)
Remaining Estimate: 0h
Time Spent: 10m

> MetastoreDirectSql.getDatabase() needs a space in the query
> ---
>
> Key: HIVE-24076
> URL: https://issues.apache.org/jira/browse/HIVE-24076
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> String queryTextDbSelector= "select "
>   + "\"DB_ID\", \"NAME\", \"DB_LOCATION_URI\", \"DESC\", "
>   + "\"OWNER_NAME\", \"OWNER_TYPE\", \"CTLG_NAME\" , \"CREATE_TIME\", 
> \"DB_MANAGED_LOCATION_URI\""
>   + "FROM "+ DBS
> There needs to be a space before FROM so the query is right. Currently it 
> falls back to JDO, so not lapse in functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24076) MetastoreDirectSql.getDatabase() needs a space in the query

2020-08-26 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam reassigned HIVE-24076:



> MetastoreDirectSql.getDatabase() needs a space in the query
> ---
>
> Key: HIVE-24076
> URL: https://issues.apache.org/jira/browse/HIVE-24076
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
>
> String queryTextDbSelector= "select "
>   + "\"DB_ID\", \"NAME\", \"DB_LOCATION_URI\", \"DESC\", "
>   + "\"OWNER_NAME\", \"OWNER_TYPE\", \"CTLG_NAME\" , \"CREATE_TIME\", 
> \"DB_MANAGED_LOCATION_URI\""
>   + "FROM "+ DBS
> There needs to be a space before FROM so the query is right. Currently it 
> falls back to JDO, so not lapse in functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23938) LLAP: JDK11 - some GC log file rotation related jvm arguments cannot be used anymore

2020-08-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-23938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17185117#comment-17185117
 ] 

László Bodor commented on HIVE-23938:
-

[~belugabehr]: could you please take a look at this simple patch?

gc logs attached for reference for jdk8 with old VM args ( 
[^gc_2020-07-29-12.jdk8.log] ) and jdk11 with new VM args ( 
[^gc_2020-07-27-13.log] )

> LLAP: JDK11 - some GC log file rotation related jvm arguments cannot be used 
> anymore
> 
>
> Key: HIVE-23938
> URL: https://issues.apache.org/jira/browse/HIVE-23938
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: gc_2020-07-27-13.log, gc_2020-07-29-12.jdk8.log
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://github.com/apache/hive/blob/master/llap-server/bin/runLlapDaemon.sh#L55
> {code}
> JAVA_OPTS_BASE="-server -Djava.net.preferIPv4Stack=true -XX:+UseNUMA 
> -XX:+PrintGCDetails -verbose:gc -XX:+UseGCLogFileRotation 
> -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M -XX:+PrintGCDateStamps"
> {code}
> on JDK11 I got something like:
> {code}
> + exec /usr/lib/jvm/jre-11-openjdk/bin/java -Dproc_llapdaemon -Xms32000m 
> -Xmx64000m -Dhttp.maxConnections=17 -XX:+UseG1GC -XX:+ResizeTLAB -XX:+UseNUMA 
> -XX:+AggressiveOpts -XX:MetaspaceSize=1024m 
> -XX:InitiatingHeapOccupancyPercent=80 -XX:MaxGCPauseMillis=200 
> -XX:+PreserveFramePointer -XX:AllocatePrefetchStyle=2 
> -Dhttp.maxConnections=10 -Dasync.profiler.home=/grid/0/async-profiler -server 
> -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails -verbose:gc 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M 
> -XX:+PrintGCDateStamps 
> -Xloggc:/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/gc_2020-07-27-12.log
>  
> ... 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon
> OpenJDK 64-Bit Server VM warning: Option AggressiveOpts was deprecated in 
> version 11.0 and will likely be removed in a future release.
> Unrecognized VM option 'UseGCLogFileRotation'
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
> {code}
> These are not valid in JDK11:
> {code}
> -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles
> -XX:GCLogFileSize
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDateStamps
> {code}
> Instead something like:
> {code}
> -Xlog:gc*,safepoint:gc.log:time,uptime:filecount=4,filesize=100M
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23618) NotificationLog should also contain events for default/check constraints

2020-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23618?focusedWorklogId=474769=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474769
 ]

ASF GitHub Bot logged work on HIVE-23618:
-

Author: ASF GitHub Bot
Created on: 26/Aug/20 11:12
Start Date: 26/Aug/20 11:12
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1237:
URL: https://github.com/apache/hive/pull/1237#discussion_r477220540



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java
##
@@ -3060,13 +3049,19 @@ public void testConstraints() throws IOException {
   assertEquals(fks.size(), 2);
   List nns = 
metaStoreClientMirror.getNotNullConstraints(new 
NotNullConstraintsRequest(DEFAULT_CATALOG_NAME, replDbName , "tbl3"));
   assertEquals(nns.size(), 1);
+  List cks = 
metaStoreClientMirror.getCheckConstraints(new 
CheckConstraintsRequest(DEFAULT_CATALOG_NAME, replDbName , "tbl7"));
+  assertEquals(cks.size(), 1);
+  List dks = 
metaStoreClientMirror.getDefaultConstraints(new 
DefaultConstraintsRequest(DEFAULT_CATALOG_NAME, replDbName , "tbl8"));
+  assertEquals(dks.size(), 1);
 } catch (TException te) {
   assertNull(te);
 }
 
 run("CREATE TABLE " + dbName + ".tbl4(a string, b string, primary key (a, 
b) disable novalidate rely)", driver);
 run("CREATE TABLE " + dbName + ".tbl5(a string, b string, foreign key (a, 
b) references " + dbName + ".tbl4(a, b) disable novalidate)", driver);
 run("CREATE TABLE " + dbName + ".tbl6(a string, b string not null disable, 
unique (a) disable)", driver);
+run("CREATE TABLE " + dbName + ".tbl9(a string, price double CHECK (price 
> 0 AND price <= 1000))", driver);

Review comment:
   done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 474769)
Time Spent: 1h 10m  (was: 1h)

> NotificationLog should also contain events for default/check constraints
> 
>
> Key: HIVE-23618
> URL: https://issues.apache.org/jira/browse/HIVE-23618
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Adesh Kumar Rao
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This should follow similar approach of notNull/Unique constraints. This will 
> also include event replication for these constraints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23618) NotificationLog should also contain events for default/check constraints

2020-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23618?focusedWorklogId=474768=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474768
 ]

ASF GitHub Bot logged work on HIVE-23618:
-

Author: ASF GitHub Bot
Created on: 26/Aug/20 11:12
Start Date: 26/Aug/20 11:12
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1237:
URL: https://github.com/apache/hive/pull/1237#discussion_r477220458



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java
##
@@ -31,20 +31,7 @@
 import 
org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore.BehaviourInjection;
 import org.apache.hadoop.hive.metastore.MetaStoreTestUtils;
 import org.apache.hadoop.hive.metastore.PersistenceManagerProvider;
-import org.apache.hadoop.hive.metastore.api.Database;
-import org.apache.hadoop.hive.metastore.api.ForeignKeysRequest;
-import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
-import org.apache.hadoop.hive.metastore.api.NotNullConstraintsRequest;
-import org.apache.hadoop.hive.metastore.api.NotificationEvent;
-import org.apache.hadoop.hive.metastore.api.NotificationEventResponse;
-import org.apache.hadoop.hive.metastore.api.Partition;
-import org.apache.hadoop.hive.metastore.api.PrimaryKeysRequest;
-import org.apache.hadoop.hive.metastore.api.SQLForeignKey;
-import org.apache.hadoop.hive.metastore.api.SQLNotNullConstraint;
-import org.apache.hadoop.hive.metastore.api.SQLPrimaryKey;
-import org.apache.hadoop.hive.metastore.api.SQLUniqueConstraint;
-import org.apache.hadoop.hive.metastore.api.Table;
-import org.apache.hadoop.hive.metastore.api.UniqueConstraintsRequest;
+import org.apache.hadoop.hive.metastore.api.*;

Review comment:
   done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 474768)
Time Spent: 1h  (was: 50m)

> NotificationLog should also contain events for default/check constraints
> 
>
> Key: HIVE-23618
> URL: https://issues.apache.org/jira/browse/HIVE-23618
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Adesh Kumar Rao
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This should follow similar approach of notNull/Unique constraints. This will 
> also include event replication for these constraints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23618) NotificationLog should also contain events for default/check constraints

2020-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23618?focusedWorklogId=474767=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474767
 ]

ASF GitHub Bot logged work on HIVE-23618:
-

Author: ASF GitHub Bot
Created on: 26/Aug/20 11:11
Start Date: 26/Aug/20 11:11
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1237:
URL: https://github.com/apache/hive/pull/1237#discussion_r477218017



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java
##
@@ -3060,13 +3049,19 @@ public void testConstraints() throws IOException {
   assertEquals(fks.size(), 2);
   List nns = 
metaStoreClientMirror.getNotNullConstraints(new 
NotNullConstraintsRequest(DEFAULT_CATALOG_NAME, replDbName , "tbl3"));
   assertEquals(nns.size(), 1);
+  List cks = 
metaStoreClientMirror.getCheckConstraints(new 
CheckConstraintsRequest(DEFAULT_CATALOG_NAME, replDbName , "tbl7"));

Review comment:
   External tables don't support default/check constraints yet. So, skipped 
adding for external table.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/load/message/AddCheckConstraintHandler.java
##
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse.repl.load.message;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+
+import org.apache.hadoop.hive.common.TableName;
+import org.apache.hadoop.hive.metastore.api.SQLCheckConstraint;
+import org.apache.hadoop.hive.metastore.messaging.AddCheckConstraintMessage;
+import org.apache.hadoop.hive.ql.ddl.DDLWork;
+import org.apache.hadoop.hive.ql.ddl.table.constraint.Constraints;
+import 
org.apache.hadoop.hive.ql.ddl.table.constraint.add.AlterTableAddConstraintDesc;
+import org.apache.hadoop.hive.ql.exec.Task;
+import org.apache.hadoop.hive.ql.exec.TaskFactory;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+
+public class AddCheckConstraintHandler extends AbstractMessageHandler {

Review comment:
   Added comments.
   
   I couldn't find any existing unit tests for other *Handler classes, and 
since TestReplicationScenarios in itests is covering this end-to-end, I skipped 
unit tests.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 474767)
Time Spent: 50m  (was: 40m)

> NotificationLog should also contain events for default/check constraints
> 
>
> Key: HIVE-23618
> URL: https://issues.apache.org/jira/browse/HIVE-23618
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Adesh Kumar Rao
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This should follow similar approach of notNull/Unique constraints. This will 
> also include event replication for these constraints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22758) Create database with permission error when doas set to true

2020-08-26 Thread Naveen Gangam (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17185095#comment-17185095
 ] 

Naveen Gangam commented on HIVE-22758:
--

[~chiran54321] HIVE-20001 is not committed. How could this be caused by that 
issue? 
Also could you please rebase the base and create a pull request instead. Hive 
has now moved to using CI testing with PR. Thanks

> Create database with permission error when doas set to true
> ---
>
> Key: HIVE-22758
> URL: https://issues.apache.org/jira/browse/HIVE-22758
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Critical
> Attachments: HIVE-22758.1.patch
>
>
> With doAs set to true, running create database on external location fails 
> with permission denied for write access on the directory for hive user (User 
> HMS is running as).
> Steps to reproduce the issue:
> 1. Turn on, Hive run as end-user to true.
> 2. Connect to hive as some user other than admin, eg:- chiran
> 3. Create a database with external location
> {code}
> create database externaldbexample location '/user/chiran/externaldbexample'
> {code}
> The above statement fails as write access is not available to hive service 
> user on HDFS as below.
> {code}
> > create database externaldbexample location '/user/chiran/externaldbexample';
> INFO  : Compiling 
> command(queryId=hive_20200122043626_5c95e1fd-ce00-45fd-b58d-54f5e579f87d): 
> create database externaldbexample location '/user/chiran/externaldbexample'
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20200122043626_5c95e1fd-ce00-45fd-b58d-54f5e579f87d); 
> Time taken: 1.377 seconds
> INFO  : Executing 
> command(queryId=hive_20200122043626_5c95e1fd-ce00-45fd-b58d-54f5e579f87d): 
> create database externaldbexample location '/user/chiran/externaldbexample'
> INFO  : Starting task [Stage-0:DDL] in serial mode
> ERROR : FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. 
> MetaException(message:java.lang.reflect.UndeclaredThrowableException)
> INFO  : Completed executing 
> command(queryId=hive_20200122043626_5c95e1fd-ce00-45fd-b58d-54f5e579f87d); 
> Time taken: 0.238 seconds
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
> MetaException(message:java.lang.reflect.UndeclaredThrowableException) 
> (state=08S01,code=1)
> {code}
> From Hive Metastore service log, below is seen.
> {code}
> 2020-01-22T04:36:27,870 WARN  [pool-6-thread-6]: metastore.ObjectStore 
> (ObjectStore.java:getDatabase(1010)) - Failed to get database 
> hive.externaldbexample, returning NoSuchObjectExcept
> ion
> 2020-01-22T04:36:27,898 INFO  [pool-6-thread-6]: metastore.HiveMetaStore 
> (HiveMetaStore.java:run(1339)) - Creating database path in managed directory 
> hdfs://c470-node2.squadron.support.
> hortonworks.com:8020/user/chiran/externaldbexample
> 2020-01-22T04:36:27,903 INFO  [pool-6-thread-6]: utils.FileUtils 
> (FileUtils.java:mkdir(170)) - Creating directory if it doesn't exist: 
> hdfs://namenodeaddress:8020/user/chiran/externaldbexample
> 2020-01-22T04:36:27,932 ERROR [pool-6-thread-6]: utils.MetaStoreUtils 
> (MetaStoreUtils.java:logAndThrowMetaException(169)) - Got exception: 
> org.apache.hadoop.security.AccessControlException Permission denied: 
> user=hive, access=WRITE, inode="/user/chiran":chiran:chiran:drwxr-xr-x
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:399)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:255)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:193)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1859)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1843)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1802)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:59)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3150)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1126)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:707)
> at 
> 

[jira] [Work logged] (HIVE-24064) Disable Materialized View Replication

2020-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24064?focusedWorklogId=474736=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474736
 ]

ASF GitHub Bot logged work on HIVE-24064:
-

Author: ASF GitHub Bot
Created on: 26/Aug/20 08:47
Start Date: 26/Aug/20 08:47
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1422:
URL: https://github.com/apache/hive/pull/1422#discussion_r477138389



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java
##
@@ -2609,6 +2609,109 @@ public void testViewsReplication() throws IOException {
 verifyIfTableNotExist(replDbName, "virtual_view", metaStoreClientMirror);
   }
 
+  @Test
+  public void testMaterializedViewsReplication() throws Exception {
+boolean verifySetup_tmp = verifySetupSteps;
+verifySetupSteps = true;
+String testName = "materializedviewsreplication";
+String testName2 = testName + "2";
+String dbName = createDB(testName, driver);
+String dbName2 = createDB(testName2, driver); //for creating multi-db 
materialized view
+String replDbName = dbName + "_dupe";
+
+run("CREATE TABLE " + dbName + ".unptned(a string) STORED AS TEXTFILE", 
driver);
+run("CREATE TABLE " + dbName2 + ".unptned(a string) STORED AS TEXTFILE", 
driver);
+run("CREATE TABLE " + dbName + ".ptned(a string) partitioned by (b int) 
STORED AS TEXTFILE", driver);
+
+String[] unptn_data = new String[]{ "eleven", "twelve" };
+String[] ptn_data_1 = new String[]{ "thirteen", "fourteen", "fifteen"};
+String[] ptn_data_2 = new String[]{ "fifteen", "sixteen", "seventeen"};
+String[] empty = new String[]{};
+
+String unptn_locn = new Path(TEST_PATH, testName + 
"_unptn").toUri().getPath();
+String ptn_locn_1 = new Path(TEST_PATH, testName + 
"_ptn1").toUri().getPath();
+String ptn_locn_2 = new Path(TEST_PATH, testName + 
"_ptn2").toUri().getPath();
+
+createTestDataFile(unptn_locn, unptn_data);
+createTestDataFile(ptn_locn_1, ptn_data_1);
+createTestDataFile(ptn_locn_2, ptn_data_2);
+
+verifySetup("SELECT a from " + dbName + ".ptned", empty, driver);
+verifySetup("SELECT * from " + dbName + ".unptned", empty, driver);
+verifySetup("SELECT * from " + dbName2 + ".unptned", empty, driver);
+
+run("LOAD DATA LOCAL INPATH '" + unptn_locn + "' OVERWRITE INTO TABLE " + 
dbName + ".unptned", driver);
+run("LOAD DATA LOCAL INPATH '" + unptn_locn + "' OVERWRITE INTO TABLE " + 
dbName2 + ".unptned", driver);
+verifySetup("SELECT * from " + dbName + ".unptned", unptn_data, driver);
+verifySetup("SELECT * from " + dbName2 + ".unptned", unptn_data, driver);
+
+run("LOAD DATA LOCAL INPATH '" + ptn_locn_1 + "' OVERWRITE INTO TABLE " + 
dbName + ".ptned PARTITION(b=1)", driver);
+verifySetup("SELECT a from " + dbName + ".ptned WHERE b=1", ptn_data_1, 
driver);
+run("LOAD DATA LOCAL INPATH '" + ptn_locn_2 + "' OVERWRITE INTO TABLE " + 
dbName + ".ptned PARTITION(b=2)", driver);
+verifySetup("SELECT a from " + dbName + ".ptned WHERE b=2", ptn_data_2, 
driver);
+
+
+run("CREATE MATERIALIZED VIEW " + dbName + ".mat_view_boot disable rewrite 
 stored as textfile AS SELECT a FROM " + dbName + ".ptned where b=1", driver);
+verifySetup("SELECT a from " + dbName + ".mat_view_boot", ptn_data_1, 
driver);
+
+run("CREATE MATERIALIZED VIEW " + dbName + ".mat_view_boot2 disable 
rewrite  stored as textfile AS SELECT t1.a FROM " + dbName + ".unptned as t1 
join " + dbName2 + ".unptned as t2 on t1.a = t2.a", driver);
+verifySetup("SELECT a from " + dbName + ".mat_view_boot2", unptn_data, 
driver);
+
+Tuple bootstrapDump = bootstrapLoadAndVerify(dbName, replDbName);
+
+verifyRun("SELECT * from " + replDbName + ".unptned", unptn_data, 
driverMirror);
+verifyRun("SELECT a from " + replDbName + ".ptned where b=1", ptn_data_1, 
driverMirror);
+
+//verify source MVs are not on replica
+verifyIfTableNotExist(replDbName, "mat_view_boot", metaStoreClientMirror);
+verifyIfTableNotExist(replDbName, "mat_view_boot2", metaStoreClientMirror);
+
+//test alter materialized view with rename
+run("ALTER TABLE " + dbName + ".mat_view_boot RENAME TO " + dbName + 
".mat_view_rename", driver);
+
+//verify rename, i.e. new MV exists and old MV does not exist
+verifyIfTableNotExist(dbName, "mat_view_boot", metaStoreClient);
+verifyIfTableExist(dbName, "mat_view_rename", metaStoreClient);
+//verifySetup("SELECT a from " + dbName + ".mat_view_rename", ptn_data_1, 
driver);

Review comment:
   check why this is failing

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java
##
@@ -2609,6 +2609,109 @@ public void testViewsReplication() throws IOException {
 

[jira] [Updated] (HIVE-24067) TestReplicationScenariosExclusiveReplica - Wrong FS error during DB drop

2020-08-26 Thread Anishek Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anishek Agarwal updated HIVE-24067:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed to master, thanks for the patch [~pkumarsinha] and review 
[~aasha]

> TestReplicationScenariosExclusiveReplica - Wrong FS error during DB drop
> 
>
> Key: HIVE-24067
> URL: https://issues.apache.org/jira/browse/HIVE-24067
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24067.01.patch, HIVE-24067.02.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In TestReplicationScenariosExclusiveReplica during drop database operation 
> for primary db, it leads to wrong FS error as the ReplChangeManager is 
> associated with replica FS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22782) Consolidate metastore call to fetch constraints

2020-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22782?focusedWorklogId=474718=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474718
 ]

ASF GitHub Bot logged work on HIVE-22782:
-

Author: ASF GitHub Bot
Created on: 26/Aug/20 07:55
Start Date: 26/Aug/20 07:55
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on a change in pull 
request #1419:
URL: https://github.com/apache/hive/pull/1419#discussion_r477106020



##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestGetAllTableConstraints.java
##
@@ -0,0 +1,145 @@
+package org.apache.hadoop.hive.metastore.client;
+
+import org.apache.hadoop.hive.metastore.IMetaStoreClient;
+import org.apache.hadoop.hive.metastore.MetaStoreTestUtils;
+import org.apache.hadoop.hive.metastore.annotation.MetastoreCheckinTest;
+import org.apache.hadoop.hive.metastore.api.AllTableConstraintsRequest;
+import org.apache.hadoop.hive.metastore.api.Catalog;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.apache.hadoop.hive.metastore.api.PrimaryKeysRequest;
+import org.apache.hadoop.hive.metastore.api.SQLAllTableConstraints;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.client.builder.CatalogBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.DatabaseBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.TableBuilder;
+import org.apache.hadoop.hive.metastore.minihms.AbstractMetaStoreService;
+import org.apache.thrift.TException;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+import static org.apache.hadoop.hive.metastore.Warehouse.DEFAULT_DATABASE_NAME;
+
+@RunWith(Parameterized.class)
+@Category(MetastoreCheckinTest.class)
+public class TestGetAllTableConstraints extends MetaStoreClientTest {
+  private static final String OTHER_DATABASE = 
"test_constraints_other_database";
+  private static final String OTHER_CATALOG = "test_constraints_other_catalog";
+  private static final String DATABASE_IN_OTHER_CATALOG = 
"test_constraints_database_in_other_catalog";
+  private final AbstractMetaStoreService metaStore;
+  private IMetaStoreClient client;
+  private Table[] testTables = new Table[3];
+  private Database inOtherCatalog;
+
+  public TestGetAllTableConstraints(String name, AbstractMetaStoreService 
metaStore) throws Exception {
+this.metaStore = metaStore;
+  }
+  @Before
+  public void setUp() throws Exception {
+// Get new client
+client = metaStore.getClient();
+
+// Clean up the database
+client.dropDatabase(OTHER_DATABASE, true, true, true);
+// Drop every table in the default database
+for(String tableName : client.getAllTables(DEFAULT_DATABASE_NAME)) {
+  client.dropTable(DEFAULT_DATABASE_NAME, tableName, true, true, true);
+}
+
+client.dropDatabase(OTHER_CATALOG, DATABASE_IN_OTHER_CATALOG, true, true, 
true);
+try {
+  client.dropCatalog(OTHER_CATALOG);
+} catch (NoSuchObjectException e) {
+  // NOP
+}
+
+// Clean up trash
+metaStore.cleanWarehouseDirs();
+
+new DatabaseBuilder().setName(OTHER_DATABASE).create(client, 
metaStore.getConf());
+
+Catalog cat = new CatalogBuilder()
+.setName(OTHER_CATALOG)
+.setLocation(MetaStoreTestUtils.getTestWarehouseDir(OTHER_CATALOG))
+.build();
+client.createCatalog(cat);
+
+// For this one don't specify a location to make sure it gets put in the 
catalog directory
+inOtherCatalog = new DatabaseBuilder()
+.setName(DATABASE_IN_OTHER_CATALOG)
+.setCatalogName(OTHER_CATALOG)
+.create(client, metaStore.getConf());
+
+testTables[0] =
+new TableBuilder()
+.setTableName("test_table_1")
+.addCol("col1", "int")
+.addCol("col2", "varchar(32)")
+.create(client, metaStore.getConf());
+
+testTables[1] =
+new TableBuilder()
+.setDbName(OTHER_DATABASE)
+.setTableName("test_table_2")
+.addCol("col1", "int")
+.addCol("col2", "varchar(32)")
+.create(client, metaStore.getConf());
+
+testTables[2] =
+new TableBuilder()
+.inDb(inOtherCatalog)
+.setTableName("test_table_3")
+.addCol("col1", "int")
+.addCol("col2", "varchar(32)")
+.create(client, metaStore.getConf());
+
+// Reload tables from the MetaStore
+for(int i=0; i < testTables.length; i++) {
+  testTables[i] = client.getTable(testTables[i].getCatName(), 
testTables[i].getDbName(),
+  

[jira] [Updated] (HIVE-24064) Disable Materialized View Replication

2020-08-26 Thread Arko Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arko Sharma updated HIVE-24064:
---
Attachment: HIVE-24064.03.patch

> Disable Materialized View Replication
> -
>
> Key: HIVE-24064
> URL: https://issues.apache.org/jira/browse/HIVE-24064
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24064.01.patch, HIVE-24064.02.patch, 
> HIVE-24064.03.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24067) TestReplicationScenariosExclusiveReplica - Wrong FS error during DB drop

2020-08-26 Thread Aasha Medhi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184972#comment-17184972
 ] 

Aasha Medhi commented on HIVE-24067:


+1

> TestReplicationScenariosExclusiveReplica - Wrong FS error during DB drop
> 
>
> Key: HIVE-24067
> URL: https://issues.apache.org/jira/browse/HIVE-24067
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24067.01.patch, HIVE-24067.02.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In TestReplicationScenariosExclusiveReplica during drop database operation 
> for primary db, it leads to wrong FS error as the ReplChangeManager is 
> associated with replica FS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)