[jira] [Commented] (HIVE-24090) NPE while SJ reduction due to missing null check for col stats

2020-08-30 Thread Vipin Vishvkarma (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17187424#comment-17187424
 ] 

Vipin Vishvkarma commented on HIVE-24090:
-

[~zabetak] [~jcamachorodriguez] Can you please review the PR.

> NPE while SJ reduction due to missing null check for col stats
> --
>
> Key: HIVE-24090
> URL: https://issues.apache.org/jira/browse/HIVE-24090
> Project: Hive
>  Issue Type: Bug
>Reporter: Vipin Vishvkarma
>Assignee: Vipin Vishvkarma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hitting NPE while SJ reduction due to missing col stats
> {code:java}
> Error(1647)) - FAILED: NullPointerException null 
> java.lang.NullPointerException at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.updateStats(StatsUtils.java:2111) 
> at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.removeSemijoinOptimizationByBenefit(TezCompiler.java:1629)
>  at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.semijoinRemovalBasedTransformations(TezCompiler.java:498)
>  at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:209)
>  at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:144) 
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12642)
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11960)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24090) NPE while SJ reduction due to missing null check for col stats

2020-08-29 Thread Vipin Vishvkarma (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186973#comment-17186973
 ] 

Vipin Vishvkarma commented on HIVE-24090:
-

[~zabetak] Yes, sorry the stacktrace is from a private branch. For master NPE 
will be at this line  
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L2071]

 

> NPE while SJ reduction due to missing null check for col stats
> --
>
> Key: HIVE-24090
> URL: https://issues.apache.org/jira/browse/HIVE-24090
> Project: Hive
>  Issue Type: Bug
>Reporter: Vipin Vishvkarma
>Assignee: Vipin Vishvkarma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hitting NPE while SJ reduction due to missing col stats
> {code:java}
> Error(1647)) - FAILED: NullPointerException null 
> java.lang.NullPointerException at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.updateStats(StatsUtils.java:2111) 
> at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.removeSemijoinOptimizationByBenefit(TezCompiler.java:1629)
>  at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.semijoinRemovalBasedTransformations(TezCompiler.java:498)
>  at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:209)
>  at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:144) 
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12642)
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11960)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24090) NPE while SJ reduction due to missing null check for col stats

2020-08-29 Thread Vipin Vishvkarma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vipin Vishvkarma reassigned HIVE-24090:
---


> NPE while SJ reduction due to missing null check for col stats
> --
>
> Key: HIVE-24090
> URL: https://issues.apache.org/jira/browse/HIVE-24090
> Project: Hive
>  Issue Type: Bug
>Reporter: Vipin Vishvkarma
>Assignee: Vipin Vishvkarma
>Priority: Major
>
> Hitting NPE while SJ reduction due to missing col stats
> {code:java}
> Error(1647)) - FAILED: NullPointerException null 
> java.lang.NullPointerException at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.updateStats(StatsUtils.java:2111) 
> at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.removeSemijoinOptimizationByBenefit(TezCompiler.java:1629)
>  at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.semijoinRemovalBasedTransformations(TezCompiler.java:498)
>  at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:209)
>  at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:144) 
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12642)
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11960)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24020) Automatic Compaction not working in existing partitions for Streaming Ingest with Dynamic Partition

2020-08-10 Thread Vipin Vishvkarma (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174149#comment-17174149
 ] 

Vipin Vishvkarma commented on HIVE-24020:
-

[~pvary] [~lpinter] Can you please review.

> Automatic Compaction not working in existing partitions for Streaming Ingest 
> with Dynamic Partition
> ---
>
> Key: HIVE-24020
> URL: https://issues.apache.org/jira/browse/HIVE-24020
> Project: Hive
>  Issue Type: Bug
>  Components: Streaming, Transactions
>Affects Versions: 4.0.0, 3.1.2
>Reporter: Vipin Vishvkarma
>Assignee: Vipin Vishvkarma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This issue happens when we try to do streaming ingest with dynamic partition 
> on already existing partitions. I checked in the code, we have following 
> check in the AbstractRecordWriter.
>  
> {code:java}
> PartitionInfo partitionInfo = 
> conn.createPartitionIfNotExists(partitionValues);
> // collect the newly added partitions. connection.commitTransaction() will 
> report the dynamically added
> // partitions to TxnHandler
> if (!partitionInfo.isExists()) {
>   addedPartitions.add(partitionInfo.getName());
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Partition {} already exists for table {}",
> partitionInfo.getName(), fullyQualifiedTableName);
>   }
> }
> {code}
> Above *addedPartitions* is passed to *addDynamicPartitions* during 
> TransactionBatch commit. So in case of already existing partitions, 
> *addedPartitions* will be empty and *addDynamicPartitions* **will not move 
> entries from TXN_COMPONENTS to COMPLETED_TXN_COMPONENTS. This results in 
> Initiator not able to trigger auto compaction.
> Another issue which has been observed is, we are not clearing 
> *addedPartitions* on writer close, which results in information flowing 
> across transactions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24020) Automatic Compaction not working in existing partitions for Streaming Ingest with Dynamic Partition

2020-08-10 Thread Vipin Vishvkarma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vipin Vishvkarma updated HIVE-24020:

Component/s: Transactions
 Streaming

> Automatic Compaction not working in existing partitions for Streaming Ingest 
> with Dynamic Partition
> ---
>
> Key: HIVE-24020
> URL: https://issues.apache.org/jira/browse/HIVE-24020
> Project: Hive
>  Issue Type: Bug
>  Components: Streaming, Transactions
>Affects Versions: 4.0.0, 3.1.2
>Reporter: Vipin Vishvkarma
>Assignee: Vipin Vishvkarma
>Priority: Major
>
> This issue happens when we try to do streaming ingest with dynamic partition 
> on already existing partitions. I checked in the code, we have following 
> check in the AbstractRecordWriter.
>  
> {code:java}
> PartitionInfo partitionInfo = 
> conn.createPartitionIfNotExists(partitionValues);
> // collect the newly added partitions. connection.commitTransaction() will 
> report the dynamically added
> // partitions to TxnHandler
> if (!partitionInfo.isExists()) {
>   addedPartitions.add(partitionInfo.getName());
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Partition {} already exists for table {}",
> partitionInfo.getName(), fullyQualifiedTableName);
>   }
> }
> {code}
> Above *addedPartitions* is passed to *addDynamicPartitions* during 
> TransactionBatch commit. So in case of already existing partitions, 
> *addedPartitions* will be empty and *addDynamicPartitions* **will not move 
> entries from TXN_COMPONENTS to COMPLETED_TXN_COMPONENTS. This results in 
> Initiator not able to trigger auto compaction.
> Another issue which has been observed is, we are not clearing 
> *addedPartitions* on writer close, which results in information flowing 
> across transactions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24020) Automatic Compaction not working in existing partitions for Streaming Ingest with Dynamic Partition

2020-08-10 Thread Vipin Vishvkarma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vipin Vishvkarma reassigned HIVE-24020:
---


> Automatic Compaction not working in existing partitions for Streaming Ingest 
> with Dynamic Partition
> ---
>
> Key: HIVE-24020
> URL: https://issues.apache.org/jira/browse/HIVE-24020
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Vipin Vishvkarma
>Assignee: Vipin Vishvkarma
>Priority: Major
>
> This issue happens when we try to do streaming ingest with dynamic partition 
> on already existing partitions. I checked in the code, we have following 
> check in the AbstractRecordWriter.
>  
> {code:java}
> PartitionInfo partitionInfo = 
> conn.createPartitionIfNotExists(partitionValues);
> // collect the newly added partitions. connection.commitTransaction() will 
> report the dynamically added
> // partitions to TxnHandler
> if (!partitionInfo.isExists()) {
>   addedPartitions.add(partitionInfo.getName());
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Partition {} already exists for table {}",
> partitionInfo.getName(), fullyQualifiedTableName);
>   }
> }
> {code}
> Above *addedPartitions* is passed to *addDynamicPartitions* during 
> TransactionBatch commit. So in case of already existing partitions, 
> *addedPartitions* will be empty and *addDynamicPartitions* **will not move 
> entries from TXN_COMPONENTS to COMPLETED_TXN_COMPONENTS. This results in 
> Initiator not able to trigger auto compaction.
> Another issue which has been observed is, we are not clearing 
> *addedPartitions* on writer close, which results in information flowing 
> across transactions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-06-10 Thread Vipin Vishvkarma (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132608#comment-17132608
 ] 

Vipin Vishvkarma commented on HIVE-21052:
-

[~dkuzmenko] Sorry for a bit late reply.
 
I and [~asomani] have gone through the current patch/doc, just to summarize the 
design,  
1. Add a dummy entry in TXN_COMPONENTS while taking a lock
2. Remove the above and add the actual partitions in addDynamicPartitions
3. On an abort before step 2, the entry in TXN_COMPONENTS will remain and 
signal that cleanup needs to be done.
4. The initiator will add a row in COMPACTION_QUEUE (with type 'p') for the 
above-aborted txn with the state as READY_FOR_CLEANING
4. Introduce a new type of cleanup (p-type), which will do the cleanup of the 
above by doing a table-level scan and deletion of aborted dirs
5. Add a thread pool in the cleaner to run above cleanup in parallel with the 
regular cleanup

In the current patch, we have found out some shortcomings/issues which are,
1. The current multi-threaded solution is not complete till we fix HIVE-21150
2. The current solution allows parallel cleanup on the same partition, as all 
regular cleanup only takes a shared lock. This we need to change, or if we can 
allow parallel cleanup on the same partition, then why do we need an exclusive 
lock for p-type cleanup.
3. Only delta dirs cleanup is handled, aborted IOW dirs cleanup is still 
missing in both static/dynamic partition case for MM table and the data can be 
read once cleaner removes the entry from the TXN table.

So for now, we have decided to go with a single-threaded cleaner and fix this 
for Hive 3 first, as our customers have been blocked because of this. 

For Hive 4, we need some inputs as we don't have expertise, open questions,
1. Is there a concern in removing aborted base dirs, like we remove aborted 
delta dir for MM table in worker
2. As we don't see much benefit from current multi-threaded cleaner 
implementation, should we remove this for now?

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-8123) Support parquet ACID

2020-03-23 Thread Vipin Vishvkarma (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-8123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17063607#comment-17063607
 ] 

Vipin Vishvkarma edited comment on HIVE-8123 at 3/23/20, 4:50 PM:
--

We have picked up the work and progressed a bit on this. Here is a [design 
doc|https://tinyurl.com/wgo2hx3], feel free to comment on the same. Will upload 
a WIP patch in the next few days.
  
 cc: [~gates] [~gopalv] [~pvary]


was (Author: vpnvishv):
We have picked up the work and progressed a bit on this. Here is a [design 
doc|*https://tinyurl.com/wgo2hx3*], feel free to comment on the same. Will 
upload a WIP patch in the next few days.
  
 cc: [~gates] [~gopalv] [~pvary]

> Support parquet ACID
> 
>
> Key: HIVE-8123
> URL: https://issues.apache.org/jira/browse/HIVE-8123
> Project: Hive
>  Issue Type: Task
>Reporter: Brock Noland
>Assignee: Ferdinand Xu
>Priority: Major
>
> Hive "ACID" work currently only works with ORC. It should work with Parquet 
> as well. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-8123) Support parquet ACID

2020-03-23 Thread Vipin Vishvkarma (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-8123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17063607#comment-17063607
 ] 

Vipin Vishvkarma edited comment on HIVE-8123 at 3/23/20, 4:49 PM:
--

We have picked up the work and progressed a bit on this. Here is a [design 
doc|*https://tinyurl.com/wgo2hx3*], feel free to comment on the same. Will 
upload a WIP patch in the next few days.
  
 cc: [~gates] [~gopalv] [~pvary]


was (Author: vpnvishv):
We have picked up the work and progressed a bit on this. Here is a design doc 
[[https://docs.google.com/document/d/19XQ3W-jyXP2M_94ltAeIc3Gdgum2oxI-CikRsuItcr0/edit#]],
 feel free to comment on the same. Will upload a WIP patch in the next few days.
  
 cc: [~gates] [~gopalv] [~pvary]

> Support parquet ACID
> 
>
> Key: HIVE-8123
> URL: https://issues.apache.org/jira/browse/HIVE-8123
> Project: Hive
>  Issue Type: Task
>Reporter: Brock Noland
>Assignee: Ferdinand Xu
>Priority: Major
>
> Hive "ACID" work currently only works with ORC. It should work with Parquet 
> as well. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-8123) Support parquet ACID

2020-03-20 Thread Vipin Vishvkarma (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-8123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17063607#comment-17063607
 ] 

Vipin Vishvkarma edited comment on HIVE-8123 at 3/20/20, 8:06 PM:
--

We have picked up the work and progressed a bit on this. Here is a design doc 
[[https://docs.google.com/document/d/19XQ3W-jyXP2M_94ltAeIc3Gdgum2oxI-CikRsuItcr0/edit#]],
 feel free to comment on the same. Will upload a WIP patch in the next few days.
  
 cc: [~gates] [~gopalv] [~pvary]


was (Author: vpnvishv):
We have picked up the work and progressed a bit on this. Here is a [design doc 
| 
[https://docs.google.com/document/d/19XQ3W-jyXP2M_94ltAeIc3Gdgum2oxI-CikRsuItcr0/edit#]],
 feel free to comment on the same. Will upload a WIP patch in the next few days.
  
 cc: [~gates] [~gopalv] [~pvary]

> Support parquet ACID
> 
>
> Key: HIVE-8123
> URL: https://issues.apache.org/jira/browse/HIVE-8123
> Project: Hive
>  Issue Type: Task
>Reporter: Brock Noland
>Assignee: Ferdinand Xu
>Priority: Major
>
> Hive "ACID" work currently only works with ORC. It should work with Parquet 
> as well. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-8123) Support parquet ACID

2020-03-20 Thread Vipin Vishvkarma (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-8123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17063607#comment-17063607
 ] 

Vipin Vishvkarma edited comment on HIVE-8123 at 3/20/20, 8:05 PM:
--

We have picked up the work and progressed a bit on this. Here is a [design doc 
| 
[https://docs.google.com/document/d/19XQ3W-jyXP2M_94ltAeIc3Gdgum2oxI-CikRsuItcr0/edit#]],
 feel free to comment on the same. Will upload a WIP patch in the next few days.
  
 cc: [~gates] [~gopalv] [~pvary]


was (Author: vpnvishv):
We have picked up the work and progressed a bit on this. Here is a [design 
doc|[https://docs.google.com/document/d/19XQ3W-jyXP2M_94ltAeIc3Gdgum2oxI-CikRsuItcr0/edit#]],
 feel free to comment on the same. Will upload a WIP patch in the next few days.
  
 cc: [~gates] [~gopalv] [~pvary]

> Support parquet ACID
> 
>
> Key: HIVE-8123
> URL: https://issues.apache.org/jira/browse/HIVE-8123
> Project: Hive
>  Issue Type: Task
>Reporter: Brock Noland
>Assignee: Ferdinand Xu
>Priority: Major
>
> Hive "ACID" work currently only works with ORC. It should work with Parquet 
> as well. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-8123) Support parquet ACID

2020-03-20 Thread Vipin Vishvkarma (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-8123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17063607#comment-17063607
 ] 

Vipin Vishvkarma edited comment on HIVE-8123 at 3/20/20, 8:04 PM:
--

We have picked up the work and progressed a bit on this. Here is a [design 
doc|[https://docs.google.com/document/d/19XQ3W-jyXP2M_94ltAeIc3Gdgum2oxI-CikRsuItcr0/edit#]],
 feel free to comment on the same. Will upload a WIP patch in the next few days.
  
 cc: [~gates] [~gopalv] [~pvary]


was (Author: vpnvishv):
We have picked up the work and progressed a bit on this. Here is a [design 
doc|[https://docs.google.com/document/d/19XQ3W-jyXP2M_94ltAeIc3Gdgum2oxI-CikRsuItcr0/edit#]],
 feel free to comment on the same. Will upload a WIP patch in the next few days.
 
cc: [~gates] [~gopalv] [~pvary]

> Support parquet ACID
> 
>
> Key: HIVE-8123
> URL: https://issues.apache.org/jira/browse/HIVE-8123
> Project: Hive
>  Issue Type: Task
>Reporter: Brock Noland
>Assignee: Ferdinand Xu
>Priority: Major
>
> Hive "ACID" work currently only works with ORC. It should work with Parquet 
> as well. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-8123) Support parquet ACID

2020-03-20 Thread Vipin Vishvkarma (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-8123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17063607#comment-17063607
 ] 

Vipin Vishvkarma commented on HIVE-8123:


We have picked up the work and progressed a bit on this. Here is a [design 
doc|[https://docs.google.com/document/d/19XQ3W-jyXP2M_94ltAeIc3Gdgum2oxI-CikRsuItcr0/edit#]],
 feel free to comment on the same. Will upload a WIP patch in the next few days.
 
cc: [~gates] [~gopalv] [~pvary]

> Support parquet ACID
> 
>
> Key: HIVE-8123
> URL: https://issues.apache.org/jira/browse/HIVE-8123
> Project: Hive
>  Issue Type: Task
>Reporter: Brock Noland
>Assignee: Ferdinand Xu
>Priority: Major
>
> Hive "ACID" work currently only works with ORC. It should work with Parquet 
> as well. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22081) Hivemetastore Performance: Compaction Initiator Thread overwhelmed if there are too many Table/partitions are eligible for compaction

2019-11-12 Thread Vipin Vishvkarma (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973063#comment-16973063
 ] 

Vipin Vishvkarma commented on HIVE-22081:
-

[~Rajkumar Singh] Will, there be any performance improvement with this change, 
as I don't see changes related to point 2 from the description in the final 
change and we have used stream() which is sequential in nature.  I may be 
missing something here, can you please confirm.

> Hivemetastore Performance: Compaction Initiator Thread overwhelmed if there 
> are too many Table/partitions are eligible for compaction 
> --
>
> Key: HIVE-22081
> URL: https://issues.apache.org/jira/browse/HIVE-22081
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.1.1
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21917.01.patch, HIVE-21917.02.patch, 
> HIVE-21917.03.patch, HIVE-22081.04.patch, HIVE-22081.patch
>
>
> if Automatic Compaction is turned on, Initiator thread check for potential 
> table/partitions which are eligible for compactions and run some checks in 
> for loop before requesting compaction for eligibles. Though initiator thread 
> is configured to run at interval 5 min default, in case of many objects it 
> keeps on running as these checks are IO intensive and hog cpu.
> In the proposed changes, I am planning to do
> 1. passing less object to for loop by filtering out the objects based on the 
> condition which we are checking within the loop.
> 2. Doing Async call using future to determine compaction type(this is where 
> we do FileSystem calls)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21749) ACID: Provide an option to run Cleaner thread from Hive client

2019-07-21 Thread Vipin Vishvkarma (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889728#comment-16889728
 ] 

Vipin Vishvkarma commented on HIVE-21749:
-

[~vgumashta] Are you working on this?

> ACID: Provide an option to run Cleaner thread from Hive client
> --
>
> Key: HIVE-21749
> URL: https://issues.apache.org/jira/browse/HIVE-21749
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Major
>
> In some cases, it could be useful to trigger the cleaner thread manually. We 
> should provide an option for that.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)