[jira] [Created] (HIVE-22700) Compactions may leak memory when unauthorized

2020-01-07 Thread Laszlo Pinter (Jira)
Laszlo Pinter created HIVE-22700:


 Summary: Compactions may leak memory when unauthorized
 Key: HIVE-22700
 URL: https://issues.apache.org/jira/browse/HIVE-22700
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Laszlo Pinter
Assignee: Laszlo Pinter


Initiator class determines compaction type periodically. Initiator either runs 
as hive user or impersonates the owner of the table. When impersonation is 
used, Initiator#checkForCompaction may leak memory. If impersonation 
(ugi.doAs()) call fails, FileSystem.closeAllForUGI does not run, therefore does 
not clean the file system cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22701) New Compaction for subsequent read's optimisations.

2020-01-07 Thread Aditya Shah (Jira)
Aditya Shah created HIVE-22701:
--

 Summary: New Compaction for subsequent read's optimisations.
 Key: HIVE-22701
 URL: https://issues.apache.org/jira/browse/HIVE-22701
 Project: Hive
  Issue Type: New Feature
  Components: Transactions
Reporter: Aditya Shah


Introducing a new Compaction Type say "OPTIMIZE" to have the following 
optimizations for better reads:

1. Sort data
2. Re-bucket data
3. z-ordering
4. removing ROW_IDs

I've attached a [design doc| 
https://docs.google.com/document/d/10zWk7FR6I0CMy57Uykbkcox4HZTMQv2sgLoZrHVeLYU/edit?usp=sharing]
 with the JIRA. Feel free to comment on the same.

cc: [~t3rmin4t0r] [~pvary]  [~lpinter]  [~asomani]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 71949: HIVE-20934: ACID: Query based compactor for minor compaction

2020-01-07 Thread Peter Vary via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71949/#review219140
---




itests/hive-unit/pom.xml
Lines 440 (patched)


nit: formatting?
question: Do we really need guava? I hate this dependency as a general rule 
try to avoid it.



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
Line 1 (original), 1 (patched)


Hard to review the changes because of the formatting differences... Let's 
talk



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
Line 10 (original), 10 (patched)


nit: Do you know what is this change?



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
Line 116 (original), 112 (patched)


' ' is needed after ','



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
Line 198 (original), 191 (patched)


nit: unnecessary '+' in the middle



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
Line 213 (original), 207 (patched)


nit: unnecessary '+'



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
Line 259 (original), 252 (patched)


nit: unnecessary '+'



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
Line 264 (original), 256 (patched)


'+' again



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
Lines 282 (patched)


I do not see this in the original code. What is this for?



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java
Lines 1228 (patched)


Do we need this? With delete_delta we do not supposed to have syntehetic 
rowIDs...



ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommandsForMmTable.java
Lines 619 (patched)


nit: spaces...


- Peter Vary


On jan. 4, 2020, 9:06 de, Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71949/
> ---
> 
> (Updated jan. 4, 2020, 9:06 de)
> 
> 
> Review request for hive, Denys Kuzmenko, Karen Coppage, and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20934: ACID: Query based compactor for minor compaction
> 
> 
> Diffs
> -
> 
>   itests/hive-unit/pom.xml bc20cd6168dd61222c75fb866deada26328986dd 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorTestUtil.java
>  PRE-CREATION 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  445e39c260edc68f511550271a7ac471fae908fe 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
>  b7245e2c3570b362a00b65b23f3f84616d0a3d1e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 
> 33d723a02e28d69a69b88281038f69b5aecfe6a2 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 
> 3c508ec6cf620aee6a7791c6ab52c331ad5ec6bd 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 
> 2ac6232460fedb8351b5f0cfae2ce2d0f2e2d948 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 
> 0a96fc30b359043293017b235a36cd044ddb176e 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
> 20b0ccd94b5f08aa2c1dace1301a8315bd202bf7 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> 2b2cc1a2ba8377aa3681b1a3454a0d64369eef64 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> 7a0e32463d28007cff5526ae037cc1447e50a50b 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MajorQueryCompactor.java 
> 38689ef86c607a36f8ec961a88578c13bfcd5b01 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MinorQueryCompactor.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MmMajorQueryCompactor.java
>  9b8420902fb688b218fa432d70f71302f9f180e6 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/QueryCompactor.java 
> 1eab5b888deef2d0fb5c097941a1dafa51c7d46b 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/QueryCompactorFactory.java
>  41cb4b64fbc79dcf81919

Re: Review Request 71761: HIVE-22489

2020-01-07 Thread Krisztian Kasa

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71761/
---

(Updated Jan. 7, 2020, 11:22 a.m.)


Review request for hive, Jesús Camacho Rodríguez and Zoltan Haindrich.


Bugs: HIVE-22489
https://issues.apache.org/jira/browse/HIVE-22489


Repository: hive-git


Description
---

Reduce Sink operator orders nulls first
===
1. Set the default null sort order by hive config when creating Reduce Sink 
Desc.
2. Hash join uses 
`org.apache.hadoop.hive.serde2.binarysortable.fast.BinarySortableSerializeWrite`
 or `BinarySortableDeserializeRead` for selializing keys. For bigtable keys 
always ascending and nulls first ordering was hardcoded. This patch changes 
this behaviour to use the `Operator.getConf().TableDesc.getProperties()` (in 
this case `MapJoinOperator`) to setup ordering in `BinarySortableSerializeWrite`
3. Use null ordering set in ReduceRecordSource at Reduce phase when comparing 
keys in `CommonMergeJoinOperator` (This is the null ordering of the children 
Reduce Sink operators)


Diffs (updated)
-

  accumulo-handler/src/test/results/positive/accumulo_queries.q.out 7c552621f2 
  contrib/src/test/results/clientpositive/udaf_example_group_concat.q.out 
6846720d95 
  hbase-handler/src/test/results/positive/hbase_queries.q.out a32ef81a7b 
  
itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out
 e997fa65cf 
  kudu-handler/src/test/results/positive/kudu_complex_queries.q.out 73fc3e514f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonMergeJoinOperator.java 
3974627a24 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java 
72446afeda 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java
 2380d936f2 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinInnerBigOnlyMultiKeyOperator.java
 f587517b08 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinInnerMultiKeyOperator.java
 cdee3fd957 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinLeftSemiMultiKeyOperator.java
 e5d9fdae19 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinOuterMultiKeyOperator.java
 29c531bd51 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastLongHashMap.java
 a4cda921a5 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastLongHashMultiSet.java
 43f093d906 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastLongHashSet.java
 8dce5b82d3 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastLongHashTable.java
 a35401d9b2 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastStringCommon.java
 1b108a8c14 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastStringHashMap.java
 446feb2526 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastStringHashMultiSet.java
 c28ef9be2b 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastStringHashSet.java
 17bd5fda93 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java
 4ab8902a3f 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedCreateHashTable.java
 21c355cb42 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedLongCommon.java
 de1ee15c3b 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedLongHashMap.java
 42573f0898 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedLongHashMultiSet.java
 829a03737d 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedLongHashSet.java
 18e1435019 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedStringCommon.java
 da0e8365b1 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedStringHashMap.java
 6c4d8a81d1 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedStringHashMultiSet.java
 a6b754c7eb 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedStringHashSet.java
 fdcd83dde7 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkCommonOperator.java
 5c409e4573 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/CountDistinctRewriteProc.java 
a50ad78e8f 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java
 0f95d7788c 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java 
89b55001f0 
  
ql/src/java/org/apache/hadoop/hi

Re: Review Request 71949: HIVE-20934: ACID: Query based compactor for minor compaction

2020-01-07 Thread Laszlo Pinter via Review Board


> On Jan. 7, 2020, 10:21 a.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
> > Lines 282 (patched)
> > 
> >
> > I do not see this in the original code. What is this for?

I made a bit of simplification here. The OrcSplit.parse() call does some 
metadata preparation steps, like fetching the writeID, transactionID and 
bucketID and setting it to the split. Originally, this was done in 
ComparatorCompactor, but my point of view is that the comparator should be used 
only for comparing OrcSplits and not to prepare data. Since, we are already 
iterating over the splits, it made sense to move the call here.


> On Jan. 7, 2020, 10:21 a.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java
> > Lines 1228 (patched)
> > 
> >
> > Do we need this? With delete_delta we do not supposed to have 
> > syntehetic rowIDs...

That is correct, but I had to introduce this condition as well, to avoid 
IllegalStateException. Without this check, the splits in a delete delta 
directory would be considered as original files but without the correct parent.


- Laszlo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71949/#review219140
---


On Jan. 4, 2020, 9:06 a.m., Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71949/
> ---
> 
> (Updated Jan. 4, 2020, 9:06 a.m.)
> 
> 
> Review request for hive, Denys Kuzmenko, Karen Coppage, and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20934: ACID: Query based compactor for minor compaction
> 
> 
> Diffs
> -
> 
>   itests/hive-unit/pom.xml bc20cd6168dd61222c75fb866deada26328986dd 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorTestUtil.java
>  PRE-CREATION 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  445e39c260edc68f511550271a7ac471fae908fe 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
>  b7245e2c3570b362a00b65b23f3f84616d0a3d1e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 
> 33d723a02e28d69a69b88281038f69b5aecfe6a2 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 
> 3c508ec6cf620aee6a7791c6ab52c331ad5ec6bd 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 
> 2ac6232460fedb8351b5f0cfae2ce2d0f2e2d948 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 
> 0a96fc30b359043293017b235a36cd044ddb176e 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
> 20b0ccd94b5f08aa2c1dace1301a8315bd202bf7 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> 2b2cc1a2ba8377aa3681b1a3454a0d64369eef64 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> 7a0e32463d28007cff5526ae037cc1447e50a50b 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MajorQueryCompactor.java 
> 38689ef86c607a36f8ec961a88578c13bfcd5b01 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MinorQueryCompactor.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MmMajorQueryCompactor.java
>  9b8420902fb688b218fa432d70f71302f9f180e6 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/QueryCompactor.java 
> 1eab5b888deef2d0fb5c097941a1dafa51c7d46b 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/QueryCompactorFactory.java
>  41cb4b64fbc79dcf81919769c567b26a2e18cfe5 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommandsForMmTable.java 
> d4c9121c9f17f8d083f1e1af1caf840678a3559d 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommandsForOrcMmTable.java 
> d6435342aa1f56ba5495a657b4a43327fdc49645 
> 
> 
> Diff: https://reviews.apache.org/r/71949/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Laszlo Pinter
> 
>



Re: Review Request 71949: HIVE-20934: ACID: Query based compactor for minor compaction

2020-01-07 Thread Laszlo Pinter via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71949/
---

(Updated Jan. 7, 2020, 2:24 p.m.)


Review request for hive, Denys Kuzmenko, Karen Coppage, and Peter Vary.


Changes
---

Fixed code review findings.


Repository: hive-git


Description
---

HIVE-20934: ACID: Query based compactor for minor compaction


Diffs (updated)
-

  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorTestUtil.java
 PRE-CREATION 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
 445e39c260edc68f511550271a7ac471fae908fe 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
 b7245e2c3570b362a00b65b23f3f84616d0a3d1e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 
33d723a02e28d69a69b88281038f69b5aecfe6a2 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 
3c508ec6cf620aee6a7791c6ab52c331ad5ec6bd 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 
2ac6232460fedb8351b5f0cfae2ce2d0f2e2d948 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 
0a96fc30b359043293017b235a36cd044ddb176e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
ad6817c32bbfad1d27023b25912b1204f069a66a 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
2b2cc1a2ba8377aa3681b1a3454a0d64369eef64 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
7a0e32463d28007cff5526ae037cc1447e50a50b 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MajorQueryCompactor.java 
38689ef86c607a36f8ec961a88578c13bfcd5b01 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MinorQueryCompactor.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MmMajorQueryCompactor.java 
9b8420902fb688b218fa432d70f71302f9f180e6 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/QueryCompactor.java 
1eab5b888deef2d0fb5c097941a1dafa51c7d46b 
  
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/QueryCompactorFactory.java 
41cb4b64fbc79dcf81919769c567b26a2e18cfe5 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommandsForMmTable.java 
d4c9121c9f17f8d083f1e1af1caf840678a3559d 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommandsForOrcMmTable.java 
d6435342aa1f56ba5495a657b4a43327fdc49645 


Diff: https://reviews.apache.org/r/71949/diff/2/

Changes: https://reviews.apache.org/r/71949/diff/1-2/


Testing
---


Thanks,

Laszlo Pinter



Review Request 71963: HIVE-22700: Compactions may leak memory when unauthorized

2020-01-07 Thread Laszlo Pinter via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71963/
---

Review request for hive, Denys Kuzmenko, Karen Coppage, and Peter Vary.


Repository: hive-git


Description
---

HIVE-22700: Compactions may leak memory when unauthorized


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
7a0e32463d28007cff5526ae037cc1447e50a50b 


Diff: https://reviews.apache.org/r/71963/diff/1/


Testing
---


Thanks,

Laszlo Pinter



Re: Review Request 71963: HIVE-22700: Compactions may leak memory when unauthorized

2020-01-07 Thread Peter Vary via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71963/#review219145
---


Ship it!




Ship It!

- Peter Vary


On jan. 7, 2020, 2:51 du, Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71963/
> ---
> 
> (Updated jan. 7, 2020, 2:51 du)
> 
> 
> Review request for hive, Denys Kuzmenko, Karen Coppage, and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-22700: Compactions may leak memory when unauthorized
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> 7a0e32463d28007cff5526ae037cc1447e50a50b 
> 
> 
> Diff: https://reviews.apache.org/r/71963/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Laszlo Pinter
> 
>



Re: Review Request 71932: HIVE-22652

2020-01-07 Thread Jesús Camacho Rodríguez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71932/#review219151
---


Ship it!




Ship It!

- Jesús Camacho Rodríguez


On Jan. 7, 2020, 7:36 a.m., Krisztian Kasa wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71932/
> ---
> 
> (Updated Jan. 7, 2020, 7:36 a.m.)
> 
> 
> Review request for hive and Jesús Camacho Rodríguez.
> 
> 
> Bugs: HIVE-22652
> https://issues.apache.org/jira/browse/HIVE-22652
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-22652: TopNKey push through Group by with Grouping sets
> 
> Enable TNK op push down through Group by with Grouping sets by removing the 
> lines which checked whether the GBY operator has GROUPING SETS
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyPushdownProcessor.java
>  c79c371a8b 
>   ql/src/test/queries/clientpositive/topnkey_grouping_sets.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/topnkey_grouping_sets_functions.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/topnkey_grouping_sets_order.q 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/llap/topnkey_grouping_sets.q.out 
> PRE-CREATION 
>   
> ql/src/test/results/clientpositive/llap/topnkey_grouping_sets_functions.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/llap/topnkey_grouping_sets_order.q.out 
> PRE-CREATION 
>   
> ql/src/test/results/clientpositive/llap/vector_groupby_grouping_sets_limit.q.out
>  c7e837905d 
>   ql/src/test/results/clientpositive/perf/tez/cbo_query14.q.out d1e8c3806e 
>   ql/src/test/results/clientpositive/perf/tez/cbo_query77.q.out aa080603e1 
>   ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query14.q.out 
> 59fcf951fe 
>   ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query77.q.out 
> 39da7ea903 
>   ql/src/test/results/clientpositive/perf/tez/constraints/query14.q.out 
> 65d3faa20f 
>   ql/src/test/results/clientpositive/perf/tez/constraints/query27.q.out 
> e1a48eaeea 
>   ql/src/test/results/clientpositive/perf/tez/constraints/query5.q.out 
> 13288d28b4 
>   ql/src/test/results/clientpositive/perf/tez/constraints/query77.q.out 
> c2758b7033 
>   ql/src/test/results/clientpositive/perf/tez/constraints/query80.q.out 
> 72a54928c2 
>   ql/src/test/results/clientpositive/perf/tez/query14.q.out 00bc4cb026 
>   ql/src/test/results/clientpositive/perf/tez/query27.q.out 774c0fd192 
>   ql/src/test/results/clientpositive/perf/tez/query5.q.out 03980ac2c0 
>   ql/src/test/results/clientpositive/perf/tez/query77.q.out fcfc5a33bc 
>   ql/src/test/results/clientpositive/perf/tez/query80.q.out 3020b58781 
>   ql/src/test/results/clientpositive/topnkey_grouping_sets.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/topnkey_grouping_sets_functions.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/topnkey_grouping_sets_order.q.out 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/71932/diff/6/
> 
> 
> Testing
> ---
> 
> - New q test: topnkey_grouping_sets.q
> - Run `src/test/queries/clientpositive/perf/query*.q` tests with 
> TestTezPerfCliDriver, TestTezPerfConstraintsCliDriver
> 
> 
> Thanks,
> 
> Krisztian Kasa
> 
>



[jira] [Created] (HIVE-22702) ALTER TABLE REMOVE PARTITION is inefficient

2020-01-07 Thread Michael Chirico (Jira)
Michael Chirico created HIVE-22702:
--

 Summary: ALTER TABLE REMOVE PARTITION is inefficient
 Key: HIVE-22702
 URL: https://issues.apache.org/jira/browse/HIVE-22702
 Project: Hive
  Issue Type: Improvement
  Components: Database/Schema
Reporter: Michael Chirico


I recently realized the poor partitioning of a table of mine was becoming a 
major bottleneck and endeavored to reset the partitioning.

At this point, the table had about 56K partitions (year|month|day|city) 
combinations; moving to the more efficient year|month partitions means there's 
about 24.

In the process, I was having trouble fixing the registration of the table 
because of the size of its partition DB; I happened upon this SO Q&A which 
addresses the same issue:

https://stackoverflow.com/questions/50715939/drop-table-in-hive-via-spark-hangs/50814566#comment105440563_50814566

I set about batching through ALTER TABLE x DROP PARTITION (...), PARTITION 
(...) 200 at a time; it would run for about 2 hours to accomplish this, which 
strikes me as being quite inefficient.

(apologies that I haven't done a fully proper analysis of the scaling 
efficiency in this ticket)

If I were designing it from scratch, I would:

* Keep the database of existing partitions sorted
* Sort the incoming partitions to remove
* Iterate via "shrinking binary search" (each partition is searched with binary 
search, and we can eliminate from the existing DB anything "less than" the 
current index when moving to the next iteration)

Is there something preventing this from being achieved?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)