[jira] [Created] (HIVE-22987) ClassCastException in VectorCoalesce when DataTypePhysicalVariation is null

2020-03-05 Thread Ramesh Kumar Thangarajan (Jira)
Ramesh Kumar Thangarajan created HIVE-22987:
---

 Summary: ClassCastException in VectorCoalesce when 
DataTypePhysicalVariation is null
 Key: HIVE-22987
 URL: https://issues.apache.org/jira/browse/HIVE-22987
 Project: Hive
  Issue Type: Bug
Reporter: Ramesh Kumar Thangarajan
Assignee: Ramesh Kumar Thangarajan


ClassCastException in VectorCoalesce when DataTypePhysicalVariation is null



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22986) Prevent Decimal64 to Decimal conversion when other operations support Decimal64

2020-03-05 Thread Ramesh Kumar Thangarajan (Jira)
Ramesh Kumar Thangarajan created HIVE-22986:
---

 Summary: Prevent Decimal64 to Decimal conversion when other 
operations support Decimal64
 Key: HIVE-22986
 URL: https://issues.apache.org/jira/browse/HIVE-22986
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Reporter: Ramesh Kumar Thangarajan
Assignee: Ramesh Kumar Thangarajan


Prevent Decimal64 to Decimal conversion when other operations support Decimal64



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72200: TopN Key efficiency check might disable filter too soon

2020-03-05 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72200/#review219806
---




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Line 2421 (original), 2421 (patched)


i think we shall check for atleast 100K rows before turning this off, so 
checking for 10K batches make more sense to me. So, lets have default as 10K 
here.


- Ashutosh Chauhan


On March 5, 2020, 2:22 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72200/
> ---
> 
> (Updated March 5, 2020, 2:22 p.m.)
> 
> 
> Review request for hive, Gopal V, Jesús Camacho Rodríguez, Krisztian Kasa, 
> and Rajesh Balamohan.
> 
> 
> Bugs: HIVE-22982
> https://issues.apache.org/jira/browse/HIVE-22982
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The check is triggered after every n batches but there can be multiple 
> filters, one for each partition. Some filters might have less data then the 
> others.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7ea2de9019c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java f09867bb4e8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 
> 0f8eb173c66 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestTopNKeyFilter.java 
> a91bc7354a7 
> 
> 
> Diff: https://reviews.apache.org/r/72200/diff/1/
> 
> 
> Testing
> ---
> 
> manually
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



[jira] [Created] (HIVE-22985) Failed compaction always throws TxnAbortedException

2020-03-05 Thread Karen Coppage (Jira)
Karen Coppage created HIVE-22985:


 Summary: Failed compaction always throws TxnAbortedException
 Key: HIVE-22985
 URL: https://issues.apache.org/jira/browse/HIVE-22985
 Project: Hive
  Issue Type: Bug
Reporter: Karen Coppage
Assignee: Karen Coppage


If compaction fails, its txn is aborted, however Worker attempts to commit it 
again in a finally statement. This results in a TxnAbortedException [1] thrown 
from TxnHandler#commitTxn

We need to add a check and only try to commit at the end if the txn is not 
aborted.(TxnHandler#commitTxn does nothing if txn is already committed.)

[1]
{code:java}
ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler - 
TxnAbortedException(message:Transaction txnid:16 already aborted)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.raiseTxnUnexpectedState(TxnHandler.java:4843)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.commitTxn(TxnHandler.java:1141)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.commit_txn(HiveMetaStore.java:8101)
...
at 
org.apache.hadoop.hive.ql.txn.compactor.Worker.commitTxn(Worker.java:291)
at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:269)

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Review Request 72200: TopN Key efficiency check might disable filter too soon

2020-03-05 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72200/
---

Review request for hive, Gopal V, Jesús Camacho Rodríguez, Krisztian Kasa, and 
Rajesh Balamohan.


Bugs: HIVE-22982
https://issues.apache.org/jira/browse/HIVE-22982


Repository: hive-git


Description
---

The check is triggered after every n batches but there can be multiple filters, 
one for each partition. Some filters might have less data then the others.


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7ea2de9019c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java f09867bb4e8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 
0f8eb173c66 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestTopNKeyFilter.java a91bc7354a7 


Diff: https://reviews.apache.org/r/72200/diff/1/


Testing
---

manually


Thanks,

Attila Magyar



Re: Review Request 72193: HIVE-22977: Merge delta files instead of running a query in major/minor compaction

2020-03-05 Thread Karen Coppage via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72193/#review219795
---



How feasible would it be to launch this process from CompactorMR instead of the 
QueryCompactor implementations, and fall back to QueryCompactors if it fails? 
Because it's not exactly a "query compaction" but more of a 
"FileMergeCompaction"...


ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/OrcFileMerger.java
Lines 39 (patched)


Should this class be public?



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/OrcFileMerger.java
Lines 62-64 (patched)


If the readers aren't compatible, compaction fails silently? Consider 
returning a boolean or throwing an exception and falling back to query-based 
compaction.



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/OrcFileMerger.java
Lines 67 (patched)


Does the Reader need to be closed as well? Not sure



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/OrcFileMerger.java
Lines 76 (patched)


nit: if nextBatch(batch) is true, will batch ever be null?
Idk, at the end of the day I guess it's good to be on the safe side:)



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/OrcFileMerger.java
Lines 104 (patched)


Output path should be logged once, not every time setupWriter is called, 
unless outPath's value changes somewhere...



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/QueryCompactor.java
Lines 220 (patched)


Since there's no need to check for delete dirs for insert-only tables, 
maybe 
(a) skip the delete dir check in this function or
(b) split this function into 2: hasDeletes and hasAborted



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/QueryCompactor.java
Lines 227 (patched)


Wouldn't it be more efficient to check
!directory.getAbortedDirectories().isEmpty()
at the very beginning of this function, before starting all the streaming 
and filtering?



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/QueryCompactor.java
Lines 293 (patched)


Idea:
Could use this in 
org.apache.hadoop.hive.ql.txn.compactor.MinorQueryCompactor#getCreateQueries
for delta creation. Or possibly for both deltas and delete deltas, if you 
added a param.



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/QueryCompactor.java
Lines 314-315 (patched)


Consider debug/info log message about merge starting



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/QueryCompactor.java
Lines 318 (patched)


nit: rename e to something meaningful like listOfDeltaPaths


- Karen Coppage


On March 4, 2020, 3:31 p.m., Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72193/
> ---
> 
> (Updated March 4, 2020, 3:31 p.m.)
> 
> 
> Review request for hive, Karen Coppage and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-22977: Merge delta files instead of running a query in major/minor 
> compaction
> 
> 
> Diffs
> -
> 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorOnTezTest.java
>  78174f345b35709cd654aa81578ab598e0d9ed9c 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
>  9659a3f0481dcb2446b197688459f0c1dba867fa 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestMmCompactorOnTez.java
>  074430ce7fa0f0617e8fb50c334c14f33cc74d8a 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> c44f2b5026558c9f0a7d6fa03cb6950f24b77da2 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MajorQueryCompactor.java 
> 93850807137a4cfbd49beb256624b11801bd08d1 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MinorQueryCompactor.java 
> 01cd2fc93d12002249253added06df70b0c40181 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MmMajorQueryCompactor.java
>  41fdd7e210bfc42c3e41e9f1240d34a51add33a9 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MmMinorQueryCompactor.java
>  feb667cba960c0fdd19c030235eb31ebddfa7ca1 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/OrcFileMerger.java 
> PRE-CREATION 
>   

[jira] [Created] (HIVE-22984) Optimise FetchOperator when fetching large number of records

2020-03-05 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-22984:
---

 Summary: Optimise FetchOperator when fetching large number of 
records
 Key: HIVE-22984
 URL: https://issues.apache.org/jira/browse/HIVE-22984
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Rajesh Balamohan
 Attachments: image-2020-03-05-19-11-16-318.png

!image-2020-03-05-19-11-16-318.png|width=676,height=456!

 
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java#L149]

 
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java#L545]
 
 
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22983) Address the comments on ConstantPropagate

2020-03-05 Thread Zhihua Deng (Jira)
Zhihua Deng created HIVE-22983:
--

 Summary: Address the comments on ConstantPropagate
 Key: HIVE-22983
 URL: https://issues.apache.org/jira/browse/HIVE-22983
 Project: Hive
  Issue Type: Improvement
  Components: Logical Optimizer
Reporter: Zhihua Deng


The constantPropagate traverse the DAG from root to child, the child won’t 
start until all his parents have been visited.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22982) TopN Key efficiency check might disable filter too soon

2020-03-05 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-22982:


 Summary: TopN Key efficiency check might disable filter too soon
 Key: HIVE-22982
 URL: https://issues.apache.org/jira/browse/HIVE-22982
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 4.0.0
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0


The check is triggered after every n batches but there can be multiple filters, 
one for each partition. Some filters might have less data then the others.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Review Request 72199: HIVE-22980 Support custom path filter for ORC tables

2020-03-05 Thread Oleksiy Sayankin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72199/
---

Review request for hive.


Repository: hive-git


Description
---

Initial commit


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java dbbe6f1ec5 
  ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java 9e6d47ebc5 


Diff: https://reviews.apache.org/r/72199/diff/1/


Testing
---


Thanks,

Oleksiy Sayankin



[jira] [Created] (HIVE-22981) DataFileReader is not closed in AvroGenericRecordReader#extractWriterTimezoneFromMetadata

2020-03-05 Thread Karen Coppage (Jira)
Karen Coppage created HIVE-22981:


 Summary: DataFileReader is not closed in 
AvroGenericRecordReader#extractWriterTimezoneFromMetadata
 Key: HIVE-22981
 URL: https://issues.apache.org/jira/browse/HIVE-22981
 Project: Hive
  Issue Type: Bug
Reporter: Karen Coppage
Assignee: Karen Coppage


Method looks like :

{code}
 private ZoneId extractWriterTimezoneFromMetadata(JobConf job, FileSplit split,
  GenericDatumReader gdr) throws IOException {
if (job == null || gdr == null || split == null || split.getPath() == null) 
{
  return null;
}
try {
  DataFileReader dataFileReader =
  new DataFileReader(new FsInput(split.getPath(), job), 
gdr);
  [...return...]
  }
} catch (IOException e) {
  // Can't access metadata, carry on.
}
return null;
  }
{code}

The DataFileReader is never closed which can cause a memory leak. We need a 
try-with-resources here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22980) Support custom path filter for ORC tables

2020-03-05 Thread Oleksiy Sayankin (Jira)
Oleksiy Sayankin created HIVE-22980:
---

 Summary: Support custom path filter for ORC tables
 Key: HIVE-22980
 URL: https://issues.apache.org/jira/browse/HIVE-22980
 Project: Hive
  Issue Type: New Feature
  Components: ORC
Reporter: Oleksiy Sayankin
Assignee: Oleksiy Sayankin


The customer is looking for an option to specify custom path filter for ORC 
tables. Please find the details below from customer requirement.

Problem Statement/Approach in customer words :

{quote} 
Currently, Orc file input format does not take in path filters set in the 
property "mapreduce.input.pathfilter.class" OR " mapred.input.pathfilter.class 
". So, we cannot use custom filters with Orc files. 

AcidUtils class has a static filter called "hiddenFilters" which is used by ORC 
to filter input paths. If we can pass the custom filter classes(set in the 
property mentioned above) to AcidUtils and replace hiddenFilter with a filter 
that does an "and" operation over hiddenFilter+customFilters, the filters would 
work well.

On local testing, mapreduce.input.pathfilter.class seems to be working for Text 
tables but not for ORC tables.
{quote}

Our analysis:

{{OrcInputFormat}} and {{FileInputFormat}} are different implementations for 
{{Inputformat}} interface. Property "{{mapreduce.input.pathfilter.class}}" is 
only respected by {{FileInputFormat}}, but not by any other implementations of 
{{InputFormat}}. The customer wants to have the ability to filter out rows 
based on path/filenames, current ORC features like bloomfilters and indexes are 
not good enough for them to minimize number of disk read operations.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22979) Support total file size in statistics annotation

2020-03-05 Thread Prasanth Jayachandran (Jira)
Prasanth Jayachandran created HIVE-22979:


 Summary: Support total file size in statistics annotation
 Key: HIVE-22979
 URL: https://issues.apache.org/jira/browse/HIVE-22979
 Project: Hive
  Issue Type: Improvement
Affects Versions: 4.0.0
Reporter: Prasanth Jayachandran


Hive statistics annotation provide estimated Statistics for each operator. The 
data size provided in TableScanOperator is raw data size (after decompression 
and decoding), but there are some optimizations that can be performed based on 
total file size on disk (scan cost estimation).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)