Nedzad Campara created HIVE-25588:
-------------------------------------
Summary: Hive 2.3.3 Fetch Task threshold not respected
Key: HIVE-25588
URL: https://issues.apache.org/jira/browse/HIVE-25588
Project: Hive
Issue Type: Bug
Components: Physical Optimizer
Affects Versions: 2.3.3
Reporter: Nedzad Campara
So it seems that "hive.fetch.task.conversion.threshold" is not respected in
Hive 2.3.3, and basically it will always do a Fetch Task, irrelevant of the
input size, as long as the conditions are met for either "more" or "minimal"
setting of "hive.fetch.task.conversion".
Apologies if this has been reported already, but I could not find any issues
which mention this specifically.
The way to reproduce is to set "hive.fetch.task.conversion.threshold=1", which
to my understanding should basically always trigger an MR/Tez job, but it does
not, and instead does a fetch task.
Tested on various tables from dozens of GB in size to dozens of TBs in size
with hundreds and thousands partitions, in ORC and Parquet format. Example
table size from statistics:
| Table Parameters: | NULL | NULL |
| | EXTERNAL | TRUE |
| | numFiles | 234258 |
| | numPartitions | 171898 |
| | numRows | 1719836838331 |
| | rawDataSize | 515766839727247 |
| | totalSize | 189367471403333 |
Please let me know if any additional information is required.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)