Nedzad Campara created HIVE-25588:
-------------------------------------

             Summary: Hive 2.3.3 Fetch Task threshold not respected
                 Key: HIVE-25588
                 URL: https://issues.apache.org/jira/browse/HIVE-25588
             Project: Hive
          Issue Type: Bug
          Components: Physical Optimizer
    Affects Versions: 2.3.3
            Reporter: Nedzad Campara


So it seems that "hive.fetch.task.conversion.threshold" is not respected in 
Hive 2.3.3, and basically it will always do a Fetch Task, irrelevant of the 
input size, as long as the conditions are met for either "more" or "minimal" 
setting of "hive.fetch.task.conversion".

Apologies if this has been reported already, but I could not find any issues 
which mention this specifically.

The way to reproduce is to set "hive.fetch.task.conversion.threshold=1", which 
to my understanding should basically always trigger an MR/Tez job, but it does 
not, and instead does a fetch task.

Tested on various tables from dozens of GB in size to dozens of TBs  in size 
with hundreds and thousands partitions, in ORC and Parquet format. Example 
table size from statistics:

| Table Parameters: | NULL | NULL |
| | EXTERNAL | TRUE |
| | numFiles | 234258 |
| | numPartitions | 171898 |
| | numRows | 1719836838331 |
| | rawDataSize | 515766839727247 |
| | totalSize | 189367471403333 | 


Please let me know if any additional information is required.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to