[jira] [Created] (HIVE-24819) CombineHiveInputFormat format seems to be returning row count in the multiple of Maps

Jitender Kumar (Jira) Tue, 23 Feb 2021 23:30:08 -0800

Jitender Kumar created HIVE-24819:
-------------------------------------

             Summary: CombineHiveInputFormat format seems to be returning row 
count in the multiple of Maps 
                 Key: HIVE-24819
                 URL: https://issues.apache.org/jira/browse/HIVE-24819
             Project: Hive
          Issue Type: Bug
         Environment: Apache Hive (version 3.1.0.3.1.0.0-78)
Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.0.3.1.0.0-78 by Apache Hive
            Reporter: Jitender Kumar



Hi Team,

This is the first time I am writing a bug using apache Jira, so pardon me if I 
am unintentionally breaking any protocols. 

I am facing the following issue (on a multi-node cluster) when I set 
hive.tez.input.format to  org.apache.hadoop.hive.ql.io.CombineHiveInputFormat. 

Just for demonstration purposes, I will be executing the following query for 
multiple cases. 

_select count(1) from dbname.personal_data_rc tablesample(1000 rows);_

*Case1*

mapred.map.tasks=2

hive.tez.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat

*Output*

1000

*Case 2*

mapred.map.tasks=2

hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat

*Output*

2000

*Case 3*

mapred.map.tasks=3

hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat

*Output*

3000

After 3 maps set as default, out remains same, i.e multiple of 3. 

Can you help me understand why if I have TABLESAMPLE set to 1000 rows, it is 
giving me more number of rows? Is there any other property that must be used 
with CombineHiveInputFormat or is it an issue with CombineHiveInputFormat only? 

I have tried to look for a solution but in the end i had to come here. Please 
share your inputs ASAP as one of our client is looking for a solution or 
explaination regarding this? 
For now as a workaround we have changed it to following.  
*hive.tez.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat*

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24819) CombineHiveInputFormat format seems to be returning row count in the multiple of Maps

Reply via email to