Rajesh Balamohan created HIVE-23936:
---------------------------------------
Summary: Provide approximate number of input records to be
processed in broadcast reader
Key: HIVE-23936
URL: https://issues.apache.org/jira/browse/HIVE-23936
Project: Hive
Issue Type: Bug
Reporter: Rajesh Balamohan
There are cases when broadcasted data is loaded into hashtable in upstream
applications (e.g Hive). Apps tends to predict the number of entries in the
hashtable diligently, but there are cases where these estimates can be very
complicated at compile time.
Tez can help in such cases, by providing "approximate number of input records
counter", to be processed in UnorderedKVInput. This is to avoid expensive
rehash when hashtable sizes are not estimated correctly. It would be good to
start with broadcast first and then to move on to unordered partitioned case
later.
This would help in predicting the number of entries at runtime & can get better
estimates for hashtable.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)