Hey Folks,
Could you please take a look at the below problem. We are hitting
OutOfMemoryErrors while joining tables that are not managed by Hive.
Would appreciate any feedback.
Thanks
Mehant
On 10/7/13 12:04 PM, Mehant Baid wrote:
Hey Folks,
We are using hive-0.11 and are hitting java.lang.OutOfMemoryError. The
problem seems to be in CommonJoinResolver.java (processCurrentTask()),
in this function we try and convert a map-reduce join to a map join if
'n-1' tables involved in a 'n' way join have a size below a certain
threshold.
If the tables are maintained by hive then we have accurate sizes of
each table and can apply this optimization but if the tables are
created using storage handlers, HBaseStorageHanlder in our case then
the size is set to be zero. Due to this we assume that we can apply
the optimization and convert the map-reduce join to a map join. So we
build a in-memory hash table for all the keys, since our table created
using the storage handler is large, it does not fit in memory and we
hit the error.
Should I open a JIRA for this? One way to fix this is to set the size
of the table (created using storage handler) to be equal to the map
join threshold. This way the table would be selected as the big table
and we can proceed with the optimization if other tables in the join
have size below the threshold. If we have multiple big tables then the
optimization would be turned off.
Thanks
Mehant