[ 
https://issues.apache.org/jira/browse/HIVE-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771569#action_12771569
 ] 

Ning Zhang commented on HIVE-900:
---------------------------------

@parasad, yes that's definitely a good idea to scale out mapjoin with a large 
number of mappers. Dhruba also suggested to increase the replication factor for 
the small file. But as you mentioned, we need to revert the replication factor 
before mapjoin finishes or any exception is caught. I'll also investigate that. 

> Map-side join failed if there are large number of mappers
> ---------------------------------------------------------
>
>                 Key: HIVE-900
>                 URL: https://issues.apache.org/jira/browse/HIVE-900
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>
> Map-side join is efficient when joining a huge table with a small table so 
> that the mapper can read the small table into main memory and do join on each 
> mapper. However, if there are too many mappers generated for the map join, a 
> large number of mappers will simultaneously send request to read the same 
> block of the small table. Currently Hadoop has a upper limit of the # of 
> request of a the same block (250?). If that is reached a 
> BlockMissingException will be thrown. That cause a lot of mappers been 
> killed. Retry won't solve but worsen the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to