[ https://issues.apache.org/jira/browse/HIVE-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769573#action_12769573 ]
Venky Iyer commented on HIVE-900: --------------------------------- This is a high-priority bug for me, blocking me on fairly important stuff . The workaround that Dhruba had, of downloading data to the client and adding to the distributedcache is a pretty good solution. > Map-side join failed if there are large number of mappers > --------------------------------------------------------- > > Key: HIVE-900 > URL: https://issues.apache.org/jira/browse/HIVE-900 > Project: Hadoop Hive > Issue Type: Improvement > Reporter: Ning Zhang > Assignee: Ning Zhang > > Map-side join is efficient when joining a huge table with a small table so > that the mapper can read the small table into main memory and do join on each > mapper. However, if there are too many mappers generated for the map join, a > large number of mappers will simultaneously send request to read the same > block of the small table. Currently Hadoop has a upper limit of the # of > request of a the same block (250?). If that is reached a > BlockMissingException will be thrown. That cause a lot of mappers been > killed. Retry won't solve but worsen the problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.