Yeah, I can do that, but my large table in the JOIN is really large and I'd like to avoid having to do that.
On Oct 23, 2009, at 8:09 PM, Ning Zhang wrote: > Yes, that's the plan. You can also try the workaround to remove > mapjoin hints. > > Ning > > On Oct 23, 2009, at 7:52 PM, Venky Iyer (JIRA) wrote: > >> >> [ >> https://issues.apache.org/jira/browse/HIVE-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769573#action_12769573 >> ] >> >> Venky Iyer commented on HIVE-900: >> --------------------------------- >> >> This is a high-priority bug for me, blocking me on fairly important >> stuff . The workaround that Dhruba had, of downloading data to the >> client and adding to the distributedcache is a pretty good solution. >> >>> Map-side join failed if there are large number of mappers >>> --------------------------------------------------------- >>> >>> Key: HIVE-900 >>> URL: https://issues.apache.org/jira/browse/HIVE-900 >>> Project: Hadoop Hive >>> Issue Type: Improvement >>> Reporter: Ning Zhang >>> Assignee: Ning Zhang >>> >>> Map-side join is efficient when joining a huge table with a small >>> table so that the mapper can read the small table into main memory >>> and do join on each mapper. However, if there are too many mappers >>> generated for the map join, a large number of mappers will >>> simultaneously send request to read the same block of the small >>> table. Currently Hadoop has a upper limit of the # of request of a >>> the same block (250?). If that is reached a BlockMissingException >>> will be thrown. That cause a lot of mappers been killed. Retry >>> won't solve but worsen the problem. >> >> -- >> This message is automatically generated by JIRA. >> - >> You can reply to this email to add a comment to the issue online. >> > -- Venky Iyer vi...@facebook.com