Yeah, I can do that, but my large table in the JOIN is really large  
and I'd like to avoid having to do that.


On Oct 23, 2009, at 8:09 PM, Ning Zhang wrote:

> Yes, that's the plan. You can also try the workaround to remove
> mapjoin hints.
>
> Ning
>
> On Oct 23, 2009, at 7:52 PM, Venky Iyer (JIRA) wrote:
>
>>
>>   [ 
>> https://issues.apache.org/jira/browse/HIVE-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769573#action_12769573
>> ]
>>
>> Venky Iyer commented on HIVE-900:
>> ---------------------------------
>>
>> This is a high-priority bug for me, blocking me on fairly important
>> stuff . The workaround that Dhruba had, of downloading data to the
>> client and adding to the distributedcache is a pretty good solution.
>>
>>> Map-side join failed if there are large number of mappers
>>> ---------------------------------------------------------
>>>
>>>               Key: HIVE-900
>>>               URL: https://issues.apache.org/jira/browse/HIVE-900
>>>           Project: Hadoop Hive
>>>        Issue Type: Improvement
>>>          Reporter: Ning Zhang
>>>          Assignee: Ning Zhang
>>>
>>> Map-side join is efficient when joining a huge table with a small
>>> table so that the mapper can read the small table into main memory
>>> and do join on each mapper. However, if there are too many mappers
>>> generated for the map join, a large number of mappers will
>>> simultaneously send request to read the same block of the small
>>> table. Currently Hadoop has a upper limit of the # of request of a
>>> the same block (250?). If that is reached a BlockMissingException
>>> will be thrown. That cause a lot of mappers been killed. Retry
>>> won't solve but worsen the problem.
>>
>> -- 
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>

--
Venky Iyer
vi...@facebook.com




Reply via email to