[ 
https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13732293#comment-13732293
 ] 

Ashutosh Chauhan commented on HIVE-4838:
----------------------------------------

Actually memory monitoring I was talking of was about local task which 
generates hashtable which happens locally on client. To generate a hashtable 
(which is then ship to task nodes) we launch local job on client in separate 
process. Logic of memory management for this local task is convoluted (not of 
MR job which actually does the join in mapper). This local task monitors its 
own memory, but seems like MapredLocalTask is catching OOM exception anyways. 
One of this is not required. My thinking is there shouldn't be any memory 
monitoring and we should just catch OOM exception when it fails. Anyways join 
is converted into mapjoin only when size of small table is small (governed by 
config knob), so this OOM should be very very rare. So, my suggestion is to 
remove MemoryHandler altogether.

ORC memory manger won't be a problem here, since ORC makes use of memory 
manager only while writing data and here we are dumping hashtable in java 
serialized format, so that wont be relevant. For similar reason (that this is 
local task) java.opts and io.sort.mb arent relevant either. 
                
> Refactor MapJoin HashMap code to improve testability and readability
> --------------------------------------------------------------------
>
>                 Key: HIVE-4838
>                 URL: https://issues.apache.org/jira/browse/HIVE-4838
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Brock Noland
>            Assignee: Brock Noland
>         Attachments: HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch, 
> HIVE-4838.patch, HIVE-4838.patch
>
>
> MapJoin is an essential component for high performance joins in Hive and the 
> current code has done great service for many years. However, the code is 
> showing it's age and currently suffers  from the following issues:
> * Uses static state via the MapJoinMetaData class to pass serialization 
> metadata to the Key, Row classes.
> * The api of a logical "Table Container" is not defined and therefore it's 
> unclear what apis HashMapWrapper 
> needs to publicize. Additionally HashMapWrapper has many used public methods.
> * HashMapWrapper contains logic to serialize, test memory bounds, and 
> implement the table container. Ideally these logical units could be seperated
> * HashTableSinkObjectCtx has unused fields and unused methods
> * CommonJoinOperator and children use ArrayList on left hand side when only 
> List is required
> * There are unused classes MRU, DCLLItemm and classes which duplicate 
> functionality MapJoinSingleKey and MapJoinDoubleKeys

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to