[ 
https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726497#comment-13726497
 ] 

Ashutosh Chauhan commented on HIVE-4838:
----------------------------------------

bq. The current code is using this static code because by using java 
serialization there is no way to pass any "context" information down to the 
class when the read/write methods are being called. In the new patch I define 
my own read/write methods 

By tracking metadata info per key, will it going to increase the size of 
hashtable? 
Earlier, metadata info is passed as one blob and loaded statically which can be 
looked by every key. Agreed it is not the clean way of doing it, but now this 
patch is storing metadata info per key, looks like this will increase the size 
of hashtable.
                
> Refactor MapJoin HashMap code to improve testability and readability
> --------------------------------------------------------------------
>
>                 Key: HIVE-4838
>                 URL: https://issues.apache.org/jira/browse/HIVE-4838
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Brock Noland
>            Assignee: Brock Noland
>         Attachments: HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch, 
> HIVE-4838.patch, HIVE-4838.patch
>
>
> MapJoin is an essential component for high performance joins in Hive and the 
> current code has done great service for many years. However, the code is 
> showing it's age and currently suffers  from the following issues:
> * Uses static state via the MapJoinMetaData class to pass serialization 
> metadata to the Key, Row classes.
> * The api of a logical "Table Container" is not defined and therefore it's 
> unclear what apis HashMapWrapper 
> needs to publicize. Additionally HashMapWrapper has many used public methods.
> * HashMapWrapper contains logic to serialize, test memory bounds, and 
> implement the table container. Ideally these logical units could be seperated
> * HashTableSinkObjectCtx has unused fields and unused methods
> * CommonJoinOperator and children use ArrayList on left hand side when only 
> List is required
> * There are unused classes MRU, DCLLItemm and classes which duplicate 
> functionality MapJoinSingleKey and MapJoinDoubleKeys

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to