[ 
https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13705401#comment-13705401
 ] 

Brock Noland commented on HIVE-4838:
------------------------------------

Hey thanks for the feedback!

Yes I thought about those items as well. I have a patch just about ready, which 
I'd like to get in before the optimizations since it fixes some correctness 
bugs but I'd love to per-sue those two items in a follow up jira.  For example, 
the following code produces unexpected results :)

{noformat}
  public static void main(String[] args) {
    MapJoinDoubleKeys left = new MapJoinDoubleKeys(148, null);
    MapJoinDoubleKeys right = new MapJoinDoubleKeys(148, null);
    System.out.println(left.equals(right));
    MapJoinObjectKey left = new MapJoinObjectKey(new Object[]{null, "left"});
    MapJoinObjectKey right = new MapJoinObjectKey(new Object[]{null, "right"});
    System.out.println(left.equals(right));
  }
{noformat}
                
> Refactor MapJoin HashMap code to improve testability and readability
> --------------------------------------------------------------------
>
>                 Key: HIVE-4838
>                 URL: https://issues.apache.org/jira/browse/HIVE-4838
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Brock Noland
>            Assignee: Brock Noland
>
> MapJoin is an essential component for high performance joins in Hive and the 
> current code has done great service for many years. However, the code is 
> showing it's age and currently suffers  from the following issues:
> * Uses static state via the MapJoinMetaData class to pass serialization 
> metadata to the Key, Row classes.
> * The api of a logical "Table Container" is not defined and therefore it's 
> unclear what apis HashMapWrapper 
> needs to publicize. Additionally HashMapWrapper has many used public methods.
> * HashMapWrapper contains logic to serialize, test memory bounds, and 
> implement the table container. Ideally these logical units could be seperated
> * HashTableSinkObjectCtx has unused fields and unused methods
> * CommonJoinOperator and children use ArrayList on left hand side when only 
> List is required
> * There are unused classes MRU, DCLLItemm and classes which duplicate 
> functionality MapJoinSingleKey and MapJoinDoubleKeys

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to