[
https://issues.apache.org/jira/browse/HIVE-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey Shelukhin updated HIVE-6418:
-----------------------------------
Attachment: HIVE-6418.WIP.patch
First cut.
Introduces an alternative container that basically has an array. Initially that
just stores context and all the un-serialized writables.
On access, it deserializes the writables. It knows the row count at that point
and can determine row length from the first deserialized row (assumes its the
same), so array represents a matrix with this row length.
For simple case of one row, it also serves as a list, so it can return itself
as that "row". Otherwise it returns a readonly sublist.
Works for Tez, because Tez doesn't have to serialize/deserialize the hashtable.
I am not sure the lazy part can be made to work for MR with its extra stage,
probably not, so MR uses old container.
WIP:
Need to get rid of index stored in each row, since unless rowCount is made
short it will round to 8 bytes I presume and it's really useless.
Also need to run more tests, I ran some tez tests
> MapJoinRowContainer has large memory overhead in typical cases
> --------------------------------------------------------------
>
> Key: HIVE-6418
> URL: https://issues.apache.org/jira/browse/HIVE-6418
> Project: Hive
> Issue Type: Improvement
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Attachments: HIVE-6418.WIP.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)