[jira] [Updated] (HIVE-6418) MapJoinRowContainer has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6418: --- Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) committed to trunk MapJoinRowContainer has large memory overhead in typical cases -- Key: HIVE-6418 URL: https://issues.apache.org/jira/browse/HIVE-6418 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.13.0 Attachments: HIVE-6418.01.patch, HIVE-6418.02.patch, HIVE-6418.03.patch, HIVE-6418.04.patch, HIVE-6418.04.patch, HIVE-6418.05.patch, HIVE-6418.WIP.patch, HIVE-6418.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6418) MapJoinRowContainer has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6418: --- Attachment: HIVE-6418.05.patch add changes to non-tez out file. I think someone forgot to update tez files cause there are small unrelated changes in all outputs (database name pre/post hook), but they all pass MapJoinRowContainer has large memory overhead in typical cases -- Key: HIVE-6418 URL: https://issues.apache.org/jira/browse/HIVE-6418 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6418.01.patch, HIVE-6418.02.patch, HIVE-6418.03.patch, HIVE-6418.04.patch, HIVE-6418.04.patch, HIVE-6418.05.patch, HIVE-6418.WIP.patch, HIVE-6418.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6418) MapJoinRowContainer has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6418: --- Attachment: HIVE-6418.04.patch update, remove abstractlist one tez test passed, will run the rest MapJoinRowContainer has large memory overhead in typical cases -- Key: HIVE-6418 URL: https://issues.apache.org/jira/browse/HIVE-6418 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6418.01.patch, HIVE-6418.02.patch, HIVE-6418.03.patch, HIVE-6418.04.patch, HIVE-6418.WIP.patch, HIVE-6418.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6418) MapJoinRowContainer has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6418: --- Attachment: HIVE-6418.04.patch MapJoinRowContainer has large memory overhead in typical cases -- Key: HIVE-6418 URL: https://issues.apache.org/jira/browse/HIVE-6418 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6418.01.patch, HIVE-6418.02.patch, HIVE-6418.03.patch, HIVE-6418.04.patch, HIVE-6418.04.patch, HIVE-6418.WIP.patch, HIVE-6418.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6418) MapJoinRowContainer has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6418: --- Attachment: HIVE-6418.03.patch Added config setting MapJoinRowContainer has large memory overhead in typical cases -- Key: HIVE-6418 URL: https://issues.apache.org/jira/browse/HIVE-6418 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6418.01.patch, HIVE-6418.02.patch, HIVE-6418.03.patch, HIVE-6418.WIP.patch, HIVE-6418.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6418) MapJoinRowContainer has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6418: --- Attachment: HIVE-6418.01.patch Added correct handling on 0-length rows (main issue), got rid of some array allocations when deserializing, other minor changes and fixes. I have run minitez tests and they passed except for mapjoin_mapjoin; with this patch, that passes, I am rerunning all tests now. MapJoinRowContainer has large memory overhead in typical cases -- Key: HIVE-6418 URL: https://issues.apache.org/jira/browse/HIVE-6418 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6418.01.patch, HIVE-6418.WIP.patch, HIVE-6418.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6418) MapJoinRowContainer has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6418: --- Attachment: HIVE-6418.02.patch tiny fix to serde to make last tez tests pass MapJoinRowContainer has large memory overhead in typical cases -- Key: HIVE-6418 URL: https://issues.apache.org/jira/browse/HIVE-6418 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6418.01.patch, HIVE-6418.02.patch, HIVE-6418.WIP.patch, HIVE-6418.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6418) MapJoinRowContainer has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6418: --- Attachment: HIVE-6418.WIP.patch First cut. Introduces an alternative container that basically has an array. Initially that just stores context and all the un-serialized writables. On access, it deserializes the writables. It knows the row count at that point and can determine row length from the first deserialized row (assumes its the same), so array represents a matrix with this row length. For simple case of one row, it also serves as a list, so it can return itself as that row. Otherwise it returns a readonly sublist. Works for Tez, because Tez doesn't have to serialize/deserialize the hashtable. I am not sure the lazy part can be made to work for MR with its extra stage, probably not, so MR uses old container. WIP: Need to get rid of index stored in each row, since unless rowCount is made short it will round to 8 bytes I presume and it's really useless. Also need to run more tests, I ran some tez tests MapJoinRowContainer has large memory overhead in typical cases -- Key: HIVE-6418 URL: https://issues.apache.org/jira/browse/HIVE-6418 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6418.WIP.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6418) MapJoinRowContainer has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6418: --- Attachment: HIVE-6418.patch Fix some issues. Need to figure out whether the lazy part provides benefits, now that it requires byte array copy. It would really depend on how many entries actually get used, perhaps it could be configurable MapJoinRowContainer has large memory overhead in typical cases -- Key: HIVE-6418 URL: https://issues.apache.org/jira/browse/HIVE-6418 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6418.WIP.patch, HIVE-6418.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6418) MapJoinRowContainer has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6418: --- Status: Patch Available (was: Open) MapJoinRowContainer has large memory overhead in typical cases -- Key: HIVE-6418 URL: https://issues.apache.org/jira/browse/HIVE-6418 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6418.WIP.patch, HIVE-6418.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)