[jira] [Updated] (HIVE-6418) MapJoinRowContainer has large memory overhead in typical cases

2014-02-27 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6418:
---

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

committed to trunk

 MapJoinRowContainer has large memory overhead in typical cases
 --

 Key: HIVE-6418
 URL: https://issues.apache.org/jira/browse/HIVE-6418
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.13.0

 Attachments: HIVE-6418.01.patch, HIVE-6418.02.patch, 
 HIVE-6418.03.patch, HIVE-6418.04.patch, HIVE-6418.04.patch, 
 HIVE-6418.05.patch, HIVE-6418.WIP.patch, HIVE-6418.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6418) MapJoinRowContainer has large memory overhead in typical cases

2014-02-23 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6418:
---

Attachment: HIVE-6418.05.patch

add changes to non-tez out file. I think someone forgot to update tez files 
cause there are small unrelated changes in all outputs (database name pre/post 
hook), but they all pass

 MapJoinRowContainer has large memory overhead in typical cases
 --

 Key: HIVE-6418
 URL: https://issues.apache.org/jira/browse/HIVE-6418
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6418.01.patch, HIVE-6418.02.patch, 
 HIVE-6418.03.patch, HIVE-6418.04.patch, HIVE-6418.04.patch, 
 HIVE-6418.05.patch, HIVE-6418.WIP.patch, HIVE-6418.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6418) MapJoinRowContainer has large memory overhead in typical cases

2014-02-20 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6418:
---

Attachment: HIVE-6418.04.patch

update, remove abstractlist
one tez test passed, will run the rest

 MapJoinRowContainer has large memory overhead in typical cases
 --

 Key: HIVE-6418
 URL: https://issues.apache.org/jira/browse/HIVE-6418
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6418.01.patch, HIVE-6418.02.patch, 
 HIVE-6418.03.patch, HIVE-6418.04.patch, HIVE-6418.WIP.patch, HIVE-6418.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6418) MapJoinRowContainer has large memory overhead in typical cases

2014-02-20 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6418:
---

Attachment: HIVE-6418.04.patch

 MapJoinRowContainer has large memory overhead in typical cases
 --

 Key: HIVE-6418
 URL: https://issues.apache.org/jira/browse/HIVE-6418
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6418.01.patch, HIVE-6418.02.patch, 
 HIVE-6418.03.patch, HIVE-6418.04.patch, HIVE-6418.04.patch, 
 HIVE-6418.WIP.patch, HIVE-6418.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6418) MapJoinRowContainer has large memory overhead in typical cases

2014-02-18 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6418:
---

Attachment: HIVE-6418.03.patch

Added config setting

 MapJoinRowContainer has large memory overhead in typical cases
 --

 Key: HIVE-6418
 URL: https://issues.apache.org/jira/browse/HIVE-6418
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6418.01.patch, HIVE-6418.02.patch, 
 HIVE-6418.03.patch, HIVE-6418.WIP.patch, HIVE-6418.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6418) MapJoinRowContainer has large memory overhead in typical cases

2014-02-13 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6418:
---

Attachment: HIVE-6418.01.patch

Added correct handling on 0-length rows (main issue), got rid of some array 
allocations when deserializing, other minor changes and fixes. I have run 
minitez tests and they passed except for mapjoin_mapjoin; with this patch, that 
passes, I am rerunning all tests now.


 MapJoinRowContainer has large memory overhead in typical cases
 --

 Key: HIVE-6418
 URL: https://issues.apache.org/jira/browse/HIVE-6418
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6418.01.patch, HIVE-6418.WIP.patch, HIVE-6418.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6418) MapJoinRowContainer has large memory overhead in typical cases

2014-02-13 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6418:
---

Attachment: HIVE-6418.02.patch

tiny fix to serde to make last tez tests pass

 MapJoinRowContainer has large memory overhead in typical cases
 --

 Key: HIVE-6418
 URL: https://issues.apache.org/jira/browse/HIVE-6418
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6418.01.patch, HIVE-6418.02.patch, 
 HIVE-6418.WIP.patch, HIVE-6418.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6418) MapJoinRowContainer has large memory overhead in typical cases

2014-02-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6418:
---

Attachment: HIVE-6418.WIP.patch

First cut.
Introduces an alternative container that basically has an array. Initially that 
just stores context and all the un-serialized writables.
On access, it deserializes the writables. It knows the row count at that point 
and can determine row length from the first deserialized row (assumes its the 
same), so array represents a matrix with this row length.
For simple case of one row, it also serves as a list, so it can return itself 
as that row. Otherwise it returns a readonly sublist.
Works for Tez, because Tez doesn't have to serialize/deserialize the hashtable. 
I am not sure the lazy part can be made to work for MR with its extra stage, 
probably not, so MR uses old container.

WIP:
Need to get rid of index stored in each row, since unless rowCount is made 
short it will round to 8 bytes I presume and it's really useless. 
Also need to run more tests, I ran some tez tests

 MapJoinRowContainer has large memory overhead in typical cases
 --

 Key: HIVE-6418
 URL: https://issues.apache.org/jira/browse/HIVE-6418
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6418.WIP.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6418) MapJoinRowContainer has large memory overhead in typical cases

2014-02-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6418:
---

Attachment: HIVE-6418.patch

Fix some issues. Need to figure out whether the lazy part provides benefits, 
now that it requires byte array copy. It would really depend on how many 
entries actually get used, perhaps it could be configurable

 MapJoinRowContainer has large memory overhead in typical cases
 --

 Key: HIVE-6418
 URL: https://issues.apache.org/jira/browse/HIVE-6418
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6418.WIP.patch, HIVE-6418.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6418) MapJoinRowContainer has large memory overhead in typical cases

2014-02-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6418:
---

Status: Patch Available  (was: Open)

 MapJoinRowContainer has large memory overhead in typical cases
 --

 Key: HIVE-6418
 URL: https://issues.apache.org/jira/browse/HIVE-6418
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6418.WIP.patch, HIVE-6418.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)