[jira] [Commented] (CASSANDRA-5506) Reduce memory consumption of IndexSummary

2013-04-28 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643917#comment-13643917
 ] 

Jonathan Ellis commented on CASSANDRA-5506:
---

I'm pretty comfortable switching the representation for 1.2.5; let's make a 
separate ticket to move off heap for 2.0.

 Reduce memory consumption of IndexSummary
 -

 Key: CASSANDRA-5506
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5506
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Nick Puz
Assignee: Jonathan Ellis
 Fix For: 1.2.5


 I am evaluating cassandra for a use case with many tiny rows which would 
 result in a node with 1-3TB of storage having billions of rows. Before 
 loading that much data I am hitting GC issues and when looking at the heap 
 dump I noticed that 70+% of the memory was used by IndexSummaries. 
 The two major issues seem to be:
 1) that the positions are stored as an ArrayListLong which results in each 
 position taking 24 bytes (class + flags + 8 byte long). This might make sense 
 when the file is initially written but once it has been serialized it would 
 be a lot more memory efficient to just have an long[] (really a int[] would 
 be fine unless 2GB sstables are allowed).
 2) The DecoratedKey for a byte[16] key takes 195 bytes -- this is for the 
 overhead of the ByteBuffer in the key and overhead in the token.
 To somewhat work around the problem I have increased index_sample but will 
 this many rows that didn't really help starts to have diminishing returns. 
 NOTE: This heap dump was from linux with a 64bit oracle vm. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5506) Reduce memory consumption of IndexSummary

2013-04-28 Thread Vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643920#comment-13643920
 ] 

Vijay commented on CASSANDRA-5506:
--

+1 for the patch and +1 for a separate ticket.

 Reduce memory consumption of IndexSummary
 -

 Key: CASSANDRA-5506
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5506
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Nick Puz
Assignee: Jonathan Ellis
 Fix For: 1.2.5


 I am evaluating cassandra for a use case with many tiny rows which would 
 result in a node with 1-3TB of storage having billions of rows. Before 
 loading that much data I am hitting GC issues and when looking at the heap 
 dump I noticed that 70+% of the memory was used by IndexSummaries. 
 The two major issues seem to be:
 1) that the positions are stored as an ArrayListLong which results in each 
 position taking 24 bytes (class + flags + 8 byte long). This might make sense 
 when the file is initially written but once it has been serialized it would 
 be a lot more memory efficient to just have an long[] (really a int[] would 
 be fine unless 2GB sstables are allowed).
 2) The DecoratedKey for a byte[16] key takes 195 bytes -- this is for the 
 overhead of the ByteBuffer in the key and overhead in the token.
 To somewhat work around the problem I have increased index_sample but will 
 this many rows that didn't really help starts to have diminishing returns. 
 NOTE: This heap dump was from linux with a 64bit oracle vm. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5506) Reduce memory consumption of IndexSummary

2013-04-28 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644030#comment-13644030
 ] 

Jonathan Ellis commented on CASSANDRA-5506:
---

committed; created CASSANDRA-5521 for off-heap feature.

 Reduce memory consumption of IndexSummary
 -

 Key: CASSANDRA-5506
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5506
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Nick Puz
Assignee: Jonathan Ellis
 Fix For: 1.2.5


 I am evaluating cassandra for a use case with many tiny rows which would 
 result in a node with 1-3TB of storage having billions of rows. Before 
 loading that much data I am hitting GC issues and when looking at the heap 
 dump I noticed that 70+% of the memory was used by IndexSummaries. 
 The two major issues seem to be:
 1) that the positions are stored as an ArrayListLong which results in each 
 position taking 24 bytes (class + flags + 8 byte long). This might make sense 
 when the file is initially written but once it has been serialized it would 
 be a lot more memory efficient to just have an long[] (really a int[] would 
 be fine unless 2GB sstables are allowed).
 2) The DecoratedKey for a byte[16] key takes 195 bytes -- this is for the 
 overhead of the ByteBuffer in the key and overhead in the token.
 To somewhat work around the problem I have increased index_sample but will 
 this many rows that didn't really help starts to have diminishing returns. 
 NOTE: This heap dump was from linux with a 64bit oracle vm. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5506) Reduce memory consumption of IndexSummary

2013-04-27 Thread Vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643906#comment-13643906
 ] 

Vijay commented on CASSANDRA-5506:
--

Instead of storing the long[] and byte[][] in memory, can we store the 
indexes/pointers of the decorated key in memory... which will be helpful to 
address the off-heap decorated key's and offset?

For example: 
During the binary search, we can use offheap indexes.length to find the 
midpoint in memory then reference it back to offheap BB which will be 
deserialized as needed (Summary effectively becomes a contiguous off-heap 
location)?

 Reduce memory consumption of IndexSummary
 -

 Key: CASSANDRA-5506
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5506
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Nick Puz
Assignee: Jonathan Ellis
 Fix For: 1.2.5


 I am evaluating cassandra for a use case with many tiny rows which would 
 result in a node with 1-3TB of storage having billions of rows. Before 
 loading that much data I am hitting GC issues and when looking at the heap 
 dump I noticed that 70+% of the memory was used by IndexSummaries. 
 The two major issues seem to be:
 1) that the positions are stored as an ArrayListLong which results in each 
 position taking 24 bytes (class + flags + 8 byte long). This might make sense 
 when the file is initially written but once it has been serialized it would 
 be a lot more memory efficient to just have an long[] (really a int[] would 
 be fine unless 2GB sstables are allowed).
 2) The DecoratedKey for a byte[16] key takes 195 bytes -- this is for the 
 overhead of the ByteBuffer in the key and overhead in the token.
 To somewhat work around the problem I have increased index_sample but will 
 this many rows that didn't really help starts to have diminishing returns. 
 NOTE: This heap dump was from linux with a 64bit oracle vm. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira