[jira] [Commented] (HBASE-9553) Pad HFile blocks to a fixed size before placing them into the blockcache

2013-09-20 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773174#comment-13773174
 ] 

Andrew Purtell commented on HBASE-9553:
---

bq. Maybe I should have done this testing before I filed this idea, going to 
close as Invalid.

This was an interesting issue though.

A negative result is just as interesting and informative as a positive one. In 
some cases, more.

 Pad HFile blocks to a fixed size before placing them into the blockcache
 

 Key: HBASE-9553
 URL: https://issues.apache.org/jira/browse/HBASE-9553
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl

 In order to make it easy on the garbage collector and to avoid full 
 compaction phases we should make sure that all (or at least a large 
 percentage) of the HFile blocks as cached in the block cache are exactly the 
 same size.
 Currently an HFile block is typically slightly larger than the declared block 
 size, as the block will accommodate that last KV on the block. The padding 
 would be a ColumnFamily option. In many cases 100 bytes would probably be a 
 good value to make all blocks exactly the same size (but of course it depends 
 on the max size of the KVs).
 This does not have to be perfect. The more blocks evicted and replaced in the 
 block cache are of the exact same size the easier it should be on the GC.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9553) Pad HFile blocks to a fixed size before placing them into the blockcache

2013-09-18 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771564#comment-13771564
 ] 

Lars Hofhansl commented on HBASE-9553:
--

So I did some simple tests with just byte[]'s:
# allocated chunks of 1 64k+-100 bytes
# allocated chunks of 1 65636 (64k+100) bytes
# allocated chunks of 1 64k+-1000 bytes
# allocated chunks of 1 66536 (64k+1000) bytes

Runs allocate and GC 10m of those 64k byte[]'s.

With various GC settings... There was no discernible difference, between the 
fixed and variable sized blocks.
Maybe I should have done this testing before I filed this idea, going to close 
as Invalid.


 Pad HFile blocks to a fixed size before placing them into the blockcache
 

 Key: HBASE-9553
 URL: https://issues.apache.org/jira/browse/HBASE-9553
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl

 In order to make it easy on the garbage collector and to avoid full 
 compaction phases we should make sure that all (or at least a large 
 percentage) of the HFile blocks as cached in the block cache are exactly the 
 same size.
 Currently an HFile block is typically slightly larger than the declared block 
 size, as the block will accommodate that last KV on the block. The padding 
 would be a ColumnFamily option. In many cases 100 bytes would probably be a 
 good value to make all blocks exactly the same size (but of course it depends 
 on the max size of the KVs).
 This does not have to be perfect. The more blocks evicted and replaced in the 
 block cache are of the exact same size the easier it should be on the GC.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9553) Pad HFile blocks to a fixed size before placing them into the blockcache

2013-09-17 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769273#comment-13769273
 ] 

Anoop Sam John commented on HBASE-9553:
---

What abt when the on-cache encoding is enabled. Will the HFile block sizes can 
change much from block to block?

 Pad HFile blocks to a fixed size before placing them into the blockcache
 

 Key: HBASE-9553
 URL: https://issues.apache.org/jira/browse/HBASE-9553
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl

 In order to make it easy on the garbage collector and to avoid full 
 compaction phases we should make sure that all (or at least a large 
 percentage) of the HFile blocks as cached in the block cache are exactly the 
 same size.
 Currently an HFile block is typically slightly larger than the declared block 
 size, as the block will accommodate that last KV on the block. The padding 
 would be a ColumnFamily option. In many cases 100 bytes would probably be a 
 good value to make all blocks exactly the same size (but of course it depends 
 on the max size of the KVs).
 This does not have to be perfect. The more blocks evicted and replaced in the 
 block cache are of the exact same size the easier it should be on the GC.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9553) Pad HFile blocks to a fixed size before placing them into the blockcache

2013-09-17 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770316#comment-13770316
 ] 

Todd Lipcon commented on HBASE-9553:


Interested to see the results here. When I tested block cache churn before, I 
didn't see heap fragmentation really crop up: 
http://blog.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-2/

For testing this improvement, it would be good to produce similar graphs of the 
CMS maximum chunk size metric from -XX:+PrintFLSStatistics output under some 
workload, and show that the improvement results in less fragmentation over time 
for at least some workload(s).

 Pad HFile blocks to a fixed size before placing them into the blockcache
 

 Key: HBASE-9553
 URL: https://issues.apache.org/jira/browse/HBASE-9553
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl

 In order to make it easy on the garbage collector and to avoid full 
 compaction phases we should make sure that all (or at least a large 
 percentage) of the HFile blocks as cached in the block cache are exactly the 
 same size.
 Currently an HFile block is typically slightly larger than the declared block 
 size, as the block will accommodate that last KV on the block. The padding 
 would be a ColumnFamily option. In many cases 100 bytes would probably be a 
 good value to make all blocks exactly the same size (but of course it depends 
 on the max size of the KVs).
 This does not have to be perfect. The more blocks evicted and replaced in the 
 block cache are of the exact same size the easier it should be on the GC.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9553) Pad HFile blocks to a fixed size before placing them into the blockcache

2013-09-17 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770389#comment-13770389
 ] 

Matt Corgan commented on HBASE-9553:


I don't know the code-level implementation details of any of the garbage 
collectors, but I imagine they do this to an extent already by dividing the 
heap into regions of different chunk sizes and placing blocks into slightly 
bigger slots than they need, effectively doing the padding by leaving empty 
space after each block.  Maybe not for tiny objects, but possibly for bigger 
ones.

I also worry it would be hard to pick a single size to round all the blocks to 
because hbase allows configurable block size and encoding per table.  And even 
if all tables use the default block size and encoding, the encoding will result 
in different block sizes depending on the nature of the data in each table.

It would be a good question for the Mechanical Sympathy mailing list.

 Pad HFile blocks to a fixed size before placing them into the blockcache
 

 Key: HBASE-9553
 URL: https://issues.apache.org/jira/browse/HBASE-9553
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl

 In order to make it easy on the garbage collector and to avoid full 
 compaction phases we should make sure that all (or at least a large 
 percentage) of the HFile blocks as cached in the block cache are exactly the 
 same size.
 Currently an HFile block is typically slightly larger than the declared block 
 size, as the block will accommodate that last KV on the block. The padding 
 would be a ColumnFamily option. In many cases 100 bytes would probably be a 
 good value to make all blocks exactly the same size (but of course it depends 
 on the max size of the KVs).
 This does not have to be perfect. The more blocks evicted and replaced in the 
 block cache are of the exact same size the easier it should be on the GC.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9553) Pad HFile blocks to a fixed size before placing them into the blockcache

2013-09-16 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769048#comment-13769048
 ] 

Nick Dimiduk commented on HBASE-9553:
-

I think it's worth giving a try. Why not take it one step further and 
self-manage a slice of the BlockCache with this pre-defined block size, a la 
MemStoreLAB? Reserve, say, 80% of the BlockCache for slab management and leave 
the rest for the awkward-sized blocks.

Instead of explicitly setting the buffer size, why not sample existing HFiles 
and calculate a guesstimate?

 Pad HFile blocks to a fixed size before placing them into the blockcache
 

 Key: HBASE-9553
 URL: https://issues.apache.org/jira/browse/HBASE-9553
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl

 In order to make it easy on the garbage collector and to avoid full 
 compaction phases we should make sure that all (or at least a large 
 percentage) of the HFile blocks as cached in the block cache are exactly the 
 same size.
 Currently an HFile block is typically slightly larger than the declared block 
 size, as the block will accommodate that last KV on the block. The padding 
 would be a ColumnFamily option. In many cases 100 bytes would probably be a 
 good value to make all blocks exactly the same size (but of course it depends 
 on the max size of the KVs).
 This does not have to be perfect. The more blocks evicted and replaced in the 
 block cache are of the exact same size the easier it should be on the GC.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9553) Pad HFile blocks to a fixed size before placing them into the blockcache

2013-09-16 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769121#comment-13769121
 ] 

Lars Hofhansl commented on HBASE-9553:
--

The memstore stores small variable sized KVs so slab is essential there.
Not sure a slab is needed or even desired here, as we already have fixed (well 
after we do some simple padding) sized chunks for memory. The padding is simple 
and low overhead.

Could calculate standard variation of the KV sizes and add that to the HFile's 
metadata. Then the padding could be a multiple of the standard deviation, 
subject to some maximum (like 2% of the hfile's blocksize or something).

For testing, I would generate data with KVs drawn from a simple size 
distribution and then measure the GC as we evict/replace block in the block 
cache.

[~vasu.mariy...@gmail.com], this is the idea I was talking about earlier today.


 Pad HFile blocks to a fixed size before placing them into the blockcache
 

 Key: HBASE-9553
 URL: https://issues.apache.org/jira/browse/HBASE-9553
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl

 In order to make it easy on the garbage collector and to avoid full 
 compaction phases we should make sure that all (or at least a large 
 percentage) of the HFile blocks as cached in the block cache are exactly the 
 same size.
 Currently an HFile block is typically slightly larger than the declared block 
 size, as the block will accommodate that last KV on the block. The padding 
 would be a ColumnFamily option. In many cases 100 bytes would probably be a 
 good value to make all blocks exactly the same size (but of course it depends 
 on the max size of the KVs).
 This does not have to be perfect. The more blocks evicted and replaced in the 
 block cache are of the exact same size the easier it should be on the GC.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9553) Pad HFile blocks to a fixed size before placing them into the blockcache

2013-09-16 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769150#comment-13769150
 ] 

Jean-Marc Spaggiari commented on HBASE-9553:


The idea seems correct. Looking forward to seeing the results. I'm not sure we 
will get much improvements, but as Nick sais, it's worth giving at try.

 Pad HFile blocks to a fixed size before placing them into the blockcache
 

 Key: HBASE-9553
 URL: https://issues.apache.org/jira/browse/HBASE-9553
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl

 In order to make it easy on the garbage collector and to avoid full 
 compaction phases we should make sure that all (or at least a large 
 percentage) of the HFile blocks as cached in the block cache are exactly the 
 same size.
 Currently an HFile block is typically slightly larger than the declared block 
 size, as the block will accommodate that last KV on the block. The padding 
 would be a ColumnFamily option. In many cases 100 bytes would probably be a 
 good value to make all blocks exactly the same size (but of course it depends 
 on the max size of the KVs).
 This does not have to be perfect. The more blocks evicted and replaced in the 
 block cache are of the exact same size the easier it should be on the GC.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9553) Pad HFile blocks to a fixed size before placing them into the blockcache

2013-09-16 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769161#comment-13769161
 ] 

Liang Xie commented on HBASE-9553:
--

probably it could beat the current implement:) but imho, the off-heap 
solution(e.g. bucket cache with off-heap enabled) is still better than padding. 
per one of our internal benchmark, the off-heap block caching model could cut 
off the 99% pencentile latency to a half, comparing the current on-heap block 
caching implement.

ps: i remembered(unclear) hotspot internal could dynamically resize some stuff, 
like PLAB, to meet the diff obj sizes. maybe some vm expects could  give more 
explaination:) of cause, i agree, the change from app code would be better than 
depends on hotspot:)

 Pad HFile blocks to a fixed size before placing them into the blockcache
 

 Key: HBASE-9553
 URL: https://issues.apache.org/jira/browse/HBASE-9553
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl

 In order to make it easy on the garbage collector and to avoid full 
 compaction phases we should make sure that all (or at least a large 
 percentage) of the HFile blocks as cached in the block cache are exactly the 
 same size.
 Currently an HFile block is typically slightly larger than the declared block 
 size, as the block will accommodate that last KV on the block. The padding 
 would be a ColumnFamily option. In many cases 100 bytes would probably be a 
 good value to make all blocks exactly the same size (but of course it depends 
 on the max size of the KVs).
 This does not have to be perfect. The more blocks evicted and replaced in the 
 block cache are of the exact same size the easier it should be on the GC.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira