[jira] [Comment Edited] (HBASE-17819) Reduce the heap overhead for BucketCache

2017-11-03 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237495#comment-16237495
 ] 

Anoop Sam John edited comment on HBASE-17819 at 11/3/17 11:52 AM:
--

To let know the approach.  This is bit diff from V2 patch.  Major changes are
1. BucketEntry is extended to make the SharedMemory BucketEntry.  For file 
mode, there is no need to keep the ref count as that is not shared memory type. 
 So I removed those new states added for 11425 from BucketEntry.  For off heap 
mode BucketEntry, we have an extension now where we have the new states.
2. Removed the CSLM for keeping the HFilename based blocks info.  The 
evictBlocksByHfileName will have a perf impact as it has to iterate through all 
the entries to know each of the block entry belong to this file or not.  For 
that changed the evictBlocksByHfileName to be an async op way. A dedicated 
eviction thread will do this work.  ANy way even if we dont remove these blocks 
or have delay in removal, eventually these block will get removed as we have 
LRU algo for the eviction.  So when there are no space left for the new blocks 
addition, eviction would happen, removing unused blocks.  More over, eviction 
of blocks on HFile close is default off only (We have a config to turn this 
off).  When it is compaction , for the compacted files, we have evictByHFiles 
happening now. There will be  bit more delay for the actual removal of the 
blocks.   
But we save lot of heap memory per entry now as per this approach. The math is 
there in above comment
{quote}
Now - 32 + 64 + 40 + 40 = 176
After patch - 32 + 48 + 40 = 120
Tested with Java Instrumentation
{quote}


was (Author: anoop.hbase):
To let know the approach.  This is bit diff from V2 patch.  Major changes are
1. BucketEntry is extended to make the SharedMemory BucketEntry.  For file 
mode, there is no need to keep the ref count as that is not shared memory type. 
 So I removed those new states added for 11425 from BucketEntry.  For off heap 
mode BucketEntry, we have an extension now where we have the new states.
2. Removed the CSLM for keeping the HFilename based blocks info.  The 
evictBlocksByHfileName will have a perf impact as it has to iterate through all 
the entries to know each of the block entry belong to this file or not.  For 
that changed the evictBlocksByHfileName to be an async op way. A dedicated 
eviction thread will do this work.  ANy way even if we dont remove these blocks 
or have delay in removal, eventually these block will get removed as we have 
LRU algo for the eviction.  So when there are no space left for the new blocks 
addition, eviction would happen, removing unused blocks.  More over, eviction 
of blocks on HFile close is default off only (We have a config to turn this 
off).  When it is compaction , for the compacted files, we have evictByHFiles 
happening now. There will be  bit more delay for the actual removal of the 
blocks.   
But we save lot of heap memory per entry now as per this approach. The math is 
there in above comment

> Reduce the heap overhead for BucketCache
> 
>
> Key: HBASE-17819
> URL: https://issues.apache.org/jira/browse/HBASE-17819
> Project: HBase
>  Issue Type: Sub-task
>  Components: BucketCache
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-17819_V1.patch, HBASE-17819_V2.patch, 
> HBASE-17819_V3.patch
>
>
> We keep Bucket entry map in BucketCache.  Below is the math for heapSize for 
> the key , value into this map.
> BlockCacheKey
> ---
> String hfileName  -  Ref  - 4
> long offset  - 8
> BlockType blockType  - Ref  - 4
> boolean isPrimaryReplicaBlock  - 1
> Total  =  12 (Object) + 17 = 29
> BucketEntry
> 
> int offsetBase  -  4
> int length  - 4
> byte offset1  -  1
> byte deserialiserIndex  -  1
> long accessCounter  -  8
> BlockPriority priority  - Ref  - 4
> volatile boolean markedForEvict  -  1
> AtomicInteger refCount  -  16 + 4
> long cachedTime  -  8
> Total = 12 (Object) + 51 = 63
> ConcurrentHashMap Map.Entry  -  40
> blocksByHFile ConcurrentSkipListSet Entry  -  40
> Total = 29 + 63 + 80 = 172
> For 10 million blocks we will end up having 1.6GB of heap size.  
> This jira aims to reduce this as much as possible



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HBASE-17819) Reduce the heap overhead for BucketCache

2017-07-21 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16096195#comment-16096195
 ] 

Anoop Sam John edited comment on HBASE-17819 at 7/21/17 12:45 PM:
--

BlockCacheKey after align the heap overhead is 32 bytes and even after the 
change of ref to byte it will be 32.  Still it will be worth doing that as 
above 32 GB heap size, compressed ref wont be there and refs might take 8 
bytes. Then it will make a diff.
BucketEntry was 64 bytes heap and after the patch it will be 48.
Also we will be removing 40 bytes per entry as we remove blocksByHFile Set.
So the math is 
Now -  32 + 64 + 40 + 40 = 176
After patch - 32 + 48 + 40 = 120

Tested with Java Instrumentation


was (Author: anoop.hbase):
BlockCacheKey after align the heap overhead is 32 bytes and even after the 
change of ref to byte it will be 32.  Still it will be worth doing that as 
above 32 GB heap size, compressed ref wont be there and refs might take 8 
bytes. Then it will make a diff.
BucketEntry was 64 bytes heap and after the patch it will be 48.
Also we will be removing 40 bytes per entry as we remove blocksByHFile Set.
So the math is 
Now -  32 + 64 + 40 + 40 = 176
After patch - 32 + 48 + 40 = 120

> Reduce the heap overhead for BucketCache
> 
>
> Key: HBASE-17819
> URL: https://issues.apache.org/jira/browse/HBASE-17819
> Project: HBase
>  Issue Type: Sub-task
>  Components: BucketCache
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
> Fix For: 2.0.0
>
> Attachments: HBASE-17819_V1.patch, HBASE-17819_V2.patch
>
>
> We keep Bucket entry map in BucketCache.  Below is the math for heapSize for 
> the key , value into this map.
> BlockCacheKey
> ---
> String hfileName  -  Ref  - 4
> long offset  - 8
> BlockType blockType  - Ref  - 4
> boolean isPrimaryReplicaBlock  - 1
> Total  =  12 (Object) + 17 = 29
> BucketEntry
> 
> int offsetBase  -  4
> int length  - 4
> byte offset1  -  1
> byte deserialiserIndex  -  1
> long accessCounter  -  8
> BlockPriority priority  - Ref  - 4
> volatile boolean markedForEvict  -  1
> AtomicInteger refCount  -  16 + 4
> long cachedTime  -  8
> Total = 12 (Object) + 51 = 63
> ConcurrentHashMap Map.Entry  -  40
> blocksByHFile ConcurrentSkipListSet Entry  -  40
> Total = 29 + 63 + 80 = 172
> For 10 million blocks we will end up having 1.6GB of heap size.  
> This jira aims to reduce this as much as possible



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HBASE-17819) Reduce the heap overhead for BucketCache

2017-07-20 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094930#comment-16094930
 ] 

Vladimir Rodionov edited comment on HBASE-17819 at 7/20/17 4:35 PM:


{quote}
BlockCacheKey
---
String hfileName - Ref - 4
long offset - 8
BlockType blockType - Ref - 4
boolean isPrimaryReplicaBlock - 1
Total = 12 (Object) + 17 = 29
BucketEntry

int offsetBase - 4
int length - 4
byte offset1 - 1
byte deserialiserIndex - 1
long accessCounter - 8
BlockPriority priority - Ref - 4
volatile boolean markedForEvict - 1
AtomicInteger refCount - 16 + 4
long cachedTime - 8
Total = 12 (Object) + 51 = 63
ConcurrentHashMap Map.Entry - 40
blocksByHFile ConcurrentSkipListSet Entry - 40
Total = 29 + 63 + 80 = 172
{quote}

Just couple corrections on you math, guys

# compressed OOP (obj ref = 4 bytes) works up to 30.5GB of heap size. Many 
users already have more than that
# object's fields layout is slightly different: n-byte types are aligned on 
n-bytes boundaries, therefore if you have for example, boolean and long fields 
in the object, the object's size is going to be 16 (overhead) + 8 + 8 = 32 and 
not 16 + 1+ 8. You should take into account also that total object size is 
always multiple of 8, so if you get 42, then its actually - 48, because next 
object starts on a 8-byte boundary.

You can shave some bytes by just rearranging fields in the object in size 
descending order: first go 8 byte types (obj ref, long, double), followed by 
4-byte types (int, float), 2-byte types (short, char) and 1-byte types (bool, 
byte) at the end



was (Author: vrodionov):
{quote}
BlockCacheKey
---
String hfileName - Ref - 4
long offset - 8
BlockType blockType - Ref - 4
boolean isPrimaryReplicaBlock - 1
Total = 12 (Object) + 17 = 29
BucketEntry

int offsetBase - 4
int length - 4
byte offset1 - 1
byte deserialiserIndex - 1
long accessCounter - 8
BlockPriority priority - Ref - 4
volatile boolean markedForEvict - 1
AtomicInteger refCount - 16 + 4
long cachedTime - 8
Total = 12 (Object) + 51 = 63
ConcurrentHashMap Map.Entry - 40
blocksByHFile ConcurrentSkipListSet Entry - 40
Total = 29 + 63 + 80 = 172
{quote}

Just couple corrections on you math, guys

# compressed OOP (obj ref = 4 bytes) works up to 30.5GB of heap size. Many 
users already have more than that
# object's fields layout is slightly different: n-byte types are aligned on 
n-bytes boundaries, therefore if you have for example, boolean and long fields 
in the object, the object's size is going to be 16 (overhead) + 8 + 8 = 32 and 
not 16 + 1+ 8. You should take into account also that total object size is 
always multiple of 8, so if you get 42, then its actually - 48, because next 
object starts on a 8-byte boundary.

You can shave some bytes by just rearranging fields in the object in size 
descending order: first go 8 byte type, followed by 4-byte types, 2-byte and 
1-byte at the end


> Reduce the heap overhead for BucketCache
> 
>
> Key: HBASE-17819
> URL: https://issues.apache.org/jira/browse/HBASE-17819
> Project: HBase
>  Issue Type: Sub-task
>  Components: BucketCache
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
> Fix For: 2.0.0
>
> Attachments: HBASE-17819_V1.patch, HBASE-17819_V2.patch
>
>
> We keep Bucket entry map in BucketCache.  Below is the math for heapSize for 
> the key , value into this map.
> BlockCacheKey
> ---
> String hfileName  -  Ref  - 4
> long offset  - 8
> BlockType blockType  - Ref  - 4
> boolean isPrimaryReplicaBlock  - 1
> Total  =  12 (Object) + 17 = 29
> BucketEntry
> 
> int offsetBase  -  4
> int length  - 4
> byte offset1  -  1
> byte deserialiserIndex  -  1
> long accessCounter  -  8
> BlockPriority priority  - Ref  - 4
> volatile boolean markedForEvict  -  1
> AtomicInteger refCount  -  16 + 4
> long cachedTime  -  8
> Total = 12 (Object) + 51 = 63
> ConcurrentHashMap Map.Entry  -  40
> blocksByHFile ConcurrentSkipListSet Entry  -  40
> Total = 29 + 63 + 80 = 172
> For 10 million blocks we will end up having 1.6GB of heap size.  
> This jira aims to reduce this as much as possible



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HBASE-17819) Reduce the heap overhead for BucketCache

2017-07-20 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094930#comment-16094930
 ] 

Vladimir Rodionov edited comment on HBASE-17819 at 7/20/17 4:33 PM:


{quote}
BlockCacheKey
---
String hfileName - Ref - 4
long offset - 8
BlockType blockType - Ref - 4
boolean isPrimaryReplicaBlock - 1
Total = 12 (Object) + 17 = 29
BucketEntry

int offsetBase - 4
int length - 4
byte offset1 - 1
byte deserialiserIndex - 1
long accessCounter - 8
BlockPriority priority - Ref - 4
volatile boolean markedForEvict - 1
AtomicInteger refCount - 16 + 4
long cachedTime - 8
Total = 12 (Object) + 51 = 63
ConcurrentHashMap Map.Entry - 40
blocksByHFile ConcurrentSkipListSet Entry - 40
Total = 29 + 63 + 80 = 172
{quote}

Just couple corrections on you math, guys

# compressed OOP (obj ref = 4 bytes) works up to 30.5GB of heap size. Many 
users already have more than that
# object's fields layout is slightly different: n-byte types are aligned on 
n-bytes boundaries, therefore if you have for example, boolean and long fields 
in the object, the object's size is going to be 16 (overhead) + 8 + 8 = 32 and 
not 16 + 1+ 8. You should take into account also that total object size is 
always multiple of 8, so if you get 42, then its actually - 48, because next 
object starts on a 8-byte boundary.

You can shave some bytes by just rearranging fields in the object in size 
descending order: first go 8 byte type, followed by 4-byte types, 2-byte and 
1-byte at the end



was (Author: vrodionov):
{quote}
BlockCacheKey
---
String hfileName - Ref - 4
long offset - 8
BlockType blockType - Ref - 4
boolean isPrimaryReplicaBlock - 1
Total = 12 (Object) + 17 = 29
BucketEntry

int offsetBase - 4
int length - 4
byte offset1 - 1
byte deserialiserIndex - 1
long accessCounter - 8
BlockPriority priority - Ref - 4
volatile boolean markedForEvict - 1
AtomicInteger refCount - 16 + 4
long cachedTime - 8
Total = 12 (Object) + 51 = 63
ConcurrentHashMap Map.Entry - 40
blocksByHFile ConcurrentSkipListSet Entry - 40
Total = 29 + 63 + 80 = 172
{quote}

Just couple corrections on you math, guys

# compressed OOP (obj ref = 4 bytes) works up to 30.5GB of heap size. Many 
users already have more than that
# object's fields layout is slightly different: n-byte types are aligned on 
n-bytes boundaries, therefore if you have for example, boolean and long fields 
of the object is going to be 16 (overhead) + 8 + 8 = 32 and not 16 + 1+ 8. You 
should take into account also that total object size is always multiple of 8, 
so if you get 42, then its actually - 48, because next object starts on a 
8-byte boundary.

You can shave some bytes by just rearranging fields in the object in size 
descending order: first go 8 byte type, followed by 4-byte types, 2-byte and 
1-byte at the end


> Reduce the heap overhead for BucketCache
> 
>
> Key: HBASE-17819
> URL: https://issues.apache.org/jira/browse/HBASE-17819
> Project: HBase
>  Issue Type: Sub-task
>  Components: BucketCache
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
> Fix For: 2.0.0
>
> Attachments: HBASE-17819_V1.patch, HBASE-17819_V2.patch
>
>
> We keep Bucket entry map in BucketCache.  Below is the math for heapSize for 
> the key , value into this map.
> BlockCacheKey
> ---
> String hfileName  -  Ref  - 4
> long offset  - 8
> BlockType blockType  - Ref  - 4
> boolean isPrimaryReplicaBlock  - 1
> Total  =  12 (Object) + 17 = 29
> BucketEntry
> 
> int offsetBase  -  4
> int length  - 4
> byte offset1  -  1
> byte deserialiserIndex  -  1
> long accessCounter  -  8
> BlockPriority priority  - Ref  - 4
> volatile boolean markedForEvict  -  1
> AtomicInteger refCount  -  16 + 4
> long cachedTime  -  8
> Total = 12 (Object) + 51 = 63
> ConcurrentHashMap Map.Entry  -  40
> blocksByHFile ConcurrentSkipListSet Entry  -  40
> Total = 29 + 63 + 80 = 172
> For 10 million blocks we will end up having 1.6GB of heap size.  
> This jira aims to reduce this as much as possible



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)