[jira] [Commented] (HBASE-15248) BLOCKSIZE 4k should result in 4096 bytes on disk; i.e. fit inside a BucketCache 'block' of 4k

2017-07-30 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106811#comment-16106811
 ] 

Anoop Sam John commented on HBASE-15248:


This block header we know it is 33 bytes and there will be 13 bytes meta data.  
And the CRCs we know(?) how many bytes? May be yes.
These are all extra other than the block size what a user configured (4KB).  
But we know how many bytes extra.
The biggest issue is our block size is NOT an upper cap. This is not the max 
size a block will have. This is the min size a block will have.  The check for 
ending cur block and starting new is with out considering cur cell size. Also 
one more thing is we dont want to write duplicate cells (Same key but diff mvcc 
numbers) in diff blocks.  IMO this the main concern area for making sure the 
block size is very predictable. Because of this unpredictability what we do is 
all block sizes user will end up adding some extra size.  Remember the default 
values for BC block sizes as (5 KB, 9 KB etc) for block sizes 4KB, 8 KB.  Extra 
1 KB per block is considering all these extra things.

> BLOCKSIZE 4k should result in 4096 bytes on disk; i.e. fit inside a 
> BucketCache 'block' of 4k
> -
>
> Key: HBASE-15248
> URL: https://issues.apache.org/jira/browse/HBASE-15248
> Project: HBase
>  Issue Type: Sub-task
>  Components: BucketCache
>Reporter: stack
>
> Chatting w/ a gentleman named Daniel Pol who is messing w/ bucketcache, he 
> wants blocks to be the size specified in the configuration and no bigger. His 
> hardware set ups fetches pages of 4k and so a block that has 4k of payload 
> but has then a header and the header of the next block (which helps figure 
> whats next when scanning) ends up being 4203 bytes or something, and this 
> then then translates into two seeks per block fetch.
> This issue is about what it would take to stay inside our configured size 
> boundary writing out blocks.
> If not possible, give back better signal on what to do so you could fit 
> inside a particular constraint.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-15248) BLOCKSIZE 4k should result in 4096 bytes on disk; i.e. fit inside a BucketCache 'block' of 4k

2017-07-30 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106470#comment-16106470
 ] 

stack commented on HBASE-15248:
---

That is right [~anoop.hbase]  There is tail data too... the CRCs.

This issue is about "...what it would take to stay inside our configured size 
boundary writing out blocks "  or ".give back better signal on what to 
do so you could fit inside a particular constraint." ... so if user says 4k, 
then we'd only write 4k blocks (the 4k would include metadata and CRCs...).   
As [~ndimiduk] says, the 4k doesn't count compression + encryption.. but maybe 
a first pass would fix case when not compression or encoding?


> BLOCKSIZE 4k should result in 4096 bytes on disk; i.e. fit inside a 
> BucketCache 'block' of 4k
> -
>
> Key: HBASE-15248
> URL: https://issues.apache.org/jira/browse/HBASE-15248
> Project: HBase
>  Issue Type: Sub-task
>  Components: BucketCache
>Reporter: stack
>
> Chatting w/ a gentleman named Daniel Pol who is messing w/ bucketcache, he 
> wants blocks to be the size specified in the configuration and no bigger. His 
> hardware set ups fetches pages of 4k and so a block that has 4k of payload 
> but has then a header and the header of the next block (which helps figure 
> whats next when scanning) ends up being 4203 bytes or something, and this 
> then then translates into two seeks per block fetch.
> This issue is about what it would take to stay inside our configured size 
> boundary writing out blocks.
> If not possible, give back better signal on what to do so you could fit 
> inside a particular constraint.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-15248) BLOCKSIZE 4k should result in 4096 bytes on disk; i.e. fit inside a BucketCache 'block' of 4k

2017-07-20 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094465#comment-16094465
 ] 

Anoop Sam John commented on HBASE-15248:


So what we save corresponding to one block is this blocks data (cells)  + 
header of this block (33 bytes)  +  Block meta data (13 bytes).   Correct 
[~Stack]? 
HBASE-15477 already removed the saving of next block's header while writing to 
cache.

> BLOCKSIZE 4k should result in 4096 bytes on disk; i.e. fit inside a 
> BucketCache 'block' of 4k
> -
>
> Key: HBASE-15248
> URL: https://issues.apache.org/jira/browse/HBASE-15248
> Project: HBase
>  Issue Type: Sub-task
>  Components: BucketCache
>Reporter: stack
>
> Chatting w/ a gentleman named Daniel Pol who is messing w/ bucketcache, he 
> wants blocks to be the size specified in the configuration and no bigger. His 
> hardware set ups fetches pages of 4k and so a block that has 4k of payload 
> but has then a header and the header of the next block (which helps figure 
> whats next when scanning) ends up being 4203 bytes or something, and this 
> then then translates into two seeks per block fetch.
> This issue is about what it would take to stay inside our configured size 
> boundary writing out blocks.
> If not possible, give back better signal on what to do so you could fit 
> inside a particular constraint.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-15248) BLOCKSIZE 4k should result in 4096 bytes on disk; i.e. fit inside a BucketCache 'block' of 4k

2017-03-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898738#comment-15898738
 ] 

Hudson commented on HBASE-15248:


SUCCESS: Integrated in Jenkins build HBase-Trunk_matrix #2626 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/2626/])
Javadoc on what BLOCKSIZE means (related to HBASE-15248) (stack: rev 
4beae9a56e2d5d2992d017ba876d18f78415e067)
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java


> BLOCKSIZE 4k should result in 4096 bytes on disk; i.e. fit inside a 
> BucketCache 'block' of 4k
> -
>
> Key: HBASE-15248
> URL: https://issues.apache.org/jira/browse/HBASE-15248
> Project: HBase
>  Issue Type: Sub-task
>  Components: BucketCache
>Reporter: stack
>
> Chatting w/ a gentleman named Daniel Pol who is messing w/ bucketcache, he 
> wants blocks to be the size specified in the configuration and no bigger. His 
> hardware set ups fetches pages of 4k and so a block that has 4k of payload 
> but has then a header and the header of the next block (which helps figure 
> whats next when scanning) ends up being 4203 bytes or something, and this 
> then then translates into two seeks per block fetch.
> This issue is about what it would take to stay inside our configured size 
> boundary writing out blocks.
> If not possible, give back better signal on what to do so you could fit 
> inside a particular constraint.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-15248) BLOCKSIZE 4k should result in 4096 bytes on disk; i.e. fit inside a BucketCache 'block' of 4k

2017-03-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898383#comment-15898383
 ] 

stack commented on HBASE-15248:
---

Added this note to BLOCKSIZE:


--- a/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
+++ b/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
@@ -103,7 +103,10 @@ public class HColumnDescriptor implements 
Comparable {
   /**
* Size of storefile/hfile 'blocks'.  Default is {@link #DEFAULT_BLOCKSIZE}.
* Use smaller block sizes for faster random-access at expense of larger
-   * indices (more memory consumption).
+   * indices (more memory consumption). Note that this is a soft limit and that
+   * blocks have overhead (metadata, CRCs) so blocks will tend to be the size
+   * specified here and then some; i.e. don't expect that setting BLOCKSIZE=4k
+   * means hbase data will align with an SSDs 4k page accesses (TODO).
*/
   public static final String BLOCKSIZE = "BLOCKSIZE";



> BLOCKSIZE 4k should result in 4096 bytes on disk; i.e. fit inside a 
> BucketCache 'block' of 4k
> -
>
> Key: HBASE-15248
> URL: https://issues.apache.org/jira/browse/HBASE-15248
> Project: HBase
>  Issue Type: Sub-task
>  Components: BucketCache
>Reporter: stack
>
> Chatting w/ a gentleman named Daniel Pol who is messing w/ bucketcache, he 
> wants blocks to be the size specified in the configuration and no bigger. His 
> hardware set ups fetches pages of 4k and so a block that has 4k of payload 
> but has then a header and the header of the next block (which helps figure 
> whats next when scanning) ends up being 4203 bytes or something, and this 
> then then translates into two seeks per block fetch.
> This issue is about what it would take to stay inside our configured size 
> boundary writing out blocks.
> If not possible, give back better signal on what to do so you could fit 
> inside a particular constraint.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-15248) BLOCKSIZE 4k should result in 4096 bytes on disk; i.e. fit inside a BucketCache 'block' of 4k

2016-12-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762022#comment-15762022
 ] 

stack commented on HBASE-15248:
---

Not a blocker for enabling bucketcache by default I'd say.

> BLOCKSIZE 4k should result in 4096 bytes on disk; i.e. fit inside a 
> BucketCache 'block' of 4k
> -
>
> Key: HBASE-15248
> URL: https://issues.apache.org/jira/browse/HBASE-15248
> Project: HBase
>  Issue Type: Sub-task
>  Components: BucketCache
>Reporter: stack
>
> Chatting w/ a gentleman named Daniel Pol who is messing w/ bucketcache, he 
> wants blocks to be the size specified in the configuration and no bigger. His 
> hardware set ups fetches pages of 4k and so a block that has 4k of payload 
> but has then a header and the header of the next block (which helps figure 
> whats next when scanning) ends up being 4203 bytes or something, and this 
> then then translates into two seeks per block fetch.
> This issue is about what it would take to stay inside our configured size 
> boundary writing out blocks.
> If not possible, give back better signal on what to do so you could fit 
> inside a particular constraint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15248) BLOCKSIZE 4k should result in 4096 bytes on disk; i.e. fit inside a BucketCache 'block' of 4k

2016-12-19 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15761211#comment-15761211
 ] 

Anoop Sam John commented on HBASE-15248:


Whether this should block turning ON off heap BC Stack? Am trying to do it but 
before that, we have to make sure to solve some imp jiras. WDYT?

> BLOCKSIZE 4k should result in 4096 bytes on disk; i.e. fit inside a 
> BucketCache 'block' of 4k
> -
>
> Key: HBASE-15248
> URL: https://issues.apache.org/jira/browse/HBASE-15248
> Project: HBase
>  Issue Type: Sub-task
>  Components: BucketCache
>Reporter: stack
>
> Chatting w/ a gentleman named Daniel Pol who is messing w/ bucketcache, he 
> wants blocks to be the size specified in the configuration and no bigger. His 
> hardware set ups fetches pages of 4k and so a block that has 4k of payload 
> but has then a header and the header of the next block (which helps figure 
> whats next when scanning) ends up being 4203 bytes or something, and this 
> then then translates into two seeks per block fetch.
> This issue is about what it would take to stay inside our configured size 
> boundary writing out blocks.
> If not possible, give back better signal on what to do so you could fit 
> inside a particular constraint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15248) BLOCKSIZE 4k should result in 4096 bytes on disk; i.e. fit inside a BucketCache 'block' of 4k

2016-12-16 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755447#comment-15755447
 ] 

Nick Dimiduk commented on HBASE-15248:
--

Yeah we have a disconnect -- BLOCKSIZE doesn't take compression or encoding 
into account.

> BLOCKSIZE 4k should result in 4096 bytes on disk; i.e. fit inside a 
> BucketCache 'block' of 4k
> -
>
> Key: HBASE-15248
> URL: https://issues.apache.org/jira/browse/HBASE-15248
> Project: HBase
>  Issue Type: Sub-task
>  Components: BucketCache
>Reporter: stack
>
> Chatting w/ a gentleman named Daniel Pol who is messing w/ bucketcache, he 
> wants blocks to be the size specified in the configuration and no bigger. His 
> hardware set ups fetches pages of 4k and so a block that has 4k of payload 
> but has then a header and the header of the next block (which helps figure 
> whats next when scanning) ends up being 4203 bytes or something, and this 
> then then translates into two seeks per block fetch.
> This issue is about what it would take to stay inside our configured size 
> boundary writing out blocks.
> If not possible, give back better signal on what to do so you could fit 
> inside a particular constraint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15248) BLOCKSIZE 4k should result in 4096 bytes on disk; i.e. fit inside a BucketCache 'block' of 4k

2016-12-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755063#comment-15755063
 ] 

stack commented on HBASE-15248:
---

bq. So right now u have removed the eager fetch of next block header right? 

No. I did not. On study, it would be too disruptive removing the fetch of the 
next block header: HBASE-17185 "Purge the seek of the next block reading 
HFileBlocks" 

On this issue, the way we do sizing now -- i.e. we write bigger than what is 
configured -- is confusing. This issue should be about rationalizing this.


> BLOCKSIZE 4k should result in 4096 bytes on disk; i.e. fit inside a 
> BucketCache 'block' of 4k
> -
>
> Key: HBASE-15248
> URL: https://issues.apache.org/jira/browse/HBASE-15248
> Project: HBase
>  Issue Type: Sub-task
>  Components: BucketCache
>Reporter: stack
>
> Chatting w/ a gentleman named Daniel Pol who is messing w/ bucketcache, he 
> wants blocks to be the size specified in the configuration and no bigger. His 
> hardware set ups fetches pages of 4k and so a block that has 4k of payload 
> but has then a header and the header of the next block (which helps figure 
> whats next when scanning) ends up being 4203 bytes or something, and this 
> then then translates into two seeks per block fetch.
> This issue is about what it would take to stay inside our configured size 
> boundary writing out blocks.
> If not possible, give back better signal on what to do so you could fit 
> inside a particular constraint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15248) BLOCKSIZE 4k should result in 4096 bytes on disk; i.e. fit inside a BucketCache 'block' of 4k

2016-12-16 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753974#comment-15753974
 ] 

Anoop Sam John commented on HBASE-15248:


BC is having diff sized buckets and slots in it.  This sizes as per the 
possible HFile block sizes.  By def we consider 4K, 8K...  512K..  But we give 
an additional +1KB size for each of the bucket size and so to a slot..  So a 4K 
is actually becoming 5K.  And ya extra bytes are taken off by this cur header 
and next block header.
So right now u have removed the eager fetch of next block header right? 
[~saint@gmail.com]..
Still the cur block, header we will need?
Am trying to solve issues under the parent so that I can make HBASE-17204 to 
happen.

> BLOCKSIZE 4k should result in 4096 bytes on disk; i.e. fit inside a 
> BucketCache 'block' of 4k
> -
>
> Key: HBASE-15248
> URL: https://issues.apache.org/jira/browse/HBASE-15248
> Project: HBase
>  Issue Type: Sub-task
>  Components: BucketCache
>Reporter: stack
>
> Chatting w/ a gentleman named Daniel Pol who is messing w/ bucketcache, he 
> wants blocks to be the size specified in the configuration and no bigger. His 
> hardware set ups fetches pages of 4k and so a block that has 4k of payload 
> but has then a header and the header of the next block (which helps figure 
> whats next when scanning) ends up being 4203 bytes or something, and this 
> then then translates into two seeks per block fetch.
> This issue is about what it would take to stay inside our configured size 
> boundary writing out blocks.
> If not possible, give back better signal on what to do so you could fit 
> inside a particular constraint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)