[jira] [Commented] (HBASE-7404) Bucket Cache:A solution about CMS,Heap Fragment and Big Cache on HBASE

2017-04-25 Thread Hanjie Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982530#comment-15982530
 ] 

Hanjie Gu commented on HBASE-7404:
--

I have a question:
Why does each size in BucketSizeInfo have one more 1KB? 
such as below code:(5KB, ..., 17KB, ..., 65KB, ...)
```
  // Default block size is 64K, so we choose more sizes near 64K, you'd better
  // reset it according to your cluster's block size distribution
  // TODO Support the view of block size distribution statistics
  private static final int DEFAULT_BUCKET_SIZES[] = { 4 * 1024 + 1024, 8 * 1024 
+ 1024,
  16 * 1024 + 1024, 32 * 1024 + 1024, 40 * 1024 + 1024, 48 * 1024 + 1024,
  56 * 1024 + 1024, 64 * 1024 + 1024, 96 * 1024 + 1024, 128 * 1024 + 1024,
  192 * 1024 + 1024, 256 * 1024 + 1024, 384 * 1024 + 1024,
  512 * 1024 + 1024 };
```

> Bucket Cache:A solution about CMS,Heap Fragment and Big Cache on HBASE
> --
>
> Key: HBASE-7404
> URL: https://issues.apache.org/jira/browse/HBASE-7404
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.94.3
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.95.0
>
> Attachments: 7404-0.94-fixed-lines.txt, 7404-trunk-v10.patch, 
> 7404-trunk-v11.patch, 7404-trunk-v12.patch, 7404-trunk-v13.patch, 
> 7404-trunk-v13.txt, 7404-trunk-v14.patch, BucketCache.pdf, 
> hbase-7404-94v2.patch, HBASE-7404-backport-0.94.patch, 
> hbase-7404-trunkv2.patch, hbase-7404-trunkv9.patch, Introduction of Bucket 
> Cache.pdf
>
>
> First, thanks @neil from Fusion-IO share the source code.
> Usage:
> 1.Use bucket cache as main memory cache, configured as the following:
> –"hbase.bucketcache.ioengine" "heap" (or "offheap" if using offheap memory to 
> cache block )
> –"hbase.bucketcache.size" 0.4 (size for bucket cache, 0.4 is a percentage of 
> max heap size)
> 2.Use bucket cache as a secondary cache, configured as the following:
> –"hbase.bucketcache.ioengine" "file:/disk1/hbase/cache.data"(The file path 
> where to store the block data)
> –"hbase.bucketcache.size" 1024 (size for bucket cache, unit is MB, so 1024 
> means 1GB)
> –"hbase.bucketcache.combinedcache.enabled" false (default value being true)
> See more configurations from org.apache.hadoop.hbase.io.hfile.CacheConfig and 
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache
> What's Bucket Cache? 
> It could greatly decrease CMS and heap fragment by GC
> It support a large cache space for High Read Performance by using high speed 
> disk like Fusion-io
> 1.An implementation of block cache like LruBlockCache
> 2.Self manage blocks' storage position through Bucket Allocator
> 3.The cached blocks could be stored in the memory or file system
> 4.Bucket Cache could be used as a mainly block cache(see CombinedBlockCache), 
> combined with LruBlockCache to decrease CMS and fragment by GC.
> 5.BucketCache also could be used as a secondary cache(e.g. using Fusionio to 
> store block) to enlarge cache space
> How about SlabCache?
> We have studied and test SlabCache first, but the result is bad, because:
> 1.SlabCache use SingleSizeCache, its use ratio of memory is low because kinds 
> of block size, especially using DataBlockEncoding
> 2.SlabCache is uesd in DoubleBlockCache, block is cached both in SlabCache 
> and LruBlockCache, put the block to LruBlockCache again if hit in SlabCache , 
> it causes CMS and heap fragment don't get any better
> 3.Direct heap performance is not good as heap, and maybe cause OOM, so we 
> recommend using "heap" engine 
> See more in the attachment and in the patch



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-7404) Bucket Cache:A solution about CMS,Heap Fragment and Big Cache on HBASE

2017-04-25 Thread Hanjie Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982530#comment-15982530
 ] 

Hanjie Gu edited comment on HBASE-7404 at 4/25/17 8:00 AM:
---

I have a question:
Why does each size in BucketSizeInfo have one more 1KB? 
such as below code in BucketAllocator.java:(5KB, ..., 17KB, ..., 65KB, ...)
```
  // Default block size is 64K, so we choose more sizes near 64K, you'd better
  // reset it according to your cluster's block size distribution
  // TODO Support the view of block size distribution statistics
  private static final int DEFAULT_BUCKET_SIZES[] = { 4 * 1024 + 1024, 8 * 1024 
+ 1024,
  16 * 1024 + 1024, 32 * 1024 + 1024, 40 * 1024 + 1024, 48 * 1024 + 1024,
  56 * 1024 + 1024, 64 * 1024 + 1024, 96 * 1024 + 1024, 128 * 1024 + 1024,
  192 * 1024 + 1024, 256 * 1024 + 1024, 384 * 1024 + 1024,
  512 * 1024 + 1024 };
```


was (Author: jackgu):
I have a question:
Why does each size in BucketSizeInfo have one more 1KB? 
such as below code:(5KB, ..., 17KB, ..., 65KB, ...)
```
  // Default block size is 64K, so we choose more sizes near 64K, you'd better
  // reset it according to your cluster's block size distribution
  // TODO Support the view of block size distribution statistics
  private static final int DEFAULT_BUCKET_SIZES[] = { 4 * 1024 + 1024, 8 * 1024 
+ 1024,
  16 * 1024 + 1024, 32 * 1024 + 1024, 40 * 1024 + 1024, 48 * 1024 + 1024,
  56 * 1024 + 1024, 64 * 1024 + 1024, 96 * 1024 + 1024, 128 * 1024 + 1024,
  192 * 1024 + 1024, 256 * 1024 + 1024, 384 * 1024 + 1024,
  512 * 1024 + 1024 };
```

> Bucket Cache:A solution about CMS,Heap Fragment and Big Cache on HBASE
> --
>
> Key: HBASE-7404
> URL: https://issues.apache.org/jira/browse/HBASE-7404
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.94.3
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.95.0
>
> Attachments: 7404-0.94-fixed-lines.txt, 7404-trunk-v10.patch, 
> 7404-trunk-v11.patch, 7404-trunk-v12.patch, 7404-trunk-v13.patch, 
> 7404-trunk-v13.txt, 7404-trunk-v14.patch, BucketCache.pdf, 
> hbase-7404-94v2.patch, HBASE-7404-backport-0.94.patch, 
> hbase-7404-trunkv2.patch, hbase-7404-trunkv9.patch, Introduction of Bucket 
> Cache.pdf
>
>
> First, thanks @neil from Fusion-IO share the source code.
> Usage:
> 1.Use bucket cache as main memory cache, configured as the following:
> –"hbase.bucketcache.ioengine" "heap" (or "offheap" if using offheap memory to 
> cache block )
> –"hbase.bucketcache.size" 0.4 (size for bucket cache, 0.4 is a percentage of 
> max heap size)
> 2.Use bucket cache as a secondary cache, configured as the following:
> –"hbase.bucketcache.ioengine" "file:/disk1/hbase/cache.data"(The file path 
> where to store the block data)
> –"hbase.bucketcache.size" 1024 (size for bucket cache, unit is MB, so 1024 
> means 1GB)
> –"hbase.bucketcache.combinedcache.enabled" false (default value being true)
> See more configurations from org.apache.hadoop.hbase.io.hfile.CacheConfig and 
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache
> What's Bucket Cache? 
> It could greatly decrease CMS and heap fragment by GC
> It support a large cache space for High Read Performance by using high speed 
> disk like Fusion-io
> 1.An implementation of block cache like LruBlockCache
> 2.Self manage blocks' storage position through Bucket Allocator
> 3.The cached blocks could be stored in the memory or file system
> 4.Bucket Cache could be used as a mainly block cache(see CombinedBlockCache), 
> combined with LruBlockCache to decrease CMS and fragment by GC.
> 5.BucketCache also could be used as a secondary cache(e.g. using Fusionio to 
> store block) to enlarge cache space
> How about SlabCache?
> We have studied and test SlabCache first, but the result is bad, because:
> 1.SlabCache use SingleSizeCache, its use ratio of memory is low because kinds 
> of block size, especially using DataBlockEncoding
> 2.SlabCache is uesd in DoubleBlockCache, block is cached both in SlabCache 
> and LruBlockCache, put the block to LruBlockCache again if hit in SlabCache , 
> it causes CMS and heap fragment don't get any better
> 3.Direct heap performance is not good as heap, and maybe cause OOM, so we 
> recommend using "heap" engine 
> See more in the attachment and in the patch



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-7404) Bucket Cache:A solution about CMS,Heap Fragment and Big Cache on HBASE

2017-06-09 Thread Hanjie Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanjie Gu updated HBASE-7404:
-

thanks for response, I have asked to bucket writer, and the answer is the same.



发自我的小米手机在 "Anoop Sam John (JIRA)" ,2017年6月9日 上午10:01写道:





    [ 
[1]https://issues.apache.org/jira/browse/HBASE-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043811#comment-16043811
 ]

Anoop Sam John commented on HBASE-7404:
---

Because the block size is not a hard limit. While writing HFiles, it is always 
possible that we might have crossed the block size for the current cell.  Then 
only we have check that says the size is crossed so we move on to the next 
block.  To accommodate this possibility, we have 1K extra




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[1] 
https://issues.apache.org/jira/browse/HBASE-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043811#comment-16043811
[2] https://issues.apache.org/jira/browse/HBASE-7404
[3] http://org.apache.hadoop.hbase.io
[4] http://org.apache.hadoop.hbase.io


> Bucket Cache:A solution about CMS,Heap Fragment and Big Cache on HBASE
> --
>
> Key: HBASE-7404
> URL: https://issues.apache.org/jira/browse/HBASE-7404
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.94.3
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.95.0
>
> Attachments: 7404-0.94-fixed-lines.txt, 7404-trunk-v10.patch, 
> 7404-trunk-v11.patch, 7404-trunk-v12.patch, 7404-trunk-v13.patch, 
> 7404-trunk-v13.txt, 7404-trunk-v14.patch, BucketCache.pdf, 
> hbase-7404-94v2.patch, HBASE-7404-backport-0.94.patch, 
> hbase-7404-trunkv2.patch, hbase-7404-trunkv9.patch, Introduction of Bucket 
> Cache.pdf
>
>
> First, thanks @neil from Fusion-IO share the source code.
> Usage:
> 1.Use bucket cache as main memory cache, configured as the following:
> –"hbase.bucketcache.ioengine" "heap" (or "offheap" if using offheap memory to 
> cache block )
> –"hbase.bucketcache.size" 0.4 (size for bucket cache, 0.4 is a percentage of 
> max heap size)
> 2.Use bucket cache as a secondary cache, configured as the following:
> –"hbase.bucketcache.ioengine" "file:/disk1/hbase/cache.data"(The file path 
> where to store the block data)
> –"hbase.bucketcache.size" 1024 (size for bucket cache, unit is MB, so 1024 
> means 1GB)
> –"hbase.bucketcache.combinedcache.enabled" false (default value being true)
> See more configurations from org.apache.hadoop.hbase.io.hfile.CacheConfig and 
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache
> What's Bucket Cache? 
> It could greatly decrease CMS and heap fragment by GC
> It support a large cache space for High Read Performance by using high speed 
> disk like Fusion-io
> 1.An implementation of block cache like LruBlockCache
> 2.Self manage blocks' storage position through Bucket Allocator
> 3.The cached blocks could be stored in the memory or file system
> 4.Bucket Cache could be used as a mainly block cache(see CombinedBlockCache), 
> combined with LruBlockCache to decrease CMS and fragment by GC.
> 5.BucketCache also could be used as a secondary cache(e.g. using Fusionio to 
> store block) to enlarge cache space
> How about SlabCache?
> We have studied and test SlabCache first, but the result is bad, because:
> 1.SlabCache use SingleSizeCache, its use ratio of memory is low because kinds 
> of block size, especially using DataBlockEncoding
> 2.SlabCache is uesd in DoubleBlockCache, block is cached both in SlabCache 
> and LruBlockCache, put the block to LruBlockCache again if hit in SlabCache , 
> it causes CMS and heap fragment don't get any better
> 3.Direct heap performance is not good as heap, and maybe cause OOM, so we 
> recommend using "heap" engine 
> See more in the attachment and in the patch



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-4811) Support reverse Scan

2017-06-21 Thread Hanjie Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057114#comment-16057114
 ] 

Hanjie Gu commented on HBASE-4811:
--

I am confused  about the top Description.
In my opinion reverse scan will output reversely, from bottom to up, starting 
from the point of start key.
That is given the such example, for the following rows:
aaa/c1:q1/value1
aaa/c1:q2/value2
bbb/c1:q1/value1
bbb/c1:q2/value2
ccc/c1:q1/value1
ccc/c1:q2/value2
ddd/c1:q1/value1
ddd/c1:q2/value2
eee/c1:q1/value1
eee/c1:q2/value2

Shouldn't a reversed scan from 'ddd' to 'bbb'(exclude) output like this:
ddd/c1:q2/value2
ddd/c1:q1/value1
ccc/c1:q2/value2
ccc/c1:q1/value1
???

However, the Description says like this:
ddd/c1:q1/value1
ddd/c1:q2/value2
ccc/c1:q1/value1
ccc/c1:q2/value2

did it wrote error? or I have a misunderstand?

> Support reverse Scan
> 
>
> Key: HBASE-4811
> URL: https://issues.apache.org/jira/browse/HBASE-4811
> Project: HBase
>  Issue Type: New Feature
>  Components: Client
>Affects Versions: 0.20.6, 0.94.7
>Reporter: John Carrino
>Assignee: chunhui shen
> Fix For: 0.98.0
>
> Attachments: 4811-0.94-v22.txt, 4811-0.94-v23.txt, 4811-0.94-v25.txt, 
> 4811-0.94-v3.txt, 4811-trunk-v10.txt, 4811-trunk-v29.patch, 
> 4811-trunk-v5.patch, HBase-4811-0.94.3modified.txt, hbase-4811-0.94 
> v21.patch, hbase-4811-0.94-v24.patch, HBase-4811-0.94-v2.txt, 
> hbase-4811-trunkv11.patch, hbase-4811-trunkv12.patch, 
> hbase-4811-trunkv13.patch, hbase-4811-trunkv14.patch, 
> hbase-4811-trunkv15.patch, hbase-4811-trunkv16.patch, 
> hbase-4811-trunkv17.patch, hbase-4811-trunkv18.patch, 
> hbase-4811-trunkv19.patch, hbase-4811-trunkv1.patch, 
> hbase-4811-trunkv20.patch, hbase-4811-trunkv21.patch, 
> hbase-4811-trunkv24.patch, hbase-4811-trunkv24.patch, 
> hbase-4811-trunkv25.patch, hbase-4811-trunkv26.patch, 
> hbase-4811-trunkv27.patch, hbase-4811-trunkv28.patch, 
> hbase-4811-trunkv4.patch, hbase-4811-trunkv6.patch, hbase-4811-trunkv7.patch, 
> hbase-4811-trunkv8.patch, hbase-4811-trunkv9.patch
>
>
> Reversed scan means scan the rows backward. 
> And StartRow bigger than StopRow in a reversed scan.
> For example, for the following rows:
> aaa/c1:q1/value1
> aaa/c1:q2/value2
> bbb/c1:q1/value1
> bbb/c1:q2/value2
> ccc/c1:q1/value1
> ccc/c1:q2/value2
> ddd/c1:q1/value1
> ddd/c1:q2/value2
> eee/c1:q1/value1
> eee/c1:q2/value2
> you could do a reversed scan from 'ddd' to 'bbb'(exclude) like this:
> Scan scan = new Scan();
> scan.setStartRow('ddd');
> scan.setStopRow('bbb');
> scan.setReversed(true);
> for(Result result:htable.getScanner(scan)){
>  System.out.println(result);
> }
> Aslo you could do the reversed scan with shell like this:
> hbase> scan 'table',{REVERSED => true,STARTROW=>'ddd', STOPROW=>'bbb'}
> And the output is:
> ddd/c1:q1/value1
> ddd/c1:q2/value2
> ccc/c1:q1/value1
> ccc/c1:q2/value2
> All the documentation I find about HBase says that if you want forward and 
> reverse scans you should just build 2 tables and one be ascending and one 
> descending.  Is there a fundamental reason that HBase only supports forward 
> Scan?  It seems like a lot of extra space overhead and coding overhead (to 
> keep them in sync) to support 2 tables.  
> I am assuming this has been discussed before, but I can't find the 
> discussions anywhere about it or why it would be infeasible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HBASE-4811) Support reverse Scan

2017-06-21 Thread Hanjie Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057114#comment-16057114
 ] 

Hanjie Gu edited comment on HBASE-4811 at 6/21/17 7:50 AM:
---

I am confused  about the top Description.
In my opinion reverse scan will output reversely, from bottom to up, starting 
from the point of start key.
That is given the such example, for the following rows:
aaa/c1:q1/value1
aaa/c1:q2/value2
bbb/c1:q1/value1
bbb/c1:q2/value2
ccc/c1:q1/value1
ccc/c1:q2/value2
ddd/c1:q1/value1
ddd/c1:q2/value2
eee/c1:q1/value1
eee/c1:q2/value2

Shouldn't a reversed scan from 'ddd' to 'bbb'(exclude) output like this:
ddd/c1:q2/value2
ddd/c1:q1/value1
ccc/c1:q2/value2
ccc/c1:q1/value1

However, the Description says like this:
ddd/c1:q1/value1
ddd/c1:q2/value2
ccc/c1:q1/value1
ccc/c1:q2/value2

did it wrote error? or I have a misunderstand???


was (Author: jackgu):
I am confused  about the top Description.
In my opinion reverse scan will output reversely, from bottom to up, starting 
from the point of start key.
That is given the such example, for the following rows:
aaa/c1:q1/value1
aaa/c1:q2/value2
bbb/c1:q1/value1
bbb/c1:q2/value2
ccc/c1:q1/value1
ccc/c1:q2/value2
ddd/c1:q1/value1
ddd/c1:q2/value2
eee/c1:q1/value1
eee/c1:q2/value2

Shouldn't a reversed scan from 'ddd' to 'bbb'(exclude) output like this:
ddd/c1:q2/value2
ddd/c1:q1/value1
ccc/c1:q2/value2
ccc/c1:q1/value1
???

However, the Description says like this:
ddd/c1:q1/value1
ddd/c1:q2/value2
ccc/c1:q1/value1
ccc/c1:q2/value2

did it wrote error? or I have a misunderstand?

> Support reverse Scan
> 
>
> Key: HBASE-4811
> URL: https://issues.apache.org/jira/browse/HBASE-4811
> Project: HBase
>  Issue Type: New Feature
>  Components: Client
>Affects Versions: 0.20.6, 0.94.7
>Reporter: John Carrino
>Assignee: chunhui shen
> Fix For: 0.98.0
>
> Attachments: 4811-0.94-v22.txt, 4811-0.94-v23.txt, 4811-0.94-v25.txt, 
> 4811-0.94-v3.txt, 4811-trunk-v10.txt, 4811-trunk-v29.patch, 
> 4811-trunk-v5.patch, HBase-4811-0.94.3modified.txt, hbase-4811-0.94 
> v21.patch, hbase-4811-0.94-v24.patch, HBase-4811-0.94-v2.txt, 
> hbase-4811-trunkv11.patch, hbase-4811-trunkv12.patch, 
> hbase-4811-trunkv13.patch, hbase-4811-trunkv14.patch, 
> hbase-4811-trunkv15.patch, hbase-4811-trunkv16.patch, 
> hbase-4811-trunkv17.patch, hbase-4811-trunkv18.patch, 
> hbase-4811-trunkv19.patch, hbase-4811-trunkv1.patch, 
> hbase-4811-trunkv20.patch, hbase-4811-trunkv21.patch, 
> hbase-4811-trunkv24.patch, hbase-4811-trunkv24.patch, 
> hbase-4811-trunkv25.patch, hbase-4811-trunkv26.patch, 
> hbase-4811-trunkv27.patch, hbase-4811-trunkv28.patch, 
> hbase-4811-trunkv4.patch, hbase-4811-trunkv6.patch, hbase-4811-trunkv7.patch, 
> hbase-4811-trunkv8.patch, hbase-4811-trunkv9.patch
>
>
> Reversed scan means scan the rows backward. 
> And StartRow bigger than StopRow in a reversed scan.
> For example, for the following rows:
> aaa/c1:q1/value1
> aaa/c1:q2/value2
> bbb/c1:q1/value1
> bbb/c1:q2/value2
> ccc/c1:q1/value1
> ccc/c1:q2/value2
> ddd/c1:q1/value1
> ddd/c1:q2/value2
> eee/c1:q1/value1
> eee/c1:q2/value2
> you could do a reversed scan from 'ddd' to 'bbb'(exclude) like this:
> Scan scan = new Scan();
> scan.setStartRow('ddd');
> scan.setStopRow('bbb');
> scan.setReversed(true);
> for(Result result:htable.getScanner(scan)){
>  System.out.println(result);
> }
> Aslo you could do the reversed scan with shell like this:
> hbase> scan 'table',{REVERSED => true,STARTROW=>'ddd', STOPROW=>'bbb'}
> And the output is:
> ddd/c1:q1/value1
> ddd/c1:q2/value2
> ccc/c1:q1/value1
> ccc/c1:q2/value2
> All the documentation I find about HBase says that if you want forward and 
> reverse scans you should just build 2 tables and one be ascending and one 
> descending.  Is there a fundamental reason that HBase only supports forward 
> Scan?  It seems like a lot of extra space overhead and coding overhead (to 
> keep them in sync) to support 2 tables.  
> I am assuming this has been discussed before, but I can't find the 
> discussions anywhere about it or why it would be infeasible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)