[jira] [Commented] (HBASE-18300) Implement a Multi TieredBucketCache
[ https://issues.apache.org/jira/browse/HBASE-18300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17567489#comment-17567489 ] Bryan Beaudreault commented on HBASE-18300: --- We mostly use SSD for our clusters. For the big disk requirement clusters, this is wasteful. We have tried simply moving to slow spinning disks (like d3 instances), but the latencies during request spikes was too high. A tiered system mentioned above should help. We are going to do some experiments with BucketCache in FileIO mode but I anticipate missing the larger off-heap BucketCache layer. > Implement a Multi TieredBucketCache > --- > > Key: HBASE-18300 > URL: https://issues.apache.org/jira/browse/HBASE-18300 > Project: HBase > Issue Type: New Feature > Components: BucketCache >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Major > > We did an internal brainstorming to study the feasibility of this. Some of > our recent tests on SSDs like Optane shows that they are vastly faster in > randomreads and can act as effective caches. > In the current state we have a single tier of Bucket cache and the bucket > cache can either be offheap or configured to work with file mode. (The file > mode can have multiple files backing it). > So this model restricts us from using either the memory or the file and not > both. > With the advent of faster devices like Optane SSDs, NVMe based devices it is > better we try to utilize all those devices and try using them for the bucket > cache so that we can avoid the impact of slower devices where the actual data > resides on the HDFS data nodes. > Combined with this we can allow the user to configure the caching layer per > family/table so that one can effectively make use of the caching tiers. > Can upload a design doc here. Before that, would like to know the suggestions > here. Thoughts!!! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-18300) Implement a Multi TieredBucketCache
[ https://issues.apache.org/jira/browse/HBASE-18300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17567487#comment-17567487 ] Bryan Beaudreault commented on HBASE-18300: --- [~ram_krish] did you end up doing anything here? I'm thinking about this. I think it would be helpful with S3 backed clusters or other cloud-based clusters. Let's say you have huge disk requirements but low CPU requirements (low req/s). In cloud, you may not have access to perfect server configuration and options will be limited to cut cost. Often on large clusters like this, you may have lots of old data which is not frequently accessed. TieredBucketCache could help here. I'm imagining something like: * L1: 4-5gb small to keep GC costs down * L2: off-heap BucketCache with 50-150gb * L3: 500gb or >1TB of direct attached NVME SSDs (in AWS this might be i4i or other similar) * HDFS: backed by S3 or slow EBS drives Today we can accomplish this with BucketCache, but we miss out on the big L2 off-heap which could really help accelerate. An alternative configuration would use fast EBS drives for L3, and slow spinning disks for HDFS (in aws, like d3 instance type). > Implement a Multi TieredBucketCache > --- > > Key: HBASE-18300 > URL: https://issues.apache.org/jira/browse/HBASE-18300 > Project: HBase > Issue Type: New Feature > Components: BucketCache >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Major > > We did an internal brainstorming to study the feasibility of this. Some of > our recent tests on SSDs like Optane shows that they are vastly faster in > randomreads and can act as effective caches. > In the current state we have a single tier of Bucket cache and the bucket > cache can either be offheap or configured to work with file mode. (The file > mode can have multiple files backing it). > So this model restricts us from using either the memory or the file and not > both. > With the advent of faster devices like Optane SSDs, NVMe based devices it is > better we try to utilize all those devices and try using them for the bucket > cache so that we can avoid the impact of slower devices where the actual data > resides on the HDFS data nodes. > Combined with this we can allow the user to configure the caching layer per > family/table so that one can effectively make use of the caching tiers. > Can upload a design doc here. Before that, would like to know the suggestions > here. Thoughts!!! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-18300) Implement a Multi TieredBucketCache
[ https://issues.apache.org/jira/browse/HBASE-18300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020850#comment-17020850 ] Michael Stack commented on HBASE-18300: --- Unscheduling fieature not being worked on. > Implement a Multi TieredBucketCache > --- > > Key: HBASE-18300 > URL: https://issues.apache.org/jira/browse/HBASE-18300 > Project: HBase > Issue Type: New Feature > Components: BucketCache >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Major > > We did an internal brainstorming to study the feasibility of this. Some of > our recent tests on SSDs like Optane shows that they are vastly faster in > randomreads and can act as effective caches. > In the current state we have a single tier of Bucket cache and the bucket > cache can either be offheap or configured to work with file mode. (The file > mode can have multiple files backing it). > So this model restricts us from using either the memory or the file and not > both. > With the advent of faster devices like Optane SSDs, NVMe based devices it is > better we try to utilize all those devices and try using them for the bucket > cache so that we can avoid the impact of slower devices where the actual data > resides on the HDFS data nodes. > Combined with this we can allow the user to configure the caching layer per > family/table so that one can effectively make use of the caching tiers. > Can upload a design doc here. Before that, would like to know the suggestions > here. Thoughts!!! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-18300) Implement a Multi TieredBucketCache
[ https://issues.apache.org/jira/browse/HBASE-18300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16298360#comment-16298360 ] ramkrishna.s.vasudevan commented on HBASE-18300: Thanks. Just seeing this comment. Lets take it forward. > Implement a Multi TieredBucketCache > --- > > Key: HBASE-18300 > URL: https://issues.apache.org/jira/browse/HBASE-18300 > Project: HBase > Issue Type: New Feature > Components: BucketCache >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.1.0 > > > We did an internal brainstorming to study the feasibility of this. Some of > our recent tests on SSDs like Optane shows that they are vastly faster in > randomreads and can act as effective caches. > In the current state we have a single tier of Bucket cache and the bucket > cache can either be offheap or configured to work with file mode. (The file > mode can have multiple files backing it). > So this model restricts us from using either the memory or the file and not > both. > With the advent of faster devices like Optane SSDs, NVMe based devices it is > better we try to utilize all those devices and try using them for the bucket > cache so that we can avoid the impact of slower devices where the actual data > resides on the HDFS data nodes. > Combined with this we can allow the user to configure the caching layer per > family/table so that one can effectively make use of the caching tiers. > Can upload a design doc here. Before that, would like to know the suggestions > here. Thoughts!!! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18300) Implement a Multi TieredBucketCache
[ https://issues.apache.org/jira/browse/HBASE-18300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292133#comment-16292133 ] Anoop Sam John commented on HBASE-18300: As said in HBASE-19357 comments, when we do this, we should see putting the system table's data blocks always on off heap BC. (When we have File mode + off heap mode tiered BC) > Implement a Multi TieredBucketCache > --- > > Key: HBASE-18300 > URL: https://issues.apache.org/jira/browse/HBASE-18300 > Project: HBase > Issue Type: New Feature > Components: BucketCache >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.1.0 > > > We did an internal brainstorming to study the feasibility of this. Some of > our recent tests on SSDs like Optane shows that they are vastly faster in > randomreads and can act as effective caches. > In the current state we have a single tier of Bucket cache and the bucket > cache can either be offheap or configured to work with file mode. (The file > mode can have multiple files backing it). > So this model restricts us from using either the memory or the file and not > both. > With the advent of faster devices like Optane SSDs, NVMe based devices it is > better we try to utilize all those devices and try using them for the bucket > cache so that we can avoid the impact of slower devices where the actual data > resides on the HDFS data nodes. > Combined with this we can allow the user to configure the caching layer per > family/table so that one can effectively make use of the caching tiers. > Can upload a design doc here. Before that, would like to know the suggestions > here. Thoughts!!! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18300) Implement a Multi TieredBucketCache
[ https://issues.apache.org/jira/browse/HBASE-18300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237192#comment-16237192 ] ramkrishna.s.vasudevan commented on HBASE-18300: Just did some initial testing with default bucket cache with 10G offheap bucket cache and a Tiered bucket cache with L1 - 10 G offheap L2 - 24 G optane SSD L3 - 24 G fultondale SSD (PCIe SSD) HDFS is configured to write and read from HDDs. I can see that with 80G data (total data set size) and with 75 threads during random reads we get 23% improvement. But this may not be a real time installation where people can have the HDFS itself in PCIe SSDs. So in that case L1 and L2 cache would be ideal choice. In cloud like deployment where there is only file based bucket cache they can allow memory based bucket cache also to be a tier. As mentioned in the doc there are some TODOs and some bucket cache related clean up JIRAs like heapspace occupancy of bucket cache etc will help here too. > Implement a Multi TieredBucketCache > --- > > Key: HBASE-18300 > URL: https://issues.apache.org/jira/browse/HBASE-18300 > Project: HBase > Issue Type: New Feature > Components: BucketCache >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Major > Fix For: 2.1.0 > > > We did an internal brainstorming to study the feasibility of this. Some of > our recent tests on SSDs like Optane shows that they are vastly faster in > randomreads and can act as effective caches. > In the current state we have a single tier of Bucket cache and the bucket > cache can either be offheap or configured to work with file mode. (The file > mode can have multiple files backing it). > So this model restricts us from using either the memory or the file and not > both. > With the advent of faster devices like Optane SSDs, NVMe based devices it is > better we try to utilize all those devices and try using them for the bucket > cache so that we can avoid the impact of slower devices where the actual data > resides on the HDFS data nodes. > Combined with this we can allow the user to configure the caching layer per > family/table so that one can effectively make use of the caching tiers. > Can upload a design doc here. Before that, would like to know the suggestions > here. Thoughts!!! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18300) Implement a Multi TieredBucketCache
[ https://issues.apache.org/jira/browse/HBASE-18300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100294#comment-16100294 ] ramkrishna.s.vasudevan commented on HBASE-18300: Please find the link to the doc describing the feature. https://docs.google.com/document/d/1HF2GOSWXWoPapRwiKgMw2k516uyaTAWoEIunB72GVWU/edit?usp=sharing > Implement a Multi TieredBucketCache > --- > > Key: HBASE-18300 > URL: https://issues.apache.org/jira/browse/HBASE-18300 > Project: HBase > Issue Type: New Feature > Components: BucketCache >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0, 2.0.0-alpha-2 > > > We did an internal brainstorming to study the feasibility of this. Some of > our recent tests on SSDs like Optane shows that they are vastly faster in > randomreads and can act as effective caches. > In the current state we have a single tier of Bucket cache and the bucket > cache can either be offheap or configured to work with file mode. (The file > mode can have multiple files backing it). > So this model restricts us from using either the memory or the file and not > both. > With the advent of faster devices like Optane SSDs, NVMe based devices it is > better we try to utilize all those devices and try using them for the bucket > cache so that we can avoid the impact of slower devices where the actual data > resides on the HDFS data nodes. > Combined with this we can allow the user to configure the caching layer per > family/table so that one can effectively make use of the caching tiers. > Can upload a design doc here. Before that, would like to know the suggestions > here. Thoughts!!! -- This message was sent by Atlassian JIRA (v6.4.14#64029)