[jira] [Commented] (HBASE-23887) BlockCache performance improve by reduce eviction rate
[ https://issues.apache.org/jira/browse/HBASE-23887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200361#comment-17200361 ] Vladimir Rodionov commented on HBASE-23887: --- {quote} Thats why this feature will work when the data really can't fit into BlockCache -> eviction rate really work hard and it usually means reading blocks evenly distributed. {quote} For such use cases (if they exist in a wild) cache is no help and must be disabled. HBase can do it per table/cf. I do not see any improvements in this "feature". It is just ""use cache slightly when data is not catchable" type of improvemt. > BlockCache performance improve by reduce eviction rate > -- > > Key: HBASE-23887 > URL: https://issues.apache.org/jira/browse/HBASE-23887 > Project: HBase > Issue Type: Improvement > Components: BlockCache, Performance >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: 1582787018434_rs_metrics.jpg, > 1582801838065_rs_metrics_new.png, BC_LongRun.png, > BlockCacheEvictionProcess.gif, BlockCacheEvictionProcess.gif, cmp.png, > evict_BC100_vs_BC23.png, eviction_100p.png, eviction_100p.png, > eviction_100p.png, gc_100p.png, graph.png, image-2020-06-07-08-11-11-929.png, > image-2020-06-07-08-19-00-922.png, image-2020-06-07-12-07-24-903.png, > image-2020-06-07-12-07-30-307.png, image-2020-06-08-17-38-45-159.png, > image-2020-06-08-17-38-52-579.png, image-2020-06-08-18-35-48-366.png, > image-2020-06-14-20-51-11-905.png, image-2020-06-22-05-57-45-578.png, > ratio.png, ratio2.png, read_requests_100pBC_vs_23pBC.png, requests_100p.png, > requests_100p.png, requests_new2_100p.png, requests_new_100p.png, scan.png, > scan_and_gets.png, scan_and_gets2.png, wave.png > > > Hi! > I first time here, correct me please if something wrong. > All latest information is here: > [https://docs.google.com/document/d/1X8jVnK_3lp9ibpX6lnISf_He-6xrHZL0jQQ7hoTV0-g/edit?usp=sharing] > I want propose how to improve performance when data in HFiles much more than > BlockChache (usual story in BigData). The idea - caching only part of DATA > blocks. It is good becouse LruBlockCache starts to work and save huge amount > of GC. > Sometimes we have more data than can fit into BlockCache and it is cause a > high rate of evictions. In this case we can skip cache a block N and insted > cache the N+1th block. Anyway we would evict N block quite soon and that why > that skipping good for performance. > --- > Some information below isn't actual > --- > > > Example: > Imagine we have little cache, just can fit only 1 block and we are trying to > read 3 blocks with offsets: > 124 > 198 > 223 > Current way - we put the block 124, then put 198, evict 124, put 223, evict > 198. A lot of work (5 actions). > With the feature - last few digits evenly distributed from 0 to 99. When we > divide by modulus we got: > 124 -> 24 > 198 -> 98 > 223 -> 23 > It helps to sort them. Some part, for example below 50 (if we set > *hbase.lru.cache.data.block.percent* = 50) go into the cache. And skip > others. It means we will not try to handle the block 198 and save CPU for > other job. In the result - we put block 124, then put 223, evict 124 (3 > actions). > See the picture in attachment with test below. Requests per second is higher, > GC is lower. > > The key point of the code: > Added the parameter: *hbase.lru.cache.data.block.percent* which by default = > 100 > > But if we set it 1-99, then will work the next logic: > > > {code:java} > public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean > inMemory) { > if (cacheDataBlockPercent != 100 && buf.getBlockType().isData()) > if (cacheKey.getOffset() % 100 >= cacheDataBlockPercent) > return; > ... > // the same code as usual > } > {code} > > Other parameters help to control when this logic will be enabled. It means it > will work only while heavy reading going on. > hbase.lru.cache.heavy.eviction.count.limit - set how many times have to run > eviction process that start to avoid of putting data to BlockCache > hbase.lru.cache.heavy.eviction.bytes.size.limit - set how many bytes have to > evicted each time that start to avoid of putting data to BlockCache > By default: if 10 times (100 secunds) evicted more than 10 MB (each time) > then we start to skip 50% of data blocks. > When heavy evitions process end then new logic off and will put into > BlockCache all blocks again. > > Descriptions of the test: > 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 4 RegionServers > 4 tables by 64 regions by 1.88 Gb data in each = 600 Gb total (only FAST_DIFF) > Total BlockCache Size = 48 Gb (8 % of data in HFiles) > Random read in 20 threads > > I am going to ma
[jira] [Commented] (HBASE-14847) Add FIFO compaction section to HBase book
[ https://issues.apache.org/jira/browse/HBASE-14847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17179985#comment-17179985 ] Vladimir Rodionov commented on HBASE-14847: --- Sure, go ahead. > Add FIFO compaction section to HBase book > - > > Key: HBASE-14847 > URL: https://issues.apache.org/jira/browse/HBASE-14847 > Project: HBase > Issue Type: Task > Components: documentation >Affects Versions: 2.0.0 >Reporter: Vladimir Rodionov >Priority: Major > > HBASE-14468 introduced new compaction policy. Book needs to be updated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24101) Correct snapshot handling
[ https://issues.apache.org/jira/browse/HBASE-24101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov resolved HBASE-24101. --- Resolution: Not A Problem > Correct snapshot handling > - > > Key: HBASE-24101 > URL: https://issues.apache.org/jira/browse/HBASE-24101 > Project: HBase > Issue Type: Sub-task > Components: mob, snapshots >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > > Reopening this umbrella to address correct snapshot handling. Particularly, > the following scenario must be verified: > # load data to a table > # take snapshot > # major compact table > # run mob file cleaner chore > # load data to table > # restore table from snapshot into another table > # verify data integrity > # restore table from snapshot into original table > # verify data integrity -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov resolved HBASE-22749. --- Resolution: Fixed > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-22749-branch-2.2-v4.patch, > HBASE-22749-master-v1.patch, HBASE-22749-master-v2.patch, > HBASE-22749-master-v3.patch, HBASE-22749-master-v4.patch, > HBASE-22749_nightly_Unit_Test_Results.csv, > HBASE-22749_nightly_unit_test_analyzer.pdf, HBase-MOB-2.0-v3.0.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24101) Correct snapshot handling
[ https://issues.apache.org/jira/browse/HBASE-24101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17086728#comment-17086728 ] Vladimir Rodionov commented on HBASE-24101: --- Verified, not the issue. > Correct snapshot handling > - > > Key: HBASE-24101 > URL: https://issues.apache.org/jira/browse/HBASE-24101 > Project: HBase > Issue Type: Sub-task > Components: mob, snapshots >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > > Reopening this umbrella to address correct snapshot handling. Particularly, > the following scenario must be verified: > # load data to a table > # take snapshot > # major compact table > # run mob file cleaner chore > # load data to table > # restore table from snapshot into another table > # verify data integrity > # restore table from snapshot into original table > # verify data integrity -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24101) Correct snapshot handling
Vladimir Rodionov created HBASE-24101: - Summary: Correct snapshot handling Key: HBASE-24101 URL: https://issues.apache.org/jira/browse/HBASE-24101 Project: HBase Issue Type: Sub-task Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23723) Add tests for MOB compaction on a table created from snapshot
[ https://issues.apache.org/jira/browse/HBASE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073132#comment-17073132 ] Vladimir Rodionov commented on HBASE-23723: --- Reopening this umbrella to address correct snapshot handling. Particularly, the following scenario must be verified: #load data to a table #take snapshot #major compact table #run mob file cleaner chore #load data to table #restore table from snapshot into another table #verify data integrity #restore table from snapshot into original table #verify data integrity > Add tests for MOB compaction on a table created from snapshot > - > > Key: HBASE-23723 > URL: https://issues.apache.org/jira/browse/HBASE-23723 > Project: HBase > Issue Type: Sub-task > Components: Compaction, mob >Reporter: Vladimir Rodionov >Assignee: Sean Busbey >Priority: Blocker > > How does code handle snapshot naming convention for MOB files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-23723) Add tests for MOB compaction on a table created from snapshot
[ https://issues.apache.org/jira/browse/HBASE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073132#comment-17073132 ] Vladimir Rodionov edited comment on HBASE-23723 at 4/1/20, 7:38 PM: Reopening this umbrella to address correct snapshot handling. Particularly, the following scenario must be verified: # load data to a table # take snapshot # major compact table # run mob file cleaner chore # load data to table # restore table from snapshot into another table # verify data integrity # restore table from snapshot into original table # verify data integrity was (Author: vrodionov): Reopening this umbrella to address correct snapshot handling. Particularly, the following scenario must be verified: #load data to a table #take snapshot #major compact table #run mob file cleaner chore #load data to table #restore table from snapshot into another table #verify data integrity #restore table from snapshot into original table #verify data integrity > Add tests for MOB compaction on a table created from snapshot > - > > Key: HBASE-23723 > URL: https://issues.apache.org/jira/browse/HBASE-23723 > Project: HBase > Issue Type: Sub-task > Components: Compaction, mob >Reporter: Vladimir Rodionov >Assignee: Sean Busbey >Priority: Blocker > > How does code handle snapshot naming convention for MOB files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073122#comment-17073122 ] Vladimir Rodionov edited comment on HBASE-22749 at 4/1/20, 7:37 PM: Reopening this umbrella to address correct snapshot handling. Particularly, the following scenario must be verified: # load data to a table # take snapshot # major compact table # run mob file cleaner chore # load data to table # restore table from snapshot into another table # verify data integrity # restore table from snapshot into original table # verify data integrity was (Author: vrodionov): Reopening this umbrella to address correct snapshot handling > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-22749-branch-2.2-v4.patch, > HBASE-22749-master-v1.patch, HBASE-22749-master-v2.patch, > HBASE-22749-master-v3.patch, HBASE-22749-master-v4.patch, > HBASE-22749_nightly_Unit_Test_Results.csv, > HBASE-22749_nightly_unit_test_analyzer.pdf, HBase-MOB-2.0-v3.0.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov reopened HBASE-22749: --- Reopening this umbrella to address correct snapshot handling > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-22749-branch-2.2-v4.patch, > HBASE-22749-master-v1.patch, HBASE-22749-master-v2.patch, > HBASE-22749-master-v3.patch, HBASE-22749-master-v4.patch, > HBASE-22749_nightly_Unit_Test_Results.csv, > HBASE-22749_nightly_unit_test_analyzer.pdf, HBase-MOB-2.0-v3.0.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23363) MobCompactionChore takes a long time to complete once job
[ https://issues.apache.org/jira/browse/HBASE-23363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov resolved HBASE-23363. --- Resolution: Won't Fix HBASE-22749 has introduced distributed MOB compaction, which significantly improves performance. Distributed MOB compaction will be back-ported to 2.x branches soon. > MobCompactionChore takes a long time to complete once job > - > > Key: HBASE-23363 > URL: https://issues.apache.org/jira/browse/HBASE-23363 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.1, 2.2.2 >Reporter: Bo Cui >Priority: Major > Attachments: image-2019-12-04-11-01-20-352.png > > > mob table compcation is done in master > poolSize of hbase choreService is 1 > if hbase has 1000 mob table,MobCompactionChore takes a long time to complete > once job, other chore need to wait > !image-2019-12-04-11-01-20-352.png! > {code:java} > MobCompactionChore#chore() { >... >for (TableDescriptor htd : map.values()) { > ... > for (ColumnFamilyDescriptor hcd : htd.getColumnFamilies()) { > if hcd is mob{ > MobUtils.doMobCompaction; > } > } > ... >} >... > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22075: -- Resolution: Fixed Status: Resolved (was: Patch Available) This is problem has been addressed in HBASE-22749. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: compaction, mob > Fix For: 2.1.10, 2.2.5, 2.0.7 > > Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch, > HBASE-22075.test-only.0.patch, HBASE-22075.test-only.1.patch, > HBASE-22075.test-only.2.patch, ReproMOBDataLoss.java > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23887) BlockCache performance improve
[ https://issues.apache.org/jira/browse/HBASE-23887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043810#comment-17043810 ] Vladimir Rodionov commented on HBASE-23887: --- So, basically, you propose to cache only some %% of data blocks randomly? Hmm. Do you do a lot of large scans? Large scans trash block cache if not set to bypass it. Setting block cache disabled for large scan operation can help. > BlockCache performance improve > -- > > Key: HBASE-23887 > URL: https://issues.apache.org/jira/browse/HBASE-23887 > Project: HBase > Issue Type: New Feature >Reporter: Danil Lipovoy >Priority: Minor > Attachments: cmp.png > > > Hi! > I first time here, correct me please if something wrong. > I want propose how to improve performance when data in HFiles much more than > BlockChache (usual story in BigData). The idea - caching only part of DATA > blocks. It is good becouse LruBlockCache starts to work and save huge amount > of GC. See the picture in attachment with test below. Requests per second is > higher, GC is lower. > > The key point of the code: > Added the parameter: *hbase.lru.cache.data.block.percent* which by default = > 100 > > But if we set it 0-99, then will work the next logic: > > > {code:java} > public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean > inMemory) { > if (cacheDataBlockPercent != 100 && buf.getBlockType().isData()) > if (cacheKey.getOffset() % 100 >= cacheDataBlockPercent) > return; > ... > // the same code as usual > } > {code} > > > Descriptions of the test: > 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 4 RegionServers > 4 tables by 64 regions by 1.88 Gb data in each = 600 Gb total (only FAST_DIFF) > Total BlockCache Size = 48 Gb (8 % of data in HFiles) > Random read in 20 threads > > I am going to make Pull Request, hope it is right way to make some > contribution in this cool product. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-23854) Documentation update of external_apis.adoc#example-scala-code
[ https://issues.apache.org/jira/browse/HBASE-23854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov reassigned HBASE-23854: - Assignee: (was: Vladimir Rodionov) > Documentation update of external_apis.adoc#example-scala-code > - > > Key: HBASE-23854 > URL: https://issues.apache.org/jira/browse/HBASE-23854 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Michael Heil >Priority: Trivial > Labels: beginner > Attachments: HBASE-23854.patch > > > Update the Example Scala Code in the Reference Guide as it contains > deprecated content such as > * new HBaseConfiguration() > * new HTable(conf, "mytable") > * add(Bytes.toBytes("ids"),Bytes.toBytes("id1"),Bytes.toBytes("one")) > Replace it with: > * HBaseConfiguration.create() > * TableName.valueOf({color:#6a8759}"mytable"{color}) > * > addColumn(Bytes.toBytes({color:#6a8759}"ids"{color}){color:#cc7832},{color}Bytes.toBytes({color:#6a8759}"id1"{color}){color:#cc7832},{color}Bytes.toBytes({color:#6a8759}"one"{color})) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23840) Revert optimized IO back to general compaction during upgrade/migration process
[ https://issues.apache.org/jira/browse/HBASE-23840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov resolved HBASE-23840. --- Resolution: Fixed > Revert optimized IO back to general compaction during upgrade/migration > process > > > Key: HBASE-23840 > URL: https://issues.apache.org/jira/browse/HBASE-23840 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > > Optimized mode IO compaction may leave old MOB file, which size is above > threshold as is and don't compact it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-23840) Revert optimized IO back to general compaction during upgrade/migration process
[ https://issues.apache.org/jira/browse/HBASE-23840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-23840: -- Summary: Revert optimized IO back to general compaction during upgrade/migration process (was: Revert optimized IO backt to general compaction during upgrade/migration process ) > Revert optimized IO back to general compaction during upgrade/migration > process > > > Key: HBASE-23840 > URL: https://issues.apache.org/jira/browse/HBASE-23840 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > > Optimized mode IO compaction may leave old MOB file, which size is above > threshold as is and don't compact it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23840) Revert optimized IO backt to general compaction during upgrade/migration process
Vladimir Rodionov created HBASE-23840: - Summary: Revert optimized IO backt to general compaction during upgrade/migration process Key: HBASE-23840 URL: https://issues.apache.org/jira/browse/HBASE-23840 Project: HBase Issue Type: Sub-task Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Optimized mode IO compaction may leave old MOB file, which size is above threshold as is and don't compact it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23724) Change code in StoreFileInfo to use regex matcher for mob files.
Vladimir Rodionov created HBASE-23724: - Summary: Change code in StoreFileInfo to use regex matcher for mob files. Key: HBASE-23724 URL: https://issues.apache.org/jira/browse/HBASE-23724 Project: HBase Issue Type: Sub-task Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Currently it sits on top of other regex with additional logic added. Code should simplified. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23723) Add tests for MOB compaction on a table created from snapshot
Vladimir Rodionov created HBASE-23723: - Summary: Add tests for MOB compaction on a table created from snapshot Key: HBASE-23723 URL: https://issues.apache.org/jira/browse/HBASE-23723 Project: HBase Issue Type: Sub-task Environment: How does code handle snapshot naming convention for MOB files. Reporter: Vladimir Rodionov -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23571) Handle CompactType.MOB correctly
Vladimir Rodionov created HBASE-23571: - Summary: Handle CompactType.MOB correctly Key: HBASE-23571 URL: https://issues.apache.org/jira/browse/HBASE-23571 Project: HBase Issue Type: Sub-task Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Client facing feature, should be supported or at least properly handled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16991906#comment-16991906 ] Vladimir Rodionov commented on HBASE-22749: --- Created new PR, old one was closed as obsolete. > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v4.patch, > HBASE-22749-master-v1.patch, HBASE-22749-master-v2.patch, > HBASE-22749-master-v3.patch, HBASE-22749-master-v4.patch, > HBase-MOB-2.0-v3.0.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23363) MobCompactionChore takes a long time to complete once job
[ https://issues.apache.org/jira/browse/HBASE-23363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987548#comment-16987548 ] Vladimir Rodionov commented on HBASE-23363: --- Please, refer to HBASE-22749, for the new distributed MOB compaction implementation. This is going to be next MOB soon. I do not think, anybody will be working on optimizing old MOB compaction. > MobCompactionChore takes a long time to complete once job > - > > Key: HBASE-23363 > URL: https://issues.apache.org/jira/browse/HBASE-23363 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.1, 2.2.2 >Reporter: Bo Cui >Priority: Major > Attachments: image-2019-12-04-11-01-20-352.png > > > mob table compcation is done in master > poolSize of hbase choreService is 1 > if hbase has 1000 mob table,MobCompactionChore takes a long time to complete > once job, other chore need to wait > !image-2019-12-04-11-01-20-352.png! > {code:java} > MobCompactionChore#chore() { >... >for (TableDescriptor htd : map.values()) { > ... > for (ColumnFamilyDescriptor hcd : htd.getColumnFamilies()) { > if hcd is mob{ > MobUtils.doMobCompaction; > } > } > ... >} >... > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Status: Patch Available (was: Open) > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v4.patch, > HBASE-22749-master-v1.patch, HBASE-22749-master-v2.patch, > HBASE-22749-master-v3.patch, HBASE-22749-master-v4.patch, > HBase-MOB-2.0-v3.0.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Attachment: HBASE-22749-master-v4.patch > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v4.patch, > HBASE-22749-master-v1.patch, HBASE-22749-master-v2.patch, > HBASE-22749-master-v3.patch, HBASE-22749-master-v4.patch, > HBase-MOB-2.0-v3.0.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Status: Open (was: Patch Available) > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v4.patch, > HBASE-22749-master-v1.patch, HBASE-22749-master-v2.patch, > HBASE-22749-master-v3.patch, HBASE-22749-master-v4.patch, > HBase-MOB-2.0-v3.0.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Status: Patch Available (was: Open) > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v4.patch, > HBASE-22749-master-v1.patch, HBASE-22749-master-v2.patch, > HBASE-22749-master-v3.patch, HBase-MOB-2.0-v3.0.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Attachment: HBASE-22749-master-v3.patch > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v4.patch, > HBASE-22749-master-v1.patch, HBASE-22749-master-v2.patch, > HBASE-22749-master-v3.patch, HBase-MOB-2.0-v3.0.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Status: Open (was: Patch Available) > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v4.patch, > HBASE-22749-master-v1.patch, HBASE-22749-master-v2.patch, > HBase-MOB-2.0-v3.0.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23189) Finalize I/O optimized MOB compaction
[ https://issues.apache.org/jira/browse/HBASE-23189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980442#comment-16980442 ] Vladimir Rodionov commented on HBASE-23189: --- Closing, passes stress tests up to 6M (above 6M HBase fails with NotServingRegionExceptions, which is not related to the feature but a master branch stability issue). will mark this feature as *experimental* in release notes. > Finalize I/O optimized MOB compaction > - > > Key: HBASE-23189 > URL: https://issues.apache.org/jira/browse/HBASE-23189 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > > +corresponding test cases > The current code for I/O optimized compaction has not been tested and > verified yet. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23189) Finalize I/O optimized MOB compaction
[ https://issues.apache.org/jira/browse/HBASE-23189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov resolved HBASE-23189. --- Resolution: Fixed > Finalize I/O optimized MOB compaction > - > > Key: HBASE-23189 > URL: https://issues.apache.org/jira/browse/HBASE-23189 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > > +corresponding test cases > The current code for I/O optimized compaction has not been tested and > verified yet. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23189) Finalize I/O optimized MOB compaction
[ https://issues.apache.org/jira/browse/HBASE-23189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978874#comment-16978874 ] Vladimir Rodionov commented on HBASE-23189: --- Pushed first implementation to parent's PR branch. > Finalize I/O optimized MOB compaction > - > > Key: HBASE-23189 > URL: https://issues.apache.org/jira/browse/HBASE-23189 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > > +corresponding test cases > The current code for I/O optimized compaction has not been tested and > verified yet. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-23189) Finalize I/O optimized MOB compaction
[ https://issues.apache.org/jira/browse/HBASE-23189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-23189: -- Description: +corresponding test cases The current code for I/O optimized compaction has not been tested and verified yet. was: +corresponding test cases The current code for generational compaction has not been tested and verified yet. > Finalize I/O optimized MOB compaction > - > > Key: HBASE-23189 > URL: https://issues.apache.org/jira/browse/HBASE-23189 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > > +corresponding test cases > The current code for I/O optimized compaction has not been tested and > verified yet. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-23189) Finalize I/O optimized MOB compaction
[ https://issues.apache.org/jira/browse/HBASE-23189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-23189: -- Summary: Finalize I/O optimized MOB compaction (was: Finalize generational compaction) > Finalize I/O optimized MOB compaction > - > > Key: HBASE-23189 > URL: https://issues.apache.org/jira/browse/HBASE-23189 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > > +corresponding test cases > The current code for generational compaction has not been tested and verified > yet. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974684#comment-16974684 ] Vladimir Rodionov commented on HBASE-22749: --- Updated design document, bumped version to 3.0. > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v4.patch, > HBASE-22749-master-v1.patch, HBASE-22749-master-v2.patch, > HBase-MOB-2.0-v3.0.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Attachment: (was: HBase-MOB-2.0-v2.1.pdf) > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v4.patch, > HBASE-22749-master-v1.patch, HBASE-22749-master-v2.patch, > HBase-MOB-2.0-v3.0.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Attachment: (was: HBase-MOB-2.0-v2.2.pdf) > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v4.patch, > HBASE-22749-master-v1.patch, HBASE-22749-master-v2.patch, > HBase-MOB-2.0-v3.0.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Attachment: (was: HBase-MOB-2.0-v1.pdf) > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v4.patch, > HBASE-22749-master-v1.patch, HBASE-22749-master-v2.patch, > HBase-MOB-2.0-v3.0.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Attachment: (was: HBase-MOB-2.0-v2.pdf) > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v4.patch, > HBASE-22749-master-v1.patch, HBASE-22749-master-v2.patch, > HBase-MOB-2.0-v3.0.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Attachment: (was: HBase-MOB-2.0-v2.3.pdf) > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v4.patch, > HBASE-22749-master-v1.patch, HBASE-22749-master-v2.patch, > HBase-MOB-2.0-v3.0.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Attachment: HBase-MOB-2.0-v3.0.pdf > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v4.patch, > HBASE-22749-master-v1.patch, HBASE-22749-master-v2.patch, > HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, HBase-MOB-2.0-v2.2.pdf, > HBase-MOB-2.0-v2.3.pdf, HBase-MOB-2.0-v2.pdf, HBase-MOB-2.0-v3.0.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Attachment: HBase-MOB-2.0-v2.3.pdf > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v4.patch, > HBASE-22749-master-v1.patch, HBASE-22749-master-v2.patch, > HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, HBase-MOB-2.0-v2.2.pdf, > HBase-MOB-2.0-v2.3.pdf, HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16973871#comment-16973871 ] Vladimir Rodionov commented on HBASE-22749: --- Updated design doc with new I/O optimized compaction algorithm description. > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v4.patch, > HBASE-22749-master-v1.patch, HBASE-22749-master-v2.patch, > HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, HBase-MOB-2.0-v2.2.pdf, > HBase-MOB-2.0-v2.3.pdf, HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23267) Test case for MOB compaction in a regular mode.
[ https://issues.apache.org/jira/browse/HBASE-23267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov resolved HBASE-23267. --- Resolution: Fixed Resolved. Pushed to the parent's PR branch. > Test case for MOB compaction in a regular mode. > --- > > Key: HBASE-23267 > URL: https://issues.apache.org/jira/browse/HBASE-23267 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > > We need this test case too. > Test case description (similar to HBASE-23266): > {code} > /** > * Mob file compaction chore in default regular mode test. > * 1. Enables non-batch mode (default) for regular MOB compaction, > *Sets batch size to 7 regions. > * 2. Disables periodic MOB compactions, sets minimum age to archive to 10 > sec > * 3. Creates MOB table with 20 regions > * 4. Loads MOB data (randomized keys, 1000 rows), flushes data. > * 5. Repeats 4. two more times > * 6. Verifies that we have 20 *3 = 60 mob files (equals to number of > regions x 3) > * 7. Runs major MOB compaction. > * 8. Verifies that number of MOB files in a mob directory is 20 x4 = 80 > * 9. Waits for a period of time larger than minimum age to archive > * 10. Runs Mob cleaner chore > * 11 Verifies that number of MOB files in a mob directory is 20. > * 12 Runs scanner and checks all 3 * 1000 rows. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-23267) Test case for MOB compaction in a regular mode.
[ https://issues.apache.org/jira/browse/HBASE-23267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-23267: -- Description: We need this test case too. Test case description (similar to HBASE-23266): {code} /** * Mob file compaction chore in batch mode test. * 1. Enables non-batch mode (default) for regular MOB compaction, *Sets batch size to 7 regions. * 2. Disables periodic MOB compactions, sets minimum age to archive to 10 sec * 3. Creates MOB table with 20 regions * 4. Loads MOB data (randomized keys, 1000 rows), flushes data. * 5. Repeats 4. two more times * 6. Verifies that we have 20 *3 = 60 mob files (equals to number of regions x 3) * 7. Runs major MOB compaction. * 8. Verifies that number of MOB files in a mob directory is 20 x4 = 80 * 9. Waits for a period of time larger than minimum age to archive * 10. Runs Mob cleaner chore * 11 Verifies that number of MOB files in a mob directory is 20. * 12 Runs scanner and checks all 3 * 1000 rows. {code} was:We need this test case too. > Test case for MOB compaction in a regular mode. > --- > > Key: HBASE-23267 > URL: https://issues.apache.org/jira/browse/HBASE-23267 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > > We need this test case too. > Test case description (similar to HBASE-23266): > {code} > /** > * Mob file compaction chore in batch mode test. > * 1. Enables non-batch mode (default) for regular MOB compaction, > *Sets batch size to 7 regions. > * 2. Disables periodic MOB compactions, sets minimum age to archive to 10 > sec > * 3. Creates MOB table with 20 regions > * 4. Loads MOB data (randomized keys, 1000 rows), flushes data. > * 5. Repeats 4. two more times > * 6. Verifies that we have 20 *3 = 60 mob files (equals to number of > regions x 3) > * 7. Runs major MOB compaction. > * 8. Verifies that number of MOB files in a mob directory is 20 x4 = 80 > * 9. Waits for a period of time larger than minimum age to archive > * 10. Runs Mob cleaner chore > * 11 Verifies that number of MOB files in a mob directory is 20. > * 12 Runs scanner and checks all 3 * 1000 rows. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-23267) Test case for MOB compaction in a regular mode.
[ https://issues.apache.org/jira/browse/HBASE-23267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-23267: -- Description: We need this test case too. Test case description (similar to HBASE-23266): {code} /** * Mob file compaction chore in default regular mode test. * 1. Enables non-batch mode (default) for regular MOB compaction, *Sets batch size to 7 regions. * 2. Disables periodic MOB compactions, sets minimum age to archive to 10 sec * 3. Creates MOB table with 20 regions * 4. Loads MOB data (randomized keys, 1000 rows), flushes data. * 5. Repeats 4. two more times * 6. Verifies that we have 20 *3 = 60 mob files (equals to number of regions x 3) * 7. Runs major MOB compaction. * 8. Verifies that number of MOB files in a mob directory is 20 x4 = 80 * 9. Waits for a period of time larger than minimum age to archive * 10. Runs Mob cleaner chore * 11 Verifies that number of MOB files in a mob directory is 20. * 12 Runs scanner and checks all 3 * 1000 rows. {code} was: We need this test case too. Test case description (similar to HBASE-23266): {code} /** * Mob file compaction chore in batch mode test. * 1. Enables non-batch mode (default) for regular MOB compaction, *Sets batch size to 7 regions. * 2. Disables periodic MOB compactions, sets minimum age to archive to 10 sec * 3. Creates MOB table with 20 regions * 4. Loads MOB data (randomized keys, 1000 rows), flushes data. * 5. Repeats 4. two more times * 6. Verifies that we have 20 *3 = 60 mob files (equals to number of regions x 3) * 7. Runs major MOB compaction. * 8. Verifies that number of MOB files in a mob directory is 20 x4 = 80 * 9. Waits for a period of time larger than minimum age to archive * 10. Runs Mob cleaner chore * 11 Verifies that number of MOB files in a mob directory is 20. * 12 Runs scanner and checks all 3 * 1000 rows. {code} > Test case for MOB compaction in a regular mode. > --- > > Key: HBASE-23267 > URL: https://issues.apache.org/jira/browse/HBASE-23267 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > > We need this test case too. > Test case description (similar to HBASE-23266): > {code} > /** > * Mob file compaction chore in default regular mode test. > * 1. Enables non-batch mode (default) for regular MOB compaction, > *Sets batch size to 7 regions. > * 2. Disables periodic MOB compactions, sets minimum age to archive to 10 > sec > * 3. Creates MOB table with 20 regions > * 4. Loads MOB data (randomized keys, 1000 rows), flushes data. > * 5. Repeats 4. two more times > * 6. Verifies that we have 20 *3 = 60 mob files (equals to number of > regions x 3) > * 7. Runs major MOB compaction. > * 8. Verifies that number of MOB files in a mob directory is 20 x4 = 80 > * 9. Waits for a period of time larger than minimum age to archive > * 10. Runs Mob cleaner chore > * 11 Verifies that number of MOB files in a mob directory is 20. > * 12 Runs scanner and checks all 3 * 1000 rows. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HBASE-23267) Test case for MOB compaction in a regular mode.
[ https://issues.apache.org/jira/browse/HBASE-23267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-23267 started by Vladimir Rodionov. - > Test case for MOB compaction in a regular mode. > --- > > Key: HBASE-23267 > URL: https://issues.apache.org/jira/browse/HBASE-23267 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > > We need this test case too. > Test case description (similar to HBASE-23266): > {code} > /** > * Mob file compaction chore in default regular mode test. > * 1. Enables non-batch mode (default) for regular MOB compaction, > *Sets batch size to 7 regions. > * 2. Disables periodic MOB compactions, sets minimum age to archive to 10 > sec > * 3. Creates MOB table with 20 regions > * 4. Loads MOB data (randomized keys, 1000 rows), flushes data. > * 5. Repeats 4. two more times > * 6. Verifies that we have 20 *3 = 60 mob files (equals to number of > regions x 3) > * 7. Runs major MOB compaction. > * 8. Verifies that number of MOB files in a mob directory is 20 x4 = 80 > * 9. Waits for a period of time larger than minimum age to archive > * 10. Runs Mob cleaner chore > * 11 Verifies that number of MOB files in a mob directory is 20. > * 12 Runs scanner and checks all 3 * 1000 rows. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-23266) Test case for MOB compaction in a region's batch mode.
[ https://issues.apache.org/jira/browse/HBASE-23266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-23266: -- Description: Major MOB compaction in a general (non-generational) mode can be run in a batched mode (disabled by default). In this mode, only subset of regions at a time are compacted to mitigate possible compaction storms. We need test case for this mode. Test case description: {code} /** * Mob file compaction chore in batch mode test. * 1. Enables batch mode for regular MOB compaction, *Sets batch size to 7 regions. * 2. Disables periodic MOB compactions, sets minimum age to archive to 10 sec * 3. Creates MOB table with 20 regions * 4. Loads MOB data (randomized keys, 1000 rows), flushes data. * 5. Repeats 4. two more times * 6. Verifies that we have 20 *3 = 60 mob files (equals to number of regions x 3) * 7. Runs major MOB compaction. * 8. Verifies that number of MOB files in a mob directory is 20 x4 = 80 * 9. Waits for a period of time larger than minimum age to archive * 10. Runs Mob cleaner chore * 11 Verifies that number of MOB files in a mob directory is 20. * 12 Runs scanner and checks all 3 * 1000 rows. */ {code} was:Major MOB compaction in a general (non-generational) mode can be run in a batched mode (disabled by default). In this mode, only subset of regions at a time are compacted to mitigate possible compaction storms. We need test case for this mode. > Test case for MOB compaction in a region's batch mode. > -- > > Key: HBASE-23266 > URL: https://issues.apache.org/jira/browse/HBASE-23266 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > > Major MOB compaction in a general (non-generational) mode can be run in a > batched mode (disabled by default). In this mode, only subset of regions at a > time are compacted to mitigate possible compaction storms. We need test case > for this mode. > Test case description: > {code} > /** > * Mob file compaction chore in batch mode test. > * 1. Enables batch mode for regular MOB compaction, > *Sets batch size to 7 regions. > * 2. Disables periodic MOB compactions, sets minimum age to archive to 10 > sec > * 3. Creates MOB table with 20 regions > * 4. Loads MOB data (randomized keys, 1000 rows), flushes data. > * 5. Repeats 4. two more times > * 6. Verifies that we have 20 *3 = 60 mob files (equals to number of > regions x 3) > * 7. Runs major MOB compaction. > * 8. Verifies that number of MOB files in a mob directory is 20 x4 = 80 > * 9. Waits for a period of time larger than minimum age to archive > * 10. Runs Mob cleaner chore > * 11 Verifies that number of MOB files in a mob directory is 20. > * 12 Runs scanner and checks all 3 * 1000 rows. > */ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23267) Test case for MOB compaction in a regular mode.
Vladimir Rodionov created HBASE-23267: - Summary: Test case for MOB compaction in a regular mode. Key: HBASE-23267 URL: https://issues.apache.org/jira/browse/HBASE-23267 Project: HBase Issue Type: Sub-task Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov We need this test case too. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HBASE-23189) Finalize generational compaction
[ https://issues.apache.org/jira/browse/HBASE-23189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-23189 started by Vladimir Rodionov. - > Finalize generational compaction > > > Key: HBASE-23189 > URL: https://issues.apache.org/jira/browse/HBASE-23189 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > > +corresponding test cases > The current code for generational compaction has not been tested and verified > yet. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23266) Test case for MOB compaction in a region's batch mode.
[ https://issues.apache.org/jira/browse/HBASE-23266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov resolved HBASE-23266. --- Resolution: Fixed Resolved. Pushed change to parent's PR branch. > Test case for MOB compaction in a region's batch mode. > -- > > Key: HBASE-23266 > URL: https://issues.apache.org/jira/browse/HBASE-23266 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > > Major MOB compaction in a general (non-generational) mode can be run in a > batched mode (disabled by default). In this mode, only subset of regions at a > time are compacted to mitigate possible compaction storms. We need test case > for this mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HBASE-23266) Test case for MOB compaction in a region's batch mode.
[ https://issues.apache.org/jira/browse/HBASE-23266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-23266 started by Vladimir Rodionov. - > Test case for MOB compaction in a region's batch mode. > -- > > Key: HBASE-23266 > URL: https://issues.apache.org/jira/browse/HBASE-23266 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > > Major MOB compaction in a general (non-generational) mode can be run in a > batched mode (disabled by default). In this mode, only subset of regions at a > time are compacted to mitigate possible compaction storms. We need test case > for this mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23266) Test case for MOB compaction in a region's batch mode.
Vladimir Rodionov created HBASE-23266: - Summary: Test case for MOB compaction in a region's batch mode. Key: HBASE-23266 URL: https://issues.apache.org/jira/browse/HBASE-23266 Project: HBase Issue Type: Sub-task Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Major MOB compaction in a general (non-generational) mode can be run in a batched mode (disabled by default). In this mode, only subset of regions at a time are compacted to mitigate possible compaction storms. We need test case for this mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23188) MobFileCleanerChore test case
[ https://issues.apache.org/jira/browse/HBASE-23188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov resolved HBASE-23188. --- Resolution: Fixed Resolved. Pushed to parent PR branch. > MobFileCleanerChore test case > - > > Key: HBASE-23188 > URL: https://issues.apache.org/jira/browse/HBASE-23188 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Priority: Major > > The test should do the following: > a) properly remove obsolete files as expected > b) dot not remove mob files from prior to the reference accounting added in > this change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23190) Convert MobCompactionTest into integration test
[ https://issues.apache.org/jira/browse/HBASE-23190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov resolved HBASE-23190. --- Resolution: Fixed Resolved in a last parent PR commit (11/5). > Convert MobCompactionTest into integration test > --- > > Key: HBASE-23190 > URL: https://issues.apache.org/jira/browse/HBASE-23190 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23209) Simplify logic in DefaultMobStoreCompactor
[ https://issues.apache.org/jira/browse/HBASE-23209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964331#comment-16964331 ] Vladimir Rodionov commented on HBASE-23209: --- Changed was pushed to parents PR branch. > Simplify logic in DefaultMobStoreCompactor > -- > > Key: HBASE-23209 > URL: https://issues.apache.org/jira/browse/HBASE-23209 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > > The major compaction loop is quite large and has many branches, especially in > a non-MOB mode. Consider moving MOB data only Ain a MOB compaction mode and > simplify non-MOB case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23209) Simplify logic in DefaultMobStoreCompactor
[ https://issues.apache.org/jira/browse/HBASE-23209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov resolved HBASE-23209. --- Resolution: Fixed > Simplify logic in DefaultMobStoreCompactor > -- > > Key: HBASE-23209 > URL: https://issues.apache.org/jira/browse/HBASE-23209 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > > The major compaction loop is quite large and has many branches, especially in > a non-MOB mode. Consider moving MOB data only Ain a MOB compaction mode and > simplify non-MOB case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23209) Simplify logic in DefaultMobStoreCompactor
[ https://issues.apache.org/jira/browse/HBASE-23209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964330#comment-16964330 ] Vladimir Rodionov commented on HBASE-23209: --- Reduced code, by leaving handling of a changed mobStoreThreshold to a MOB compaction only. Now, during regular compactions we do not check if MOB threshold was changed and do not handle this case. > Simplify logic in DefaultMobStoreCompactor > -- > > Key: HBASE-23209 > URL: https://issues.apache.org/jira/browse/HBASE-23209 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > > The major compaction loop is quite large and has many branches, especially in > a non-MOB mode. Consider moving MOB data only Ain a MOB compaction mode and > simplify non-MOB case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23209) Simplify logic in DefaultMobStoreCompactor
Vladimir Rodionov created HBASE-23209: - Summary: Simplify logic in DefaultMobStoreCompactor Key: HBASE-23209 URL: https://issues.apache.org/jira/browse/HBASE-23209 Project: HBase Issue Type: Sub-task Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov The major compaction loop is quite large and has many branches, especially in a non-MOB mode. Consider moving MOB data only Ain a MOB compaction mode and simplify non-MOB case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23198) Documentation and release notes
Vladimir Rodionov created HBASE-23198: - Summary: Documentation and release notes Key: HBASE-23198 URL: https://issues.apache.org/jira/browse/HBASE-23198 Project: HBase Issue Type: Sub-task Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Document all the changes: algorithms, new configuration options, obsolete configurations, upgrade procedure and possibility of downgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-23188) MobFileCleanerChore test case
[ https://issues.apache.org/jira/browse/HBASE-23188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov reassigned HBASE-23188: - Assignee: (was: Vladimir Rodionov) > MobFileCleanerChore test case > - > > Key: HBASE-23188 > URL: https://issues.apache.org/jira/browse/HBASE-23188 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Priority: Major > > The test should do the following: > a) properly remove obsolete files as expected > b) dot not remove mob files from prior to the reference accounting added in > this change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23190) Convert MobCompactionTest into integration test
Vladimir Rodionov created HBASE-23190: - Summary: Convert MobCompactionTest into integration test Key: HBASE-23190 URL: https://issues.apache.org/jira/browse/HBASE-23190 Project: HBase Issue Type: Sub-task Reporter: Vladimir Rodionov -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23189) Finalize generational compaction
Vladimir Rodionov created HBASE-23189: - Summary: Finalize generational compaction Key: HBASE-23189 URL: https://issues.apache.org/jira/browse/HBASE-23189 Project: HBase Issue Type: Sub-task Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov +corresponding test cases The current code for generational compaction has not been tested and verified yet. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23188) MobFileCleanerChore test case
Vladimir Rodionov created HBASE-23188: - Summary: MobFileCleanerChore test case Key: HBASE-23188 URL: https://issues.apache.org/jira/browse/HBASE-23188 Project: HBase Issue Type: Sub-task Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov The test should do the following: a) properly remove obsolete files as expected b) dot not remove mob files from prior to the reference accounting added in this change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Attachment: HBASE-22749-master-v2.patch > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v4.patch, > HBASE-22749-master-v1.patch, HBASE-22749-master-v2.patch, > HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, HBase-MOB-2.0-v2.2.pdf, > HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Status: Patch Available (was: Open) Code cleanup. unit tests fixes. > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v4.patch, > HBASE-22749-master-v1.patch, HBASE-22749-master-v2.patch, > HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, HBase-MOB-2.0-v2.2.pdf, > HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929511#comment-16929511 ] Vladimir Rodionov commented on HBASE-22749: --- PR has been created. > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v4.patch, > HBASE-22749-master-v1.patch, HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, > HBase-MOB-2.0-v2.2.pdf, HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Attachment: HBASE-22749-master-v1.patch > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v4.patch, > HBASE-22749-master-v1.patch, HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, > HBase-MOB-2.0-v2.2.pdf, HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Attachment: (was: HBASE-22749-branch-2.2-v3.patch) > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v4.patch, > HBASE-22749-master-v1.patch, HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, > HBase-MOB-2.0-v2.2.pdf, HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Attachment: HBASE-22749-branch-2.2-v4.patch > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v3.patch, > HBASE-22749-branch-2.2-v4.patch, HBase-MOB-2.0-v1.pdf, > HBase-MOB-2.0-v2.1.pdf, HBase-MOB-2.0-v2.2.pdf, HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928879#comment-16928879 ] Vladimir Rodionov commented on HBASE-22749: --- v4 should build on 2.2. > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v3.patch, > HBASE-22749-branch-2.2-v4.patch, HBase-MOB-2.0-v1.pdf, > HBase-MOB-2.0-v2.1.pdf, HBase-MOB-2.0-v2.2.pdf, HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928832#comment-16928832 ] Vladimir Rodionov commented on HBASE-22749: --- Nevertheless, failed again: {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-shade-plugin:3.1.1:shade (aggregate-into-a-jar-with-relocated-third-parties) on project hbase-shaded-client: Error creating shaded jar: duplicate entry: META-INF/services/org.apache.hadoop.hbase.shaded.com.fasterxml.jackson.core.ObjectCodec -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :hbase-shaded-client {code} > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v3.patch, HBase-MOB-2.0-v1.pdf, > HBase-MOB-2.0-v2.1.pdf, HBase-MOB-2.0-v2.2.pdf, HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928796#comment-16928796 ] Vladimir Rodionov commented on HBASE-22749: --- It seems, that tip of branch-2.2 is broken. Not a patch related. I tried to build 2.2 w/o patch and it failed with multiple errors. > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v3.patch, HBase-MOB-2.0-v1.pdf, > HBase-MOB-2.0-v2.1.pdf, HBase-MOB-2.0-v2.2.pdf, HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928789#comment-16928789 ] Vladimir Rodionov commented on HBASE-22749: --- Oops, will fix it. > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v3.patch, HBase-MOB-2.0-v1.pdf, > HBase-MOB-2.0-v2.1.pdf, HBase-MOB-2.0-v2.2.pdf, HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928139#comment-16928139 ] Vladimir Rodionov commented on HBASE-22749: --- Uploaded patch for 2.2 branch. Master version will follow shortly. > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v3.patch, HBase-MOB-2.0-v1.pdf, > HBase-MOB-2.0-v2.1.pdf, HBase-MOB-2.0-v2.2.pdf, HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Attachment: HBASE-22749-branch-2.2-v3.patch > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22749-branch-2.2-v3.patch, HBase-MOB-2.0-v1.pdf, > HBase-MOB-2.0-v2.1.pdf, HBase-MOB-2.0-v2.2.pdf, HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922943#comment-16922943 ] Vladimir Rodionov commented on HBASE-22749: --- Updated design document to v2.2. Added totally new MOB compaction algorithm section, which now can limit for sure, overall Read/Write I/O amplification (major concern so far) The initial patch is almost done, just need to fix the algorithm and run tests. > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, > HBase-MOB-2.0-v2.2.pdf, HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Attachment: HBase-MOB-2.0-v2.2.pdf > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, > HBase-MOB-2.0-v2.2.pdf, HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Comment Edited] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912908#comment-16912908 ] Vladimir Rodionov edited comment on HBASE-22749 at 8/22/19 3:33 AM: It is the big list [~busbey]. Below are some answers: {quote} region sizing - splitting, normalizers, etc Need to expressly state wether or not this change to per-region accounting plans to alter the current assumptions that use of the feature means that the MOB data isn’t counted when determining region size for decisions to normalize or split. {quote} This part has not been touched - meaning that MOB 2.0 does exactly the same what MOB 1.0 does. If MOB is not counted for normalize/split decision now in MOB it won'y be in 2.0. Should it? Probably, yes. But it is not part of scalable compactions. {quote} write amplification {quote} Good question. Default (non partial) major compaction does have the same or similar to regular HBase tiered compaction WA. I would not call this unbounded, but it is probably worse than in MOB 1.0. Partial MOB compaction will definetely have a bounded WA comparable to what we have in MOB 1.0 (where compaction is done by partitions and partitions are date-based) The idea of partial major MOB compaction is to keep total number of MOB files in a system under control (say - around 1 M) by not compacting MOB files, which reached some size threshold (say 1GB). If you exclude all MOB files above T bytes from compaction - your WA will be bounded by logK(T/S), where logK - logarithm base K (K is average number of files in compaction selection), T - maximum MOB file size (threshold) and S - average size of Memstore flush. This is approximation of course. How it compares to MOB 1.0 partitioned compaction? By varying T we can get any WA we want. Say, if we set limit on number of MOB files to 10M we can decrease T to 100MB and it will give us total capacity for MOB data to 1PB. With 100MB threshold, WA can be very low (low one's). I will update the document and will add more info on partial major MOB compactions, including file selection policy. was (Author: vrodionov): It is the big list [~busbey]. Below are some answers: {quote} region sizing - splitting, normalizers, etc Need to expressly state wether or not this change to per-region accounting plans to alter the current assumptions that use of the feature means that the MOB data isn’t counted when determining region size for decisions to normalize or split. {quote} This part has not been touched - meaning that MOB 2.0 does exactly the same what MOB 1.0 does. If MOB is not counted for normalize/split decision now in MOB it won'y be in 2.0. Should it? Probably, yes. But it is not part of scalable compactions. {quote} write amplification {quote} Good question. Default (non partial) major compaction does have the same or similar to regular HBase tiered compaction WA. I would not call this unbounded, but it is probably worse than in MOB 1.0. Partial MOB compaction will definetely have a bounded WA comparable to what we have in MOB 1.0 (where compaction is done by partitions and partitions are date-based) The idea of partial major MOB compaction is to keep total number of MOB files in a system under control (say - around 1 M) by not compacting MOB files, which reached some size threshold (say 1GB). If you exclude all MOB files above 1GB from compaction - your WA will be bounded by logK(T/S), where logK - logarithm base K (K is average number of files in compaction selection), T - maximum MOB file size (threshold) and S - average size of Memstore flush. This is approximation of course. How it compares to MOB 1.0 partitioned compaction? By varying T we can get any WA we want. Say, if we set limit on number of MOB files to 10M we can decrease T to 100MB and it will give us total capacity for MOB data to 1PB. With 100MB threshold, WA can be very low (low one's). I will update the document and will add more info on partial major MOB compactions, including file selection policy. > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, > HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required
[jira] [Comment Edited] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912908#comment-16912908 ] Vladimir Rodionov edited comment on HBASE-22749 at 8/22/19 3:33 AM: It is the big list [~busbey]. Below are some answers: {quote} region sizing - splitting, normalizers, etc Need to expressly state wether or not this change to per-region accounting plans to alter the current assumptions that use of the feature means that the MOB data isn’t counted when determining region size for decisions to normalize or split. {quote} This part has not been touched - meaning that MOB 2.0 does exactly the same what MOB 1.0 does. If MOB is not counted for normalize/split decision now in MOB it won'y be in 2.0. Should it? Probably, yes. But it is not part of scalable compactions. {quote} write amplification {quote} Good question. Default (non partial) major compaction does have the same or similar to regular HBase tiered compaction WA. I would not call this unbounded, but it is probably worse than in MOB 1.0. Partial MOB compaction will definetely have a bounded WA comparable to what we have in MOB 1.0 (where compaction is done by partitions and partitions are date-based) The idea of partial major MOB compaction is to keep total number of MOB files in a system under control (say - around 1 M) by not compacting MOB files, which reached some size threshold (say 1GB). If you exclude all MOB files above 1GB from compaction - your WA will be bounded by logK(T/S), where logK - logarithm base K (K is average number of files in compaction selection), T - maximum MOB file size (threshold) and S - average size of Memstore flush. This is approximation of course. How it compares to MOB 1.0 partitioned compaction? By varying T we can get any WA we want. Say, if we set limit on number of MOB files to 10M we can decrease T to 100MB and it will give us total capacity for MOB data to 1PB. With 100MB threshold, WA can be very low (low one's). I will update the document and will add more info on partial major MOB compactions, including file selection policy. was (Author: vrodionov): It is the big list [~busbey]. Below are some answers: {quote} region sizing - splitting, normalizers, etc Need to expressly state wether or not this change to per-region accounting plans to alter the current assumptions that use of the feature means that the MOB data isn’t counted when determining region size for decisions to normalize or split. {quote} This part has not been touched - meaning that MOB 2.0 does exactly the same what MOB 1.0 does. If MOB is not counted for normalize/split decision now in MOB it won'y be in 2.0. Should it? Probably, yes. But it is not part of scalable compactions. {quote} write amplification {quote} Good question. Default (non partial) major compaction does have the same or similar to regular HBase tiered compaction WA. I would not call this unbounded, but it is probably worse than in MOB 1.0. Partial MOB compaction will definetely have a bounded WA comparable to what we have in MOB 1.0 (where compaction is done by partitions and partitions are date-based) The idea of partial major MOB compaction is either to keep total number of MOB files in a system under control (say - around 1 M), or do not compact MOB files which reached some size threshold (say 1GB). The latter case is easier to explain. If you exclude all MOB files above 1GB from compaction - your WA will be bounded by log2(T/S), where log2 - logarithm base 2, T - maximum MOB file size (threshold) and S - average size of Memstore flush. This is approximation of course. How it compares to MOB 1.0 partitioned compaction? By varying T we can get any WA we want. Say, if we set limit on number of MOB files to 10M we can decrease T to 100MB and it will give us total capacity for MOB data to 1PB. With 100MB threshold, WA can be very low (low one's). I will update the document and will add more info on partial major MOB compactions, including file selection policy. > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, > HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB com
[jira] [Commented] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912908#comment-16912908 ] Vladimir Rodionov commented on HBASE-22749: --- It is the big list [~busbey]. Below are some answers: {quote} region sizing - splitting, normalizers, etc Need to expressly state wether or not this change to per-region accounting plans to alter the current assumptions that use of the feature means that the MOB data isn’t counted when determining region size for decisions to normalize or split. {quote} This part has not been touched - meaning that MOB 2.0 does exactly the same what MOB 1.0 does. If MOB is not counted for normalize/split decision now in MOB it won'y be in 2.0. Should it? Probably, yes. But it is not part of scalable compactions. {quote} write amplification {quote} Good question. Default (non partial) major compaction does have the same or similar to regular HBase tiered compaction WA. I would not call this unbounded, but it is probably worse than in MOB 1.0. Partial MOB compaction will definetely have a bounded WA comparable to what we have in MOB 1.0 (where compaction is done by partitions and partitions are date-based) The idea of partial major MOB compaction is either to keep total number of MOB files in a system under control (say - around 1 M), or do not compact MOB files which reached some size threshold (say 1GB). The latter case is easier to explain. If you exclude all MOB files above 1GB from compaction - your WA will be bounded by log2(T/S), where log2 - logarithm base 2, T - maximum MOB file size (threshold) and S - average size of Memstore flush. This is approximation of course. How it compares to MOB 1.0 partitioned compaction? By varying T we can get any WA we want. Say, if we set limit on number of MOB files to 10M we can decrease T to 100MB and it will give us total capacity for MOB data to 1PB. With 100MB threshold, WA can be very low (low one's). I will update the document and will add more info on partial major MOB compactions, including file selection policy. > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, > HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-22705) IllegalArgumentException exception occured during MobFileCache eviction
[ https://issues.apache.org/jira/browse/HBASE-22705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912881#comment-16912881 ] Vladimir Rodionov commented on HBASE-22705: --- Apologize for delay, [~pankaj2461]. Additional global lock on MOB file cache is the last resort approach - do not use it until you explore other (lockless) options. > IllegalArgumentException exception occured during MobFileCache eviction > --- > > Key: HBASE-22705 > URL: https://issues.apache.org/jira/browse/HBASE-22705 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.0.5 >Reporter: Pankaj Kumar >Assignee: Pankaj Kumar >Priority: Critical > Fix For: 2.3.0 > > Attachments: HBASE-22705.branch-2.patch > > > IllegalArgumentException occured during scan operation, > {noformat} > 2019-07-08 01:46:57,764 | ERROR | > RpcServer.FifoWFPBQ.default.handler=129,queue=9,port=21302 | Unexpected > throwable object | > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2502) > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.ComparableTimSort.mergeHi(ComparableTimSort.java:866) > at java.util.ComparableTimSort.mergeAt(ComparableTimSort.java:483) > at > java.util.ComparableTimSort.mergeForceCollapse(ComparableTimSort.java:422) > at java.util.ComparableTimSort.sort(ComparableTimSort.java:222) > at java.util.Arrays.sort(Arrays.java:1312) > at java.util.Arrays.sort(Arrays.java:1506) > at java.util.ArrayList.sort(ArrayList.java:1462) > at java.util.Collections.sort(Collections.java:141) > at org.apache.hadoop.hbase.mob.MobFileCache.evict(MobFileCache.java:144) > at > org.apache.hadoop.hbase.mob.MobFileCache.openFile(MobFileCache.java:214) > at > org.apache.hadoop.hbase.regionserver.HMobStore.readCell(HMobStore.java:397) > at > org.apache.hadoop.hbase.regionserver.HMobStore.resolve(HMobStore.java:358) > at > org.apache.hadoop.hbase.regionserver.MobStoreScanner.next(MobStoreScanner.java:74) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:150) > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HBASE-22826) Wrong FS: recovered.edits goes to wrong file system
[ https://issues.apache.org/jira/browse/HBASE-22826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22826: -- Description: When WAL is attached to a separate file system, recovered.edits are going to hbase root directory. PROBLEM * Customer environment HBase root directory : On WASB hbase.wal.dir : On HDFS Customer is creating and HBase table and running VIEW DDL on top of the Hbase table. The recovered.edits are going to hbase root directory in WASB and region assignments getting failed. Customer is on HBase 2.0.4. The below stack trace is from local env reproduction: {code:java}2019-08-05 22:07:31,940 ERROR [RS_OPEN_META-regionserver/c47-node3:16020-0] handler.OpenRegionHandler: Failed open of region=hbase:meta,,1.1588230740 java.lang.IllegalArgumentException: Wrong FS: hdfs://c47-node2.squadron-labs.com:8020/hbasewal/hbase/meta/1588230740/recovered.edits, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:730) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:86) at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:460) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910) at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:678) at org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:270) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910) at org.apache.hadoop.hbase.wal.WALSplitter.getSequenceIdFiles(WALSplitter.java:647) at org.apache.hadoop.hbase.wal.WALSplitter.writeRegionSequenceIdFile(WALSplitter.java:680) at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:984) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:881) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7149) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7108) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7080) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7038) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6989) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:283) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} was: When WAL is attached to a separate file system, recovered.edits are going to hbase root directory. PROBLEM * Customer environment HBase root directory : On WASB hbase.wal.dir : On HDFS Customer is creating and HBase table and running VIEW DDL on top of the Hbase table. The recovered.edits are going to hbase root directory in WASB and region assignments getting failed. Customer is on HBase 2.0.4. {code:java}if (RegionReplicaUtil.isDefaultReplica(getRegionInfo())) { LOG.debug("writing seq id for {}", this.getRegionInfo().getEncodedName()); WALSplitter.writeRegionSequenceIdFile(fs.getFileSystem(), getWALRegionDir(), nextSeqId); //WALSplitter.writeRegionSequenceIdFile(getWalFileSystem(), getWALRegionDir(), nextSeqId - 1);{code} {code:java}2019-08-05 22:07:31,940 ERROR [RS_OPEN_META-regionserver/c47-node3:16020-0] handler.OpenRegionHandler: Failed open of region=hbase:meta,,1.1588230740 java.lang.IllegalArgumentException: Wrong FS: hdfs://c47-node2.squadron-labs.com:8020/hbasewal/hbase/meta/1588230740/recovered.edits, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:730) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:86) at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:460) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910) at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:678) at org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:270) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910) at org.apache.hadoop.hbase.wal.WALSplitter.getSequenceIdFiles(WALSplitter.java:647) at org.apache.hadoop.hbase.wal.WALSplitter.writeRegionSequenceIdFile(WALSplitter.java:68
[jira] [Resolved] (HBASE-22826) Wrong FS: recovered.edits goes to wrong file system
[ https://issues.apache.org/jira/browse/HBASE-22826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov resolved HBASE-22826. --- Resolution: Won't Fix > Wrong FS: recovered.edits goes to wrong file system > --- > > Key: HBASE-22826 > URL: https://issues.apache.org/jira/browse/HBASE-22826 > Project: HBase > Issue Type: New Feature >Affects Versions: 2.0.5 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > > When WAL is attached to a separate file system, recovered.edits are going to > hbase root directory. > PROBLEM > * Customer environment > HBase root directory : On WASB > hbase.wal.dir : On HDFS > Customer is creating and HBase table and running VIEW DDL on top of the Hbase > table. The recovered.edits are going to hbase root directory in WASB and > region assignments getting failed. > Customer is on HBase 2.0.4. > {code:java}if (RegionReplicaUtil.isDefaultReplica(getRegionInfo())) { > LOG.debug("writing seq id for {}", > this.getRegionInfo().getEncodedName()); > WALSplitter.writeRegionSequenceIdFile(fs.getFileSystem(), > getWALRegionDir(), nextSeqId); > //WALSplitter.writeRegionSequenceIdFile(getWalFileSystem(), > getWALRegionDir(), nextSeqId - 1);{code} > {code:java}2019-08-05 22:07:31,940 ERROR > [RS_OPEN_META-regionserver/c47-node3:16020-0] handler.OpenRegionHandler: > Failed open of region=hbase:meta,,1.1588230740 > java.lang.IllegalArgumentException: Wrong FS: > hdfs://c47-node2.squadron-labs.com:8020/hbasewal/hbase/meta/1588230740/recovered.edits, > expected: file:/// > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:730) > at > org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:86) > at > org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:460) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910) > at > org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:678) > at > org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:270) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910) > at > org.apache.hadoop.hbase.wal.WALSplitter.getSequenceIdFiles(WALSplitter.java:647) > at > org.apache.hadoop.hbase.wal.WALSplitter.writeRegionSequenceIdFile(WALSplitter.java:680) > at > org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:984) > at > org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:881) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7149) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7108) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7080) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7038) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6989) > at > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:283) > at > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22826) Wrong FS: recovered.edits goes to wrong file system
[ https://issues.apache.org/jira/browse/HBASE-22826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904122#comment-16904122 ] Vladimir Rodionov commented on HBASE-22826: --- Basically, 2.0.x does not support fully WAL on a different file system. For those who wants this feature it is time to upgrade to 2.1. Won't fix, because 2.0 is EOL. > Wrong FS: recovered.edits goes to wrong file system > --- > > Key: HBASE-22826 > URL: https://issues.apache.org/jira/browse/HBASE-22826 > Project: HBase > Issue Type: New Feature >Affects Versions: 2.0.5 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > > When WAL is attached to a separate file system, recovered.edits are going to > hbase root directory. > PROBLEM > * Customer environment > HBase root directory : On WASB > hbase.wal.dir : On HDFS > Customer is creating and HBase table and running VIEW DDL on top of the Hbase > table. The recovered.edits are going to hbase root directory in WASB and > region assignments getting failed. > Customer is on HBase 2.0.4. > {code:java}if (RegionReplicaUtil.isDefaultReplica(getRegionInfo())) { > LOG.debug("writing seq id for {}", > this.getRegionInfo().getEncodedName()); > WALSplitter.writeRegionSequenceIdFile(fs.getFileSystem(), > getWALRegionDir(), nextSeqId); > //WALSplitter.writeRegionSequenceIdFile(getWalFileSystem(), > getWALRegionDir(), nextSeqId - 1);{code} > {code:java}2019-08-05 22:07:31,940 ERROR > [RS_OPEN_META-regionserver/c47-node3:16020-0] handler.OpenRegionHandler: > Failed open of region=hbase:meta,,1.1588230740 > java.lang.IllegalArgumentException: Wrong FS: > hdfs://c47-node2.squadron-labs.com:8020/hbasewal/hbase/meta/1588230740/recovered.edits, > expected: file:/// > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:730) > at > org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:86) > at > org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:460) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910) > at > org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:678) > at > org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:270) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910) > at > org.apache.hadoop.hbase.wal.WALSplitter.getSequenceIdFiles(WALSplitter.java:647) > at > org.apache.hadoop.hbase.wal.WALSplitter.writeRegionSequenceIdFile(WALSplitter.java:680) > at > org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:984) > at > org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:881) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7149) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7108) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7080) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7038) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6989) > at > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:283) > at > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HBASE-22826) Wrong FS: recovered.edits goes to wrong file system
Vladimir Rodionov created HBASE-22826: - Summary: Wrong FS: recovered.edits goes to wrong file system Key: HBASE-22826 URL: https://issues.apache.org/jira/browse/HBASE-22826 Project: HBase Issue Type: New Feature Affects Versions: 2.0.5 Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov When WAL is attached to a separate file system, recovered.edits are going to hbase root directory. PROBLEM * Customer environment HBase root directory : On WASB hbase.wal.dir : On HDFS Customer is creating and HBase table and running VIEW DDL on top of the Hbase table. The recovered.edits are going to hbase root directory in WASB and region assignments getting failed. Customer is on HBase 2.0.4. {code:java}if (RegionReplicaUtil.isDefaultReplica(getRegionInfo())) { LOG.debug("writing seq id for {}", this.getRegionInfo().getEncodedName()); WALSplitter.writeRegionSequenceIdFile(fs.getFileSystem(), getWALRegionDir(), nextSeqId); //WALSplitter.writeRegionSequenceIdFile(getWalFileSystem(), getWALRegionDir(), nextSeqId - 1);{code} {code:java}2019-08-05 22:07:31,940 ERROR [RS_OPEN_META-regionserver/c47-node3:16020-0] handler.OpenRegionHandler: Failed open of region=hbase:meta,,1.1588230740 java.lang.IllegalArgumentException: Wrong FS: hdfs://c47-node2.squadron-labs.com:8020/hbasewal/hbase/meta/1588230740/recovered.edits, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:730) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:86) at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:460) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910) at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:678) at org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:270) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910) at org.apache.hadoop.hbase.wal.WALSplitter.getSequenceIdFiles(WALSplitter.java:647) at org.apache.hadoop.hbase.wal.WALSplitter.writeRegionSequenceIdFile(WALSplitter.java:680) at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:984) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:881) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7149) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7108) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7080) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7038) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6989) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:283) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (HBASE-22749) Distributed MOB compactions
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Summary: Distributed MOB compactions (was: HBase MOB 2.0) > Distributed MOB compactions > > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, > HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22749) HBase MOB 2.0
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902451#comment-16902451 ] Vladimir Rodionov commented on HBASE-22749: --- Np, I will change the title :) > HBase MOB 2.0 > - > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, > HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (HBASE-22749) HBase MOB 2.0
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Attachment: (was: HBase-MOB-2.0-v2.1.pdf) > HBase MOB 2.0 > - > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, > HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (HBASE-22749) HBase MOB 2.0
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Attachment: HBase-MOB-2.0-v2.1.pdf > HBase MOB 2.0 > - > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, > HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (HBASE-22749) HBase MOB 2.0
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Attachment: HBase-MOB-2.0-v2.1.pdf > HBase MOB 2.0 > - > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, > HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (HBASE-22749) HBase MOB 2.0
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Attachment: (was: HBase-MOB-2.0-v2.1.pdf) > HBase MOB 2.0 > - > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, > HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22749) HBase MOB 2.0
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895629#comment-16895629 ] Vladimir Rodionov commented on HBASE-22749: --- Design doc v2.1 adds clarification on *CompactType.MOB* support. > HBase MOB 2.0 > - > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, > HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (HBASE-22749) HBase MOB 2.0
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Attachment: HBase-MOB-2.0-v2.1.pdf > HBase MOB 2.0 > - > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, > HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (HBASE-22749) HBase MOB 2.0
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895593#comment-16895593 ] Vladimir Rodionov edited comment on HBASE-22749 at 7/29/19 9:28 PM: {quote} Why is it called MOB 2.0? Seems to be just a change in compaction. {quote} Compaction changes are just first step. Yes, we are collecting feedback from the community for features to add/change in MOB, such as already mentioned - streaming access to MOB data. {quote} There is no more special compactor for MOB files, but the class that is doing the compaction is named DefaultMobStoreCompactor; i.e. a compactor that is 'default' but for 'MOB'? {quote} DefaultMobStoreCompactor does not do MOB compactions in original MOB - PartitionedMobCompactor does, which is gone now as all the mob.compactions sub-package. {quote} On #3, to compact MOB, need to submit a major_compaction request. Does that mean we major compact all in the target table – MOB and other files? Can I do one or the other (MOB or HFiles). {quote} We are still considering support for *CompactType.MOB*. If there is request to support that we will add it. In this case to start MOB compaction, user must submit *major_compact* with type=CompactType.MOB request. Upd. Having thought about this - no it is not possible major compact only MOB files. The CompactType.MOB request will have to compact both MOB and regular store files and CompactType.NORMAL will compact only store files. This change can be added. {quote} After finishing 'Unified Compactor' section, how does this differ from what was there before? Why superior? {quote} Code reduction and unification is an advantage as well. But the overall "superiority" comes from overall MOB 2.0 feature - not from Unified compactor along. We describe the advantages in the design document. was (Author: vrodionov): {quote} Why is it called MOB 2.0? Seems to be just a change in compaction. {quote} Compaction changes are just first step. Yes, we are collecting feedback from the community for features to add/change in MOB, such as already mentioned - streaming access to MOB data. {quote} There is no more special compactor for MOB files, but the class that is doing the compaction is named DefaultMobStoreCompactor; i.e. a compactor that is 'default' but for 'MOB'? {quote} DefaultMobStoreCompactor does not do MOB compactions in original MOB - PartitionedMobCompactor does, which is gone now as all the mob.compactions sub-package. {quote} On #3, to compact MOB, need to submit a major_compaction request. Does that mean we major compact all in the target table – MOB and other files? Can I do one or the other (MOB or HFiles). {quote} We are still considering support for *CompactType.MOB*. If there is request to support that we will add it. In this case to start MOB compaction, user must submit *major_compact* with type=CompactType.MOB request. {quote} After finishing 'Unified Compactor' section, how does this differ from what was there before? Why superior? {quote} Code reduction and unification is an advantage as well. But the overall "superiority" comes from overall MOB 2.0 feature - not from Unified compactor along. We describe the advantages in the design document. > HBase MOB 2.0 > - > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - jus
[jira] [Commented] (HBASE-22749) HBase MOB 2.0
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895593#comment-16895593 ] Vladimir Rodionov commented on HBASE-22749: --- {quote} Why is it called MOB 2.0? Seems to be just a change in compaction. {quote} Compaction changes are just first step. Yes, we are collecting feedback from the community for features to add/change in MOB, such as already mentioned - streaming access to MOB data. {quote} There is no more special compactor for MOB files, but the class that is doing the compaction is named DefaultMobStoreCompactor; i.e. a compactor that is 'default' but for 'MOB'? {quote} DefaultMobStoreCompactor does not do MOB compactions in original MOB - PartitionedMobCompactor does, which is gone now as all the mob.compactions sub-package. {quote} On #3, to compact MOB, need to submit a major_compaction request. Does that mean we major compact all in the target table – MOB and other files? Can I do one or the other (MOB or HFiles). {quote} We are still considering support for *CompactType.MOB*. If there is request to support that we will add it. In this case to start MOB compaction, user must submit *major_compact* with type=CompactType.MOB request. {quote} After finishing 'Unified Compactor' section, how does this differ from what was there before? Why superior? {quote} Code reduction and unification is an advantage as well. But the overall "superiority" comes from overall MOB 2.0 feature - not from Unified compactor along. We describe the advantages in the design document. > HBase MOB 2.0 > - > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (HBASE-22749) HBase MOB 2.0
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Attachment: HBase-MOB-2.0-v2.pdf > HBase MOB 2.0 > - > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22749) HBase MOB 2.0
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894234#comment-16894234 ] Vladimir Rodionov commented on HBASE-22749: --- The patch for the master will follow around mid-August (when I will come back from vacation). > HBase MOB 2.0 > - > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBase-MOB-2.0-v1.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (HBASE-22749) HBase MOB 2.0
[ https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-22749: -- Attachment: HBase-MOB-2.0-v1.pdf > HBase MOB 2.0 > - > > Key: HBASE-22749 > URL: https://issues.apache.org/jira/browse/HBASE-22749 > Project: HBase > Issue Type: New Feature > Components: mob >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBase-MOB-2.0-v1.pdf > > > There are several drawbacks in the original MOB 1.0 (Moderate Object > Storage) implementation, which can limit the adoption of the MOB feature: > # MOB compactions are executed in a Master as a chore, which limits > scalability because all I/O goes through a single HBase Master server. > # Yarn/Mapreduce framework is required to run MOB compactions in a scalable > way, but this won’t work in a stand-alone HBase cluster. > # Two separate compactors for MOB and for regular store files and their > interactions can result in a data loss (see HBASE-22075) > The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible > implementation, which is free of the above drawbacks and can be used as a > drop in replacement in existing MOB deployments. So, these are design goals > of a MOB 2.0: > # Make MOB compactions scalable without relying on Yarn/Mapreduce framework > # Provide unified compactor for both MOB and regular store files > # Make it more robust especially w.r.t. to data losses. > # Simplify and reduce the overall MOB code. > # Provide 100% compatible implementation with MOB 1.0. > # No migration of data should be required between MOB 1.0 and MOB 2.0 - just > software upgrade. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HBASE-22749) HBase MOB 2.0
Vladimir Rodionov created HBASE-22749: - Summary: HBase MOB 2.0 Key: HBASE-22749 URL: https://issues.apache.org/jira/browse/HBASE-22749 Project: HBase Issue Type: New Feature Components: mob Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov There are several drawbacks in the original MOB 1.0 (Moderate Object Storage) implementation, which can limit the adoption of the MOB feature: # MOB compactions are executed in a Master as a chore, which limits scalability because all I/O goes through a single HBase Master server. # Yarn/Mapreduce framework is required to run MOB compactions in a scalable way, but this won’t work in a stand-alone HBase cluster. # Two separate compactors for MOB and for regular store files and their interactions can result in a data loss (see HBASE-22075) The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible implementation, which is free of the above drawbacks and can be used as a drop in replacement in existing MOB deployments. So, these are design goals of a MOB 2.0: # Make MOB compactions scalable without relying on Yarn/Mapreduce framework # Provide unified compactor for both MOB and regular store files # Make it more robust especially w.r.t. to data losses. # Simplify and reduce the overall MOB code. # Provide 100% compatible implementation with MOB 1.0. # No migration of data should be required between MOB 1.0 and MOB 2.0 - just software upgrade. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22705) IllegalArgumentException exception occured during MobFileCache eviction
[ https://issues.apache.org/jira/browse/HBASE-22705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16892018#comment-16892018 ] Vladimir Rodionov commented on HBASE-22705: --- On my list today. > IllegalArgumentException exception occured during MobFileCache eviction > --- > > Key: HBASE-22705 > URL: https://issues.apache.org/jira/browse/HBASE-22705 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.0.5 >Reporter: Pankaj Kumar >Assignee: Pankaj Kumar >Priority: Critical > Fix For: 2.3.0 > > Attachments: HBASE-22705.branch-2.patch > > > IllegalArgumentException occured during scan operation, > {noformat} > 2019-07-08 01:46:57,764 | ERROR | > RpcServer.FifoWFPBQ.default.handler=129,queue=9,port=21302 | Unexpected > throwable object | > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2502) > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.ComparableTimSort.mergeHi(ComparableTimSort.java:866) > at java.util.ComparableTimSort.mergeAt(ComparableTimSort.java:483) > at > java.util.ComparableTimSort.mergeForceCollapse(ComparableTimSort.java:422) > at java.util.ComparableTimSort.sort(ComparableTimSort.java:222) > at java.util.Arrays.sort(Arrays.java:1312) > at java.util.Arrays.sort(Arrays.java:1506) > at java.util.ArrayList.sort(ArrayList.java:1462) > at java.util.Collections.sort(Collections.java:141) > at org.apache.hadoop.hbase.mob.MobFileCache.evict(MobFileCache.java:144) > at > org.apache.hadoop.hbase.mob.MobFileCache.openFile(MobFileCache.java:214) > at > org.apache.hadoop.hbase.regionserver.HMobStore.readCell(HMobStore.java:397) > at > org.apache.hadoop.hbase.regionserver.HMobStore.resolve(HMobStore.java:358) > at > org.apache.hadoop.hbase.regionserver.MobStoreScanner.next(MobStoreScanner.java:74) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:150) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016)