Re: Major Compaction Tool
Hi, We wrote something similar. It just triggers major compaction with given parallelism and distribute it across the cluster. https://github.com/flipkart-incubator/hbase-compactor On Dec 16, 2017 10:01 AM, "Jean-Marc Spaggiari" wrote: Rahul, I had something in mind for months/years! It's a must to have! Thanks for taking the task! I like register to the JIRA and come back very soon with tons of ides and recommendations. You can count on my to test it too! JMS 2017-12-15 17:44 GMT-05:00 rahul gidwani : > The tool creates a Map of servers to CompactionRequests needing to be > performed. You always select the server with the largest queue (*which is > not currently compacting) *to compact next. > > I created a JIRA: HBASE-19528 for this tool. > > On Fri, Dec 15, 2017 at 2:35 PM, Ted Yu wrote: > > > bq. with at most N distinct RegionServers compacting at a given time > > > > If per table balancing is not on, the regions for the underlying table > may > > not be evenly distributed across the cluster. > > In that case, how would the tool which servers to perform compaction ? > > > > I think you can log a JIRA for upstreaming this tool. > > > > Thanks > > > > On Fri, Dec 15, 2017 at 2:01 PM, rahul gidwani > wrote: > > > > > Hi, > > > > > > I was wondering if anyone was interested in a manual major compactor > > tool. > > > > > > The basic overview of how this tool works is: > > > > > > Parameters: > > > > > >- > > > > > >Table > > >- > > > > > >Stores > > >- > > > > > >ClusterConcurrency > > >- > > > > > >Timestamp > > > > > > > > > So you input a table, desired concurrency and the list of stores you > wish > > > to major compact. The tool first checks the filesystem to see which > > stores > > > need compaction based on the timestamp you provide (default is current > > > time). It takes that list of stores that require compaction and > executes > > > those requests concurrently with at most N distinct RegionServers > > > compacting at a given time. Each thread waits for the compaction to > > > complete before moving to the next queue. If a region split, merge or > > move > > > happens this tool ensures those regions get major compacted as well. > > > > > > We have started using this tool in production but were wondering if > there > > > is any interest from you guys in getting this upstream. > > > > > > This helps us in two ways, we can limit how much I/O bandwidth we are > > using > > > for major compaction cluster wide and we are guaranteed after the tool > > > completes that all requested compactions complete regardless of moves, > > > merges and splits. > > > > > >
Re: Major Compaction Tool
Rahul, I had something in mind for months/years! It's a must to have! Thanks for taking the task! I like register to the JIRA and come back very soon with tons of ides and recommendations. You can count on my to test it too! JMS 2017-12-15 17:44 GMT-05:00 rahul gidwani : > The tool creates a Map of servers to CompactionRequests needing to be > performed. You always select the server with the largest queue (*which is > not currently compacting) *to compact next. > > I created a JIRA: HBASE-19528 for this tool. > > On Fri, Dec 15, 2017 at 2:35 PM, Ted Yu wrote: > > > bq. with at most N distinct RegionServers compacting at a given time > > > > If per table balancing is not on, the regions for the underlying table > may > > not be evenly distributed across the cluster. > > In that case, how would the tool which servers to perform compaction ? > > > > I think you can log a JIRA for upstreaming this tool. > > > > Thanks > > > > On Fri, Dec 15, 2017 at 2:01 PM, rahul gidwani > wrote: > > > > > Hi, > > > > > > I was wondering if anyone was interested in a manual major compactor > > tool. > > > > > > The basic overview of how this tool works is: > > > > > > Parameters: > > > > > >- > > > > > >Table > > >- > > > > > >Stores > > >- > > > > > >ClusterConcurrency > > >- > > > > > >Timestamp > > > > > > > > > So you input a table, desired concurrency and the list of stores you > wish > > > to major compact. The tool first checks the filesystem to see which > > stores > > > need compaction based on the timestamp you provide (default is current > > > time). It takes that list of stores that require compaction and > executes > > > those requests concurrently with at most N distinct RegionServers > > > compacting at a given time. Each thread waits for the compaction to > > > complete before moving to the next queue. If a region split, merge or > > move > > > happens this tool ensures those regions get major compacted as well. > > > > > > We have started using this tool in production but were wondering if > there > > > is any interest from you guys in getting this upstream. > > > > > > This helps us in two ways, we can limit how much I/O bandwidth we are > > using > > > for major compaction cluster wide and we are guaranteed after the tool > > > completes that all requested compactions complete regardless of moves, > > > merges and splits. > > > > > >
Re: Major Compaction Tool
The tool creates a Map of servers to CompactionRequests needing to be performed. You always select the server with the largest queue (*which is not currently compacting) *to compact next. I created a JIRA: HBASE-19528 for this tool. On Fri, Dec 15, 2017 at 2:35 PM, Ted Yu wrote: > bq. with at most N distinct RegionServers compacting at a given time > > If per table balancing is not on, the regions for the underlying table may > not be evenly distributed across the cluster. > In that case, how would the tool which servers to perform compaction ? > > I think you can log a JIRA for upstreaming this tool. > > Thanks > > On Fri, Dec 15, 2017 at 2:01 PM, rahul gidwani wrote: > > > Hi, > > > > I was wondering if anyone was interested in a manual major compactor > tool. > > > > The basic overview of how this tool works is: > > > > Parameters: > > > >- > > > >Table > >- > > > >Stores > >- > > > >ClusterConcurrency > >- > > > >Timestamp > > > > > > So you input a table, desired concurrency and the list of stores you wish > > to major compact. The tool first checks the filesystem to see which > stores > > need compaction based on the timestamp you provide (default is current > > time). It takes that list of stores that require compaction and executes > > those requests concurrently with at most N distinct RegionServers > > compacting at a given time. Each thread waits for the compaction to > > complete before moving to the next queue. If a region split, merge or > move > > happens this tool ensures those regions get major compacted as well. > > > > We have started using this tool in production but were wondering if there > > is any interest from you guys in getting this upstream. > > > > This helps us in two ways, we can limit how much I/O bandwidth we are > using > > for major compaction cluster wide and we are guaranteed after the tool > > completes that all requested compactions complete regardless of moves, > > merges and splits. > > >
[jira] [Created] (HBASE-19530) New regions should always be added with state CLOSED
Appy created HBASE-19530: Summary: New regions should always be added with state CLOSED Key: HBASE-19530 URL: https://issues.apache.org/jira/browse/HBASE-19530 Project: HBase Issue Type: Bug Reporter: Appy We shouldn't add regions with state null. In case of failures and recovery, it's not possible to determine what did it mean and things become uncertain. All operations should add regions in a well defined state. For now, we'll set the default to CLOSED, since whatever ops are adding new regions, they would anyways be enabling them explicitly if needed. fyi: [~stack] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19529) Handle null states in AM
Appy created HBASE-19529: Summary: Handle null states in AM Key: HBASE-19529 URL: https://issues.apache.org/jira/browse/HBASE-19529 Project: HBase Issue Type: Bug Reporter: Appy >From debugging in HBASE-19457, found some questions that need concrete answers: 1) What does a region state of null in meta means? Currently AM treats it as OFFLINE 2) What does a table state of null in meta means? Currently TSM treats it as ENABLED. More importantly, we need to fix holes in AM so that our state machine is well defined and doesn't end up in these uncertainties. Figuring out answers to above questions will help in that direction. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: Major Compaction Tool
bq. with at most N distinct RegionServers compacting at a given time If per table balancing is not on, the regions for the underlying table may not be evenly distributed across the cluster. In that case, how would the tool which servers to perform compaction ? I think you can log a JIRA for upstreaming this tool. Thanks On Fri, Dec 15, 2017 at 2:01 PM, rahul gidwani wrote: > Hi, > > I was wondering if anyone was interested in a manual major compactor tool. > > The basic overview of how this tool works is: > > Parameters: > >- > >Table >- > >Stores >- > >ClusterConcurrency >- > >Timestamp > > > So you input a table, desired concurrency and the list of stores you wish > to major compact. The tool first checks the filesystem to see which stores > need compaction based on the timestamp you provide (default is current > time). It takes that list of stores that require compaction and executes > those requests concurrently with at most N distinct RegionServers > compacting at a given time. Each thread waits for the compaction to > complete before moving to the next queue. If a region split, merge or move > happens this tool ensures those regions get major compacted as well. > > We have started using this tool in production but were wondering if there > is any interest from you guys in getting this upstream. > > This helps us in two ways, we can limit how much I/O bandwidth we are using > for major compaction cluster wide and we are guaranteed after the tool > completes that all requested compactions complete regardless of moves, > merges and splits. >
[jira] [Created] (HBASE-19528) Major Compaction Tool
churro morales created HBASE-19528: -- Summary: Major Compaction Tool Key: HBASE-19528 URL: https://issues.apache.org/jira/browse/HBASE-19528 Project: HBase Issue Type: New Feature Reporter: churro morales Assignee: churro morales Fix For: 2.0.0, 3.0.0 The basic overview of how this tool works is: Parameters: Table Stores ClusterConcurrency Timestamp So you input a table, desired concurrency and the list of stores you wish to major compact. The tool first checks the filesystem to see which stores need compaction based on the timestamp you provide (default is current time). It takes that list of stores that require compaction and executes those requests concurrently with at most N distinct RegionServers compacting at a given time. Each thread waits for the compaction to complete before moving to the next queue. If a region split, merge or move happens this tool ensures those regions get major compacted as well. This helps us in two ways, we can limit how much I/O bandwidth we are using for major compaction cluster wide and we are guaranteed after the tool completes that all requested compactions complete regardless of moves, merges and splits. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re:Major Compaction Tool
Hi Rahul, That sounds like a very useful tool. It would be a good extension to the pedagogy I point folks to for scheduled HBase major compactions via Oozie (which just does a naive asynchronous table-wide major compaction today): https://github.com/cbaenziger/Oozie_MajorCompaction_Example -Clay From: dev@hbase.apache.org At: 12/15/17 17:02:02To: dev@hbase.apache.org Subject: Major Compaction Tool Hi, I was wondering if anyone was interested in a manual major compactor tool. The basic overview of how this tool works is: Parameters: - Table - Stores - ClusterConcurrency - Timestamp So you input a table, desired concurrency and the list of stores you wish to major compact. The tool first checks the filesystem to see which stores need compaction based on the timestamp you provide (default is current time). It takes that list of stores that require compaction and executes those requests concurrently with at most N distinct RegionServers compacting at a given time. Each thread waits for the compaction to complete before moving to the next queue. If a region split, merge or move happens this tool ensures those regions get major compacted as well. We have started using this tool in production but were wondering if there is any interest from you guys in getting this upstream. This helps us in two ways, we can limit how much I/O bandwidth we are using for major compaction cluster wide and we are guaranteed after the tool completes that all requested compactions complete regardless of moves, merges and splits.
Major Compaction Tool
Hi, I was wondering if anyone was interested in a manual major compactor tool. The basic overview of how this tool works is: Parameters: - Table - Stores - ClusterConcurrency - Timestamp So you input a table, desired concurrency and the list of stores you wish to major compact. The tool first checks the filesystem to see which stores need compaction based on the timestamp you provide (default is current time). It takes that list of stores that require compaction and executes those requests concurrently with at most N distinct RegionServers compacting at a given time. Each thread waits for the compaction to complete before moving to the next queue. If a region split, merge or move happens this tool ensures those regions get major compacted as well. We have started using this tool in production but were wondering if there is any interest from you guys in getting this upstream. This helps us in two ways, we can limit how much I/O bandwidth we are using for major compaction cluster wide and we are guaranteed after the tool completes that all requested compactions complete regardless of moves, merges and splits.
[jira] [Created] (HBASE-19527) Make ExecutorService threads daemon=true.
stack created HBASE-19527: - Summary: Make ExecutorService threads daemon=true. Key: HBASE-19527 URL: https://issues.apache.org/jira/browse/HBASE-19527 Project: HBase Issue Type: Bug Reporter: stack Let me try this. ExecutorService runs OPENs, CLOSE, etc. If Server is going down, no point in these threads sticking around (I think). Let me try this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HBASE-19272) Deal with HBCK tests disabled by HBASE-14614 AMv2 when HBCK works again...
[ https://issues.apache.org/jira/browse/HBASE-19272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-19272. --- Resolution: Fixed Assignee: stack I pushed .001 to branch-2 and master. It just removes the named tests. > Deal with HBCK tests disabled by HBASE-14614 AMv2 when HBCK works again... > -- > > Key: HBASE-19272 > URL: https://issues.apache.org/jira/browse/HBASE-19272 > Project: HBase > Issue Type: Sub-task > Components: hbck >Reporter: stack >Assignee: stack > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-19272.master.001.patch > > > DIsabled by HBASE-14614, enabling AMv2. See HBASE-18110. > Here is the list: > * TestHBaseFsckTwoRS > * TestOfflineMetaRebuildBase > * TestHBaseFsckReplicas > * TestOfflineMetaRebuildOverlap > * TestHBaseFsckOneRS > * TestOfflineMetaRebuildHole -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19526) Update hadoop version to 3.0 GA
Mike Drob created HBASE-19526: - Summary: Update hadoop version to 3.0 GA Key: HBASE-19526 URL: https://issues.apache.org/jira/browse/HBASE-19526 Project: HBase Issue Type: Task Components: build, dependencies Reporter: Mike Drob Fix For: 2.0.0-beta-1 We're still building against hadoop 3.0-beta1, while GA is recently released. We should update, hopefully no surprises. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: EOL HBase 1.1
Thanks Nick for your cool curation of our branch-1. Nice job done, S On Thu, Dec 14, 2017 at 10:25 AM, Nick Dimiduk wrote: > Hello, > > This is for folks who haven't followed our discussions and aren't reading > closely the ANNOUNCE mail. The release line from branch-1.1 is now > concluded. 1.1.13 was the final release for that line. 1.2 has been our > stable release line for quite some time -- please upgrade! > > Thank you again to all the professional and volunteer contributors who made > branch-1.1 possible. > > Thanks, > Nick >
[jira] [Resolved] (HBASE-18110) [AMv2] Reenable tests temporarily disabled
[ https://issues.apache.org/jira/browse/HBASE-18110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-18110. --- Resolution: Fixed Resolving as done. Accounting of all tests disabled has them all onlined again unless explicitly noted as tests that no longer make sense in hbase2 realm. > [AMv2] Reenable tests temporarily disabled > -- > > Key: HBASE-18110 > URL: https://issues.apache.org/jira/browse/HBASE-18110 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 2.0.0 >Reporter: stack >Assignee: stack >Priority: Blocker > Fix For: 2.0.0 > > > We disabled tests that didn't make sense or relied on behavior not supported > by AMv2. Revisit and reenable after AMv2 gets committed. Here is the set > (from > https://docs.google.com/document/d/1eVKa7FHdeoJ1-9o8yZcOTAQbv0u0bblBlCCzVSIn69g/edit#heading=h.rsj53tx4vlwj) > testAllFavoredNodesDead and testAllFavoredNodesDeadMasterRestarted and > testMisplacedRegions in TestFavoredStochasticLoadBalancer … not sure what > this about. > testRegionNormalizationMergeOnCluster in TestSimpleRegionNormalizerOnCluster > disabled for now till we fix up Merge. > testMergeWithReplicas in TestRegionMergeTransactionOnCluster because don't > know how it is supposed to work. > Admin#close does not update Master. Causes > testHBaseFsckWithFewerMetaReplicaZnodes in TestMetaWithReplicas to fail > (Master gets report about server closing when it didn’t run the close -- gets > freaked out). > Disabled/Ignore TestRSGroupsOfflineMode#testOffline; need to dig in on what > offline is. > Disabled/Ignore TestRSGroups. > All tests that have to do w/ fsck:TestHBaseFsckTwoRS, > TestOfflineMetaRebuildBase TestHBaseFsckReplicas, > TestOfflineMetaRebuildOverlap, testChangingReplicaCount in > TestMetaWithReplicas (internally it is doing fscks which are killing RS)... > FSCK test testHBaseFsckWithExcessMetaReplicas in TestMetaWithReplicas. > So is testHBaseFsckWithFewerMetaReplicas in same class. > TestHBaseFsckOneRS is fsck. Disabled. > TestOfflineMetaRebuildHole is about rebuilding hole with fsck. > Master carries meta: > TestRegionRebalancing is disabled because doesn't consider the fact that > Master carries system tables only (fix of average in RegionStates brought out > the issue). > Disabled testMetaAddressChange in TestMetaWithReplicas because presumes can > move meta... you can't > TestAsyncTableGetMultiThreaded wants to move hbase:meta...Balancer does NPEs. > AMv2 won't let you move hbase:meta off Master. > Disabled parts of...testCreateTableWithMultipleReplicas in > TestMasterOperationsForRegionReplicas There is an issue w/ assigning more > replicas if number of replicas is changed on us. See '/* DISABLED! FOR > NOW'. > Disabled TestCorruptedRegionStoreFile. Depends on a half-implemented reopen > of a region when a store file goes missing; TODO. > testRetainAssignmentOnRestart in TestRestartCluster does not work. AMv2 does > retain semantic differently. Fix. TODO. > TestMasterFailover needs to be rewritten for AMv2. It uses tricks not > ordained when up on AMv2. The test is also hobbled by fact that we > religiously enforce that only master can carry meta, something we are lose > about in old AM. > Fix Ignores in TestServerCrashProcedure. Master is different now. > Offlining is done differently now: Because of this disabled testOfflineRegion > in TestAsyncRegionAdminApi > Skipping delete of table after test in TestAccessController3 because of > access issues w/ AMv2. AMv1 seems to crash servers on exit too for same lack > of auth perms but AMv2 gets hung up. TODO. See cleanUp method. > TestHCM#testMulti and TestHCM > Fix TestMasterMetrics. Stuff is different now around startup which messes up > this test. Disabled two of three tests. > I tried to fix TestMasterBalanceThrottling but it looks like > SimpleLoadBalancer is borked whether AMv2 or not. > Disabled testPickers in TestFavoredStochasticBalancerPickers. It hangs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19525) RS side changes for moving peer modication from zk watcher to procedure
Duo Zhang created HBASE-19525: - Summary: RS side changes for moving peer modication from zk watcher to procedure Key: HBASE-19525 URL: https://issues.apache.org/jira/browse/HBASE-19525 Project: HBase Issue Type: Sub-task Components: proc-v2, Replication Reporter: Duo Zhang Assignee: Zheng Hu Fix For: HBASE-19397 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19524) Master side changes for moving peer modification from zk watcher to procedure
Duo Zhang created HBASE-19524: - Summary: Master side changes for moving peer modification from zk watcher to procedure Key: HBASE-19524 URL: https://issues.apache.org/jira/browse/HBASE-19524 Project: HBase Issue Type: Sub-task Components: proc-v2, Replication Reporter: Duo Zhang Assignee: Duo Zhang Fix For: HBASE-19397 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only
[ https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-14790. --- Resolution: Fixed More than two years, finally we are done! Thanks all who helping on this new feature! > Implement a new DFSOutputStream for logging WAL only > > > Key: HBASE-14790 > URL: https://issues.apache.org/jira/browse/HBASE-14790 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Duo Zhang >Assignee: Duo Zhang > Fix For: 2.0.0-beta-1 > > > The original {{DFSOutputStream}} is very powerful and aims to serve all > purposes. But in fact, we do not need most of the features if we only want to > log WAL. For example, we do not need pipeline recovery since we could just > close the old logger and open a new one. And also, we do not need to write > multiple blocks since we could also open a new logger if the old file is too > large. > And the most important thing is that, it is hard to handle all the corner > cases to avoid data loss or data inconsistency(such as HBASE-14004) when > using original DFSOutputStream due to its complicated logic. And the > complicated logic also force us to use some magical tricks to increase > performance. For example, we need to use multiple threads to call {{hflush}} > when logging, and now we use 5 threads. But why 5 not 10 or 100? > So here, I propose we should implement our own {{DFSOutputStream}} when > logging WAL. For correctness, and also for performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19523) TestLogRolling is flakey
Duo Zhang created HBASE-19523: - Summary: TestLogRolling is flakey Key: HBASE-19523 URL: https://issues.apache.org/jira/browse/HBASE-19523 Project: HBase Issue Type: Bug Components: test Reporter: Duo Zhang https://builds.apache.org/job/PreCommit-HBASE-Build/10475/testReport/ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19522) The complete order is wrong in AsyncBufferedMutatorImpl
Guanghao Zhang created HBASE-19522: -- Summary: The complete order is wrong in AsyncBufferedMutatorImpl Key: HBASE-19522 URL: https://issues.apache.org/jira/browse/HBASE-19522 Project: HBase Issue Type: Bug Reporter: Guanghao Zhang {code} List> toComplete = this.futures; assert toSend.size() == toComplete.size(); this.mutations = new ArrayList<>(); this.futures = new ArrayList<>(); bufferedSize = 0L; Iterator> toCompleteIter = toComplete.iterator(); for (CompletableFuture future : table.batch(toSend)) { future.whenComplete((r, e) -> { CompletableFuture f = toCompleteIter.next(); // Call next in callback, so the complete order may different with the future order if (e != null) { f.completeExceptionally(e); } else { f.complete(null); } }); } {code} Here we call table.batch to get a list of CompleteFuture for each mutation. Then we register a call back for each future. But the problem is we call toCompleteIter.next() in the callback. So we may complete the future by a wrong order(not same with the mutation order). Meanwhile, as ArrayList is not thread safe, so different thread may get same future by toCompleteIter.next(). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: EOL HBase 1.1
Thanks for all the great work as branch-1.1 RM sir, it's definitely a successful release line! Best Regards, Yu On 15 December 2017 at 02:25, Nick Dimiduk wrote: > Hello, > > This is for folks who haven't followed our discussions and aren't reading > closely the ANNOUNCE mail. The release line from branch-1.1 is now > concluded. 1.1.13 was the final release for that line. 1.2 has been our > stable release line for quite some time -- please upgrade! > > Thank you again to all the professional and volunteer contributors who made > branch-1.1 possible. > > Thanks, > Nick >