[jira] [Created] (HBASE-19943) Only allow removing sync replication peer which is in DA state
Duo Zhang created HBASE-19943: - Summary: Only allow removing sync replication peer which is in DA state Key: HBASE-19943 URL: https://issues.apache.org/jira/browse/HBASE-19943 Project: HBase Issue Type: Sub-task Reporter: Duo Zhang To simplify the logic of RemovePeerProcedure. Otherwise we may also need to reopen regions which makes the RemovePeerProcedure can not fit for both sync and normal replication peer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
RE: Considering branching for 1.5 and other branch-1 release planning
We at Huawei have been testing this for more than 8 months now, did not find any critical issues thus far and launched a service on Huawei public cloud also which is based HBase 1.3.1 version. With that, I am +1 on moving the stable pointer. Regards, Ashish -Original Message- From: Zach York [mailto:zyork.contribut...@gmail.com] Sent: Tuesday, February 06, 2018 3:28 AM To: dev@hbase.apache.org Subject: Re: Considering branching for 1.5 and other branch-1 release planning > If someone else is using 1.3 your feedback would be very valuable. EMR has shipped the 1.3 line since EMR 5.4.0 (March 08, 2017). We have not ran into any unresolved critical issues and it has been fairly stable overall. I would be a +1 on moving the stable pointer. Thanks, Zach On Mon, Feb 5, 2018 at 1:30 PM, Andrew Purtell wrote: > Thanks so much for the feedback Francis. > > I think we are just about there to move the stable pointer. > > > On Feb 5, 2018, at 9:48 AM, Francis Liu wrote: > > >> If someone else is using 1.3 your feedback would be very > > valuable. > > > > We are running 1.3 in production, full rollout ongoing. Ran into > > some issues but it's generally been stable. We'll prolly gonna be on > > 1.3 for a while. > > > > Cheers, > > Francis > > > > > >> On Sun, Feb 4, 2018 at 10:59 AM Andrew Purtell > >> > wrote: > >> > >> Hi Ted, > >> > >> If Hadoop 3 support is in place for an (eventual) 1.5.0 release, I > >> think that would be great. > >> > >> > >>> On Sun, Feb 4, 2018 at 10:55 AM, Ted Yu wrote: > >>> > >>> Andrew: > >>> Do you think making 1.5 release support hadoop 3 is among the goals ? > >>> > >>> Cheers > >>> > >>> On Fri, Feb 2, 2018 at 3:28 PM, Andrew Purtell > >>> > >>> wrote: > >>> > The backport of RSGroups to branch-1 triggered the opening of the > 1.4 > >>> code > line as branch-1.4 and releases 1.4.0 and 1.4.1. > > After the commit of HBASE-19858 (Backport HBASE-14061 (Support > CF-level > Storage Policy) to branch-1), storage policy aware file placement > might > >>> be > useful enough to trigger a new minor release from branch-1. This > would > >> be > branch-1.5, and at least release 1.5.0. I am not sure about this yet. > >> It > needs testing. I'd like to mock up a couple of use cases and > determine > >> if > what we have is sufficient on its own or more changes will be needed. > I > want to get the idea of a 1.5 on your radar. though. > > Also, I would like to make one more release of branch-1.3 before > we > >>> retire > it. Mikhail passed the reins. We might have a volunteer to RM 1.3.2. > If > not, I will do it. I'm expecting 1.4 will supersede 1.3 but this > will > >> be > decided organically depending on uptake. > > -- > Best regards, > Andrew > > Words like orphans lost among the crosstalk, meaning torn from > truth's decrepit hands > - A23, Crosstalk > > >>> > >> > >> > >> > >> -- > >> Best regards, > >> Andrew > >> > >> Words like orphans lost among the crosstalk, meaning torn from > >> truth's decrepit hands > >> - A23, Crosstalk > >> >
[jira] [Created] (HBASE-19942) Fix flaky TestSimpleRpcScheduler
Guanghao Zhang created HBASE-19942: -- Summary: Fix flaky TestSimpleRpcScheduler Key: HBASE-19942 URL: https://issues.apache.org/jira/browse/HBASE-19942 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang [https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests-branch2.0/lastSuccessfulBuild/artifact/dashboard.html] https://builds.apache.org/job/HBASE-Flaky-Tests-branch2.0/1387/testReport/junit/org.apache.hadoop.hbase.ipc/TestSimpleRpcScheduler/testSoftAndHardQueueLimits/ h3. Stacktrace java.lang.AssertionError at org.apache.hadoop.hbase.ipc.TestSimpleRpcScheduler.testSoftAndHardQueueLimits(TestSimpleRpcScheduler.java:451) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19941) Flaky TestCreateTableProcedure times out in nightly, needs to LargeTests
Umesh Agashe created HBASE-19941: Summary: Flaky TestCreateTableProcedure times out in nightly, needs to LargeTests Key: HBASE-19941 URL: https://issues.apache.org/jira/browse/HBASE-19941 Project: HBase Issue Type: Bug Components: build Affects Versions: 2.0.0-beta-1 Reporter: Umesh Agashe Assignee: Umesh Agashe Fix For: 2.0.0-beta-2 Currently its categorized as MediumTests but sometimes running all test in this class take more than 180 seconds. Here is the comparison of runtimes between local runs (on my dev machine) and in nightly runs: ||Test||Local (seconds)||Nihgtly (seconds)|| |testSimpleCreateWithSplits|~1.5|~12| |testRollbackAndDoubleExecutionOnMobTable|~4.7|~21| |testSimpleCreate|~1.7|~11| |testRollbackAndDoubleExecution|~4.3|~18| |testMRegions|~26.4|Timed out after 90 seconds| -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-19841) Tests against hadoop3 fail with StreamLacksCapabilityException
[ https://issues.apache.org/jira/browse/HBASE-19841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack reopened HBASE-19841: --- Reopening. Breaks launching of MR jobs on a cluster. Here is what a good launch looks like: {code} ... 18/02/05 17:11:33 INFO impl.YarnClientImpl: Submitted application application_1517369646236_0009 18/02/05 17:11:33 INFO mapreduce.Job: The url to track the job: http://ve0524.halxg.cloudera.com:10134/proxy/application_1517369646236_0009/ 18/02/05 17:11:33 INFO mapreduce.Job: Running job: job_1517369646236_0009 18/02/05 17:11:40 INFO mapreduce.Job: Job job_1517369646236_0009 running in uber mode : false 18/02/05 17:11:40 INFO mapreduce.Job: map 0% reduce 0% 18/02/05 17:11:57 INFO mapreduce.Job: map 14% reduce 0% ... {code} ... but now it does this {code} 18/02/05 17:17:54 INFO mapreduce.Job: The url to track the job: http://ve0524.halxg.cloudera.com:10134/proxy/application_1517369646236_0011/ 18/02/05 17:17:54 INFO mapreduce.Job: Running job: job_1517369646236_0011 18/02/05 17:17:56 INFO mapreduce.Job: Job job_1517369646236_0011 running in uber mode : false 18/02/05 17:17:56 INFO mapreduce.Job: map 0% reduce 0% 18/02/05 17:17:56 INFO mapreduce.Job: Job job_1517369646236_0011 failed with state FAILED due to: Application application_1517369646236_0011 failed 2 times due to AM Container for appattempt_1517369646236_0011_02 exited with exitCode: -1000 Failing this attempt.Diagnostics: File file:/tmp/stack/.staging/job_1517369646236_0011/job.splitmetainfo does not exist java.io.FileNotFoundException: File file:/tmp/stack/.staging/job_1517369646236_0011/job.splitmetainfo does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:635) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:861) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:625) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) For more detailed output, check the application tracking page: http://ve0524.halxg.cloudera.com:8188/applicationhistory/app/application_1517369646236_0011 Then click on links to logs of each attempt. . Failing the application. 18/02/05 17:17:56 INFO mapreduce.Job: Counters: 0 {code} If I revert this patch, the submit runs again. I'd made the staging dir /tmp/stack and seemed to get further... The job staging was made in the local fs... but it seems like we are then looking for it up in hdfs. My guess is our stamping the fs as local until minihdfscluster starts works for the unit test case but it messes up the inference of fs that allows the above submission to work. I'd like to revert this if thats ok. > Tests against hadoop3 fail with StreamLacksCapabilityException > -- > > Key: HBASE-19841 > URL: https://issues.apache.org/jira/browse/HBASE-19841 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Assignee: Mike Drob >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: 19841.007.patch, 19841.06.patch, 19841.v0.txt, > 19841.v1.txt, HBASE-19841.v10.patch, HBASE-19841.v11.patch, > HBASE-19841.v11.patch, HBASE-19841.v2.patch, HBASE-19841.v3.patch, > HBASE-19841.v4.patch, HBASE-19841.v5.patch, HBASE-19841.v7.patch, > HBASE-19841.v8.patch, HBASE-19841.v8.patch, HBASE-19841.v8.patch, > HBASE-19841.v9.patch > > > The following can be observed running against hadoop3: > {code} > java.io.IOException: cannot get log writer > at > org.apache.hadoop.hbase.regionserver.TestCompactingMemStore.compactingSetUp(TestCompactingMemStor
[jira] [Reopened] (HBASE-19927) TestFullLogReconstruction flakey
[ https://issues.apache.org/jira/browse/HBASE-19927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang reopened HBASE-19927: --- A bit strange {noformat} 2018-02-05 19:05:43,537 INFO [Time-limited test] regionserver.HRegionServer(2116): * STOPPING region server 'asf903.gq1.ygridcore.net,57911,1517857533524' * 2018-02-05 19:05:43,537 INFO [Time-limited test] regionserver.HRegionServer(2130): STOPPED: Shutdown requested 2018-02-05 19:05:43,538 INFO [Time-limited test] regionserver.HRegionServer(2116): * STOPPING region server 'asf903.gq1.ygridcore.net,50054,1517857533606' * 2018-02-05 19:05:43,538 INFO [RS:0;asf903:57911] regionserver.SplitLogWorker(160): Sending interrupt to stop the worker thread 2018-02-05 19:05:43,538 INFO [Time-limited test] regionserver.HRegionServer(2130): STOPPED: Shutdown requested 2018-02-05 19:05:43,538 INFO [Time-limited test] regionserver.HRegionServer(2116): * STOPPING region server 'asf903.gq1.ygridcore.net,42069,1517857533678' * 2018-02-05 19:05:43,974 ERROR [regionserver/asf903:0.logRoller] helpers.MarkerIgnoringBase(159): * ABORTING region server asf903.gq1.ygridcore.net,57911,1517857533524: IOE in log roller * {noformat} The aborting still happens after the stopping in shutdown. Let me check. > TestFullLogReconstruction flakey > > > Key: HBASE-19927 > URL: https://issues.apache.org/jira/browse/HBASE-19927 > Project: HBase > Issue Type: Sub-task > Components: wal >Reporter: stack >Assignee: Duo Zhang >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19927.patch, js, out > > > Fails pretty frequently in hadoopqa builds. > There is a recent hang in > org.apache.hadoop.hbase.TestFullLogReconstruction.tearDownAfterClass(TestFullLogReconstruction.java:68) > In here... > https://builds.apache.org/job/PreCommit-HBASE-Build/11363/testReport/org.apache.hadoop.hbase/TestFullLogReconstruction/org_apache_hadoop_hbase_TestFullLogReconstruction/ > ... see here. > Thread 1250 (RS_CLOSE_META-edd281aedb18:59863-0): > State: TIMED_WAITING > Blocked count: 92 > Waited count: 278 > Stack: > java.lang.Object.wait(Native Method) > > org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:133) > > org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:718) > > org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:605) > > org.apache.hadoop.hbase.regionserver.wal.WALUtil.doFullAppendTransaction(WALUtil.java:154) > > org.apache.hadoop.hbase.regionserver.wal.WALUtil.writeFlushMarker(WALUtil.java:81) > > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2645) > > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2356) > > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2328) > > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2319) > org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1531) > org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1437) > > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:104) > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > java.lang.Thread.run(Thread.java:748) > We missed a signal? We need to do an interrupt? The log is not all there in > hadoopqa builds so hard to see all that is going on. This test is not in the > flakey set either -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-19915) From split/ merge procedures daughter/ merged regions get created in OFFLINE state
[ https://issues.apache.org/jira/browse/HBASE-19915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Appy resolved HBASE-19915. -- Resolution: Fixed > From split/ merge procedures daughter/ merged regions get created in OFFLINE > state > -- > > Key: HBASE-19915 > URL: https://issues.apache.org/jira/browse/HBASE-19915 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0-beta-1 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: hbase-19915.addendum.patch, > hbase-19915.master.001.patch, hbase-19915.master.001.patch > > > See HBASE-19530. When regions are created initial state should be CLOSED. Bug > was discovered while debugging flaky test > TestSplitTableRegionProcedure#testRollbackAndDoubleExecution with numOfSteps > set to 4. After updating daughter regions in meta when master is restarted, > startup sequence of master assigns all OFFLINE regions. As daughter regions > are stored with OFFLINE state, daughter regions are assigned. This is > followed by re-assignment of daughter regions from resumed > SplitTableRegionProcedure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-19915) From split/ merge procedures daughter/ merged regions get created in OFFLINE state
[ https://issues.apache.org/jira/browse/HBASE-19915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Umesh Agashe reopened HBASE-19915: -- As [~appy] pointed out, daughterA region gets stored with state CLOSED twice. > From split/ merge procedures daughter/ merged regions get created in OFFLINE > state > -- > > Key: HBASE-19915 > URL: https://issues.apache.org/jira/browse/HBASE-19915 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0-beta-1 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: hbase-19915.master.001.patch, > hbase-19915.master.001.patch > > > See HBASE-19530. When regions are created initial state should be CLOSED. Bug > was discovered while debugging flaky test > TestSplitTableRegionProcedure#testRollbackAndDoubleExecution with numOfSteps > set to 4. After updating daughter regions in meta when master is restarted, > startup sequence of master assigns all OFFLINE regions. As daughter regions > are stored with OFFLINE state, daughter regions are assigned. This is > followed by re-assignment of daughter regions from resumed > SplitTableRegionProcedure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Considering branching for 1.5 and other branch-1 release planning
> If someone else is using 1.3 your feedback would be very valuable. EMR has shipped the 1.3 line since EMR 5.4.0 (March 08, 2017). We have not ran into any unresolved critical issues and it has been fairly stable overall. I would be a +1 on moving the stable pointer. Thanks, Zach On Mon, Feb 5, 2018 at 1:30 PM, Andrew Purtell wrote: > Thanks so much for the feedback Francis. > > I think we are just about there to move the stable pointer. > > > On Feb 5, 2018, at 9:48 AM, Francis Liu wrote: > > >> If someone else is using 1.3 your feedback would be very > > valuable. > > > > We are running 1.3 in production, full rollout ongoing. Ran into some > > issues but it's generally been stable. We'll prolly gonna be on 1.3 for a > > while. > > > > Cheers, > > Francis > > > > > >> On Sun, Feb 4, 2018 at 10:59 AM Andrew Purtell > wrote: > >> > >> Hi Ted, > >> > >> If Hadoop 3 support is in place for an (eventual) 1.5.0 release, I think > >> that would be great. > >> > >> > >>> On Sun, Feb 4, 2018 at 10:55 AM, Ted Yu wrote: > >>> > >>> Andrew: > >>> Do you think making 1.5 release support hadoop 3 is among the goals ? > >>> > >>> Cheers > >>> > >>> On Fri, Feb 2, 2018 at 3:28 PM, Andrew Purtell > >>> wrote: > >>> > The backport of RSGroups to branch-1 triggered the opening of the 1.4 > >>> code > line as branch-1.4 and releases 1.4.0 and 1.4.1. > > After the commit of HBASE-19858 (Backport HBASE-14061 (Support > CF-level > Storage Policy) to branch-1), storage policy aware file placement > might > >>> be > useful enough to trigger a new minor release from branch-1. This would > >> be > branch-1.5, and at least release 1.5.0. I am not sure about this yet. > >> It > needs testing. I'd like to mock up a couple of use cases and determine > >> if > what we have is sufficient on its own or more changes will be needed. > I > want to get the idea of a 1.5 on your radar. though. > > Also, I would like to make one more release of branch-1.3 before we > >>> retire > it. Mikhail passed the reins. We might have a volunteer to RM 1.3.2. > If > not, I will do it. I'm expecting 1.4 will supersede 1.3 but this will > >> be > decided organically depending on uptake. > > -- > Best regards, > Andrew > > Words like orphans lost among the crosstalk, meaning torn from truth's > decrepit hands > - A23, Crosstalk > > >>> > >> > >> > >> > >> -- > >> Best regards, > >> Andrew > >> > >> Words like orphans lost among the crosstalk, meaning torn from truth's > >> decrepit hands > >> - A23, Crosstalk > >> >
[jira] [Resolved] (HBASE-19931) TestMetaWithReplicas failing 100% of the time in testHBaseFsckWithMetaReplicas
[ https://issues.apache.org/jira/browse/HBASE-19931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-19931. --- Resolution: Fixed Re-resolving. Let HBASE-19840 be responsible for latest findings regards this test failing. > TestMetaWithReplicas failing 100% of the time in testHBaseFsckWithMetaReplicas > -- > > Key: HBASE-19931 > URL: https://issues.apache.org/jira/browse/HBASE-19931 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19931.branch-2.001.patch > > > Somehow we missed a test that depends on a run of HBCK. It fails 100% of the > time now because of HBASE-19726 Failed to start HMaster due to infinite > retrying on meta assign where we no longer update hbase:meta with the state > of hbase:meta; rather, hbase:meta's always-ENABLED state is inferred. It > broke HBCK here. > So, disable the test and just-in-case add meta as ENABLED to hbck though hbck > as is is not for hbase2. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-19907) TestMetaWithReplicas still flakey
[ https://issues.apache.org/jira/browse/HBASE-19907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-19907. --- Resolution: Fixed Resolving. HBASE-19931 has taken up the baton on the new failure types for TestMetaWithReplicas. Pushed to branch-2 and master. > TestMetaWithReplicas still flakey > - > > Key: HBASE-19907 > URL: https://issues.apache.org/jira/browse/HBASE-19907 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19907.master.001.patch > > > Still fails because all meta replicas arrive at same server even though > supposedly protection against this added by me in HBASE-19840. > --- > Test set: org.apache.hadoop.hbase.client.TestMetaWithReplicas > --- > Tests run: 5, Failures: 0, Errors: 2, Skipped: 1, Time elapsed: 600.251 s <<< > FAILURE! - in org.apache.hadoop.hbase.client.TestMetaWithReplicas > org.apache.hadoop.hbase.client.TestMetaWithReplicas Time elapsed: 563.656 s > <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 600 > seconds > at > org.apache.hadoop.hbase.client.TestMetaWithReplicas.shutdownMetaAndDoValidations(TestMetaWithReplicas.java:255) > at > org.apache.hadoop.hbase.client.TestMetaWithReplicas.testShutdownHandling(TestMetaWithReplicas.java:181) > org.apache.hadoop.hbase.client.TestMetaWithReplicas Time elapsed: 563.656 s > <<< ERROR! > java.lang.Exception: Appears to be stuck in thread > NIOServerCxn.Factory:0.0.0.0/0.0.0.0:49912 > The move of hbase:meta actually moves it back to same server no good. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Considering branching for 1.5 and other branch-1 release planning
Thanks so much for the feedback Francis. I think we are just about there to move the stable pointer. On Feb 5, 2018, at 9:48 AM, Francis Liu wrote: >> If someone else is using 1.3 your feedback would be very > valuable. > > We are running 1.3 in production, full rollout ongoing. Ran into some > issues but it's generally been stable. We'll prolly gonna be on 1.3 for a > while. > > Cheers, > Francis > > >> On Sun, Feb 4, 2018 at 10:59 AM Andrew Purtell wrote: >> >> Hi Ted, >> >> If Hadoop 3 support is in place for an (eventual) 1.5.0 release, I think >> that would be great. >> >> >>> On Sun, Feb 4, 2018 at 10:55 AM, Ted Yu wrote: >>> >>> Andrew: >>> Do you think making 1.5 release support hadoop 3 is among the goals ? >>> >>> Cheers >>> >>> On Fri, Feb 2, 2018 at 3:28 PM, Andrew Purtell >>> wrote: >>> The backport of RSGroups to branch-1 triggered the opening of the 1.4 >>> code line as branch-1.4 and releases 1.4.0 and 1.4.1. After the commit of HBASE-19858 (Backport HBASE-14061 (Support CF-level Storage Policy) to branch-1), storage policy aware file placement might >>> be useful enough to trigger a new minor release from branch-1. This would >> be branch-1.5, and at least release 1.5.0. I am not sure about this yet. >> It needs testing. I'd like to mock up a couple of use cases and determine >> if what we have is sufficient on its own or more changes will be needed. I want to get the idea of a 1.5 on your radar. though. Also, I would like to make one more release of branch-1.3 before we >>> retire it. Mikhail passed the reins. We might have a volunteer to RM 1.3.2. If not, I will do it. I'm expecting 1.4 will supersede 1.3 but this will >> be decided organically depending on uptake. -- Best regards, Andrew Words like orphans lost among the crosstalk, meaning torn from truth's decrepit hands - A23, Crosstalk >>> >> >> >> >> -- >> Best regards, >> Andrew >> >> Words like orphans lost among the crosstalk, meaning torn from truth's >> decrepit hands >> - A23, Crosstalk >>
[jira] [Created] (HBASE-19940) TestMetaShutdownHandler flakey
stack created HBASE-19940: - Summary: TestMetaShutdownHandler flakey Key: HBASE-19940 URL: https://issues.apache.org/jira/browse/HBASE-19940 Project: HBase Issue Type: Sub-task Reporter: stack Fails 13% of the time. One of the RS won't go down. It has an errant thread running. Not sure what. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19939) TestSplitTableRegion#testSplitWithoutPONR() and testRecoveryAndDoubleExecution() are failing with NPE
Umesh Agashe created HBASE-19939: Summary: TestSplitTableRegion#testSplitWithoutPONR() and testRecoveryAndDoubleExecution() are failing with NPE Key: HBASE-19939 URL: https://issues.apache.org/jira/browse/HBASE-19939 Project: HBase Issue Type: Improvement Components: amv2 Affects Versions: 2.0.0-beta-1 Reporter: Umesh Agashe Assignee: Umesh Agashe Fix For: 2.0.0-beta-2 Error is: {code:java} java.lang.AssertionError: found exception: java.lang.NullPointerException via CODE-BUG: Uncaught runtime exception: pid=154, state=RUNNABLE:SPLIT_TABLE_REGION_CREATE_DAUGHTER_REGIONS; SplitTableRegionProcedure table=testRecoveryAndDoubleExecution, parent=3d8d459ba395c2cf6b1e5c71aca92cfd, daughterA=c6531c10effa8e542159ab82a87bd75e, daughterB=ee34a9af88273b6c06e1a688fc50ed6e:java.lang.NullPointerException: at org.apache.hadoop.hbase.master.assignment.TestSplitTableRegionProcedure.testRecoveryAndDoubleExecution(TestSplitTableRegionProcedure.java:411){code} Exception from the output file: {code:java} 2018-02-05 18:00:48,205 ERROR [PEWorker-1] procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception: pid=19, state=RUNNABLE:SPLIT_TABLE_REGION_CREATE_DAUGHTER_REGIONS; SplitTableRegionProcedure table=testSplitWithoutPONR, parent=57114194fb486a3988b232bcf10eb177, daughterA=749aa83c03b8f7c6b642cd73c5b51e43, daughterB=a53ec69e8dd2cfa6c0be2b9a7eb271bb java.lang.NullPointerException at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.splitStoreFiles(SplitTableRegionProcedure.java:617) at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.createDaughterRegions(SplitTableRegionProcedure.java:541) at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:241) at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:89) at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:180) at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1455) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1224) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734){code} Value of 'htd' is null as it is initialized in the constructor but when the object is deserialized its null. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-19840) Flakey TestMetaWithReplicas
[ https://issues.apache.org/jira/browse/HBASE-19840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack reopened HBASE-19840: --- I see some flakyness still. There is something weird going on. Two ServerNames seem to hash the same. Doesn't make sense (I made a test to try it). Reopening to figure. Pushing a bit more debug... in meantime. Reopening. > Flakey TestMetaWithReplicas > --- > > Key: HBASE-19840 > URL: https://issues.apache.org/jira/browse/HBASE-19840 > Project: HBase > Issue Type: Sub-task > Components: flakey, test >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19840.master.001.patch, > HBASE-19840.master.001.patch > > > Failing about 15% of the time.. In testShutdownHandling.. > [https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests-branch2.0/lastSuccessfulBuild/artifact/dashboard.html] > > Adding some debug. Its hard to follow what is going on in this test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Considering branching for 1.5 and other branch-1 release planning
> If someone else is using 1.3 your feedback would be very valuable. We are running 1.3 in production, full rollout ongoing. Ran into some issues but it's generally been stable. We'll prolly gonna be on 1.3 for a while. Cheers, Francis On Sun, Feb 4, 2018 at 10:59 AM Andrew Purtell wrote: > Hi Ted, > > If Hadoop 3 support is in place for an (eventual) 1.5.0 release, I think > that would be great. > > > On Sun, Feb 4, 2018 at 10:55 AM, Ted Yu wrote: > > > Andrew: > > Do you think making 1.5 release support hadoop 3 is among the goals ? > > > > Cheers > > > > On Fri, Feb 2, 2018 at 3:28 PM, Andrew Purtell > > wrote: > > > > > The backport of RSGroups to branch-1 triggered the opening of the 1.4 > > code > > > line as branch-1.4 and releases 1.4.0 and 1.4.1. > > > > > > After the commit of HBASE-19858 (Backport HBASE-14061 (Support CF-level > > > Storage Policy) to branch-1), storage policy aware file placement might > > be > > > useful enough to trigger a new minor release from branch-1. This would > be > > > branch-1.5, and at least release 1.5.0. I am not sure about this yet. > It > > > needs testing. I'd like to mock up a couple of use cases and determine > if > > > what we have is sufficient on its own or more changes will be needed. I > > > want to get the idea of a 1.5 on your radar. though. > > > > > > Also, I would like to make one more release of branch-1.3 before we > > retire > > > it. Mikhail passed the reins. We might have a volunteer to RM 1.3.2. If > > > not, I will do it. I'm expecting 1.4 will supersede 1.3 but this will > be > > > decided organically depending on uptake. > > > > > > -- > > > Best regards, > > > Andrew > > > > > > Words like orphans lost among the crosstalk, meaning torn from truth's > > > decrepit hands > > >- A23, Crosstalk > > > > > > > > > -- > Best regards, > Andrew > > Words like orphans lost among the crosstalk, meaning torn from truth's > decrepit hands >- A23, Crosstalk >
[jira] [Resolved] (HBASE-19837) Flakey TestRegionLoad
[ https://issues.apache.org/jira/browse/HBASE-19837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-19837. --- Resolution: Fixed Assignee: stack Resolving. Will open new issue if still flakey to refactor the test. > Flakey TestRegionLoad > - > > Key: HBASE-19837 > URL: https://issues.apache.org/jira/browse/HBASE-19837 > Project: HBase > Issue Type: Sub-task > Components: flakey, test >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: > 0001-HBASE-19837-Flakey-TestRegionLoad-ADDENDUM-Report-mo.patch, > 0001-HBASE-19837-Flakey-TestRegionLoad.patch, HBASE-19837.branch-2.001.patch > > > This one fails the most in the flakey list. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19938) Allow write request from replication but reject write request from user client when S state.
Zheng Hu created HBASE-19938: Summary: Allow write request from replication but reject write request from user client when S state. Key: HBASE-19938 URL: https://issues.apache.org/jira/browse/HBASE-19938 Project: HBase Issue Type: Sub-task Reporter: Zheng Hu Assignee: Zheng Hu According the doc, we should reject write request when in S state ,however, the replication data from master cluster will turn to a batch mutation request (it's a write request). So, for peer in S state, it should distinguish write request from replication or from user .. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19937) Enable rsgroup NPE in CreateTableProcedure
Xiaolin Ha created HBASE-19937: -- Summary: Enable rsgroup NPE in CreateTableProcedure Key: HBASE-19937 URL: https://issues.apache.org/jira/browse/HBASE-19937 Project: HBase Issue Type: Bug Components: rsgroup Affects Versions: 2.0.0-beta-2 Reporter: Xiaolin Ha Assignee: Xiaolin Ha When enable rsgroup, it may throws NPE as follows, 2018-02-02,16:12:45,688 ERROR org.apache.hadoop.hbase.procedure2.ProcedureExecutor: CODE-BUG: Uncaught runtime exception: pid=7, state=RUNNABLE:CREATE_TABLE_ASSIGN_REGIONS; CreateTableProcedure table=hbase:rsgroup java.lang.NullPointerException at org.apache.hadoop.hbase.rsgroup.RSGroupBasedLoadBalancer.generateGroupMaps(RSGroupBasedLoadBalancer.java:254) at org.apache.hadoop.hbase.rsgroup.RSGroupBasedLoadBalancer.roundRobinAssignment(RSGroupBasedLoadBalancer.java:162) at org.apache.hadoop.hbase.master.assignment.AssignmentManager.createRoundRobinAssignProcedures(AssignmentManager.java:603) at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.executeFromState(CreateTableProcedure.java:108) at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.executeFromState(CreateTableProcedure.java:51) at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:182) at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1458) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1227) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1738) As a result of CreateTableProcedure.rollbackState, it may then print logs warning TableExistsException as follows, 2018-02-02,16:12:55,503 WARN org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker: Failed to perform check java.io.IOException: Failed to create group table. org.apache.hadoop.hbase.TableExistsException: hbase:rsgroup at org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.createRSGroupTable(RSGroupInfoManagerImpl.java:877) After some auto-retries, it loops running the thread RSGroupStartupWorker, will print logs as follows, 2018-02-02,16:23:17,626 INFO org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker: RSGroup table=hbase:rsgroup isOnline=true, regionCount=0, assignCount=0, rootMetaFound=true 2018-02-02,16:23:17,730 INFO org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker: RSGroup table=hbase:rsgroup isOnline=true, regionCount=0, assignCount=0, rootMetaFound=true 2018-02-02,16:23:17,834 INFO org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker: RSGroup table=hbase:rsgroup isOnline=true, regionCount=0, assignCount=0, rootMetaFound=true 2018-02-02,16:23:17,937 INFO org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker: RSGroup table=hbase:rsgroup isOnline=true, regionCount=0, assignCount=0, rootMetaFound=true And using shells of rsgroup, it will tips that currently is in "offline mode". The reason of this problem is that CreateTableProcedure used RSGroupBasedLoadBalancer, who has member variables initialized depending on return of CreateTableProcedure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)