[jira] [Commented] (HBASE-21623) ServerCrashProcedure can stomp on a RIT for a wrong server
[ https://issues.apache.org/jira/browse/HBASE-21623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769625#comment-16769625 ] Sergey Shelukhin commented on HBASE-21623: -- [~wchevreuil] the problem is "for (RegionInfo region : regions) {" loop. The "regions" are the regions assumed to be on the server; they are obtained on previous procedure step, without any locks. So, if between getting "regions" and running the loop RIT makes a change (transitions the region from OPENING on server1 to OPENING on server2), SCP still has this region in the list. > ServerCrashProcedure can stomp on a RIT for a wrong server > -- > > Key: HBASE-21623 > URL: https://issues.apache.org/jira/browse/HBASE-21623 > Project: HBase > Issue Type: Bug > Components: amv2 >Affects Versions: 3.0.0, 2.2.0 >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Critical > Attachments: HBASE-21623.patch > > > A server died while some region was being opened on it; eventually the open > failed, and the RIT procedure started retrying on a different server. > However, by then SCP for the dying server had already obtained the region > from the list of regions on the old server, and proceeded to overwrite > whatever the RIT was doing with a new server. > {noformat} > 2018-12-18 23:06:03,160 INFO [PEWorker-14] procedure2.ProcedureExecutor: > Initialized subprocedures=[{pid=151404, ppid=151104, state=RUNNABLE, > hasLock=false; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure}] > ... > 2018-12-18 23:06:38,208 INFO [PEWorker-10] procedure.ServerCrashProcedure: > Start pid=151632, state=RUNNABLE:SERVER_CRASH_START, hasLock=true; > ServerCrashProcedure server=oldServer,17020,1545202098577, splitWal=true, > meta=false > ... > 2018-12-18 23:06:41,953 WARN [RSProcedureDispatcher-pool4-t115] > assignment.RegionRemoteProcedureBase: The remote operation pid=151404, > ppid=151104, state=RUNNABLE, hasLock=false; > org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure for region > {ENCODED => region1, ... } to server oldServer,17020,1545202098577 failed > org.apache.hadoop.hbase.regionserver.RegionServerAbortedException: > org.apache.hadoop.hbase.regionserver.RegionServerAbortedException: Server > oldServer,17020,1545202098577 aborting > 2018-12-18 23:06:42,485 INFO [PEWorker-5] procedure2.ProcedureExecutor: > Finished subprocedure(s) of pid=151104, ppid=150875, > state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; resume parent > processing. > 2018-12-18 23:06:42,485 INFO [PEWorker-13] > assignment.TransitRegionStateProcedure: Retry=1 of max=2147483647; > pid=151104, ppid=150875, > state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; rit=OPENING, > location=oldServer,17020,1545202098577 > 2018-12-18 23:06:42,500 INFO [PEWorker-13] > assignment.TransitRegionStateProcedure: Starting pid=151104, ppid=150875, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; rit=OPENING, > location=null; forceNewPlan=true, retain=false > 2018-12-18 23:06:42,657 INFO [PEWorker-2] assignment.RegionStateStore: > pid=151104 updating hbase:meta row=region1, regionState=OPENING, > regionLocation=newServer,17020,1545202111238 > ... > 2018-12-18 23:06:43,094 INFO [PEWorker-4] procedure.ServerCrashProcedure: > pid=151632, state=RUNNABLE:SERVER_CRASH_ASSIGN, hasLock=true; > ServerCrashProcedure server=oldServer,17020,1545202098577, splitWal=true, > meta=false found RIT pid=151104, ppid=150875, > state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; rit=OPENING, > location=newServer,17020,1545202111238, table=t1, region=region1 > 2018-12-18 23:06:43,094 INFO [PEWorker-4] assignment.RegionStateStore: > pid=151104 updating hbase:meta row=region1, regionState=ABNORMALLY_CLOSED > {noformat} > Later, the RIT overwrote the state again, it seems, and then the region got > stuck in OPENING state forever, but I'm not sure yet if that's just due to > this bug or if there was another bug after that. For now this can be > addressed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21623) ServerCrashProcedure can stomp on a RIT for a wrong server
[ https://issues.apache.org/jira/browse/HBASE-21623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769318#comment-16769318 ] Wellington Chevreuil commented on HBASE-21623: -- Thanks for the explanation [~sershe], however I still think the locks should had avoided it. Maybe my reading of this code path is mistaken, but here my interpretation of which pieces of code would be related: {quote} SCP: server1 crashed, what's on server1? looks like r1 {quote} So this would mean this part of SCPs code: {noformat} for (RegionInfo region : regions) { RegionStateNode regionNode = am.getRegionStates().getOrCreateRegionStateNode(region); regionNode.lock(); try { if (regionNode.getProcedure() != null) { LOG.info("{} found RIT {}; {}", this, regionNode.getProcedure(), regionNode); regionNode.getProcedure().serverCrashed(env, regionNode, getServerName()); } else { if (env.getMasterServices().getTableStateManager().isTableState(regionNode.getTable(), TableState.State.DISABLING, TableState.State.DISABLED)) { continue; } TransitRegionStateProcedure proc = TransitRegionStateProcedure.assign(env, region, null); regionNode.setProcedure(proc); addChildProcedure(proc); } } finally { regionNode.unlock(); } } {noformat} For this step: {quote}RIT: open failed, OPENING r1 on server2 now {quote} Related code would be in TRSP execute -> executeFromState -> openRegion, where execute method is enclosed by the region node lock: {noformat} protected Procedure[] execute(MasterProcedureEnv env) throws ProcedureSuspendedException, ProcedureYieldException, InterruptedException { RegionStateNode regionNode = env.getAssignmentManager().getRegionStates().getOrCreateRegionStateNode(getRegion()); regionNode.lock(); try { return super.execute(env); } finally { regionNode.unlock(); } } {noformat} So below point would only really happen if the TRSP had already finished its execution and regionNode object lock has been released, wouldn't it? But in this case, SCP *regionNode.getProcedure()* call should return null, not the previous RIT that had already completed, isn't it? {quote}SCP: looks like a RIT on r1; hey RIT for r1, your server crashed! (*) {quote} > ServerCrashProcedure can stomp on a RIT for a wrong server > -- > > Key: HBASE-21623 > URL: https://issues.apache.org/jira/browse/HBASE-21623 > Project: HBase > Issue Type: Bug > Components: amv2 >Affects Versions: 3.0.0, 2.2.0 >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Critical > Attachments: HBASE-21623.patch > > > A server died while some region was being opened on it; eventually the open > failed, and the RIT procedure started retrying on a different server. > However, by then SCP for the dying server had already obtained the region > from the list of regions on the old server, and proceeded to overwrite > whatever the RIT was doing with a new server. > {noformat} > 2018-12-18 23:06:03,160 INFO [PEWorker-14] procedure2.ProcedureExecutor: > Initialized subprocedures=[{pid=151404, ppid=151104, state=RUNNABLE, > hasLock=false; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure}] > ... > 2018-12-18 23:06:38,208 INFO [PEWorker-10] procedure.ServerCrashProcedure: > Start pid=151632, state=RUNNABLE:SERVER_CRASH_START, hasLock=true; > ServerCrashProcedure server=oldServer,17020,1545202098577, splitWal=true, > meta=false > ... > 2018-12-18 23:06:41,953 WARN [RSProcedureDispatcher-pool4-t115] > assignment.RegionRemoteProcedureBase: The remote operation pid=151404, > ppid=151104, state=RUNNABLE, hasLock=false; > org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure for region > {ENCODED => region1, ... } to server oldServer,17020,1545202098577 failed > org.apache.hadoop.hbase.regionserver.RegionServerAbortedException: > org.apache.hadoop.hbase.regionserver.RegionServerAbortedException: Server > oldServer,17020,1545202098577 aborting > 2018-12-18 23:06:42,485 INFO [PEWorker-5] procedure2.ProcedureExecutor: > Finished subprocedure(s) of pid=151104, ppid=150875, > state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; resume parent > processing. > 2018-12-18 23:06:42,485 INFO [PEWorker-13] > assignment.TransitRegionStateProcedure: Retry=1 of max=2147483647; > pid=151104, ppid=150875, > state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; rit=OPENING, > location=oldServer,17020,1545202098577 > 2018-12-18 23:06:42,500 INFO
[jira] [Commented] (HBASE-21623) ServerCrashProcedure can stomp on a RIT for a wrong server
[ https://issues.apache.org/jira/browse/HBASE-21623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768632#comment-16768632 ] Sergey Shelukhin commented on HBASE-21623: -- Do you mean the language-level lock, or the HBase-level region lock? The rit retried on a different server and so the fact that the region was at one time opening on the crashed server was irrelevant at that point. SCP doesn't appear to care for the HBase-level locks when it makes the decision to replace (and it couldn't, cause whether it is updating the right or wrong RIT, the latter would still be holding that, so the situation is not different). The language-level lock protects individual procedure assignments from racing - assuming every piece of code under it does correct checks;auditing them all is out of the scope of this issue, I was assuming you see some specific bug with that. The race proceeds as following (where each individual step looks safe from lower-level races to me, given region lock. RIT: OPENING r1 on server1. Server1: (silence) SCP: server1 crashed, what's on server1? looks like r1 RIT: open failed, OPENING r1 on server2 now Server2: opening... SCP: looks like a RIT on r1; hey RIT for r1, your server crashed! (*) RIT: oh well, OPENING r1 on server3 now Which in this case also leads to Server3: opening... Server2: hey I opened r1! RIT: who cares, it's on server3 now (as a side note, I'm adding a RS kill here in a separate JIRA, ignoring this is not safe) Server3: hey I (also) opened r1! The fix is for (*) to check which server has crashed. I don't think SCP can list regions and notify atomically without major changes, because they are separate state machine states. > ServerCrashProcedure can stomp on a RIT for a wrong server > -- > > Key: HBASE-21623 > URL: https://issues.apache.org/jira/browse/HBASE-21623 > Project: HBase > Issue Type: Bug > Components: amv2 >Affects Versions: 3.0.0, 2.2.0 >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Critical > Attachments: HBASE-21623.patch > > > A server died while some region was being opened on it; eventually the open > failed, and the RIT procedure started retrying on a different server. > However, by then SCP for the dying server had already obtained the region > from the list of regions on the old server, and proceeded to overwrite > whatever the RIT was doing with a new server. > {noformat} > 2018-12-18 23:06:03,160 INFO [PEWorker-14] procedure2.ProcedureExecutor: > Initialized subprocedures=[{pid=151404, ppid=151104, state=RUNNABLE, > hasLock=false; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure}] > ... > 2018-12-18 23:06:38,208 INFO [PEWorker-10] procedure.ServerCrashProcedure: > Start pid=151632, state=RUNNABLE:SERVER_CRASH_START, hasLock=true; > ServerCrashProcedure server=oldServer,17020,1545202098577, splitWal=true, > meta=false > ... > 2018-12-18 23:06:41,953 WARN [RSProcedureDispatcher-pool4-t115] > assignment.RegionRemoteProcedureBase: The remote operation pid=151404, > ppid=151104, state=RUNNABLE, hasLock=false; > org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure for region > {ENCODED => region1, ... } to server oldServer,17020,1545202098577 failed > org.apache.hadoop.hbase.regionserver.RegionServerAbortedException: > org.apache.hadoop.hbase.regionserver.RegionServerAbortedException: Server > oldServer,17020,1545202098577 aborting > 2018-12-18 23:06:42,485 INFO [PEWorker-5] procedure2.ProcedureExecutor: > Finished subprocedure(s) of pid=151104, ppid=150875, > state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; resume parent > processing. > 2018-12-18 23:06:42,485 INFO [PEWorker-13] > assignment.TransitRegionStateProcedure: Retry=1 of max=2147483647; > pid=151104, ppid=150875, > state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; rit=OPENING, > location=oldServer,17020,1545202098577 > 2018-12-18 23:06:42,500 INFO [PEWorker-13] > assignment.TransitRegionStateProcedure: Starting pid=151104, ppid=150875, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; rit=OPENING, > location=null; forceNewPlan=true, retain=false > 2018-12-18 23:06:42,657 INFO [PEWorker-2] assignment.RegionStateStore: > pid=151104 updating hbase:meta row=region1, regionState=OPENING, > regionLocation=newServer,17020,1545202111238 > ... > 2018-12-18 23:06:43,094 INFO [PEWorker-4] procedure.ServerCrashProcedure: > pid=151632, state=RUNNABLE:SERVER_CRASH_ASSIGN, hasLock=true; > ServerCrashProcedure
[jira] [Commented] (HBASE-21623) ServerCrashProcedure can stomp on a RIT for a wrong server
[ https://issues.apache.org/jira/browse/HBASE-21623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768079#comment-16768079 ] Wellington Chevreuil commented on HBASE-21623: -- bq. Procedure is set under lock; the problem here is that the SCP updates an unrelated procedure. Indeed, I had not noticed before that TRSP.executeFromSate() is called from "regionnode locked" code snippet on TRSP.execute(), which I guess would prevent the scenario I mentioned in my previous question. But thinking on the current issue, shouldn't that also had been avoided by this lock? If there was already a TRSP for that region, it would hold the region node lock until it's finished, so the SCP would only be able to reach below point if the TRSP had finished already: {noformat} 2018-12-18 23:06:43,094 INFO [PEWorker-4] procedure.ServerCrashProcedure: pid=151632, state=RUNNABLE:SERVER_CRASH_ASSIGN, hasLock=true; ServerCrashProcedure server=oldServer,17020,1545202098577, splitWal=true, meta=false found RIT pid=151104, ppid=150875, state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; TransitRegionStateProcedure table=t1, region=region1, ASSIGN; rit=OPENING, location=newServer,17020,1545202111238, table=t1, region=region1 {noformat} Or is it that original TRSP finished, but regionNode.getProcedure() still returns it as != null, causing this problem? > ServerCrashProcedure can stomp on a RIT for a wrong server > -- > > Key: HBASE-21623 > URL: https://issues.apache.org/jira/browse/HBASE-21623 > Project: HBase > Issue Type: Bug > Components: amv2 >Affects Versions: 3.0.0, 2.2.0 >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Critical > Attachments: HBASE-21623.patch > > > A server died while some region was being opened on it; eventually the open > failed, and the RIT procedure started retrying on a different server. > However, by then SCP for the dying server had already obtained the region > from the list of regions on the old server, and proceeded to overwrite > whatever the RIT was doing with a new server. > {noformat} > 2018-12-18 23:06:03,160 INFO [PEWorker-14] procedure2.ProcedureExecutor: > Initialized subprocedures=[{pid=151404, ppid=151104, state=RUNNABLE, > hasLock=false; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure}] > ... > 2018-12-18 23:06:38,208 INFO [PEWorker-10] procedure.ServerCrashProcedure: > Start pid=151632, state=RUNNABLE:SERVER_CRASH_START, hasLock=true; > ServerCrashProcedure server=oldServer,17020,1545202098577, splitWal=true, > meta=false > ... > 2018-12-18 23:06:41,953 WARN [RSProcedureDispatcher-pool4-t115] > assignment.RegionRemoteProcedureBase: The remote operation pid=151404, > ppid=151104, state=RUNNABLE, hasLock=false; > org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure for region > {ENCODED => region1, ... } to server oldServer,17020,1545202098577 failed > org.apache.hadoop.hbase.regionserver.RegionServerAbortedException: > org.apache.hadoop.hbase.regionserver.RegionServerAbortedException: Server > oldServer,17020,1545202098577 aborting > 2018-12-18 23:06:42,485 INFO [PEWorker-5] procedure2.ProcedureExecutor: > Finished subprocedure(s) of pid=151104, ppid=150875, > state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; resume parent > processing. > 2018-12-18 23:06:42,485 INFO [PEWorker-13] > assignment.TransitRegionStateProcedure: Retry=1 of max=2147483647; > pid=151104, ppid=150875, > state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; rit=OPENING, > location=oldServer,17020,1545202098577 > 2018-12-18 23:06:42,500 INFO [PEWorker-13] > assignment.TransitRegionStateProcedure: Starting pid=151104, ppid=150875, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; rit=OPENING, > location=null; forceNewPlan=true, retain=false > 2018-12-18 23:06:42,657 INFO [PEWorker-2] assignment.RegionStateStore: > pid=151104 updating hbase:meta row=region1, regionState=OPENING, > regionLocation=newServer,17020,1545202111238 > ... > 2018-12-18 23:06:43,094 INFO [PEWorker-4] procedure.ServerCrashProcedure: > pid=151632, state=RUNNABLE:SERVER_CRASH_ASSIGN, hasLock=true; > ServerCrashProcedure server=oldServer,17020,1545202098577, splitWal=true, > meta=false found RIT pid=151104, ppid=150875, > state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; rit=OPENING, > location=newServer,17020,1545202111238, table=t1, region=region1 > 2018-12-18
[jira] [Commented] (HBASE-21623) ServerCrashProcedure can stomp on a RIT for a wrong server
[ https://issues.apache.org/jira/browse/HBASE-21623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766575#comment-16766575 ] Sergey Shelukhin commented on HBASE-21623: -- Can you elaborate, which pieces of code would race? Procedure is set under lock; the problem here is that the SCP updates an unrelated procedure. It's possible that the procedure update for RIT somewhere resets the procedure without checking, but I'm not sure how it will affect SCP in particular. There might be a different race condition. > ServerCrashProcedure can stomp on a RIT for a wrong server > -- > > Key: HBASE-21623 > URL: https://issues.apache.org/jira/browse/HBASE-21623 > Project: HBase > Issue Type: Bug > Components: amv2 >Affects Versions: 3.0.0, 2.2.0 >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Critical > Attachments: HBASE-21623.patch > > > A server died while some region was being opened on it; eventually the open > failed, and the RIT procedure started retrying on a different server. > However, by then SCP for the dying server had already obtained the region > from the list of regions on the old server, and proceeded to overwrite > whatever the RIT was doing with a new server. > {noformat} > 2018-12-18 23:06:03,160 INFO [PEWorker-14] procedure2.ProcedureExecutor: > Initialized subprocedures=[{pid=151404, ppid=151104, state=RUNNABLE, > hasLock=false; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure}] > ... > 2018-12-18 23:06:38,208 INFO [PEWorker-10] procedure.ServerCrashProcedure: > Start pid=151632, state=RUNNABLE:SERVER_CRASH_START, hasLock=true; > ServerCrashProcedure server=oldServer,17020,1545202098577, splitWal=true, > meta=false > ... > 2018-12-18 23:06:41,953 WARN [RSProcedureDispatcher-pool4-t115] > assignment.RegionRemoteProcedureBase: The remote operation pid=151404, > ppid=151104, state=RUNNABLE, hasLock=false; > org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure for region > {ENCODED => region1, ... } to server oldServer,17020,1545202098577 failed > org.apache.hadoop.hbase.regionserver.RegionServerAbortedException: > org.apache.hadoop.hbase.regionserver.RegionServerAbortedException: Server > oldServer,17020,1545202098577 aborting > 2018-12-18 23:06:42,485 INFO [PEWorker-5] procedure2.ProcedureExecutor: > Finished subprocedure(s) of pid=151104, ppid=150875, > state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; resume parent > processing. > 2018-12-18 23:06:42,485 INFO [PEWorker-13] > assignment.TransitRegionStateProcedure: Retry=1 of max=2147483647; > pid=151104, ppid=150875, > state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; rit=OPENING, > location=oldServer,17020,1545202098577 > 2018-12-18 23:06:42,500 INFO [PEWorker-13] > assignment.TransitRegionStateProcedure: Starting pid=151104, ppid=150875, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; rit=OPENING, > location=null; forceNewPlan=true, retain=false > 2018-12-18 23:06:42,657 INFO [PEWorker-2] assignment.RegionStateStore: > pid=151104 updating hbase:meta row=region1, regionState=OPENING, > regionLocation=newServer,17020,1545202111238 > ... > 2018-12-18 23:06:43,094 INFO [PEWorker-4] procedure.ServerCrashProcedure: > pid=151632, state=RUNNABLE:SERVER_CRASH_ASSIGN, hasLock=true; > ServerCrashProcedure server=oldServer,17020,1545202098577, splitWal=true, > meta=false found RIT pid=151104, ppid=150875, > state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; rit=OPENING, > location=newServer,17020,1545202111238, table=t1, region=region1 > 2018-12-18 23:06:43,094 INFO [PEWorker-4] assignment.RegionStateStore: > pid=151104 updating hbase:meta row=region1, regionState=ABNORMALLY_CLOSED > {noformat} > Later, the RIT overwrote the state again, it seems, and then the region got > stuck in OPENING state forever, but I'm not sure yet if that's just due to > this bug or if there was another bug after that. For now this can be > addressed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21623) ServerCrashProcedure can stomp on a RIT for a wrong server
[ https://issues.apache.org/jira/browse/HBASE-21623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766332#comment-16766332 ] Wellington Chevreuil commented on HBASE-21623: -- Per the logs provided on initial description, RCA looks just what [~sershe] described, in which case, the current patch would address it. How about the opposite scenario happens, where the SCP gets to the "CRASH_ASSIGN" state before the TRSP triggers the retry? May there still be a minimal chance race conditions where TRSP retry open changes region's state/location before "proc.serverCrashed()" is called? > ServerCrashProcedure can stomp on a RIT for a wrong server > -- > > Key: HBASE-21623 > URL: https://issues.apache.org/jira/browse/HBASE-21623 > Project: HBase > Issue Type: Bug > Components: amv2 >Affects Versions: 3.0.0, 2.2.0 >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Critical > Attachments: HBASE-21623.patch > > > A server died while some region was being opened on it; eventually the open > failed, and the RIT procedure started retrying on a different server. > However, by then SCP for the dying server had already obtained the region > from the list of regions on the old server, and proceeded to overwrite > whatever the RIT was doing with a new server. > {noformat} > 2018-12-18 23:06:03,160 INFO [PEWorker-14] procedure2.ProcedureExecutor: > Initialized subprocedures=[{pid=151404, ppid=151104, state=RUNNABLE, > hasLock=false; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure}] > ... > 2018-12-18 23:06:38,208 INFO [PEWorker-10] procedure.ServerCrashProcedure: > Start pid=151632, state=RUNNABLE:SERVER_CRASH_START, hasLock=true; > ServerCrashProcedure server=oldServer,17020,1545202098577, splitWal=true, > meta=false > ... > 2018-12-18 23:06:41,953 WARN [RSProcedureDispatcher-pool4-t115] > assignment.RegionRemoteProcedureBase: The remote operation pid=151404, > ppid=151104, state=RUNNABLE, hasLock=false; > org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure for region > {ENCODED => region1, ... } to server oldServer,17020,1545202098577 failed > org.apache.hadoop.hbase.regionserver.RegionServerAbortedException: > org.apache.hadoop.hbase.regionserver.RegionServerAbortedException: Server > oldServer,17020,1545202098577 aborting > 2018-12-18 23:06:42,485 INFO [PEWorker-5] procedure2.ProcedureExecutor: > Finished subprocedure(s) of pid=151104, ppid=150875, > state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; resume parent > processing. > 2018-12-18 23:06:42,485 INFO [PEWorker-13] > assignment.TransitRegionStateProcedure: Retry=1 of max=2147483647; > pid=151104, ppid=150875, > state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; rit=OPENING, > location=oldServer,17020,1545202098577 > 2018-12-18 23:06:42,500 INFO [PEWorker-13] > assignment.TransitRegionStateProcedure: Starting pid=151104, ppid=150875, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; rit=OPENING, > location=null; forceNewPlan=true, retain=false > 2018-12-18 23:06:42,657 INFO [PEWorker-2] assignment.RegionStateStore: > pid=151104 updating hbase:meta row=region1, regionState=OPENING, > regionLocation=newServer,17020,1545202111238 > ... > 2018-12-18 23:06:43,094 INFO [PEWorker-4] procedure.ServerCrashProcedure: > pid=151632, state=RUNNABLE:SERVER_CRASH_ASSIGN, hasLock=true; > ServerCrashProcedure server=oldServer,17020,1545202098577, splitWal=true, > meta=false found RIT pid=151104, ppid=150875, > state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; rit=OPENING, > location=newServer,17020,1545202111238, table=t1, region=region1 > 2018-12-18 23:06:43,094 INFO [PEWorker-4] assignment.RegionStateStore: > pid=151104 updating hbase:meta row=region1, regionState=ABNORMALLY_CLOSED > {noformat} > Later, the RIT overwrote the state again, it seems, and then the region got > stuck in OPENING state forever, but I'm not sure yet if that's just due to > this bug or if there was another bug after that. For now this can be > addressed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21623) ServerCrashProcedure can stomp on a RIT for a wrong server
[ https://issues.apache.org/jira/browse/HBASE-21623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750780#comment-16750780 ] Sean Busbey commented on HBASE-21623: - Sorry, haven't gotten to catch up on doing reviews for these AMv2 patches yet. They are all in my queue now though :) > ServerCrashProcedure can stomp on a RIT for a wrong server > -- > > Key: HBASE-21623 > URL: https://issues.apache.org/jira/browse/HBASE-21623 > Project: HBase > Issue Type: Bug > Components: amv2 >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Critical > Attachments: HBASE-21623.patch > > > A server died while some region was being opened on it; eventually the open > failed, and the RIT procedure started retrying on a different server. > However, by then SCP for the dying server had already obtained the region > from the list of regions on the old server, and proceeded to overwrite > whatever the RIT was doing with a new server. > {noformat} > 2018-12-18 23:06:03,160 INFO [PEWorker-14] procedure2.ProcedureExecutor: > Initialized subprocedures=[{pid=151404, ppid=151104, state=RUNNABLE, > hasLock=false; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure}] > ... > 2018-12-18 23:06:38,208 INFO [PEWorker-10] procedure.ServerCrashProcedure: > Start pid=151632, state=RUNNABLE:SERVER_CRASH_START, hasLock=true; > ServerCrashProcedure server=oldServer,17020,1545202098577, splitWal=true, > meta=false > ... > 2018-12-18 23:06:41,953 WARN [RSProcedureDispatcher-pool4-t115] > assignment.RegionRemoteProcedureBase: The remote operation pid=151404, > ppid=151104, state=RUNNABLE, hasLock=false; > org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure for region > {ENCODED => region1, ... } to server oldServer,17020,1545202098577 failed > org.apache.hadoop.hbase.regionserver.RegionServerAbortedException: > org.apache.hadoop.hbase.regionserver.RegionServerAbortedException: Server > oldServer,17020,1545202098577 aborting > 2018-12-18 23:06:42,485 INFO [PEWorker-5] procedure2.ProcedureExecutor: > Finished subprocedure(s) of pid=151104, ppid=150875, > state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; resume parent > processing. > 2018-12-18 23:06:42,485 INFO [PEWorker-13] > assignment.TransitRegionStateProcedure: Retry=1 of max=2147483647; > pid=151104, ppid=150875, > state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; rit=OPENING, > location=oldServer,17020,1545202098577 > 2018-12-18 23:06:42,500 INFO [PEWorker-13] > assignment.TransitRegionStateProcedure: Starting pid=151104, ppid=150875, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; rit=OPENING, > location=null; forceNewPlan=true, retain=false > 2018-12-18 23:06:42,657 INFO [PEWorker-2] assignment.RegionStateStore: > pid=151104 updating hbase:meta row=region1, regionState=OPENING, > regionLocation=newServer,17020,1545202111238 > ... > 2018-12-18 23:06:43,094 INFO [PEWorker-4] procedure.ServerCrashProcedure: > pid=151632, state=RUNNABLE:SERVER_CRASH_ASSIGN, hasLock=true; > ServerCrashProcedure server=oldServer,17020,1545202098577, splitWal=true, > meta=false found RIT pid=151104, ppid=150875, > state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; rit=OPENING, > location=newServer,17020,1545202111238, table=t1, region=region1 > 2018-12-18 23:06:43,094 INFO [PEWorker-4] assignment.RegionStateStore: > pid=151104 updating hbase:meta row=region1, regionState=ABNORMALLY_CLOSED > {noformat} > Later, the RIT overwrote the state again, it seems, and then the region got > stuck in OPENING state forever, but I'm not sure yet if that's just due to > this bug or if there was another bug after that. For now this can be > addressed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21623) ServerCrashProcedure can stomp on a RIT for a wrong server
[ https://issues.apache.org/jira/browse/HBASE-21623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750409#comment-16750409 ] Sergey Shelukhin commented on HBASE-21623: -- [~busbey] does this patch make sense to you? > ServerCrashProcedure can stomp on a RIT for a wrong server > -- > > Key: HBASE-21623 > URL: https://issues.apache.org/jira/browse/HBASE-21623 > Project: HBase > Issue Type: Bug > Components: amv2 >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Critical > Attachments: HBASE-21623.patch > > > A server died while some region was being opened on it; eventually the open > failed, and the RIT procedure started retrying on a different server. > However, by then SCP for the dying server had already obtained the region > from the list of regions on the old server, and proceeded to overwrite > whatever the RIT was doing with a new server. > {noformat} > 2018-12-18 23:06:03,160 INFO [PEWorker-14] procedure2.ProcedureExecutor: > Initialized subprocedures=[{pid=151404, ppid=151104, state=RUNNABLE, > hasLock=false; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure}] > ... > 2018-12-18 23:06:38,208 INFO [PEWorker-10] procedure.ServerCrashProcedure: > Start pid=151632, state=RUNNABLE:SERVER_CRASH_START, hasLock=true; > ServerCrashProcedure server=oldServer,17020,1545202098577, splitWal=true, > meta=false > ... > 2018-12-18 23:06:41,953 WARN [RSProcedureDispatcher-pool4-t115] > assignment.RegionRemoteProcedureBase: The remote operation pid=151404, > ppid=151104, state=RUNNABLE, hasLock=false; > org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure for region > {ENCODED => region1, ... } to server oldServer,17020,1545202098577 failed > org.apache.hadoop.hbase.regionserver.RegionServerAbortedException: > org.apache.hadoop.hbase.regionserver.RegionServerAbortedException: Server > oldServer,17020,1545202098577 aborting > 2018-12-18 23:06:42,485 INFO [PEWorker-5] procedure2.ProcedureExecutor: > Finished subprocedure(s) of pid=151104, ppid=150875, > state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; resume parent > processing. > 2018-12-18 23:06:42,485 INFO [PEWorker-13] > assignment.TransitRegionStateProcedure: Retry=1 of max=2147483647; > pid=151104, ppid=150875, > state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; rit=OPENING, > location=oldServer,17020,1545202098577 > 2018-12-18 23:06:42,500 INFO [PEWorker-13] > assignment.TransitRegionStateProcedure: Starting pid=151104, ppid=150875, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; rit=OPENING, > location=null; forceNewPlan=true, retain=false > 2018-12-18 23:06:42,657 INFO [PEWorker-2] assignment.RegionStateStore: > pid=151104 updating hbase:meta row=region1, regionState=OPENING, > regionLocation=newServer,17020,1545202111238 > ... > 2018-12-18 23:06:43,094 INFO [PEWorker-4] procedure.ServerCrashProcedure: > pid=151632, state=RUNNABLE:SERVER_CRASH_ASSIGN, hasLock=true; > ServerCrashProcedure server=oldServer,17020,1545202098577, splitWal=true, > meta=false found RIT pid=151104, ppid=150875, > state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; > TransitRegionStateProcedure table=t1, region=region1, ASSIGN; rit=OPENING, > location=newServer,17020,1545202111238, table=t1, region=region1 > 2018-12-18 23:06:43,094 INFO [PEWorker-4] assignment.RegionStateStore: > pid=151104 updating hbase:meta row=region1, regionState=ABNORMALLY_CLOSED > {noformat} > Later, the RIT overwrote the state again, it seems, and then the region got > stuck in OPENING state forever, but I'm not sure yet if that's just due to > this bug or if there was another bug after that. For now this can be > addressed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21623) ServerCrashProcedure can stomp on a RIT for a wrong server
[ https://issues.apache.org/jira/browse/HBASE-21623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750231#comment-16750231 ] Hadoop QA commented on HBASE-21623: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange} 0m 0s{color} | {color:orange} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 52s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 15s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 36s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 28s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 39s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 9m 55s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}154m 3s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}196m 27s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.client.TestAsyncSnapshotAdminApi | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21623 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12952432/HBASE-21623.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux bf8fec464011 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 31 10:55:11 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh | | git revision | master / b5619a2a26 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/15698/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results |
[jira] [Commented] (HBASE-21623) ServerCrashProcedure can stomp on a RIT for the wrong server
[ https://issues.apache.org/jira/browse/HBASE-21623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725534#comment-16725534 ] Hadoop QA commented on HBASE-21623: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange} 0m 0s{color} | {color:orange} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 5s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 51s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 3s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 48s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 58s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 48s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 8m 31s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}115m 18s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}151m 54s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21623 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12952432/HBASE-21623.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux c60be26fb3b1 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 8991877bb2 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/15334/testReport/ | | Max. process+thread count | 5075 (vs. ulimit of 1) | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/15334/console | |