[jira] [Commented] (HBASE-20990) One operation in procedure batch throws an exception will cause all RegionTransitionProcedures receive the same exception

2018-08-06 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16570866#comment-16570866
 ] 

Duo Zhang commented on HBASE-20990:
---

If the RS restarts then it is the turn for SCP. And in fact, if the RS is dead, 
in the sync call you can do nothing either, you will get a connection refuse or 
some other exceptions, and it is not clear that whether this is a RS crash or a 
network error. SCP is the only way to change the target server.

And if the rpc call is successfully returned then we can make sure that the 
procedure is executing at RS side, just in a background thread pool. It is the 
duty for the RS to make sure that you need to report back to master the result, 
unless you are dead.

> One operation in procedure batch throws an exception will cause all 
> RegionTransitionProcedures receive the same exception
> -
>
> Key: HBASE-20990
> URL: https://issues.apache.org/jira/browse/HBASE-20990
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
>
> In AMv2, we batch open/close region operations and call RS with 
> executeProcedures API. But, in this API, if one of the region's operations 
> throws an exception, all the operations in the batch will receive the same 
> exception. Actually, some of the operations in the batch is executing 
> normally in the RS.
> I think we should try catch exceptions respectively, and call 
> remoteCallFailed or remoteCallCompleted in RegionTransitionProcedure 
> respectively. 
> Otherwise, there will be some very strange behave. Such as this one:
> {code}
> 2018-07-18 02:56:18,506 WARN  [RSProcedureDispatcher-pool3-t1] 
> assignment.RegionTransitionProcedure(226): Remote call failed 
> e010125048016.bja,60020,1531848989401; pid=8362, ppid=8272, state=RUNNABLE:R
> EGION_TRANSITION_DISPATCH; AssignProcedure 
> table=IntegrationTestBigLinkedList, region=0beb8ea4e2f239fc082be7cefede1427, 
> target=e010125048016.bja,60020,1531848989401; rit=OPENING, 
> location=e010125048016
> .bja,60020,1531848989401; exception=NotServingRegionException
> {code}
> The AssignProcedure failed with a NotServingRegionException, what??? It is 
> very strange, actually, the AssignProcedure successes on the RS, another 
> CloseRegion operation failed in the operation batch was causing the exception.
> To correct this, we need to modify the response of executeProcedures API, 
> which is the ExecuteProceduresResponse proto, to return infos(status, 
> exceptions) per operation.
> This issue alone won't cause much trouble, so not so hurry to change the 
> behave here, but indeed we need to consider this one when we want do some 
> reconstruct to AMv2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20990) One operation in procedure batch throws an exception will cause all RegionTransitionProcedures receive the same exception

2018-07-31 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564712#comment-16564712
 ] 

Allan Yang commented on HBASE-20990:


{quote}
I prefer not returning anything when calling executeProcedure, instead, using 
reportRegionTransition and reportProcedureResult to send back the response...
{quote}
Then you need to record the exceptions in the memory and send them back to 
master when reporting. The sync RPC call become a async one, what if the RS 
restarts before sending this info. The procedure in master even don't know 
whether the open/close procedure is executing, whether a RPC retry is needed.

> One operation in procedure batch throws an exception will cause all 
> RegionTransitionProcedures receive the same exception
> -
>
> Key: HBASE-20990
> URL: https://issues.apache.org/jira/browse/HBASE-20990
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
>
> In AMv2, we batch open/close region operations and call RS with 
> executeProcedures API. But, in this API, if one of the region's operations 
> throws an exception, all the operations in the batch will receive the same 
> exception. Actually, some of the operations in the batch is executing 
> normally in the RS.
> I think we should try catch exceptions respectively, and call 
> remoteCallFailed or remoteCallCompleted in RegionTransitionProcedure 
> respectively. 
> Otherwise, there will be some very strange behave. Such as this one:
> {code}
> 2018-07-18 02:56:18,506 WARN  [RSProcedureDispatcher-pool3-t1] 
> assignment.RegionTransitionProcedure(226): Remote call failed 
> e010125048016.bja,60020,1531848989401; pid=8362, ppid=8272, state=RUNNABLE:R
> EGION_TRANSITION_DISPATCH; AssignProcedure 
> table=IntegrationTestBigLinkedList, region=0beb8ea4e2f239fc082be7cefede1427, 
> target=e010125048016.bja,60020,1531848989401; rit=OPENING, 
> location=e010125048016
> .bja,60020,1531848989401; exception=NotServingRegionException
> {code}
> The AssignProcedure failed with a NotServingRegionException, what??? It is 
> very strange, actually, the AssignProcedure successes on the RS, another 
> CloseRegion operation failed in the operation batch was causing the exception.
> To correct this, we need to modify the response of executeProcedures API, 
> which is the ExecuteProceduresResponse proto, to return infos(status, 
> exceptions) per operation.
> This issue alone won't cause much trouble, so not so hurry to change the 
> behave here, but indeed we need to consider this one when we want do some 
> reconstruct to AMv2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20990) One operation in procedure batch throws an exception will cause all RegionTransitionProcedures receive the same exception

2018-07-31 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564697#comment-16564697
 ] 

Duo Zhang commented on HBASE-20990:
---

I prefer not returning anything when calling executeProcedure, instead, using 
reportRegionTransition and reportProcedureResult to send back the response...

But the code is a bit complicated as we have done lots of work to be compatible 
with 1.x RS(and finally the solution is to upgrade RS first so the code is 
useless...)

> One operation in procedure batch throws an exception will cause all 
> RegionTransitionProcedures receive the same exception
> -
>
> Key: HBASE-20990
> URL: https://issues.apache.org/jira/browse/HBASE-20990
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
>
> In AMv2, we batch open/close region operations and call RS with 
> executeProcedures API. But, in this API, if one of the region's operations 
> throws an exception, all the operations in the batch will receive the same 
> exception. Actually, some of the operations in the batch is executing 
> normally in the RS.
> I think we should try catch exceptions respectively, and call 
> remoteCallFailed or remoteCallCompleted in RegionTransitionProcedure 
> respectively. 
> Otherwise, there will be some very strange behave. Such as this one:
> {code}
> 2018-07-18 02:56:18,506 WARN  [RSProcedureDispatcher-pool3-t1] 
> assignment.RegionTransitionProcedure(226): Remote call failed 
> e010125048016.bja,60020,1531848989401; pid=8362, ppid=8272, state=RUNNABLE:R
> EGION_TRANSITION_DISPATCH; AssignProcedure 
> table=IntegrationTestBigLinkedList, region=0beb8ea4e2f239fc082be7cefede1427, 
> target=e010125048016.bja,60020,1531848989401; rit=OPENING, 
> location=e010125048016
> .bja,60020,1531848989401; exception=NotServingRegionException
> {code}
> The AssignProcedure failed with a NotServingRegionException, what??? It is 
> very strange, actually, the AssignProcedure successes on the RS, another 
> CloseRegion operation failed in the operation batch was causing the exception.
> To correct this, we need to modify the response of executeProcedures API, 
> which is the ExecuteProceduresResponse proto, to return infos(status, 
> exceptions) per operation.
> This issue alone won't cause much trouble, so not so hurry to change the 
> behave here, but indeed we need to consider this one when we want do some 
> reconstruct to AMv2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)