[ 
https://issues.apache.org/jira/browse/HBASE-12588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14228991#comment-14228991
 ] 

zhangduo commented on HBASE-12588:
----------------------------------

[~stack] Maybe we need to call mutateRowsWithLocks instead of batchMutate in 
HBaseFsck#rebuildMeta?

For batchMutate, it is usually called from client through rpc. And a client do 
not know if a batchMutate operation is only applied to one HRegion. And even if 
the client know, we can not make it always work because regionserver may crash 
before sending a OperationStatus back to client(common issues for a system with 
rpc), right?

I think RowProcessorEndpoint and MultiRowMutationEndpoint are designed for this 
purpose.

And maybe we need to search the code like what [~jeffreyz] had done to make 
sure we do not use batchMutate in the wrong way? Either change it to 
mutateRowsWithLocks or check the OperationStatus returned. Thanks.

> Need to fail writes when row lock can't be acquired
> ---------------------------------------------------
>
>                 Key: HBASE-12588
>                 URL: https://issues.apache.org/jira/browse/HBASE-12588
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.8, 0.99.1
>            Reporter: Jeffrey Zhong
>            Assignee: Jeffrey Zhong
>         Attachments: HBASE-12588.patch
>
>
> Currently we don't fail write operations when can't acquiring row locks as 
> shown below in HRegion#doMiniBatchMutation. 
> {code}
> ...
>         RowLock rowLock = null;
>         try {
>           rowLock = getRowLock(mutation.getRow(), shouldBlock);
>         } catch (IOException ioe) {
>           LOG.warn("Failed getting lock in batch put, row="
>             + Bytes.toStringBinary(mutation.getRow()), ioe);
>         }
>         if (rowLock == null) {
>           // We failed to grab another lock
>           assert !shouldBlock : "Should never fail to get lock when blocking";
>           break; // stop acquiring more rows for this batch
>         } else {
>           acquiredRowLocks.add(rowLock);
>         }
> ...
> {code}
> We saw this issue when there is meta corruption problem and checkRow fails 
> with error:
> {noformat}
> org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row out 
> of range for row lock on HRegion
> {noformat}
> While current code still continues with writes. In all cases, this is so 
> dangerous because row locks have to be acquired before update operations to 
> guarantee row update atomicity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to