Haoze Wu created HBASE-27520:
--------------------------------

             Summary: The potential delay of HDFS RPC in HRegion may cause data 
inconsistency
                 Key: HBASE-27520
                 URL: https://issues.apache.org/jira/browse/HBASE-27520
             Project: HBase
          Issue Type: Bug
    Affects Versions: 2.4.2
            Reporter: Haoze Wu


This is a follow-up for HBASE-26256 about the data inconsistency issue.

The data inconsistency issue is that when the HBase shell is running the 
`create` command, the metadata is not finalized or committed until the shell 
command returns, but other clients can see the table “created” when they use 
the “list” command. We recently did more analysis and found the root cause.

Here is the workflow. When a table is created, the CreateTableProcedure is 
created at HMaster#createTable at line 2011:

 
{code:java}
return MasterProcedureUtil
 .submitProcedure(new MasterProcedureUtil.NonceProcedureRunnable(this, 
nonceGroup, nonce) {
   @Override
   protected void run() throws IOException {
     getMaster().getMasterCoprocessorHost().preCreateTable(desc, newRegions);

     LOG.info(getClientIdAuditPrefix() + " create " + desc); 

     // TODO: We can handle/merge duplicate requests, and differentiate the 
case of
     // TableExistsException by saying if the schema is the same or not.
     //
     // We need to wait for the procedure to potentially fail due to "prepare" 
sanity
     // checks. This will block only the beginning of the procedure. See 
HBASE-19953.
     ProcedurePrepareLatch latch = ProcedurePrepareLatch.createBlockingLatch();
     submitProcedure(
       new CreateTableProcedure(procedureExecutor.getEnvironment(), desc, 
newRegions, latch));             // line 2011
     latch.await();

     getMaster().getMasterCoprocessorHost().postCreateTable(desc, newRegions);
   }

   @Override
   protected String getDescription() {
     return "CreateTableProcedure";
   }
 });{code}
The its call stack is:

 

 
{code:java}
org.apache.hadoop.hbase.master.HMaster,createTable,1995
org.apache.hadoop.hbase.master.MasterRpcServices,createTable,705
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2,callBlockingMethod,108919
org.apache.hadoop.hbase.ipc.RpcServer,call,395
org.apache.hadoop.hbase.ipc.CallRunner,run,133
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler,run,338
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler,run,318 {code}
`CreateTableProcedure` is submitted at line 2011. It will eventually call 
CreateTableProcedure#executeFromState in HMaster.

 

CreateTableProcedure#executeFromState runs a state machine. It exercises 3 
states, one after another (via the loop in ProcedureExecutor#executeProcedure):
 # `CREATE_TABLE_ADD_TO_META` state. It will add table info to the hbase::meta.
 # `CREATE_TABLE_ASSIGN_REGIONS` state. It will create a subprocedure, which 
makes an AssignmentManager in master connect to regionservers to distribute the 
region create command.
 # `CREATE_TABLE_UPDATE_DESC_CACHE` state. It finalizes the region creation.

Code snippet (CreateTableProcedure#executeFromState, line 105 ~ 120):

 
{code:java}
case CREATE_TABLE_ADD_TO_META:
 newRegions = addTableToMeta(env, tableDescriptor, newRegions);
 setNextState(CreateTableState.CREATE_TABLE_ASSIGN_REGIONS);
 break;
case CREATE_TABLE_ASSIGN_REGIONS:
 setEnablingState(env, getTableName());
 addChildProcedure(env.getAssignmentManager()
   .createRoundRobinAssignProcedures(newRegions));
 setNextState(CreateTableState.CREATE_TABLE_UPDATE_DESC_CACHE);
 break;
case CREATE_TABLE_UPDATE_DESC_CACHE:
 // XXX: this stage should be named as set table enabled, as now we will cache 
the
 // descriptor after writing fs layout.
 setEnabledState(env, getTableName());
 setNextState(CreateTableState.CREATE_TABLE_POST_OPERATION);
 break;{code}
When the issue described in HBASE-26256 appears, it is in the 
`CREATE_TABLE_ASSIGN_REGIONS` state. The subprocedure calling regionserver is 
stuck for a while, so the state machine is not able to proceed to the 
`CREATE_TABLE_UPDATE_DESC_CACHE` state temporarily. However, at this moment, 
when other clients run the `list` command, they are essentially only checking 
the meta variable field in this HMaster and print this table, although the 
state machine here is still stuck at the `CREATE_TABLE_ASSIGN_REGIONS` and this 
table is actually not yet fully created. Therefore, the inconsistency happens.

To fix this issue, we need to slightly change the state machine or add some 
auxiliary flags for the tables to mark whether the table is still under the 
procedure of being created. We think the latter is better because changing the 
state machine may involve lots of modification.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to