[ 
https://issues.apache.org/jira/browse/HBASE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28583:
---------------------------
    Description: 
When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 RS, 
2 HDFS), I met the following exception and the upgrade failed.
{code:java}
2024-05-09T20:16:20,638 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Failed to become active master
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: 
Message missing required fields: old_table_schema
        at 
org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
 ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
2024-05-09T20:16:20,639 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: ***** ABORTING master hmaster,16000,1715285771112: Unhandled 
exception. Starting shutdown. *****
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: 
Message missing required fields: old_table_schema
        at 
org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
 ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362] {code}
h1. Reproduce

This bug can be reproduced deterministically with the following steps:

Start up HBase 2.5.8 cluster (1 HM, 2 RS, 1 HDFS: hadoop 2.10.2). Execute a 
list of commands in the attached file.

Stop the 2.5.8 cluster, then start up 3.0.0 cluster (commit: 516c89e8597fb6)

The upgrade will fail with the above exception.
h1. Root Cause

The incompatibility between 2.5.8 and 3.0.0 is a newly added *required* field 
in proto file: _{*}old_table_schema{*}._

2.5.8
{code:java}
hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto

message RestoreSnapshotStateData {
  required UserInformation user_info = 1;
  required SnapshotDescription snapshot = 2;
  required TableSchema modified_table_schema = 3;
  repeated RegionInfo region_info_for_restore = 4;
  repeated RegionInfo region_info_for_remove = 5;
  repeated RegionInfo region_info_for_add = 6;
  repeated RestoreParentToChildRegionsPair parent_to_child_regions_pair_list = 
7;
  optional bool restore_acl = 8;
}{code}
3.0.0
{code:java}
message RestoreSnapshotStateData {
  required UserInformation user_info = 1;
  required SnapshotDescription snapshot = 2;
  required TableSchema modified_table_schema = 3;
  repeated RegionInfo region_info_for_restore = 4;
  repeated RegionInfo region_info_for_remove = 5;
  repeated RegionInfo region_info_for_add = 6;
  repeated RestoreParentToChildRegionsPair parent_to_child_regions_pair_list = 
7;
  optional bool restore_acl = 8;
  required TableSchema old_table_schema = 9;
} {code}
In certain scenarios, the proto message does not contain the old_table_schema 
field.

How this special data is generated is still unclear. I tried to minimize the 
command sequences but failed. It could be a complicated bug which requires a 
long command sequence to trigger. 

I am wondering whether *_old_table_schema_* field must be set as required.

 

I attached the (1) commands to trigger it (2) master logs file and (3) all log 
files in persistent.tar.gz.

I am trying to find out the root cause. I appreciate for any suggestion. Thank 
you!

  was:
When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 RS, 
2 HDFS), I met the following exception and the upgrade failed.

 
{code:java}
2024-05-09T20:16:20,638 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Failed to become active master
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: 
Message missing required fields: old_table_schema
        at 
org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
 ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
2024-05-09T20:16:20,639 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: ***** ABORTING master hmaster,16000,1715285771112: Unhandled 
exception. Starting shutdown. *****
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: 
Message missing required fields: old_table_schema
        at 
org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
 ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362] {code}
 
h1. Reproduce

This bug can be reproduced deterministically with the following steps:

Start up HBase 2.5.8 cluster (1 HM, 2 RS, 1 HDFS: hadoop 2.10.2). Execute a 
list of commands in the attached file.

Stop the 2.5.8 cluster, then start up 3.0.0 cluster (commit: 516c89e8597fb6)

The upgrade will fail with the above exception.
h1. Root Cause

The incompatibility between 2.5.8 and 3.0.0 is a newly added *required* field 
in proto file: _old_table_schema._

2.5.8
{code:java}
hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto

message RestoreSnapshotStateData {
  required UserInformation user_info = 1;
  required SnapshotDescription snapshot = 2;
  required TableSchema modified_table_schema = 3;
  repeated RegionInfo region_info_for_restore = 4;
  repeated RegionInfo region_info_for_remove = 5;
  repeated RegionInfo region_info_for_add = 6;
  repeated RestoreParentToChildRegionsPair parent_to_child_regions_pair_list = 
7;
  optional bool restore_acl = 8;
}{code}
3.0.0
{code:java}
message RestoreSnapshotStateData {
  required UserInformation user_info = 1;
  required SnapshotDescription snapshot = 2;
  required TableSchema modified_table_schema = 3;
  repeated RegionInfo region_info_for_restore = 4;
  repeated RegionInfo region_info_for_remove = 5;
  repeated RegionInfo region_info_for_add = 6;
  repeated RestoreParentToChildRegionsPair parent_to_child_regions_pair_list = 
7;
  optional bool restore_acl = 8;
  required TableSchema old_table_schema = 9;
} {code}
In certain scenarios, the proto message does not contain the old_table_schema 
field.

How this special data is generated is still unclear. I tried to minimize the 
command sequences but failed. It could be a complicated bug which requires a 
long command sequence to trigger. 

 

I attached the (1) commands to trigger it (2) master logs file and (3) all log 
files in persistent.tar.gz.


> Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message 
> missing required fields: old_table_schema
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-28583
>                 URL: https://issues.apache.org/jira/browse/HBASE-28583
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 3.0.0, 2.5.8
>            Reporter: Ke Han
>            Priority: Major
>         Attachments: commands.txt, hbase--master-cc13b0df0f3a.log, 
> persistent.tar.gz
>
>
> When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 
> RS, 2 HDFS), I met the following exception and the upgrade failed.
> {code:java}
> 2024-05-09T20:16:20,638 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active master
> org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: old_table_schema
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
> ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
>  ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
> 2024-05-09T20:16:20,639 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: ***** ABORTING master hmaster,16000,1715285771112: Unhandled 
> exception. Starting shutdown. *****
> org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: old_table_schema
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
> ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
>  ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362] {code}
> h1. Reproduce
> This bug can be reproduced deterministically with the following steps:
> Start up HBase 2.5.8 cluster (1 HM, 2 RS, 1 HDFS: hadoop 2.10.2). Execute a 
> list of commands in the attached file.
> Stop the 2.5.8 cluster, then start up 3.0.0 cluster (commit: 516c89e8597fb6)
> The upgrade will fail with the above exception.
> h1. Root Cause
> The incompatibility between 2.5.8 and 3.0.0 is a newly added *required* field 
> in proto file: _{*}old_table_schema{*}._
> 2.5.8
> {code:java}
> hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
> message RestoreSnapshotStateData {
>   required UserInformation user_info = 1;
>   required SnapshotDescription snapshot = 2;
>   required TableSchema modified_table_schema = 3;
>   repeated RegionInfo region_info_for_restore = 4;
>   repeated RegionInfo region_info_for_remove = 5;
>   repeated RegionInfo region_info_for_add = 6;
>   repeated RestoreParentToChildRegionsPair parent_to_child_regions_pair_list 
> = 7;
>   optional bool restore_acl = 8;
> }{code}
> 3.0.0
> {code:java}
> message RestoreSnapshotStateData {
>   required UserInformation user_info = 1;
>   required SnapshotDescription snapshot = 2;
>   required TableSchema modified_table_schema = 3;
>   repeated RegionInfo region_info_for_restore = 4;
>   repeated RegionInfo region_info_for_remove = 5;
>   repeated RegionInfo region_info_for_add = 6;
>   repeated RestoreParentToChildRegionsPair parent_to_child_regions_pair_list 
> = 7;
>   optional bool restore_acl = 8;
>   required TableSchema old_table_schema = 9;
> } {code}
> In certain scenarios, the proto message does not contain the old_table_schema 
> field.
> How this special data is generated is still unclear. I tried to minimize the 
> command sequences but failed. It could be a complicated bug which requires a 
> long command sequence to trigger. 
> I am wondering whether *_old_table_schema_* field must be set as required.
>  
> I attached the (1) commands to trigger it (2) master logs file and (3) all 
> log files in persistent.tar.gz.
> I am trying to find out the root cause. I appreciate for any suggestion. 
> Thank you!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to