[ https://issues.apache.org/jira/browse/HBASE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ke Han updated HBASE-28583: --------------------------- Description: When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 RS, 2 HDFS), I met the following exception and the upgrade failed. {code:java} 2024-05-09T20:16:20,638 ERROR [master/hmaster:16000:becomeActiveMaster] master.HMaster: Failed to become active master org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: Message missing required fields: old_table_schema at org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155) ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362] 2024-05-09T20:16:20,639 ERROR [master/hmaster:16000:becomeActiveMaster] master.HMaster: ***** ABORTING master hmaster,16000,1715285771112: Unhandled exception. Starting shutdown. ***** org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: Message missing required fields: old_table_schema at org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155) ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362] {code} h1. Reproduce This bug can be reproduced deterministically with the following steps: Start up HBase 2.5.8 cluster (1 HM, 2 RS, 1 HDFS: hadoop 2.10.2). Execute a list of commands in the attached file. Stop the 2.5.8 cluster, then start up 3.0.0 cluster (commit: 516c89e8597fb6) The upgrade will fail with the above exception. h1. Root Cause The incompatibility between 2.5.8 and 3.0.0 is a newly added *required* field in proto file: _{*}old_table_schema{*}._ 2.5.8 {code:java} hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto message RestoreSnapshotStateData { required UserInformation user_info = 1; required SnapshotDescription snapshot = 2; required TableSchema modified_table_schema = 3; repeated RegionInfo region_info_for_restore = 4; repeated RegionInfo region_info_for_remove = 5; repeated RegionInfo region_info_for_add = 6; repeated RestoreParentToChildRegionsPair parent_to_child_regions_pair_list = 7; optional bool restore_acl = 8; }{code} 3.0.0 {code:java} message RestoreSnapshotStateData { required UserInformation user_info = 1; required SnapshotDescription snapshot = 2; required TableSchema modified_table_schema = 3; repeated RegionInfo region_info_for_restore = 4; repeated RegionInfo region_info_for_remove = 5; repeated RegionInfo region_info_for_add = 6; repeated RestoreParentToChildRegionsPair parent_to_child_regions_pair_list = 7; optional bool restore_acl = 8; required TableSchema old_table_schema = 9; } {code} In certain scenarios, the proto message does not contain the old_table_schema field. How this special data is generated is still unclear. I tried to minimize the command sequences but failed. It could be a complicated bug which requires a long command sequence to trigger. I am wondering whether *_old_table_schema_* field must be set as required. I attached the (1) commands to trigger it (2) master logs file and (3) all log files in persistent.tar.gz. I am trying to find out the root cause. I appreciate for any suggestion. Thank you! was: When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 RS, 2 HDFS), I met the following exception and the upgrade failed. {code:java} 2024-05-09T20:16:20,638 ERROR [master/hmaster:16000:becomeActiveMaster] master.HMaster: Failed to become active master org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: Message missing required fields: old_table_schema at org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155) ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362] 2024-05-09T20:16:20,639 ERROR [master/hmaster:16000:becomeActiveMaster] master.HMaster: ***** ABORTING master hmaster,16000,1715285771112: Unhandled exception. Starting shutdown. ***** org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: Message missing required fields: old_table_schema at org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155) ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362] {code} h1. Reproduce This bug can be reproduced deterministically with the following steps: Start up HBase 2.5.8 cluster (1 HM, 2 RS, 1 HDFS: hadoop 2.10.2). Execute a list of commands in the attached file. Stop the 2.5.8 cluster, then start up 3.0.0 cluster (commit: 516c89e8597fb6) The upgrade will fail with the above exception. h1. Root Cause The incompatibility between 2.5.8 and 3.0.0 is a newly added *required* field in proto file: _old_table_schema._ 2.5.8 {code:java} hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto message RestoreSnapshotStateData { required UserInformation user_info = 1; required SnapshotDescription snapshot = 2; required TableSchema modified_table_schema = 3; repeated RegionInfo region_info_for_restore = 4; repeated RegionInfo region_info_for_remove = 5; repeated RegionInfo region_info_for_add = 6; repeated RestoreParentToChildRegionsPair parent_to_child_regions_pair_list = 7; optional bool restore_acl = 8; }{code} 3.0.0 {code:java} message RestoreSnapshotStateData { required UserInformation user_info = 1; required SnapshotDescription snapshot = 2; required TableSchema modified_table_schema = 3; repeated RegionInfo region_info_for_restore = 4; repeated RegionInfo region_info_for_remove = 5; repeated RegionInfo region_info_for_add = 6; repeated RestoreParentToChildRegionsPair parent_to_child_regions_pair_list = 7; optional bool restore_acl = 8; required TableSchema old_table_schema = 9; } {code} In certain scenarios, the proto message does not contain the old_table_schema field. How this special data is generated is still unclear. I tried to minimize the command sequences but failed. It could be a complicated bug which requires a long command sequence to trigger. I attached the (1) commands to trigger it (2) master logs file and (3) all log files in persistent.tar.gz. > Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message > missing required fields: old_table_schema > ---------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-28583 > URL: https://issues.apache.org/jira/browse/HBASE-28583 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 3.0.0, 2.5.8 > Reporter: Ke Han > Priority: Major > Attachments: commands.txt, hbase--master-cc13b0df0f3a.log, > persistent.tar.gz > > > When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 > RS, 2 HDFS), I met the following exception and the upgrade failed. > {code:java} > 2024-05-09T20:16:20,638 ERROR [master/hmaster:16000:becomeActiveMaster] > master.HMaster: Failed to become active master > org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: > Message missing required fields: old_table_schema > at > org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155) > ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362] > 2024-05-09T20:16:20,639 ERROR [master/hmaster:16000:becomeActiveMaster] > master.HMaster: ***** ABORTING master hmaster,16000,1715285771112: Unhandled > exception. Starting shutdown. ***** > org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: > Message missing required fields: old_table_schema > at > org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155) > ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362] {code} > h1. Reproduce > This bug can be reproduced deterministically with the following steps: > Start up HBase 2.5.8 cluster (1 HM, 2 RS, 1 HDFS: hadoop 2.10.2). Execute a > list of commands in the attached file. > Stop the 2.5.8 cluster, then start up 3.0.0 cluster (commit: 516c89e8597fb6) > The upgrade will fail with the above exception. > h1. Root Cause > The incompatibility between 2.5.8 and 3.0.0 is a newly added *required* field > in proto file: _{*}old_table_schema{*}._ > 2.5.8 > {code:java} > hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto > message RestoreSnapshotStateData { > required UserInformation user_info = 1; > required SnapshotDescription snapshot = 2; > required TableSchema modified_table_schema = 3; > repeated RegionInfo region_info_for_restore = 4; > repeated RegionInfo region_info_for_remove = 5; > repeated RegionInfo region_info_for_add = 6; > repeated RestoreParentToChildRegionsPair parent_to_child_regions_pair_list > = 7; > optional bool restore_acl = 8; > }{code} > 3.0.0 > {code:java} > message RestoreSnapshotStateData { > required UserInformation user_info = 1; > required SnapshotDescription snapshot = 2; > required TableSchema modified_table_schema = 3; > repeated RegionInfo region_info_for_restore = 4; > repeated RegionInfo region_info_for_remove = 5; > repeated RegionInfo region_info_for_add = 6; > repeated RestoreParentToChildRegionsPair parent_to_child_regions_pair_list > = 7; > optional bool restore_acl = 8; > required TableSchema old_table_schema = 9; > } {code} > In certain scenarios, the proto message does not contain the old_table_schema > field. > How this special data is generated is still unclear. I tried to minimize the > command sequences but failed. It could be a complicated bug which requires a > long command sequence to trigger. > I am wondering whether *_old_table_schema_* field must be set as required. > > I attached the (1) commands to trigger it (2) master logs file and (3) all > log files in persistent.tar.gz. > I am trying to find out the root cause. I appreciate for any suggestion. > Thank you! -- This message was sent by Atlassian Jira (v8.20.10#820010)