Ke Han created HBASE-28583:
------------------------------
Summary: Upgrade from 2.5.8 to 3.0 crash with
InvalidProtocolBufferException: Message missing required fields:
old_table_schema
Key: HBASE-28583
URL: https://issues.apache.org/jira/browse/HBASE-28583
Project: HBase
Issue Type: Bug
Components: master
Affects Versions: 2.5.8, 3.0.0
Reporter: Ke Han
Attachments: commands.txt, hbase--master-cc13b0df0f3a.log,
persistent.tar.gz
When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 RS,
2 HDFS), I met the following exception and the upgrade failed.
{code:java}
2024-05-09T20:16:20,638 ERROR [master/hmaster:16000:becomeActiveMaster]
master.HMaster: Failed to become active master
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
Message missing required fields: old_table_schema
at
org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
at
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
at
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
at
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
at
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
at
org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118)
~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
at
org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613)
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
2024-05-09T20:16:20,639 ERROR [master/hmaster:16000:becomeActiveMaster]
master.HMaster: ***** ABORTING master hmaster,16000,1715285771112: Unhandled
exception. Starting shutdown. *****
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
Message missing required fields: old_table_schema
at
org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
at
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
at
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
at
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
at
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
at
org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118)
~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
at
org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613)
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362] {code}
h1. Reproduce
This bug can be reproduced deterministically with the following steps:
Start up HBase 2.5.8 cluster (1 HM, 2 RS, 1 HDFS: hadoop 2.10.2). Execute a
list of commands in the attached file.
Stop the 2.5.8 cluster, then start up 3.0.0 cluster (commit: 516c89e8597fb6)
The upgrade will fail with the above exception.
h1. Root Cause
The incompatibility between 2.5.8 and 3.0.0 is a newly added *required* field
in proto file: _old_table_schema._
2.5.8
{code:java}
hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
message RestoreSnapshotStateData {
required UserInformation user_info = 1;
required SnapshotDescription snapshot = 2;
required TableSchema modified_table_schema = 3;
repeated RegionInfo region_info_for_restore = 4;
repeated RegionInfo region_info_for_remove = 5;
repeated RegionInfo region_info_for_add = 6;
repeated RestoreParentToChildRegionsPair parent_to_child_regions_pair_list =
7;
optional bool restore_acl = 8;
}{code}
3.0.0
{code:java}
message RestoreSnapshotStateData {
required UserInformation user_info = 1;
required SnapshotDescription snapshot = 2;
required TableSchema modified_table_schema = 3;
repeated RegionInfo region_info_for_restore = 4;
repeated RegionInfo region_info_for_remove = 5;
repeated RegionInfo region_info_for_add = 6;
repeated RestoreParentToChildRegionsPair parent_to_child_regions_pair_list =
7;
optional bool restore_acl = 8;
required TableSchema old_table_schema = 9;
} {code}
In certain scenarios, the proto message does not contain the old_table_schema
field.
How this special data is generated is still unclear. I tried to minimize the
command sequences but failed. It could be a complicated bug which requires a
long command sequence to trigger.
I attached the (1) commands to trigger it (2) master logs file and (3) all log
files in persistent.tar.gz.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)