Todd Lipcon created KUDU-1501:
---------------------------------
Summary: RaftConsensusITest.TestMasterReplacesEvictedFollowers
flaky with bootstrap reply error
Key: KUDU-1501
URL: https://issues.apache.org/jira/browse/KUDU-1501
Project: Kudu
Issue Type: Bug
Components: consensus, tablet
Affects Versions: 0.9.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical
I looped this test a couple hundred times in TSAN and caught this failure,
which looks like it might be a serious bug:
W0624 01:56:54.399389 30339 consensus_peers.cc:326] T
fbfe7e1538d442fb9d4e0958c6ed3b7a P 9ee39d9aaf9840b9a502817a4cfe0a68 -> Peer
e9f2622cd2f64f95ad5ebc9706524bab (127.112.159.2:58860): Couldn't send request
to peer e9f2622cd2f64f95ad5ebc9706524bab for tablet
fbfe7e1538d442fb9d4e0958c6ed3b7a. Error code: TABLET_NOT_RUNNING (12). Status:
Illegal state: Tablet not RUNNING: FAILED: Corruption: Failed log replay.
Reason: Debug Info: Error playing entry 3 of segment 6 of tablet
fbfe7e1538d442fb9d4e0958c6ed3b7a. Segment path:
/tmp/kudutest-1000/raft_consensus-itest.RaftConsensusITest.TestMasterReplacesEvictedFollowers.1466733357563157-28831/raft_consensus-itest-cluster/ts-2/wals/fbfe7e1538d442fb9d4e0958c6ed3b7a.recovery/wal-000000006.
Entry: type: COMMIT commit { op_type: WRITE_OP commited_op_id { term: 1 index:
36 } result { ops { mutated_stores { mrs_id: 3 } } } }: CommitMsg was orphaned
but it referred to stores which need replay. Commit: op_type: WRITE_OP
commited_op_id { term: 1 index: 36 } result { ops { mutated_stores { mrs_id: 3
} } }. TabletMetadata: table_id: "9fb52e694c1d46e4991b49b78a3b8acf" tablet_id:
"fbfe7e1538d442fb9d4e0958c6ed3b7a" last_durable_mrs_id: 2 rowsets { id: 3
last_durable_dms_id: -1 columns { block { id: 1836738791030108424 } column_id:
10 } columns { block { id: 3331501504918373718 } column_id: 11 } columns {
block { id: 4024765891195703834 } column_id: 12 } undo_deltas { block { id:
3564657040239809453 } } bloom_block { id: 3499726779858777197 } } table_name:
"TestTable" schema { columns { id: 10 name: "key" type: INT32 is_key: true
is_nullable: false encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION
cfile_block_size: 0 } columns { id: 11 name: "int_val" type: INT32 is_key:
false is_nullable: false encoding: AUTO_ENCODING compression:
DEFAULT_COMPRESSION cfile_block_size: 0 } columns { id: 12 name: "string_val"
type: STRING is_key: false is_nullable: true encoding: AUTO_ENCODING
compression: DEFAULT_COMPRESSION cfile_block_size: 0 } } schema_version: 0
tablet_data_state: TABLET_DATA_READY partition { partition_key_start: ""
partition_key_end: "" } partition_schema { range_schema { columns { id: 10 } }
}. Retrying in the next heartbeat period. Already tried 13 times.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)