Replication corrupting kudu database?

Ravi Ravi Wed, 26 Oct 2016 21:06:08 -0700

Hi,

I am trying to create a TPC-DS database on Kudu with a scale factor of
1000. My kudu version has been built using the 1.0.0-SNAPSHOT source. My
cluster has 4 kudu-tserver instances. The tables use a replication factor
of 3.


I am using simple Impala insert statements to load the tables: insert into
kudu-table select * from parquet-table.

This insert overwrites the contents of several block_manager_instance files
in the cluster. These files get filled with messages of this type:
E1020 10:25:05.947288 19199 consensus_queue.cc:349] T
5acb121ecb9945f79fe25e30d49df4e3 P 5ba624b4123a464fa95a11bfdbef9210
[LEADER]: Error trying to read ahead of the log while preparing peer
request: Incomplete: Op with index 405031 is ahead of the local log (next
sequential op: 405031). Destination peer: Peer:
7aad1905ae144a029788546e2a104909, Is new: false, Last received: 304.405031,
Next index: 405032, Last known committed idx: 405031, Last exchange result:
ERROR, Needs remote bootstrap: false

The tserver cannot be restarted after this because it reads these files on
startup and fails with an "invalid magic number" message. There is no
recovery possible. I have to delete all files in the database and recreate
all tables.

This corruption does not occur when I set the replication factor to 1.

Is anyone aware of this problem? Is there a fix available?

I have attached one of the block_manager_instance files, if it helps.

Thanks,
Ravi

Replication corrupting kudu database?

Reply via email to