Geoffrey Jacoby created PHOENIX-5027:
----------------------------------------
Summary: PhoenixIndexImportDirectMapper retried mappers can
succeed without inserting all index data
Key: PHOENIX-5027
URL: https://issues.apache.org/jira/browse/PHOENIX-5027
Project: Phoenix
Issue Type: Bug
Reporter: Geoffrey Jacoby
On two recent occasions I've rebuilt a large global immutable index by doing a
DROP/CREATE and ended up with missing index data, though it doesn't happen
every time. Here's what happened:
1. PhoenixMRJobSubmitter correctly detects the index rebuild is necessary, and
invokes IndexTool.
2. IndexTool enqueues a MapReduce job using PhoenixIndexImportDirectMapper
3. Some mappers fail because of timeouts due to heavy splitting on the new
index table
4. Those mappers are retried and succeed. The MR job as a whole completes
successfully.
5. RowCounter and IndexScrutinyTool show millions of rows are missing from the
index, with keys that imply they were part of the failed mappers
Aside from the timestamp glitch I pointed out in PHOEIX-5018, the code in
PhoenixIndexImportDirectMapper _looks_ idempotent on a rerun, so I've been
struggling to find the cause of the missing index data.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)