Bill Farner created AURORA-1090:
-----------------------------------
Summary: Optimize or remove shard uniqueness check from
StorageBackfill
Key: AURORA-1090
URL: https://issues.apache.org/jira/browse/AURORA-1090
Project: Aurora
Issue Type: Task
Components: Scheduler, Technical Debt
Reporter: Bill Farner
Priority: Critical
We have noticed that during scheduler startup, the operation, there can be a
significant amount of time spent between the following log lines:
{noformat}
Performing shard uniqueness sanity check.
storage state machine transition PREPARED -> READY
{noformat}
Looking at what happens in the scheduler between those points, the expensive
operation seems to be {{guaranteeShardUniqueness}}.
This operation aims to validate the integrity of the storage, but its value is
dubious. There are many other things that could be done to validate integrity,
but they should probably not be done every time the scheduler loads its
database.
If the operation is kept, it can be dramatically optimized. It currently
performs an O(n^2) scan of tasks, and this could trivially be reduced to O(n).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)