GitHub user danny0405 edited a comment on the discussion: RocksDB as The
Replica of MDT/RLI
> I don't like how we are coupling index choices and concurrency models.
yeah, the simple bucket index is required to impl NBCC now and we may need more
flexible and general design for concurrent modifications in streaming
concurrent write scenarios.
> can you please explain in detail,how the failover and OCC handling are
> related.
I think we can categorize the concurrent write cases into two: write with
conflicts and write without conflicts.
* If the write detects conflicts, the whole job/task will trigger failover and
the RocksDB replica will rebootstrap from scrach, which can ensure the
consistency of the index backend akka to MDT RLI index, but this needs to
introduce specific early conflict detection just in the checkpoint lifecycle:
* persist the uncommitted write metadata under the Hudi table path;
* in the last step of the #snapshot of write function, send a request to the
coordinator to detect the conflicts;
* need a customized conflict resolution strategy to combine all the existing
uncommited write metadata with the latest timeline to validate where there are
conflicts;
The pre-commit conflict resolution does not work well for Flink streaming
because it happens after a successful checkpoint, Hudi deems the write as
failed if there is conflict while Flink deems the write as successful(from the
latest successful checkpoint), to fix gap, the early confclit resolutuon is
required here.
* If the write does not detect confclits, there are still cases that another
concurrent write modify the table with new record locations, the solution is we
might need a early detection of the index backend freshness before each write:
maintain a mappings between job-id to instant time so we can load the index
changes maded from concurrent writers incrementally.(put the job-id in commit
metadata or maintain it on the coordinator). This introduces a lot of
complexities though, I'm expecting a more general solution for NBCC that is
index type agnostic and not struggle in this index concurrent modification trap.
Here is the table for support of cuncurrent modifications with Flink RLI:
| use case/concurrency mode | OCC | NBCC |
|---|---|---|
| write & write | Y(with early conflic detection and index refreshing) | N |
| write & compaction | Y | N |
| write & clustering | Y(with early index refreshing) | N |
GitHub link:
https://github.com/apache/hudi/discussions/18296#discussioncomment-16171800
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]