Hi team,

Just to add a bit of disclaimer - external observation also would be
affected in similar fashion.. but afaik, we've never claimed lossless
delivery of observation events - it's a best case service.

With CN1 (leader) and CN2 (other cluster node) participating in the
cluster, here's a rough timeline that I think can lead to issues:
# cluster is idle and index is updated till this point
# CN2 changes some indexable data at /tree/node1
# CN2->bkWrite works
# CN2 changes some indexable data at /tree/node2
# CN2 crashes
# CN1 changes some indexable data at /tree/node3
# CN1 bkReads
# Async index cycle runs and indexes /tree/node1 and /tree/node3
# CN1 notices that CN2 lease has timed out and recovers changes in /tree/node2

BUT, /tree/node2 never goes into index.

Btw, recovery does push journal entry so, if the timing is right, then
things would work fine. But it's hard to imagine that bkRead+async
cycle wouldn't have run before lease timeout of crashed node.

In the spirit of observation, I think it's hard to establish a good
way/scheme to put /tree/node2 in observation queue.
That being said, at least for lucene indexing, all we want is "refresh
indexed data /tree/node2". So, may be, it's easier to solve this for
indexing.

Thoughts?

Thanks,
Vikas
PS: Haven't opened the issue yet as I wasn't sure yet if we can solve
both observation and index in one go - or that it's useful to solve
index case alone.

Reply via email to