Hi team, Just to add a bit of disclaimer - external observation also would be affected in similar fashion.. but afaik, we've never claimed lossless delivery of observation events - it's a best case service.
With CN1 (leader) and CN2 (other cluster node) participating in the cluster, here's a rough timeline that I think can lead to issues: # cluster is idle and index is updated till this point # CN2 changes some indexable data at /tree/node1 # CN2->bkWrite works # CN2 changes some indexable data at /tree/node2 # CN2 crashes # CN1 changes some indexable data at /tree/node3 # CN1 bkReads # Async index cycle runs and indexes /tree/node1 and /tree/node3 # CN1 notices that CN2 lease has timed out and recovers changes in /tree/node2 BUT, /tree/node2 never goes into index. Btw, recovery does push journal entry so, if the timing is right, then things would work fine. But it's hard to imagine that bkRead+async cycle wouldn't have run before lease timeout of crashed node. In the spirit of observation, I think it's hard to establish a good way/scheme to put /tree/node2 in observation queue. That being said, at least for lucene indexing, all we want is "refresh indexed data /tree/node2". So, may be, it's easier to solve this for indexing. Thoughts? Thanks, Vikas PS: Haven't opened the issue yet as I wasn't sure yet if we can solve both observation and index in one go - or that it's useful to solve index case alone.