On 4 November 2015 at 00:45, Davide Giannella <dav...@apache.org> wrote:
> Hello Team, > > Lucene index is always asynchronous and the async index could lag behind > by definition. > > Sometimes we could have the same query better served by a property > index, or traversing for example. In case the async index is lagging > behind it could be that the traversing index is better suited to return > the information as it will be more updated. > > As we know we run an async update every 5 seconds, we could come up with > some algorithm to be used on the cost computing, that auto correct with > some math the cost, increasing it the more the time passed since the > last full execution of async index. > > WDYT? > Going down the property index route, for a DocumentMK instance will bloat the DocumentStore further. That already consumes 60% of a production repository and like many in DB inverted indexes is not an efficient storage structure. It's probably ok for TarMK. Traversals are a problem for production. They will create random outages under any sort of concurrent load. --- If the way the indexing was performed is changed, it could make the index NRT or real time depending on your point of view. eg. Local indexes, each Oak index in the cluster becoming a shard with replication to cover instance unavailability. No more indexing cycles, soft commits with each instance using a FS Directory and a update queue replacing the async indexing queue. Query by map reduce. It might have to copy on write to seed new instances where the number of instances falls below 3. Best Regards Ian > > Davide >