On 4 November 2015 at 00:45, Davide Giannella <dav...@apache.org> wrote:

> Hello Team,
>
> Lucene index is always asynchronous and the async index could lag behind
> by definition.
>
> Sometimes we could have the same query better served by a property
> index, or traversing for example. In case the async index is lagging
> behind it could be that the traversing index is better suited to return
> the information as it will be more updated.
>
> As we know we run an async update every 5 seconds, we could come up with
> some algorithm to be used on the cost computing, that auto correct with
> some math the cost, increasing it the more the time passed since the
> last full execution of async index.
>
> WDYT?
>


Going down the property index route, for a DocumentMK instance will bloat
the DocumentStore further. That already consumes 60% of a production
repository and like many in DB inverted indexes is not an efficient storage
structure. It's probably ok for TarMK.

Traversals are a problem for production. They will create random outages
under any sort of concurrent load.

---
If the way the indexing was performed is changed, it could make the index
NRT or real time depending on your point of view. eg. Local indexes, each
Oak index in the cluster becoming a shard with replication to cover
instance unavailability. No more indexing cycles, soft commits with each
instance using a FS Directory and a update queue replacing the async
indexing queue. Query by map reduce. It might have to copy on write to seed
new instances where the number of instances falls below 3.



Best Regards
Ian



>
> Davide
>

Reply via email to