To the second part of your mail:

You touch on a couple of different topics:
A) a design where indexes are not stored in the repo - but specifically a 
design that uses an external (shared) indexer like ES
B) indexing latency
C) embedded Lucene vs embedded ES

I am not entirely sure what you are suggesting, TBH. On one hand you seem to 
suggest to use an external indexer like ES (external to the repo) on the other 
hand you mention embedded ES (which would lead to the repo and the embedded ES 
be colocated in the same JVM IIUC).

In order to understand let me ask you this way:
A) Oak does support an external Solr(Cloud) instance as a shared indexer that 
is external to the repo. Same could be done with an external ES (in fact, 
Tommaso has written a POC for that). On the very high level question whether 
the index and the repo should be separate: does this address your concern? 
(meaning: if we leave the relative benefits of Solr vs ES aside for a second)
B) indexing latency is a very different concern. In the current design it is 
relevant in deployments when there is high latency between the persistence and 
Oak. OAK-4638 and OAK-4412 are about addressing this. I am not sure in how far 
separating out the indexer from the repo persistence would help.
C) you seem to suggest that an embedded ES has advantages over the embedded 
Lucene. What I do not understand in that comparison where you would store the 
index. If locally we would be back to the JR2 design. If somewhere remote then 
why embed ES at all?

Thanks for clarifying
Cheers
Michael



>I am reticent to disagree with you, but I feel I have no option, based on
>research, history and first hand experience over the past 10 years.
>
>Storing indexes in a repo is what Compass did from 2004 onwards, until
>after the third version they gave up trying to build a scalable and near
>real time search engine. Version 4 was a rerwite that became ElasticSearch
>0.4.0. The history is documented here
>https://en.wikipedia.org/wiki/Elasticsearch and was presented at Berlin
>Buzwords in 2010 with a detailed description of why each approach fails. I
>have shared this information before. I am not sharing it to confront. I am
>sharing it because it pains me to see Oak repeating history. I don't feel I
>can stand by and watch in silence.
>
>If Oak does not want to use ES as a library, then learn from the history as
>it addresses your concerns (1,2, + brick wall) and those of Davide, and
>satisfies the many of the other issues potentially eliminating property
>indexes completely. It will however, only ever be as NRT as the root
>document commit period (1s), well above the 100ms data latency a model like
>used by ES delivers under production load.
>
> IMHO, the Hybrid approach being proposed is a step along the same history
>that Compass started treading in 2004. It is an innovative solution to a
>constrained problem space.
>
>Sorry if I sound like a broken record. I did exactly what Oak has done/is
>doing in 2006 onwards but without a vast user base was able to be more
>agile.
>
>Apache is about doing, not standing by, about fact not fiction, about
>evidence and reasoned argument. If there is any interest, I have an Oak PoC
>somewhere that ports the Lucene index plugin to use embedded ES instances,
>1 per VM as an embedded ES cluster. It's not complete as I gave up on it
>when I realised data latency would be fixed by the Oak root document. My
>interest was proper real time indexing over the cluster.

Reply via email to