Zoie is not for distributed search. If you want to analyze the LinkedIn
developments for this area with Lucene, you should look at Sensei

There was also a BalancedSegmentMergePolicy donated to Lucene 2.x from the
Zoie project

https://issues.apache.org/jira/browse/LUCENE-1924

but there was not enough energy for maintaining it. Now Lucene is at
version 4, with vast improvements in the area of segment merging.

You mention the in-memory segments for fast NRT. Lucene 4 has implemented
this by default, plus Elasticsearch has some more improvements for
distributed NRT get.

Note, not all searches can be candidates for NRT. If you use mlockall and
index store type mmapfs, you can move almost all your ES/Lucene data and
files to RAM (if you can spend enough hardware). Modifying data in the
index always means to invalidate fielddata cache and maybe filter/facet
caches, and creation of new cache generations, which is expensive and
destroys performance. There is a tradeoff, balancing must be done very
carefully to avoid stale results. This is hard when not much is known about
the typical search workload of an application. ES allows to cache filters
and to clear caches explicitly. Maybe this is an area to experiment with.
But it always depends.

Jörg


On Thu, Jun 26, 2014 at 11:25 AM, Nico Krijnen <n...@woodwing.com> wrote:

> Hi,
>
> We have recently migrated our application from 'bare Lucene + Zoie for
> realtime search' to Elastic Search. Elastic search is awesome and next to
> scalability, it gives us lots of additional features. The one thing we
> really miss though is realtime search.
>
> Search is the core of our application. All our data is stored in the index
> (primary data store). When a user adds a file or makes a change, their
> subsequent search must reflect that change. With Zoie, the data was indexed
> very quickly into a temporary Lucene memory index. Not having to write+read
> it on disk makes the documents available for search much faster than NRT
> Lucene. The memory index is flushed to disk asynchrounously from time to
> time, not impacting indexing or search performance. Zoie also allows you to
> wait for a specific 'version of the index' to be available for searching.
> That way we could make the user's thread wait until their data was indexed
> in memory, only pausing the thread of that user without having any
> performance impact for all the other users.
>
> Result: realtime search and insanely fast indexing.
>
> With Elastic Search we have to do a refresh to make data available for
> search. Lots of refreshes or the 1 second refresh interval will cause
> significant slower indexing speed. We don't know beforehand when our users
> will import documents or make lots of changes, so we cannot really increase
> the refresh interval when needed to make indexing faster. We know that
> 'get' is realtime and we make use of that as much as possible, but in lots
> of cases we really require a search to find the data.
>
> Our plan is to implement some mechanism in Elastic Search to get the same
> realtime search + fast indexing behavior that we had with Zoie. We need
> some pointers though on what would be the best place in Elastic Search to
> do something like this. After all it hooks into low level Elastic Search
> and Lucene stuff.
>
> I can imagine that 'realtime-search while indexing' is important for many
> other Elastic Search users too. What are the chances of something like this
> getting merged back into the main branch?
>
> I'm planning to be at the Friday drinks tomorrow in Amsterdam. Is there
> anyone attending with whom I could do some sparring with on this matter?
>
> Thanks,
> Nico
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/0ed50d5f-4ade-4d56-af06-6e2c26feff9b%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/0ed50d5f-4ade-4d56-af06-6e2c26feff9b%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH2i09%3DdFTO0g%3Dc0Z9q%2BwCmdm7%3DtfzC3TV1-QQws8gsdQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to