Re: Regarding Transaction logging

Andrzej Bialecki Fri, 09 Sep 2011 02:19:55 -0700

On 09/09/2011 11:00, Simon Willnauer wrote:

I created LUCENE-3424 for this. But I still would like to keep the
discussion open here rather than moving this entirely to an issue.
There is more about this than only the seq. ids.

I'm concerned also about the content of the transaction log. In Solr ituses javabin-encoded UpdateCommand-s (either SolrInputDocuments orDelete/Commit commands). Documents in the log are raw documents, i.e.before analysis.

This may have some merits for Solr (e.g. you could imagine havingdifferent analysis chains on the Solr slaves), but IMHO it's more of ahassle for Lucene, because it means that the analysis has to be repeatedover and over again on all clients. If the analysis chain is costly(e.g. NLP) then it would make sense to have an option to log documentspost-analysis, i.e. as correctly typed stored values (e.g. string ->numeric) AND the resulting TokenStream-s. This has also the advantage ofmoving us towards the "dumb IndexWriter" concept, i.e. separatinganalysis from the core inverted index functionality.

So I'd argue for recording post-analysis docs in the tlog, eitherexclusively or as a default option.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Regarding Transaction logging

Reply via email to