On Fri, Sep 9, 2011 at 11:19 AM, Andrzej Bialecki <a...@getopt.org> wrote: > On 09/09/2011 11:00, Simon Willnauer wrote: >> >> I created LUCENE-3424 for this. But I still would like to keep the >> discussion open here rather than moving this entirely to an issue. >> There is more about this than only the seq. ids. > > I'm concerned also about the content of the transaction log. In Solr it uses > javabin-encoded UpdateCommand-s (either SolrInputDocuments or Delete/Commit > commands). Documents in the log are raw documents, i.e. before analysis. > > This may have some merits for Solr (e.g. you could imagine having different > analysis chains on the Solr slaves), but IMHO it's more of a hassle for > Lucene, because it means that the analysis has to be repeated over and over > again on all clients. If the analysis chain is costly (e.g. NLP) then it > would make sense to have an option to log documents post-analysis, i.e. as > correctly typed stored values (e.g. string -> numeric) AND the resulting > TokenStream-s. This has also the advantage of moving us towards the "dumb > IndexWriter" concept, i.e. separating analysis from the core inverted index > functionality. > > So I'd argue for recording post-analysis docs in the tlog, either > exclusively or as a default option.
I am not sure if this should be the default option but I would need to see how this is implemented. if we can efficiently support such a preanalyzed document I am all for it. But I think it should be possible to write opaque documents too. Other implementations / users of lucene should be able to write their app specific format too. simon > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org