On Fri, Sep 9, 2011 at 11:19 AM, Andrzej Bialecki <a...@getopt.org> wrote:
> On 09/09/2011 11:00, Simon Willnauer wrote:
>>
>> I created LUCENE-3424 for this. But I still would like to keep the
>> discussion open here rather than moving this entirely to an issue.
>> There is more about this than only the seq. ids.
>
> I'm concerned also about the content of the transaction log. In Solr it uses
> javabin-encoded UpdateCommand-s (either SolrInputDocuments or Delete/Commit
> commands). Documents in the log are raw documents, i.e. before analysis.
>
> This may have some merits for Solr (e.g. you could imagine having different
> analysis chains on the Solr slaves), but IMHO it's more of a hassle for
> Lucene, because it means that the analysis has to be repeated over and over
> again on all clients. If the analysis chain is costly (e.g. NLP) then it
> would make sense to have an option to log documents post-analysis, i.e. as
> correctly typed stored values (e.g. string -> numeric) AND the resulting
> TokenStream-s. This has also the advantage of moving us towards the "dumb
> IndexWriter" concept, i.e. separating analysis from the core inverted index
> functionality.
>
> So I'd argue for recording post-analysis docs in the tlog, either
> exclusively or as a default option.

I am not sure if this should be the default option but I would need to
see how this is implemented. if we can efficiently support such a
preanalyzed document I am all for it. But I think it should be
possible to write opaque documents too. Other implementations / users
of lucene should be able to write their app specific format too.

simon
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to