+1
indeed! All possibilities are are needed.

One might do wild things if it is somehow  typed. For example,
dictionary compression for fields that are tokenized (not only
stored), as we already have Term dictionary supporting ord-s. Keeping
just a map Token <-> ord with transaction log...




On Fri, Sep 9, 2011 at 11:19 AM, Andrzej Bialecki <a...@getopt.org> wrote:
> On 09/09/2011 11:00, Simon Willnauer wrote:
>>
>> I created LUCENE-3424 for this. But I still would like to keep the
>> discussion open here rather than moving this entirely to an issue.
>> There is more about this than only the seq. ids.
>
> I'm concerned also about the content of the transaction log. In Solr it uses
> javabin-encoded UpdateCommand-s (either SolrInputDocuments or Delete/Commit
> commands). Documents in the log are raw documents, i.e. before analysis.
>
> This may have some merits for Solr (e.g. you could imagine having different
> analysis chains on the Solr slaves), but IMHO it's more of a hassle for
> Lucene, because it means that the analysis has to be repeated over and over
> again on all clients. If the analysis chain is costly (e.g. NLP) then it
> would make sense to have an option to log documents post-analysis, i.e. as
> correctly typed stored values (e.g. string -> numeric) AND the resulting
> TokenStream-s. This has also the advantage of moving us towards the "dumb
> IndexWriter" concept, i.e. separating analysis from the core inverted index
> functionality.
>
> So I'd argue for recording post-analysis docs in the tlog, either
> exclusively or as a default option.
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to