Re: indexing api wrt Analyzer

John Wang Thu, 13 Mar 2008 13:13:15 -0700

Hi Grant:

    For our corpus, we don't rely on idf in scoring calculation that much,
so I don't see that being a problem that much.


    About performance, instantiating 1 indexWriter for a batch of say 1000
docs, e.g. iterate over 1000 docs and do addDocument; comparing with
instantiating and closing 1000 indexWriters each doing 1 addDocument. Are
you saying the expected performance is the same? I thought when you call
addDocument, it adds to memory and flush when segment needs to be merged or
writer closes.

    Maybe I am missing something.

Thanks

-john

On Thu, Mar 13, 2008 at 11:37 AM, Grant Ingersoll <[EMAIL PROTECTED]>
wrote:

>
> On Mar 13, 2008, at 11:03 AM, John Wang wrote:
>
> > Yes, but usually it's a good idea to add documents in batch and not
> > having
> > to reinstantiate the writer for every document and then closing it.
> >
> > It would be nice if one can specify to the writer which analyzer to
> > use.
> >
> > PerfieldAnalyzer wouldn't work because different analyzers may apply
> > on the
> > same field depending on the doc, e.g.
> >
>
> Also, I don't know that it is wise to put different langs in the same
> field.  I can't prove it definitively, but it seems to me your corpus
> statistics could be skewed by terms that are spelled the same but have
> different meanings across languages.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Re: indexing api wrt Analyzer

Reply via email to