Hi Grant: For our corpus, we don't rely on idf in scoring calculation that much, so I don't see that being a problem that much.
About performance, instantiating 1 indexWriter for a batch of say 1000 docs, e.g. iterate over 1000 docs and do addDocument; comparing with instantiating and closing 1000 indexWriters each doing 1 addDocument. Are you saying the expected performance is the same? I thought when you call addDocument, it adds to memory and flush when segment needs to be merged or writer closes. Maybe I am missing something. Thanks -john On Thu, Mar 13, 2008 at 11:37 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > On Mar 13, 2008, at 11:03 AM, John Wang wrote: > > > Yes, but usually it's a good idea to add documents in batch and not > > having > > to reinstantiate the writer for every document and then closing it. > > > > It would be nice if one can specify to the writer which analyzer to > > use. > > > > PerfieldAnalyzer wouldn't work because different analyzers may apply > > on the > > same field depending on the doc, e.g. > > > > Also, I don't know that it is wise to put different langs in the same > field. I can't prove it definitively, but it seems to me your corpus > statistics could be skewed by terms that are spelled the same but have > different meanings across languages. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >