There is an addDocument method that takes an Analyzer and overrides the one used at construction of the IndexWriter. See http://lucene.apache.org/java/2_3_1/api/core/org/apache/lucene/index/IndexWriter.html#addDocument(org.apache.lucene.document.Document,%20org.apache.lucene.analysis.Analyzer) .


On Mar 13, 2008, at 4:12 PM, John Wang wrote:

Hi Grant:

For our corpus, we don't rely on idf in scoring calculation that much,
so I don't see that being a problem that much.

About performance, instantiating 1 indexWriter for a batch of say 1000
docs, e.g. iterate over 1000 docs and do addDocument; comparing with
instantiating and closing 1000 indexWriters each doing 1 addDocument. Are you saying the expected performance is the same? I thought when you call addDocument, it adds to memory and flush when segment needs to be merged or
writer closes.

   Maybe I am missing something.

Thanks

-john

On Thu, Mar 13, 2008 at 11:37 AM, Grant Ingersoll <[EMAIL PROTECTED]>
wrote:


On Mar 13, 2008, at 11:03 AM, John Wang wrote:

Yes, but usually it's a good idea to add documents in batch and not
having
to reinstantiate the writer for every document and then closing it.

It would be nice if one can specify to the writer which analyzer to
use.

PerfieldAnalyzer wouldn't work because different analyzers may apply
on the
same field depending on the doc, e.g.


Also, I don't know that it is wise to put different langs in the same
field.  I can't prove it definitively, but it seems to me your corpus
statistics could be skewed by terms that are spelled the same but have
different meanings across languages.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--------------------------
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ






---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to