Excellent! Exactly what I was looking for! Thanks Grant!
-John On Thu, Mar 13, 2008 at 5:39 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > There is an addDocument method that takes an Analyzer and overrides > the one used at construction of the IndexWriter. See > > http://lucene.apache.org/java/2_3_1/api/core/org/apache/lucene/index/IndexWriter.html#addDocument(org.apache.lucene.document.Document,%20org.apache.lucene.analysis.Analyzer)<http://lucene.apache.org/java/2_3_1/api/core/org/apache/lucene/index/IndexWriter.html#addDocument%28org.apache.lucene.document.Document,%20org.apache.lucene.analysis.Analyzer%29> > . > > > > On Mar 13, 2008, at 4:12 PM, John Wang wrote: > > > Hi Grant: > > > > For our corpus, we don't rely on idf in scoring calculation that > > much, > > so I don't see that being a problem that much. > > > > About performance, instantiating 1 indexWriter for a batch of say > > 1000 > > docs, e.g. iterate over 1000 docs and do addDocument; comparing with > > instantiating and closing 1000 indexWriters each doing 1 > > addDocument. Are > > you saying the expected performance is the same? I thought when you > > call > > addDocument, it adds to memory and flush when segment needs to be > > merged or > > writer closes. > > > > Maybe I am missing something. > > > > Thanks > > > > -john > > > > On Thu, Mar 13, 2008 at 11:37 AM, Grant Ingersoll > > <[EMAIL PROTECTED]> > > wrote: > > > >> > >> On Mar 13, 2008, at 11:03 AM, John Wang wrote: > >> > >>> Yes, but usually it's a good idea to add documents in batch and not > >>> having > >>> to reinstantiate the writer for every document and then closing it. > >>> > >>> It would be nice if one can specify to the writer which analyzer to > >>> use. > >>> > >>> PerfieldAnalyzer wouldn't work because different analyzers may apply > >>> on the > >>> same field depending on the doc, e.g. > >>> > >> > >> Also, I don't know that it is wise to put different langs in the same > >> field. I can't prove it definitively, but it seems to me your corpus > >> statistics could be skewed by terms that are spelled the same but > >> have > >> different meanings across languages. > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > >> For additional commands, e-mail: [EMAIL PROTECTED] > >> > >> > > -------------------------- > Grant Ingersoll > http://www.lucenebootcamp.com > Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >