Hi Otis: Thanks for your reply.
The problem is the setting of maxBufferedDocs (see the code snippet from my previous email). I wouldn't think it would affect things but it does. -John On 5/30/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
John, I can't spot anything wrong in the code. I assume you are certain there are no IOException, no issues with locks, and that your writer is really getting close()d. I haven't used IndexModifier, and since that's a layer on top of the real IndexWriter with the same API exposed, I would just use IndexWriter directly and see if that makes a difference. If it does, then maybe there is a bug in IndexModifier. Otis ----- Original Message ---- From: John Wang <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, May 30, 2006 5:45:39 AM Subject: merge factor and real time indexing Hi folks: I am working on an application that requires real time indexing, e.g. for every insert, I open the writer, add a document and then closes the writer. I want to control the number of files created, and according to the documentation, a small mergeFactor is desired. However, I am experiencing the opposite, see the following code segment: public static void main(String[] args) throws IOException{ int mfactor=10; int mbuffer=1000; IndexModifier writer=null; File dir=new File("/tmp/john/"); long start=System.currentTimeMillis(); for (int i=0;i<5000;++i){ try{ boolean create=!IndexReader.indexExists(dir); writer=new IndexModifier(dir,new StandardAnalyzer(),create); writer.setMergeFactor(mfactor); writer.setMaxBufferedDocs(mbuffer); Document doc=new Document(); doc.add(new Field("test","this is a test doc", Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.YES)); writer.addDocument(doc); } finally{ if (writer!=null){ writer.close(); } } } long end=System.currentTimeMillis(); System.out.println("took: "+(end-start)); } If I set the mfactor value to a high number, e.g. 1000, indexing takes much longer but the number of files decreases dramatically. Is this expected or are there any better ways of tuning the indexing parameters so that I limit the number of open files while gettting a decent indexing speed? Thanks -John --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]