The only way you might get the performance you want is to have multiple IndexWriters writing to different indexes and then addAll are the end. You would obviously have to handle the multi threading and distribution of the parts of the log to each writer.
Mike www.ardentia.com the home of NetSearch -----Original Message----- From: Doron Cohen [mailto:[EMAIL PROTECTED] Sent: 25 July 2006 22:23 To: java-user@lucene.apache.org Subject: Re: Index Rows as Documents? Help me design a solution Few comments - > (from first posting in this thread) > The indexing was taking much more than minutes for a 1 MB log file. ... > I would expect to be able to index at least a of GB of logs within 1 or 2 minutes. 1-2 minutes per GB would be 30-60 GB/Hour, which for a single machine/jvm is a lot - well at least I did not see Lucene index this fast. > doc.add(new Field("msisdn", columns[0], Field.Store.YES, Field.Index.TOKENIZED)); > doc.add(new Field("messageid", columns[2], Field.Store.YES, Field.Index.TOKENIZED)); Is it really required to analyze the text for these fields - "msisdn" , " messageid"? > doc.add(new Field("line", line, Field.Store.YES, Field.Index.NO)); This is storing the original text of all input lines that are indexed - quite an overhead. - Doron --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]