Few comments - > (from first posting in this thread) > The indexing was taking much more than minutes for a 1 MB log file. ... > I would expect to be able to index at least a of GB of logs within 1 or 2 minutes.
1-2 minutes per GB would be 30-60 GB/Hour, which for a single machine/jvm is a lot - well at least I did not see Lucene index this fast. > doc.add(new Field("msisdn", columns[0], Field.Store.YES, Field.Index.TOKENIZED)); > doc.add(new Field("messageid", columns[2], Field.Store.YES, Field.Index.TOKENIZED)); Is it really required to analyze the text for these fields - "msisdn" , " messageid"? > doc.add(new Field("line", line, Field.Store.YES, Field.Index.NO)); This is storing the original text of all input lines that are indexed - quite an overhead. - Doron --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]