Thanks for the suggestion, Erick!

As for why we can't use a relational database, we get all the logs
from an external application. And due to the nature of the business,
we need to continue maintaining the logs. Moreover, the search
requests are very infrequent .. so it doesn't make sense to (almost)
replicate the complete data in database.

Back to the problem. Erick, here is a sample indexFile method (Is this
how I am supposed to index the file?):

   private static void indexFile(IndexWriter writer, File f) {
       try {
           System.out.println("Indexing " + f.getCanonicalPath());
           BufferedReader br = new BufferedReader(new FileReader(f));
           String line = null;
           String[] columns = null;
           while((line = br.readLine())!=null) {
               columns = line.split("#");
               if(columns.length == 4) { // Rows not having 4 columns
are not useful for us
                       Document doc = new Document();
                       doc.add(new Field("msisdn", columns[0],
Field.Store.YES, Field.Index.TOKENIZED));
                       doc.add(new Field("messageid", columns[2],
Field.Store.YES, Field.Index.TOKENIZED));
                       doc.add(new Field("line", line,
Field.Store.YES, Field.Index.NO));
                       writer.addDocument(doc);
               }
           }
       } catch (Exception e) {
           e.printStackTrace();
       }
   }

On 7/25/06, Erick Erickson <[EMAIL PROTECTED]> wrote:
Indexing 1M of logs shouldn't take minutes, so  you're probably right.

A problem I've seen is opening/indexing/closing your index writer too often.
You should do something like... (really bad pseudo code here)

IndexWriter IW = new IndexWriter(....);
for (lots and lots and lots of records) {
   IW.addDocument();
}

IW.optimize();
IW.close();


Others have had a problem where they open/write/close the index writer for
EACH document, which is painfully slow.

Also, you might play around with IndexWriter.setMergeFactor and
setMaxBufferedDocs. If you set them too high, you'll run out of memory, but
they can make a difference in now fast your index is built....


If none of this is relevant, can you post a bit of (perhaps pseudo) code?

Best
Erick



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to