RE: Index Rows as Documents? Help me design a solution

Mike Streeton Wed, 26 Jul 2006 01:10:01 -0700

The only way you might get the performance you want is to have multiple
IndexWriters writing to different indexes and then addAll are the end.
You would obviously have to handle the multi threading and distribution
of the parts of the log to each writer.


Mike

www.ardentia.com the home of NetSearch

-----Original Message-----
From: Doron Cohen [mailto:[EMAIL PROTECTED] 
Sent: 25 July 2006 22:23
To: java-user@lucene.apache.org
Subject: Re: Index Rows as Documents? Help me design a solution

Few comments -

> (from first posting in this thread)
> The indexing was taking much more than minutes for a 1 MB log file.
...
> I would expect to be able to index at least a of GB of logs within 1
or 2
minutes.

1-2 minutes per GB would be 30-60 GB/Hour, which for a single
machine/jvm
is a lot - well at least I did not see Lucene index this fast.

> doc.add(new Field("msisdn", columns[0], Field.Store.YES,
Field.Index.TOKENIZED));
> doc.add(new Field("messageid", columns[2], Field.Store.YES,
Field.Index.TOKENIZED));

Is it really required to analyze the text for these fields - "msisdn" ,
"
messageid"?

> doc.add(new Field("line", line, Field.Store.YES, Field.Index.NO));

This is storing the original text of all input lines that are indexed -
quite an overhead.

- Doron


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Index Rows as Documents? Help me design a solution

Reply via email to