Memory usage

2004-12-05 Thread Sreedhar, Dantam
Hi,

I am using lucene -1.3 final version. 

Let us say there are 10,000 files with size of 20 MB. So, total file
system size = 10,000 * 20 MB = 200 GB. I want to index these files.

Let us say, the merge factor = 10

Min heap size required by JVM = 10 * 20 = 200 MB

From the http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html
article,


---

For instance, if we set mergeFactor to 10, a new segment will be created
on the disk for every 10 documents added to the index. When the 10th
segment of size 10 is added, all 10 will be merged into a single segment
of size 100. When 10 such segments of size 100 have been added, they
will be merged into a single segment containing 1000 documents, and so
on. Therefore, at any time, there will be no more than 9 segments in
each power of 10 index size. 


-

If the lucene is indexing the 1000th document, then the current time
segment size would be 100. At that time, how many documents would the
lucene hold in memory (10 documents or 100 documents)? If the lucene
holds 100 documents, then min heap memory required will be 100 * 20 = 2
GB, which is unlikely.

Is the optimize process memory intensive? How much memory lucene would
take while doing the optimize? 

Is it safe to assume that the maximum heap size required by lucene is
mergefactor * maximum_file_size?

I am planning to use the default maxMergeDocs as default, as stated in
the article.

Any help on the above questions is highly appreciated.

Thanks,
-Sreedhar







-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Efficient search on lucene mailing archives

2004-11-04 Thread Sreedhar, Dantam
When I want to search for any thing I use the following URL. 

http://marc.theaimsgroup.com/

-Sreedhar

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Friday, October 15, 2004 2:18 AM
To: Lucene Users List
Subject: Re: Efficient search on lucene mailing archives



On Oct 14, 2004, at 4:27 PM, David Spencer wrote:

 sam s wrote:

 Hi Folks,
 Is there any place where I can do a better search on lucene mailing 
 archives?
 I tried JGuru and looks like their search is paid.
 Apache maintained archives lags efficient searching.

 Of course one of the ironies is, shouldn't we be able to use Lucene to

 search the mailing list archives and even apache.org?

Eyebrowse uses Lucene and is set up for the Apache e-mail lists:

http://nagoya.apache.org/eyebrowse/SummarizeList?listId=30

It seems clunky to navigate though and would be nice to have more 
recent e-mails ranked higher than older mails.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Multisearcher question

2004-10-12 Thread Sreedhar, Dantam
Hi,

Index side information:

No. of indexes: Two (to explain better I call these as index_a and
index_b).

Fields in index_a: x and y.
Fields in index_b: y and z.

I have written a multisearch code like this.

Searcher search_a = new IndexSearcher(LOCATION_OF_INDEX_A);
Searcher search_b = new IndexSearcher(LOCATION_OF_INDEX_B);
Searcher[] searcher = new Searcher[2];
searcher[0] = search_a;
searcher[1] = search_b;
MultiSearcher searcher = new MultiSearcher(searcher);

I am getting the following results,

x:query  - WORKS
x:query AND y:query - WORKS
x:query AND z:query - DOESN'T WORK

Is this expected behavior?

My question is, Can MultiSearcher be used to search on indexes with
different fields? If yes, could you please correct the above code.

Thanks,
-Sreedhar


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Multisearcher question

2004-10-12 Thread Sreedhar, Dantam
Thanks Otis for you reply.

If I want to solve the problem that I have defined in my previous mail,
what is the suggested approach? 

Thanks,
-Sreedhar

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 12, 2004 6:35 PM
To: Lucene Users List
Subject: Re: Multisearcher question


Hello Sreedhar,

This is the expected behaviour.  The query is run against each index,
and it won't have any matches in either index, because neither index
has both fields.

Otis

--- Sreedhar, Dantam [EMAIL PROTECTED] wrote:

 Hi,
 
 Index side information:
 
 No. of indexes: Two (to explain better I call these as index_a and
 index_b).
 
 Fields in index_a: x and y.
 Fields in index_b: y and z.
 
 I have written a multisearch code like this.
 
 Searcher search_a = new IndexSearcher(LOCATION_OF_INDEX_A);
 Searcher search_b = new IndexSearcher(LOCATION_OF_INDEX_B);
 Searcher[] searcher = new Searcher[2];
 searcher[0] = search_a;
 searcher[1] = search_b;
 MultiSearcher searcher = new MultiSearcher(searcher);
 
 I am getting the following results,
 
 x:query  - WORKS
 x:query AND y:query - WORKS
 x:query AND z:query - DOESN'T WORK
 
 Is this expected behavior?
 
 My question is, Can MultiSearcher be used to search on indexes with
 different fields? If yes, could you please correct the above code.
 
 Thanks,
 -Sreedhar
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]