Hi All, 

We are planning to use lucene in our project, but not entirely sure about some 
of the design decisions were made. Below are the details, any 
comments/suggestions are more than welcome. 

The requirements of the project are below:

1. We have  tens of thousands of files, their size ranging from 500M to a few 
terabytes, and majority of the contents in these files will not be accessed 
frequently. 

2. We are planning to keep less accessed contents outside of our database, 
store them on the file system.

3. We also have code to get the binary position of these contents in the files. 
Using these binary positions, we can quickly retrieve the contents and convert 
them into our domain objects. 

We think Lucene provides a scalable solution for storing and indexing these 
binary positions, so the idea is that each piece of the content in the files 
will a document, each document will have at least an ID field to identify to 
content and a binary position field contains the starting and stop position of 
the content. Having done some performance testing, it seems to us that Lucene 
is well capable of doing this. 

At the moment, we are planning to create one Lucene index per file, so if we 
have new files to be added to the system, we can simply generate a new index. 
The problem is do with searching, this approach means that we need to create an 
new IndexSearcher every time a file is accessed through our web service. We 
knew that it is rather expensive to open a new IndexSearcher, and are thinking 
of using some kind of pooling mechanism. Our questions are:

1. Is this one index per file approach a viable solution? What do you think 
about pooling IndexSearcher?

2. If we have many IndexSearchers opened at the same time, would the memory 
usage go through the roof? I couldn't find any document on how Lucene use 
allocate memory. 

Thank you very much for your help. 

Many thanks,
Rui Wang
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to