hi, below is some hints from my experience:
1. if you use one index per file, and many indexsearcher open at the same time, 
you may meet 'too many open files' error. you have to increase file_max value 
of os. 
2. if  these index files have less concurrent access, i think it's reasonable 
that open new searcher for every access. meanwhile, if you use lucene sort 
feature, field cache may consume many memory. thus  too many opened 
indexsearcher at the same time could exhaust all memory of your machine.


--
gang liu
email: liuga...@gmail.com



At 2011-12-06 01:58:29,"Rui Wang" <rw...@ebi.ac.uk> wrote:
>Hi All, 
>
>We are planning to use lucene in our project, but not entirely sure about some 
>of the design decisions were made. Below are the details, any 
>comments/suggestions are more than welcome. 
>
>The requirements of the project are below:
>
>1. We have  tens of thousands of files, their size ranging from 500M to a few 
>terabytes, and majority of the contents in these files will not be accessed 
>frequently. 
>
>2. We are planning to keep less accessed contents outside of our database, 
>store them on the file system.
>
>3. We also have code to get the binary position of these contents in the 
>files. Using these binary positions, we can quickly retrieve the contents and 
>convert them into our domain objects. 
>
>We think Lucene provides a scalable solution for storing and indexing these 
>binary positions, so the idea is that each piece of the content in the files 
>will a document, each document will have at least an ID field to identify to 
>content and a binary position field contains the starting and stop position of 
>the content. Having done some performance testing, it seems to us that Lucene 
>is well capable of doing this. 
>
>At the moment, we are planning to create one Lucene index per file, so if we 
>have new files to be added to the system, we can simply generate a new index. 
>The problem is do with searching, this approach means that we need to create 
>an new IndexSearcher every time a file is accessed through our web service. We 
>knew that it is rather expensive to open a new IndexSearcher, and are thinking 
>of using some kind of pooling mechanism. Our questions are:
>
>1. Is this one index per file approach a viable solution? What do you think 
>about pooling IndexSearcher?
>
>2. If we have many IndexSearchers opened at the same time, would the memory 
>usage go through the roof? I couldn't find any document on how Lucene use 
>allocate memory. 
>
>Thank you very much for your help. 
>
>Many thanks,
>Rui Wang
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>For additional commands, e-mail: java-user-h...@lucene.apache.org
>

Reply via email to