Hi All, We are planning to use lucene in our project, but not entirely sure about some of the design decisions were made. Below are the details, any comments/suggestions are more than welcome.
The requirements of the project are below: 1. We have tens of thousands of files, their size ranging from 500M to a few terabytes, and majority of the contents in these files will not be accessed frequently. 2. We are planning to keep less accessed contents outside of our database, store them on the file system. 3. We also have code to get the binary position of these contents in the files. Using these binary positions, we can quickly retrieve the contents and convert them into our domain objects. We think Lucene provides a scalable solution for storing and indexing these binary positions, so the idea is that each piece of the content in the files will a document, each document will have at least an ID field to identify to content and a binary position field contains the starting and stop position of the content. Having done some performance testing, it seems to us that Lucene is well capable of doing this. At the moment, we are planning to create one Lucene index per file, so if we have new files to be added to the system, we can simply generate a new index. The problem is do with searching, this approach means that we need to create an new IndexSearcher every time a file is accessed through our web service. We knew that it is rather expensive to open a new IndexSearcher, and are thinking of using some kind of pooling mechanism. Our questions are: 1. Is this one index per file approach a viable solution? What do you think about pooling IndexSearcher? 2. If we have many IndexSearchers opened at the same time, would the memory usage go through the roof? I couldn't find any document on how Lucene use allocate memory. Thank you very much for your help. Many thanks, Rui Wang --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org