Hi. I'm the lead developer of SubEtha, a new java open source mailing list manager (http://subetha.tigris.org/). I'm working on archive searching at the moment. I've used Lucene with great success in a previous application, but some of the characteristics of this app have me seeking architectural advice:

* While most installations will have only a handful of lists, a sourceforge-sized installation might have thousands or even tens of thousands of (likely sparse) lists. * Searching is always constrained to a specific list; you never search through the archives of more than one list at a time.

I have a thread that wakes up and periodically indexes all newly arrived mail. Which would be the best approach?

1) Build a wholly separate index per mailing list. For each search request, create a new IndexSearcher on the appropriate index and run the query. 2) Build a wholly separate index per mailing list. Cache IndexSearchers that are created when search requests come in for each mailing list. Close and remove IndexSearchers from the cache when a list's index gets updated. 3) Build a single index that holds all messages, storing the associated list id as a field. Use a Filter to limit each search to a specific list. Use a single cached IndexSearcher that is closed and removed when the update process runs.

I'm guessing that #2 is the right answer, but I'm a little worried about what might happen in a server that indexes 10,000 lists. In a long-running process, this could result in 10,000 cached IndexSearchers. Too many open file handles? Does IndexSearcher consume much memory? It's fair to say that anyone that wishes to have this kind of capacity will have to do some tuning of the OS parameters, but I would like to understand the bounds of the problem a bit better.

Any advice?

Thanks,
Jeff Schnitzer
SubEtha Mailing List Manager - http://subetha.tigris.org/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to