Mark, I'm also planning a distributed index system. After reading some code, I think it's more efficient to get rid of Hits and work directly with TopDocs returned by ParallelMultiSearcher.search(), I dun need the cache anyway as I dun need stateful netvigation.
Another question is - does each Hits.doc(i) lead to an obj serialization/traffic/deserialization? Do we need a ValueListHolder to optimize that? I also wonder why many search() methods don't throw RemoteExceptions, any idea? Thanks Tea > Don, I think I finally understand your problem -- and mine -- with > MultiSearcher. I had tested an implementation of my system using > ParallelMultiSearcher to split a huge index over many computers. > I was very impressed by the results on my test data, but alarmed > after a trial with live data :) > > Consider MultiSearcher.search(Query Q). Suppose that Q aggregated > over ALL the Searchables in the MultiSearcher would return 1000 > documents. But, the Hits object created by search() will only cache > the first 100 documents. When Hits.doc(101) is called, Hits will > cache 200 documents -- then 400, 800, 1600 and so on. How does Hits > get these extra documents? By calling the MultiSearcher again. > > Now consider a MultiSearcher as described above with 2 Searchables. > With respect to Q, Searchable S has 1000 documents, Searchable T > has zero. So to fetch the 101st document, not only is S searched, > but T is too, even though the result of Q applied to T is still zero > and will always be zero. The same thing will happen when fetching > the 201st, 401st and 801st document. > > This accounts for my slow performance, and I think yours too. That > your observed degradation is a power of 2 is a clue. > > My performance is especially vulnerable because "slave" Searchables > in the MultiSearcher are Remote -- accessed via RMI. > > I guess I have to code smarter around MultiSearcher. One problem > you highlight is that Hits is final -- so it is not possible even to > modify the "100/200/400" cache size logic. > > Any ideas from anyone would be much appreciated. > > Mark Florence > CTO, AIRS > 800-897-7714 x 1703 > [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]