Hi, I'm a relative Lucene newbe and would appreciate some expert advice.

I would like to make fulltest searchable, files distributed on various local hosts in the intranet. My startup plan is to index these files locally and then merge all the little indexes into a master indexes on a search host. Once complete I will delete all the little indexes and just keep the master index.

Then over time as new files are added to the various local hosts I'll index it (new file) locally and then (once again) send the little indexes to be merged into the master index on the search host. Once complete I will again delete the no longer needed little index.

I have been reading that merging indexes can be slow, my master index will be huge since the entire documents collection may be in the tens of millions.

Is this true? If so would it be better to send the (new) document to the search host and index it there rather then sending a little indexes to be merged into the master index? Either way is fine with me but which would be better for Lucene?

I've also read about Remote Parallel Multi Searcher, seems I might be able to keep the indexes on the local hosts and use Remote Parallel MultiSearcher to search, but the security of the files and their content is a big issue. I can't be opening any back doors (ports) to the files or their contents, port 8080 only w/HTTPS.

So I guess I am asking two questions. Is searching distributed indexes via Remote Parallel MultiSearcher easily doable, safe and recommended (Can RMI tunnel thru 8080?) or, given the security constraints, is maintaining a master index the better/only way.

Going with the master index approach, would it be better/faster to index a new document directly into the master index or index locally and then merge a tiny index into the master index?

Thanks to any and all that take the time to advice me.

jim s.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to