On 8/12/06, Mark Miller <[EMAIL PROTECTED]> wrote:
The single server is important because I think it will take a lot of work to scale it to multiple servers. The index must allow for close to real-time updates and additions. It must also remain searchable at all times (other than than during the brief period of single updates and additions). If it is easy to scale this to multiple servers please tell me how.
It can take quite a bit of work to implement a multiple-server index system; we did it last year, building an operational wrapper around Lucene. Wish Solr had been around then. ;-) I've done both the Windows and the Linux route. Windows certainly comes from a scale-up mentality, though we made it work in a scale-out model. Our requirements were the same as yours: near real-time updates & additions, always-on searchability, etc. It takes work, but it can be done. We're serving searches across 6 different types of indexes, with the indexes spread across the server farm (no single server has the full composite index). Our search availability for this year is damn near 5 nines. If you haven't looked at Windows 64-bit, let me save you some time. You don't gain as much as you might expect; the point of diminishing returns appears to have certainly been met with Windows Server. We'll apply a similar strategy to Solr, in that we'll likely run Solr clusters for our composite index. The best way to explain "how" is to simply refer you to Solr, from an operational perspective. The only thing that Solr doesn't have that we do is rolling together results from multiple searchers, and that's simply an out-of-the-box configuration; it's not a major ordeal to change that to meet our needs. Hope this helps. -- j