Thanks for all of the useful info on this topic. You have been very enlightening. My RAM requirements where obviously off the mark. Here is my current understanding of this issue:

A standard 32-bit processor has access to 4GB of RAM. If your CPU supports Physical Address Extension (PAE) the OS can access up to 64GB of RAM, although a single application is still limited to 4GB. In Windows, Address Windowing Extensions (AWE) solves this limitation but you must use OS specific calls to manage your memory. Unix/Linux has it's own memory mapping scheme that accomplishes something similar to AWE.

I have no problem requiring that the server i use be maxed to the gills. I just want to support Windows as well as Unix/Linux/Sun. I was shortsighted in my RAM comments.

As far as Solr: it looks like this will not necessarily solve my problem of handling an index so big on a single server--the entire index is replicated across each slave searcher--but if load is a factor (according to Hoss, load will probably be the big issue, not the size of the index) than this distribution is exactly what I need. I think that the average load I can expect will be pretty low though. Updates to the large index will also be relatively rare. 30-200 items added every morning with the occasional update scattered throughout the day. There will only be a few hundred searchers in total and they will not be hammering the system but consulting it here and there throughout the day.

When I said that Solr required hardlinks I was referring to the replication feature. My fault--replication was the benefit I was seeing from solar and I was unclear. I bet solr's caching helps a lot too and I will be looking into that. What I did not know was that Windows supports hard links when using ntfs. Fsutil.exe creates them. This gives me hope that cygwin might support them. I do not know if I can use the same cp -l -r trick, but it at least it looks hopeful. I did not realize that solr's replication feature was so external and easily changeable.

as far as rmi: splitting the index across several boxes seems relatively easy (the index can be naturally partitioned without much trouble) but I would still have to deal with sorting on a single box. I am not so worried about that anymore though. Warming a new searcher and tons of RAM should handle my sorts in an acceptable way.

It seems to me that one way or another I can make this happen.

I am wondering whether I should try and use a RAM directory. While I am sure it might be faster, what happens when the power goes out? System reboots? I would have to occasionally flush to disk anyway. Would it be better to use a normal directory and turn the maxbuffered docs way up?


Thanks,

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to