On 8/12/06, Mark Miller <[EMAIL PROTECTED]> wrote:
I've made a nice little archive application with lucene. I made it to
handle our largest need: 2.5 million docs or so on a single server. Now
the powers that be say: lets use it for a 30+ million document archive
on a single server! (each doc size maybe 10k max...as small as a 1 or
2k) Please tell me why we are in trouble...please tell me why we are
not. I have tested up to 2 million docs without much trouble but 30
million...the average search will include a sort on a field as
well...can I search 30+ million docs with a sort? Man am I worried about
that. Maybe the server will have 8 procs and 12 billion gigs of RAM.
Mabye. Even still, Tomcat seems to be able to launch with a max of 1.5
or 1.6 gig of Ram in Windows. What do you think? 30 million+ sounds like
too much of a load to me for a single server. Not that they care what I
think...I only wrote the thing (man I hate my job, offer me a new one :)
)...please...comments?
Cheers,
Miserable Mark
I don't really understand what you're so worried about. Either it'll
work well with the setup you have, or it won't. It's really the size
of it. ;)
Seriously, you have a number of relatively cheap possibilities at hand
to improve search performance: storing the index on a RAID 5 disk
array will let you read the indices very fast, using multicore CPUs,
adding memory and even if all that isn't good enough, you can always
use a small cluster (say, 4 nodes) of very, very inexpensive PCs
filled with a GB of RAM. You don't have to keep them inside the
regular UPS/backup/voult-protected area as the indices can always be
rebuilt (unlike e.g. data in transactional systems) and between 4 of
them they might cost like an entry-level server.
I'll let the experts speak now. :)
t.n.a.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]