Hi, I'm putting together a cheap indexing server for an "explorative" lucene project and had a few questions about which route to go.
I am going with a Socket 939 platform - does it make sense to get the dual core Athlon 64 X2, or is it better to stick with a faster clocked "plain" Athlon 64? Also, would Lucene benefit from running in 64 bit mode, or does it prefer "compatibility" 32 bit? I figure most indexing apps will be heavily IO bound, so I am stressing that, while staying with commodity components, so: WD SATA disks (250GB, 16MB cache, SATAII 3Gb/s) starting out with 4 of these (plus system disks), on the onboard controller (RAID0) If need be I can add two disk cages, 5 disks each with two decent SATA RAID controllers (64/128MB cache, NCQ, that sort of thing); the nForce4 PCI-Express should stand up to this, I'm hoping. And of course I am limited to 4GB RAM. I have three main applications in mind: Indexing PubMed/Medline article abstracts, this would we an index of about 15 million records with a couple of identifier fields, a title and a 1-3 paragraph abstract. Mostly the searches will be keyword searches on the text fields. Potentially I could add full-length papers to this as well (a lot fewer records though). Second one is indexing a couple hundred thousand MS Office documents and PDF files (Google Appliance sort of thing). And finally a genetic database repository a la LuceGene, or SRS. This would have more complex records (ie many fields, but little data with each), which are mostly retrieved on unique identifiers (very little text searching). This would probably run to a few tens of millions of records, maybe around 100 million eventually. Given these applications, what else should I be thinking about, hardware-wise? Thanks, Dmitri The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
