To add from my experiences: I've preferred Resin (stability & performance)
I always go for more ram than more servers. It's cheaper in the long run when it comes to man hours and service as well as MTBF for your hardware. Use Squid to proxy/load balance your java servers. This helped alleviate much of my traffic. Smart squid policies & configurations can offload queries from even hitting your servers if they're THAT common. (which seems to happen more often then not). I'm leaning much more to using distributedWebDB and multiple ndfs servers for storage. i'm done with pilling terrabytes on a single server. Once you have millions of pages in your db and your trying to fetch -> import -> analyze -> fetch cycle to keep up you will see what i mean :) Try converting search.jsp to xml, process that through xlst or into another process so your search processes can complete quickly without any extra page rendering you may have going on. (allows you to incorporate other results, insert sponsored feeds and do all sorts of nifty stuff as well) -----Original Message----- From: "Chirag Chaman" <[EMAIL PROTECTED]> To: <[email protected]> Date: Fri, 15 Apr 2005 10:52:37 -0400 Subject: RE: [Nutch-general] RE: Nutch - new public server > > > >1. "Souped-up" DB server - Dual CPU, 4 GB Ram (min) RAID 5 or 10, 1-2 > >NICS > > > > > This is the 'fetcher' server? > > This is you fetch/crawler/indexer -- create the final segments here, > then > move them to the search server. That way if a search server goes down, > simply move the segment to another server. > > >2. Basic Search Servers - Single/Dual CPU, Maximum RAM, Single > IDE/SATA > >drive (or 2 for redundancy) > > > > > These are the 'fetched' segment backup and search servers? > If I have 10 Million pages / server, this is good thing: 2kbyte * 10 = > 20 > GByte RAM? Or there is enought 10 GByte, and later put more if it need? > > Actually, you'll want 20GB ram if you're trying to displace MSN as the > fastest search engine. Believe it or not, Lucene is EXTREMELY fast even > when > reading from disk (whose the genius who wrote that software?). I would > keep > about 4-8MM/pages per server and give about 1GB per million. Let the > Linux > file caching system do it's magic. After the first 20-30 searches, > things > should be pretty fast. Take a look at filangy.com - search is pretty > fast > and we're hitting the disk. The only drawback is that from disk we see > things starting to slow down if more that 5-6 searches happen > simultaneously. That's 5-6 per second -- and we usually improve by > adding > another server. Given that 1GB stick are much cheper than 2GB sticks, > oyu'll > find adding another cheap server is cheaper that adding more RAM. And > the2GB > sticks are suported is more high-end server -- so cheap hardware cannot > be > user anymore. > > > >3. Basic Web Servers - Single/Dual CPU, Medium RAM > > > > > In this boxs I will put 1-2 GByte RAM. > I would like put frontend Apache2 and mod_jk2, this is bottleneck, or > in > this way I will tunning somethings: static images, web pages etc. > caching? Or better way Tomcats directly to the WEB? > > > Go with tomcat straight for now -- you don't want the search pages to > take > the Apache/mod_jk2 hit everytime. Later you can split up the static > pages in > a separate site that can be on apache. For loading images, make a > separate > url image.domain.com and load those from there. >
