So in the most simple of contexts your sort of agreeing with me. Running multiple nutch processes on a multi-core processor is more efficient then running one single process on heavily scaled hardware. Am i correct with this statement?
----- Original Message ---- From: Otis Gospodnetic <[EMAIL PROTECTED]> To: nutch-user@lucene.apache.org Sent: Friday, June 13, 2008 12:16:38 AM Subject: Re: Hardware Specifications I'm not sure -- I try to avoid running single Nutch job at a time, as I find overlapping is more efficient. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Sean Dean <[EMAIL PROTECTED]> > To: nutch-user@lucene.apache.org > Sent: Thursday, June 12, 2008 12:37:19 PM > Subject: Re: Hardware Specifications > > I see. > > What happens with the utilization when only one job is running, does it stay > about equal at a lower overall percentage or does it move predominately to > one > core? > > > > ----- Original Message ---- > From: "[EMAIL PROTECTED]" > To: nutch-user@lucene.apache.org > Sent: Thursday, June 12, 2008 12:17:10 AM > Subject: Re: Hardware Specifications > > Hm, hm. > > I can't speak for Nutch's search (don't have it running at the moment), but I > am > looking at a cluster that is running a fetch job and a generate job > concurrently > and I see both cores on the dual-core server being utilized about equally. > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > ----- Original Message ---- > > From: Sean Dean > > To: nutch-user@lucene.apache.org > > Sent: Saturday, June 7, 2008 3:52:33 AM > > Subject: Re: Hardware Specifications > > > > Hey Otis, > > > > I will first disclose that the OS im using for my Nutch implementation is > > FreeBSD 7 (amd64) and my differ from a standard 64-bit Linux distribution. > > The > > > JDK however is your standard SUN 1.5.0-14 64-bit package. > > > > I find that the JVM does not treat Nutch as something that's truly > > multithreaded. Which ever task you ask it to do, be it serve results, > > fetch, > > inject, update, etc. it will always peg one core and not use anything else > > (sometimes it will share processing on another core but this is just the > garbage > > collection thread inside the JVM). > > > > Having smaller indexes (15-20M) on multiple nutch instances (with 4GB or so > > of > > > RAM) doesn't fix this limitation, but it does cheat in that each instance > > runs > > > as its own independent JVM and as such the OS will execute operations on > > the > > core which has the lowest utilization via the scheduler (in my case > > FreeBSD's > > ULE) for each instance. > > > > When you think about it this type of setup scales very well horizontally, > > much > > > like Nutch/Hadoop itself. I find creating one huge index on the same > > machine > and > > giving it everything it has in terms of resources has diminishing returns, > > and > > > as my example points out never uses it all anyway. > > > > One negative about this setup though is detailed in NUTCH-92. This issue > > alone > > > kills any attempt to scale your search engine for "main stream" commercial > > success (e.g. Google). > > > > > > > > ----- Original Message ---- > > From: "[EMAIL PROTECTED]" > > To: nutch-user@lucene.apache.org > > Sent: Friday, June 6, 2008 12:20:41 PM > > Subject: Re: Hardware Specifications > > > > Dan, you left out one important "bit" - this is a 64-bit machine? > > > > Sean, out of curiosity... is this really better than running a single JVM > > on a > > > multi-core 64-bit machine with 32GB of RAM than running a single JVM > > instance, > > > single Nutch instance, and letting the OS switch between cores? > > > > > > As for fetching/indexing/searching - you probably don't want to do this on > > the > > > same set of machines. Use a set of machines for fetching/indexing, and a > > set > of > > machines for serving search requests. > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > ----- Original Message ---- > > > From: Sean Dean > > > To: nutch-user@lucene.apache.org > > > Sent: Thursday, June 5, 2008 3:45:41 PM > > > Subject: Re: Hardware Specifications > > > > > > Another idea is to setup 8 seperate nutch instances on the same server, > > > each > > > > with its own 20M index. > > > > > > The idea behind this is that one-core per application will be used, > > > although > > > its > > > not pegged and the RAM is used in ~4GB chunks (JVM setting) for each > instance. > > > > > > This would be used for serving results only though, you would have to > disable > > > part or all of this when in fetching mode but it would give you 160M > > > pages > and > > > > > still very good speeds (about 4-5 per second or more as other factors > > > come > > into > > > play). Keep in mind we use 8 hard drives, each associated with its own > > instance > > > on the server but as long as the RAID FC setup you have is very fast the > > results > > > should be comparible (maybe even faster). > > > > > > > > > ----- Original Message ---- > > > From: Dennis Kubes > > > To: nutch-user@lucene.apache.org > > > Sent: Thursday, June 5, 2008 2:38:04 PM > > > Subject: Re: Hardware Specifications > > > > > > In memory index 15M. On disk index, slower but still doable where > > > response time isn't critical, ~350M pages maybe more. > > > > > > Dennis > > > > > > Dan Segel wrote: > > > > We have a server that has 30TB of hard drive space connected through > fiber, > > > > 2 quad core 2.5ghz, and 32gb of ram. If fetching 5 searches per second > how > > > > many million indexed pages do you think we can achieve? > > > >