Uroš Gruber wrote:
>> How fast do you need to go?
>>
>> I did a 1 million page crawl today with trunk version of nutch patched 
>> with NUTCH-395 [1]. total time for fetching was little over 7 hrs.
>>
> How is that even possible.
> 
> I have 3.2GHz pentium with 2G ram. I was same speed problem, because of 
> that I setup nutch with single node. About hour ago fetcher was finished 
> crawling 1.2 million pages. But this took

I am running on amd athlon 64 3600+ with 1 G of memory so it's not even 
"high end"
> while map job I have about 24 pages/s. I din't test it with this patch. 
> But then reduce job was slow as hell. I realy don't understant what took 
> so long. It is almost twice as slow as map job.

Please try the trunk version for comparison and check back for results. 
(the patch is now applied to trunk)

There are also other things that count (even more?), please see [1]

> If I use local mode numbers are even worse.

my numbers are with local job runner.

> I can't imagine how much it took to crawl let say 10mio pages.
> 
I'll let you know when mine is finished, just started 3rd segment of 
size 1 million to test the trunk version (running with local job runner)

--
  Sami Siren


[1]http://www.mail-archive.com/[email protected]/msg06533.html

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to