Re: Retire the original Fetcher before the release?

Andrzej Bialecki Mon, 17 Mar 2008 08:17:51 -0700

Dennis Kubes wrote:

Andrzej Bialecki wrote:
Dennis Kubes wrote:
We continue to run on Fetcher1.
Since you're running large crawls, could you run one of them withFetcher2 and comment on the results? Note that Fetcher2 needs a lotfewer threads than Fetcher - usually running a large crawl with < 100threads is more than sufficient.
Excellent about time to run another large fetch so will try it.


Also, note that the default settings prefer the old Fetcher, specifically:

* fetcher.threads.fetch - the old Fetcher would slowly run out of freethreads at the end of the job, so you needed more threads to compensatefor that. Fetcher2 doesn't have this problem, so reduce this numberaccordingly.


* turn off parsing in Fetcher - this is best done in a separate job anyway.

* set generate.max.per.host.by.ip and fetcher.threads.per.host.by.ip tothe same value - they are different by default. IMHO this value shouldbe false.



--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Retire the original Fetcher before the release?

Reply via email to