[ 
http://issues.apache.org/jira/browse/NUTCH-272?page=comments#action_12412620 ] 

Stefan Neufeind commented on NUTCH-272:
---------------------------------------

Oh, I just discovered this new parameter was added in 0.8-dev :-)

But to my understanding of the description in nutch-default.xml this only 
applies to "per fetchlist". And that would mean "for one run", right? So in 
case I set this to 100 and fetch 10 rounds I'd have max. 1000 documents? But 
what if there is one document on the first level (theoretically) with 200 links 
in it? In this case I suspect that they are all written to the webdb as "to-do" 
in the first run, in the next the first 100 are fetched with rest skipped and 
upon another round the next 100 are fetched? Is that right?

My idea was also to have this as a "per host" or "per site"-setting - or to be 
able to override the value for a certain host ...

> Max. pages to crawl/fetch per site (emergency limit)
> ----------------------------------------------------
>
>          Key: NUTCH-272
>          URL: http://issues.apache.org/jira/browse/NUTCH-272
>      Project: Nutch
>         Type: Improvement

>     Reporter: Stefan Neufeind

>
> If I'm right, there is no way in place right now for setting an "emergency 
> limit" to fetch a certain max. number of pages per site. Is there an "easy" 
> way to implement such a limit, maybe as a plugin?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to