Sami Siren wrote:
> Andrzej Bialecki (JIRA) wrote:
>
>>    [ 
>> http://issues.apache.org/jira/browse/NUTCH-293?page=comments#action_12422244 
>> ]            Andrzej Bialecki  commented on NUTCH-293:
>> -----------------------------------------
>>
>> I'm working on this patch to commit it. Just a quick note to Sami: 
>> Math.max() is not optimal, because it always picks up the longest 
>> wait period. We are interested in getting a right period - it may be 
>> longer, but it may also be shorter than the serverDelay. If it's 
>> shorter then we win, because we are allowed to crawl this site faster.
>>
>>  
>>
> I quess it depends on the angle you look at it :)
> "don't be polite, just as polite as it's required"
>
> I'm ok with the original logic.

Hmm. Let me try another explanation.

When you crawl, you _are_ interested in getting all pages as quickly as 
possible, right? Then, you want to observe the minimum level of 
"politeness" per site, as specified by webmasters and netiquette, and 
not the maximum level of politeness.

If a site allows you to crawl it with 5 sec delay, then you won't be 
impolite if you do that, even though you apply 20 sec. delay for all 
other sites - and you will reach your goal much quicker.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to