[ 
https://issues.apache.org/jira/browse/DROIDS-105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935305#action_12935305
 ] 

Paul Rogalinski commented on DROIDS-105:
----------------------------------------

Attaching a new patch-set  which adds caching functionality to the HttpProtocol 
and the HttpClientContentLoader - there still the Advanced* subclasses which 
might need similar treatment. From my point of view, I would like to get rid of 
them by merging them with the current base classes. If we do not pay attention 
to this, we'll end up like the Win32 API with plenty of doSomethingEx and 
doSomethingEx2 methods :/

> missing caching for robots.txt
> ------------------------------
>
>                 Key: DROIDS-105
>                 URL: https://issues.apache.org/jira/browse/DROIDS-105
>             Project: Droids
>          Issue Type: Improvement
>          Components: core
>            Reporter: Paul Rogalinski
>         Attachments: Caching-Support-and-Robots_txt-fix.patch, 
> CachingContentLoader.java
>
>
> the current implementation of the HttpClient will not cache any requests to 
> the robots.txt file. While using the CrawlingWorker this will result in 2 
> requests to the robots.txt (HEAD + GET) per crawled URL. So when crawling 3 
> URLs the target server would get 6 requests for the robots.txt.
> unfortunately the contentLoader is made final in HttpProtocol, so there is no 
> possibility to replace it with a caching Protocol like that one you'll find 
> in the attachment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to