Re: How to enable the Scrapy's duplicate urls filter for start_urls?

Antoine Brunel Wed, 04 May 2016 07:15:49 -0700

It works perfectly Paul, thank you very much! 

On Tuesday, May 3, 2016 at 1:21:43 PM UTC+2, Paul Tremberth wrote:
>
> Hi Antoine,
>
> you can override the start_requests method of your spider.
> the default is this 
> <https://github.com/scrapy/scrapy/blob/ebef6d7c6dd8922210db8a4a44f48fe27ee0cd16/scrapy/spiders/__init__.py#L68>(explicitly
>  
> disabling filtering):
>
>     def start_requests(self):
>         for url in self.start_urls:
>             yield self.make_requests_from_url(url)
>
>     def make_requests_from_url(self, url):
>         return Request(url, dont_filter=True)
>
> You can change it to (default for Request is dont_filter=False 
> <https://github.com/scrapy/scrapy/blob/d42a98d3b590515bae30fb698e7aba2d7511608e/scrapy/http/request/__init__.py#L21>
> ):
>
>     def start_requests(self):
>         for url in self.start_urls:
>             yield Request(url)
>
>
>
> Regards,
> Paul.
>
> On Monday, May 2, 2016 at 10:04:34 PM UTC+2, Antoine Brunel wrote:
>>
>> Hello,
>>
>> I found out that Scrapy's duplicate url filter RFPDupeFilter is disabled 
>> for urls set in start_urls. 
>> How can I enable it?
>>
>> Thanks!
>>
>


-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: How to enable the Scrapy's duplicate urls filter for start_urls?

Reply via email to