It works perfectly Paul, thank you very much! On Tuesday, May 3, 2016 at 1:21:43 PM UTC+2, Paul Tremberth wrote: > > Hi Antoine, > > you can override the start_requests method of your spider. > the default is this > <https://github.com/scrapy/scrapy/blob/ebef6d7c6dd8922210db8a4a44f48fe27ee0cd16/scrapy/spiders/__init__.py#L68>(explicitly > > disabling filtering): > > def start_requests(self): > for url in self.start_urls: > yield self.make_requests_from_url(url) > > def make_requests_from_url(self, url): > return Request(url, dont_filter=True) > > You can change it to (default for Request is dont_filter=False > <https://github.com/scrapy/scrapy/blob/d42a98d3b590515bae30fb698e7aba2d7511608e/scrapy/http/request/__init__.py#L21> > ): > > def start_requests(self): > for url in self.start_urls: > yield Request(url) > > > > Regards, > Paul. > > On Monday, May 2, 2016 at 10:04:34 PM UTC+2, Antoine Brunel wrote: >> >> Hello, >> >> I found out that Scrapy's duplicate url filter RFPDupeFilter is disabled >> for urls set in start_urls. >> How can I enable it? >> >> Thanks! >> >
-- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
