ShapeR, try using the JOBDIR setting to store the requests queue on disk: $ scrapy crawl myspider -s JOBDIR=myspider-job
The directory myspider-job will be created and there will be a directory requests.queue and a file requests.seen. Regards, Rolando On Mon, Jul 27, 2015 at 12:48 PM, fernando vasquez <[email protected]> wrote: > You are not processing Requests as fast as you capture them. I had the > same problem; however the cause could be different. In my case the Link > Extractor was capturing duplicated Requests, so I decided filter the > duplicated ones. The problem with scrapy is that the duplicated filter work > after the Link Extractor saved the Resquest, so you get tons of Requests. > > In conclusion you might have duplicated requests, just filter before the > for loop. > > El jueves, 23 de julio de 2015, 12:33:19 (UTC-5), ShapeR escribió: >> >> My spider have a serious memory leak.. After 15 min of run its memory 5gb >> and scrapy tells (using prefs() ) that there 900k requests objects and >> thats all. What can be the reason for this high number of living requests >> objects? Request only goes up and doesnt goes down. All other objects are >> close to zero. >> >> My spider looks like this: >> >> class ExternalLinkSpider(CrawlSpider): >> name = 'external_link_spider' >> allowed_domains = [''] >> start_urls = [''] >> >> rules = (Rule(LxmlLinkExtractor(allow=()), callback='parse_obj', >> follow=True),) >> >> def parse_obj(self, response): >> if not isinstance(response, HtmlResponse): >> return >> for link in LxmlLinkExtractor(allow=(), >> deny=self.allowed_domains).extract_links(response): >> if not link.nofollow: >> yield LinkCrawlItem(domain=link.url) >> >> Here output of prefs() >> >> >> HtmlResponse 2 oldest: 0s ago ExternalLinkSpider >> 1 oldest: 3285s agoLinkCrawlItem 2 >> oldest: 0s agoRequest 1663405 oldest: 3284s ago >> >> >> Any ideas or suggestions? >> > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
