Re: Memory leak. Requests count goes only up and doesn

Rolando Espinoza Mon, 27 Jul 2015 10:19:11 -0700

ShapeR, try using the JOBDIR setting to store the requests queue on disk:

$ scrapy crawl myspider -s JOBDIR=myspider-job


The directory myspider-job will be created and there will be a directory
requests.queue and a file requests.seen.

Regards,
Rolando

On Mon, Jul 27, 2015 at 12:48 PM, fernando vasquez <[email protected]> wrote:

> You are not processing Requests as fast as you capture them. I had the
> same problem; however the cause could be different. In my case the Link
> Extractor was capturing duplicated Requests, so I decided filter the
> duplicated ones. The problem with scrapy is that the duplicated filter work
> after the Link Extractor saved the Resquest, so you get tons of Requests.
>
> In conclusion you might have duplicated requests, just filter before the
> for loop.
>
> El jueves, 23 de julio de 2015, 12:33:19 (UTC-5), ShapeR escribió:
>>
>> My spider have a serious memory leak.. After 15 min of run its memory 5gb
>> and scrapy tells (using prefs() ) that there 900k requests objects and
>> thats all. What can be the reason for this high number of living requests
>> objects? Request only goes up and doesnt goes down. All other objects are
>> close to zero.
>>
>> My spider looks like this:
>>
>> class ExternalLinkSpider(CrawlSpider):
>>   name = 'external_link_spider'
>>   allowed_domains = ['']
>>   start_urls = ['']
>>
>>   rules = (Rule(LxmlLinkExtractor(allow=()), callback='parse_obj', 
>> follow=True),)
>>
>>   def parse_obj(self, response):
>>     if not isinstance(response, HtmlResponse):
>>         return
>>     for link in LxmlLinkExtractor(allow=(), 
>> deny=self.allowed_domains).extract_links(response):
>>         if not link.nofollow:
>>             yield LinkCrawlItem(domain=link.url)
>>
>> Here output of prefs()
>>
>>
>> HtmlResponse                        2   oldest: 0s ago ExternalLinkSpider    
>>               1   oldest: 3285s agoLinkCrawlItem                       2   
>> oldest: 0s agoRequest                        1663405   oldest: 3284s ago
>>
>>
>> Any ideas or suggestions?
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Memory leak. Requests count goes only up and doesn

Reply via email to