Re: Splash spider never completes

Paul Tremberth Mon, 12 Dec 2016 01:20:00 -0800

Hi Sean Keane,

I believe you need to tell us a bit more on the type of crawl you are doing.
Is it a broad crawl with lots of domains?
Is it a CrawlSpider with rules that can pick up a lot of pages?


What about the download rate: do you see it stable or does the crawl slow 
down?

While running the crawl, if you're on Python 2, you could also check with 
the telnet console what going on
https://doc.scrapy.org/en/latest/topics/telnetconsole.html

Hope this helps,

/Paul

On Thursday, December 8, 2016 at 4:30:22 AM UTC+1, Sean Keane wrote:
>
> I have a spider that I created that use splash and it seems to never 
> complete, ie it runs for two days and thenI finally stop it. 
>
> I have the following settings for my spider:
>
> SPIDER_MIDDLEWARES = {
>     'scrapy_splash.SplashDeduplicateArgsMiddleware': 100
> }
>
> DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
>
>
> Can someone provide some advice on how I should debug the issue?
>
> Thanks
>
> Sean 
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Splash spider never completes

Reply via email to