I was not running 2 crawlers, however, there was some leftover stuff from a previous crawl in my CRAWL_ROOT directory. Clearing that resolved the issue.
On Wednesday, August 26, 2015 at 4:32:08 PM UTC-4, Ryan Compton wrote: > > I am running into the same thing on OSX and I am not running two crawlers > at the same time. Any ideas? > > On Tuesday, November 22, 2011 at 2:56:54 PM UTC-5, Pablo Hoffman wrote: >> >> Not really, the persistent scheduler feature is intended to support >> stopping and resuming a crawl, not to distribute the crawl among >> different nodes/processes (perhaps this should be clarified in the doc). >> >> In order to do that, you would have to do it yourself by storing the >> urls to visit, splitting the list into multiple chunks, and send each >> chunk to crawl in a separate process. >> >> Alternatively, there is a scrapy-redis extension [1] that (I think) it >> allows you do to what you want - worth checking I guess. >> >> [1] https://github.com/darkrho/scrapy-redis >> >> On 11/22/2011 05:22 PM, Алексей Масленников wrote: >> > Oh, and I can do? Can use a hack? >> > >> > On 22 ноя, 20:51, Pablo Hoffman<[email protected]> wrote: >> >> Are you running two instances of the spider *at the same time*?. >> Because >> >> that's not supported, >> >> >> >> On 11/22/2011 11:18 AM, Алексей Масленников wrote: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>> When I run two instances of the spider: >> >> >> >>> ---<exception caught here> --- >> >>> File >> "/usr/local/lib/python2.7/dist-packages/Twisted-11.0.0-py2.7- >> >>> linux-i686.egg/twisted/internet/base.py", line 793, in runUntilCurrent >> >>> call.func(*call.args, **call.kw) >> >>> File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.15.0- >> >>> py2.7.egg/scrapy/utils/reactor.py", line 41, in __call__ >> >>> return self._func(*self._a, **self._kw) >> >>> File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.15.0- >> >>> py2.7.egg/scrapy/core/engine.py", line 103, in _next_request >> >>> if not self._next_request_from_scheduler(spider): >> >>> File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.15.0- >> >>> py2.7.egg/scrapy/core/engine.py", line 125, in >> >>> _next_request_from_scheduler >> >>> request = slot.scheduler.next_request() >> >>> File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.15.0- >> >>> py2.7.egg/scrapy/core/scheduler.py", line 55, in next_request >> >>> return self.mqs.pop() or self._dqpop() >> >>> File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.15.0- >> >>> py2.7.egg/scrapy/core/scheduler.py", line 81, in _dqpop >> >>> d = self.dqs.pop() >> >>> File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.15.0- >> >>> py2.7.egg/scrapy/utils/pqueue.py", line 38, in pop >> >>> m = q.pop() >> >>> File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.15.0- >> >>> py2.7.egg/scrapy/squeue.py", line 18, in pop >> >>> s = super(SerializableQueue, self).pop() >> >>> File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.15.0- >> >>> py2.7.egg/scrapy/utils/queue.py", line 160, in pop >> >>> size, = struct.unpack(self.SIZE_FORMAT, self.f.read()) >> >>> struct.error: unpack requires a string argument of length 4 >> > >> >> -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
