I am running into the same thing on OSX and I am not running two crawlers at the same time. Any ideas?
On Tuesday, November 22, 2011 at 2:56:54 PM UTC-5, Pablo Hoffman wrote: > > Not really, the persistent scheduler feature is intended to support > stopping and resuming a crawl, not to distribute the crawl among > different nodes/processes (perhaps this should be clarified in the doc). > > In order to do that, you would have to do it yourself by storing the > urls to visit, splitting the list into multiple chunks, and send each > chunk to crawl in a separate process. > > Alternatively, there is a scrapy-redis extension [1] that (I think) it > allows you do to what you want - worth checking I guess. > > [1] https://github.com/darkrho/scrapy-redis > > On 11/22/2011 05:22 PM, Алексей Масленников wrote: > > Oh, and I can do? Can use a hack? > > > > On 22 ноя, 20:51, Pablo Hoffman<[email protected]> wrote: > >> Are you running two instances of the spider *at the same time*?. Because > >> that's not supported, > >> > >> On 11/22/2011 11:18 AM, Алексей Масленников wrote: > >> > >> > >> > >> > >> > >> > >> > >>> When I run two instances of the spider: > >> > >>> ---<exception caught here> --- > >>> File > "/usr/local/lib/python2.7/dist-packages/Twisted-11.0.0-py2.7- > >>> linux-i686.egg/twisted/internet/base.py", line 793, in runUntilCurrent > >>> call.func(*call.args, **call.kw) > >>> File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.15.0- > >>> py2.7.egg/scrapy/utils/reactor.py", line 41, in __call__ > >>> return self._func(*self._a, **self._kw) > >>> File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.15.0- > >>> py2.7.egg/scrapy/core/engine.py", line 103, in _next_request > >>> if not self._next_request_from_scheduler(spider): > >>> File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.15.0- > >>> py2.7.egg/scrapy/core/engine.py", line 125, in > >>> _next_request_from_scheduler > >>> request = slot.scheduler.next_request() > >>> File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.15.0- > >>> py2.7.egg/scrapy/core/scheduler.py", line 55, in next_request > >>> return self.mqs.pop() or self._dqpop() > >>> File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.15.0- > >>> py2.7.egg/scrapy/core/scheduler.py", line 81, in _dqpop > >>> d = self.dqs.pop() > >>> File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.15.0- > >>> py2.7.egg/scrapy/utils/pqueue.py", line 38, in pop > >>> m = q.pop() > >>> File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.15.0- > >>> py2.7.egg/scrapy/squeue.py", line 18, in pop > >>> s = super(SerializableQueue, self).pop() > >>> File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.15.0- > >>> py2.7.egg/scrapy/utils/queue.py", line 160, in pop > >>> size, = struct.unpack(self.SIZE_FORMAT, self.f.read()) > >>> struct.error: unpack requires a string argument of length 4 > > > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
