I was not running 2 crawlers, however, there was some leftover stuff from a 
previous crawl in my CRAWL_ROOT directory. Clearing that resolved the issue.

On Wednesday, August 26, 2015 at 4:32:08 PM UTC-4, Ryan Compton wrote:
>
> I am running into the same thing on OSX and I am not running two crawlers 
> at the same time. Any ideas?
>
> On Tuesday, November 22, 2011 at 2:56:54 PM UTC-5, Pablo Hoffman wrote:
>>
>> Not really, the persistent scheduler feature is intended to support 
>> stopping and resuming a crawl, not to distribute the crawl among 
>> different nodes/processes (perhaps this should be clarified in the doc).
>>
>> In order to do that, you would have to do it yourself by storing the 
>> urls to visit, splitting the list into multiple chunks, and send each 
>> chunk to crawl in a separate process.
>>
>> Alternatively, there is a scrapy-redis extension [1] that (I think) it 
>> allows you do to what you want - worth checking I guess.
>>
>> [1] https://github.com/darkrho/scrapy-redis
>>
>> On 11/22/2011 05:22 PM, Алексей Масленников wrote:
>> > Oh, and I can do? Can use a hack?
>> >
>> > On 22 ноя, 20:51, Pablo Hoffman<[email protected]>  wrote:
>> >> Are you running two instances of the spider *at the same time*?. 
>> Because
>> >> that's not supported,
>> >>
>> >> On 11/22/2011 11:18 AM, Алексей Масленников wrote:
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>> When I run two instances of the spider:
>> >>
>> >>> ---<exception caught here>    ---
>> >>>       File 
>> "/usr/local/lib/python2.7/dist-packages/Twisted-11.0.0-py2.7-
>> >>> linux-i686.egg/twisted/internet/base.py", line 793, in runUntilCurrent
>> >>>         call.func(*call.args, **call.kw)
>> >>>       File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.15.0-
>> >>> py2.7.egg/scrapy/utils/reactor.py", line 41, in __call__
>> >>>         return self._func(*self._a, **self._kw)
>> >>>       File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.15.0-
>> >>> py2.7.egg/scrapy/core/engine.py", line 103, in _next_request
>> >>>         if not self._next_request_from_scheduler(spider):
>> >>>       File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.15.0-
>> >>> py2.7.egg/scrapy/core/engine.py", line 125, in
>> >>> _next_request_from_scheduler
>> >>>         request = slot.scheduler.next_request()
>> >>>       File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.15.0-
>> >>> py2.7.egg/scrapy/core/scheduler.py", line 55, in next_request
>> >>>         return self.mqs.pop() or self._dqpop()
>> >>>       File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.15.0-
>> >>> py2.7.egg/scrapy/core/scheduler.py", line 81, in _dqpop
>> >>>         d = self.dqs.pop()
>> >>>       File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.15.0-
>> >>> py2.7.egg/scrapy/utils/pqueue.py", line 38, in pop
>> >>>         m = q.pop()
>> >>>       File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.15.0-
>> >>> py2.7.egg/scrapy/squeue.py", line 18, in pop
>> >>>         s = super(SerializableQueue, self).pop()
>> >>>       File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.15.0-
>> >>> py2.7.egg/scrapy/utils/queue.py", line 160, in pop
>> >>>         size, = struct.unpack(self.SIZE_FORMAT, self.f.read())
>> >>>     struct.error: unpack requires a string argument of length 4
>> >
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to