I am running multiple instances of a spider on EC2 using scrapyd. Recently
several spider jobs close right after they are launched and the log file
indicates the following common symptom (read beyond this please):
Enter code here...
2014-06-24 19:34:50+0000 [spider8] ERROR: Error caught on signal handler:
<bound method ?.item_scraped of <scrapy.contrib.feedexport.FeedExporter
object at 0x7fce018fa850>>
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py",
line 577, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/usr/lib/pymodules/python2.7/scrapy/core/scraper.py", line 215,
in _itemproc_finished
item=output, response=response, spider=spider)
File "/usr/lib/pymodules/python2.7/scrapy/signalmanager.py", line 23,
in send_catch_log_deferred
return signal.send_catch_log_deferred(*a, **kw)
File "/usr/lib/pymodules/python2.7/scrapy/utils/signal.py", line 53,
in send_catch_log_deferred
*arguments, **named)
--- <exception caught here> ---
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py",
line 139, in maybeDeferred
result = f(*args, **kw)
File
"/usr/lib/pymodules/python2.7/scrapy/xlib/pydispatch/robustapply.py", line 54,
in robustApply
return receiver(*arguments, **named)
File "/usr/lib/pymodules/python2.7/scrapy/contrib/feedexport.py",
line 190, in item_scraped
slot.exporter.export_item(item)
File
"/usr/lib/pymodules/python2.7/scrapy/contrib/exporter/__init__.py", line 87, in
export_item
itemdict = dict(self._get_serialized_fields(item))
File
"/usr/lib/pymodules/python2.7/scrapy/contrib/exporter/__init__.py", line 71, in
_get_serialized_fields
field = item.fields[field_name]
exceptions.AttributeError: 'dict' object has no attribute 'fields'
Looking at the scrapyd log it indicates that FeedExporter is enabled:
Enter code here...
2014-06-24 19:34:49+0000 [scrapy] INFO: Enabled extensions: FeedExporter,
LogStats, TelnetConsole, CloseSpider,
WebService, CoreStats, SpiderState
But when I run the spider locally using the crawl command:
*scrapy crawl MySpider -a session_id=23 -a seed_id=5 -a
seed_url=http://www.Blah.com/*
the FeedExporter is not enabled (I determined this by setting the Log mode
to DEBUG in the settings.py file). Is scrapyd overriding the EXTENSION
settings and enabling the FeedExporter? If so how can I explicitly turn
this off? Should I explicitly disable it as indicated by
http://doc.scrapy.org/en/latest/topics/extensions.html#disabling-an-extension
? I am using a custom pipeline to save all items to a MongoDB instance.
Thanks
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.