I am running multiple instances of a spider on EC2 using scrapyd.  Recently 
several spider jobs close right after they are launched and the log file 
indicates the following common symptom (read beyond this please): 

Enter code here...

2014-06-24 19:34:50+0000 [spider8] ERROR: Error caught on signal handler: 
     <bound method ?.item_scraped of <scrapy.contrib.feedexport.FeedExporter 
object at 0x7fce018fa850>>
        Traceback (most recent call last):
          File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", 
line 577, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "/usr/lib/pymodules/python2.7/scrapy/core/scraper.py", line 215, 
in _itemproc_finished
            item=output, response=response, spider=spider)
          File "/usr/lib/pymodules/python2.7/scrapy/signalmanager.py", line 23, 
in send_catch_log_deferred
            return signal.send_catch_log_deferred(*a, **kw)
          File "/usr/lib/pymodules/python2.7/scrapy/utils/signal.py", line 53, 
in send_catch_log_deferred
            *arguments, **named)
        --- <exception caught here> ---
          File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", 
line 139, in maybeDeferred
            result = f(*args, **kw)
          File 
"/usr/lib/pymodules/python2.7/scrapy/xlib/pydispatch/robustapply.py", line 54, 
in robustApply
            return receiver(*arguments, **named)
          File "/usr/lib/pymodules/python2.7/scrapy/contrib/feedexport.py", 
line 190, in item_scraped
            slot.exporter.export_item(item)
          File 
"/usr/lib/pymodules/python2.7/scrapy/contrib/exporter/__init__.py", line 87, in 
export_item
            itemdict = dict(self._get_serialized_fields(item))
          File 
"/usr/lib/pymodules/python2.7/scrapy/contrib/exporter/__init__.py", line 71, in 
_get_serialized_fields
            field = item.fields[field_name]
        exceptions.AttributeError: 'dict' object has no attribute 'fields'


Looking at the scrapyd log it indicates that FeedExporter is enabled:

Enter code here...

2014-06-24 19:34:49+0000 [scrapy] INFO: Enabled extensions: FeedExporter, 
LogStats, TelnetConsole, CloseSpider, 
     WebService, CoreStats, SpiderState


But when I run the spider locally using the crawl command: 

*scrapy crawl MySpider -a session_id=23 -a seed_id=5 -a 
seed_url=http://www.Blah.com/*

the FeedExporter is not enabled (I determined this by setting the Log mode 
to DEBUG in the settings.py file).  Is scrapyd overriding the EXTENSION 
settings and enabling the FeedExporter?  If so how can I explicitly turn 
this off?  Should I explicitly disable it as indicated by 
http://doc.scrapy.org/en/latest/topics/extensions.html#disabling-an-extension 
?  I am using a custom pipeline to save all items to a MongoDB instance.

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to