I'm parsing different item types and would like to add a reference between
items. At the moment I'm using a pipeline for that:
class OrganizationAssignmentPipeline(object):
"""Assign organization_id to items"""
def open_spider(self,spider):
self.items = []
self.organization_id = None
def process_item(self, item, spider):
if self.organization_id:
if "organization_id" in item.fields:
item["organization_id"] = self.organization_id
else:
if type(item) == Organization:
self.organization_id = item["id"]
# Assigning ids after the fact - does this work?
for i in self.items:
self.items[i]["organization_id"] = self.organization_id
self.items = []
else:
self.items.append(item)
return item
I have another pipeline after this that writes the items to a file.
At the moment the Organization item is scraped and processed first, but
what if it scraped later? Is a pipeline the wrong approach for
postprocessing all items regardless of order? Waht would be the right
approach in this case?
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.