Order of processed items in pipeline

Gabriel Birke Thu, 22 May 2014 02:53:09 -0700

I'm parsing different item types and would like to add a reference between 
items. At the moment I'm using a pipeline for that:


class OrganizationAssignmentPipeline(object):
    """Assign organization_id to items"""


    def open_spider(self,spider):
        self.items = []
        self.organization_id = None
    
    def process_item(self, item, spider):
        if self.organization_id:
            if "organization_id" in item.fields:
                item["organization_id"] = self.organization_id
        else:
            if type(item) == Organization:
                self.organization_id = item["id"]
                # Assigning ids after the fact - does this work?
                for i in self.items:
                    self.items[i]["organization_id"] = self.organization_id
                self.items = []
            else:
                self.items.append(item)
        return item

I have another pipeline after this that writes the items to a file. 
At the moment the Organization item is scraped and processed first, but 
what if it scraped later? Is a pipeline the wrong approach for 
postprocessing all items regardless of order? Waht would be the right 
approach in this case?



-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Order of processed items in pipeline

Reply via email to