On Thu, Dec 26, 2013 at 11:08 AM, Mrudul Tarwatkar < [email protected]> wrote:
> Are *Downloader Middleware processed before the downloader? Before the > url is scrapped?* > Before and after, it "wraps" the downloader. process_request is processed before the url is downloaded, and process_response is processed afterwards (with the HTTP response fetched). By "scraping" we typically refer to the action of extracting data, which happens in the spider, outside the downloader Are *Pipelines processed after the url is crawled (downloaded) and the > spider items are set?* > Pipelines are called after the item is scraped from the spider. Now, Let's say* I store the fingerprint of every response in an visit_id > item* using the request_fingerprint in scrapy. > So If I want to write a *downloader middleware which avoids visiting of > already visited url's in subsequent runs of a spider* , how would it be? > Like this one: http://snipplr.com/view/67018/middleware-to-avoid-revisiting-already-visited-items/ Note that that is a spider middleware, not a downloader middleware, which wraps the spider, not the downloader. Pablo. -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
