I am writing a crawling spider but for each url visited and parsed, the 
saved item needs to include the originating url.  

For example, lets say given the start_urls = ["http://www.A.com";] and the 
initial list of urls to follow that are extracted by the SgmlLinkExtractor
are ["http://www.B.com";, "http://www.C.com";], the spider engine would then 
schedule a visit to www.B.com then www.C.com.  When the spider crawls 
to www.B.com and the parse method extracts some data, I need the processed 
item to include a field with the originating url, which in this case is
www.A.com.  

Like a breadcrumb trail, for each call to the parse method I need to look 
back on step. Is there an existing way to get this information? 

Much thanks

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to