I have a set of rules in a spider defined.
1. Rule one does not need to parse any data from the page from my spider
but just follow links
2. Rule two then executes based on the link followed from Rule one and
parses the page
Is it possible to pass any meta data or properties about Rule one down to
the Rule two callback? For my specific case I want the "referrer" url, what
I mean is the url of the link followed from Rule one.
A simple code example to illustrate my point (this would be found in the
spider)
rules = (
Rule(LinkExtractor(allow=some_dynamic_list)),
Rule(LinkExtractor(allow=some_dynamic_list_2), callback="parse_page"
),
)
def parse_page(self, response):
# response is from the second rules request.
#
# I would to be able to call something like response.referrer
# to access the "parent" url, basically the url from Rule one where
the link was followed.
#
# If I could also somehow derive the Rule object(s) and figure it
out that way it's ok.
# Just need some ideas or thoughts from people who are familiar or
know the API inside out.
pass
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.