Basic Rule and Spider question

baransja Fri, 30 Oct 2015 15:15:35 -0700

I have a set of rules in a spider defined.

1. Rule one does not need to parse any data from the page from my spider 
but just follow links
2. Rule two then executes based on the link followed from Rule one and 
parses the page


Is it possible to pass any meta data or properties about Rule one down to 
the Rule two callback? For my specific case I want the "referrer" url, what 
I mean is the url of the link followed from Rule one.

A simple code example to illustrate my point (this would be found in the 
spider)
    
    rules = (
        Rule(LinkExtractor(allow=some_dynamic_list)),
        Rule(LinkExtractor(allow=some_dynamic_list_2), callback="parse_page"
),
    )

    def parse_page(self, response):
        # response is from the second rules request.
        #
        # I would to be able to call something like response.referrer
        # to access the "parent" url, basically the url from Rule one where 
the link was followed.
        #
        # If I could also somehow derive the Rule object(s) and figure it 
out that way it's ok.
        # Just need some ideas or thoughts from people who are familiar or 
know the API inside out.
        pass








-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Basic Rule and Spider question

Reply via email to