How does follow and rules work

Szymon Roziewski Fri, 17 Oct 2014 05:46:37 -0700

Hi scrapy people,

I am quite new to scrapy. I have done one script which works and I am 
developing it.


Could you explain me one thing please.

If I have such code 
    rules = [
        Rule(LxmlLinkExtractor(allow=("ecolex/ledge/view/SearchResults", )), 
follow=True),
        Rule(LxmlLinkExtractor (allow=("ecolex/ledge/view/RecordDetails", 
)), callback='found_items'),
    ]

what happens actually?

For each phrases all links will be extracted and for SearchResults spider 
would only follow such links until reaches all links.

If on the website a link with pattern RecordDetails is seized, spider would 
apply a method 'found_items' for further processing.

The thing is about task scheduling here.

Does it happen sequentially or in parallel ? 

I mean, spider scrapes some data from a site with pattern RecordDetails and 
after 
all scraped items switches to follow another link and scrapes?

This is something automagical. How scrapy knows what to do first, to scrape 
or to follow?

Is it sequential job:

following one site -> scraping all content
following second site -> scraping all content

Or we have some parallelization like: 
following one site -> scraping all content & following second site -> 
scraping all content

I would like to make it the latter style if it is not like this.

The question is how could I do it?

Regards,
Szymon Roziewski


-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

How does follow and rules work

Reply via email to