Hi scrapy people,
I am quite new to scrapy. I have done one script which works and I am
developing it.
Could you explain me one thing please.
If I have such code
rules = [
Rule(LxmlLinkExtractor(allow=("ecolex/ledge/view/SearchResults", )),
follow=True),
Rule(LxmlLinkExtractor (allow=("ecolex/ledge/view/RecordDetails",
)), callback='found_items'),
]
what happens actually?
For each phrases all links will be extracted and for SearchResults spider
would only follow such links until reaches all links.
If on the website a link with pattern RecordDetails is seized, spider would
apply a method 'found_items' for further processing.
The thing is about task scheduling here.
Does it happen sequentially or in parallel ?
I mean, spider scrapes some data from a site with pattern RecordDetails and
after
all scraped items switches to follow another link and scrapes?
This is something automagical. How scrapy knows what to do first, to scrape
or to follow?
Is it sequential job:
following one site -> scraping all content
following second site -> scraping all content
Or we have some parallelization like:
following one site -> scraping all content & following second site ->
scraping all content
I would like to make it the latter style if it is not like this.
The question is how could I do it?
Regards,
Szymon Roziewski
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.