Re: How does follow and rules work

Szymon Roziewski Mon, 20 Oct 2014 06:52:21 -0700

With such a rule 
 Rule(LxmlLinkExtractor (allow=("ecolex.org/server2.php/libcat/docs", )), 
callback='get_file'),
I would like to grab all files that suit to this phrase i.e. doc, pdf, txt, 
csv files.
But what I obtain is only the ability to get txt files
I have a callback method here


    def get_file(self, response):
        item = FiledownloadItem()
        item["file_urls"] = [response.url]
        yield item



On Friday, 17 October 2014 14:45:32 UTC+2, Szymon Roziewski wrote:
>
> Hi scrapy people,
>
> I am quite new to scrapy. I have done one script which works and I am 
> developing it.
>
> Could you explain me one thing please.
>
> If I have such code 
>     rules = [
>         Rule(LxmlLinkExtractor(allow=("ecolex/ledge/view/SearchResults", 
> )), follow=True),
>         Rule(LxmlLinkExtractor (allow=("ecolex/ledge/view/RecordDetails", 
> )), callback='found_items'),
>     ]
>
> what happens actually?
>
> For each phrases all links will be extracted and for SearchResults spider 
> would only follow such links until reaches all links.
>
> If on the website a link with pattern RecordDetails is seized, spider 
> would apply a method 'found_items' for further processing.
>
> The thing is about task scheduling here.
>
> Does it happen sequentially or in parallel ? 
>
> I mean, spider scrapes some data from a site with pattern RecordDetails 
> and after all scraped items switches to follow another link and scrapes?
>
> This is something automagical. How scrapy knows what to do first, to 
> scrape or to follow?
>
> Is it sequential job:
>
> following one site -> scraping all content
> following second site -> scraping all content
>
> Or we have some parallelization like: 
> following one site -> scraping all content & following second site -> 
> scraping all content
>
> I would like to make it the latter style if it is not like this.
>
> The question is how could I do it?
>
> Regards,
> Szymon Roziewski
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: How does follow and rules work

Reply via email to