Re: Crawling Every Page of a Website

Felipe Ruhland Mon, 10 Oct 2016 02:46:42 -0700

Hey, Tim. You have to change you code and find the next page selector.
You can use scrapy shell[1] to search for next page selector.


I hope this help you.

Good luck.

[1] https://doc.scrapy.org/en/latest/topics/shell.html

On Sun, Oct 9, 2016 at 8:19 AM, Tim Fitzhardinge <[email protected]>
wrote:

> Hi
>
> I'm new to web crawling. I successfully ran the main tutorial under a
> myspider.py. Now how do I crawl every page from a website. As I tried
> changing in the start_urls to take any home page of a website however it
> only crawled 1 page.
>
> For example say crawl every page from http://www.asx.com.au website. I
> believe there will be 10,000+ pages. Thank you
>
> Enter code here...import scrapy
>
>
>
> class *BlogSpider*(scrapy.Spider):
>
>     name = 'blogspider'
>
>     start_urls = ['https://blog.scrapinghub.com']
>
>
>
>     def *parse*(self, response):
>
>         for title in response.css('h2.entry-title'):
>
>             yield {'title': title.css('a ::text').extract_first()}
>
>
>
>         next_page = response.css('div.prev-post > a ::attr(href)'
> ).extract_first()
>
>         if next_page:
>
>             yield scrapy.Request(response.urljoin(next_page),
> callback=self.parse)
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Crawling Every Page of a Website

Reply via email to