Re: Problem with crawling multiple pages 2

Travis Leleu Tue, 03 Mar 2015 13:37:06 -0800

Changing it back to something that's contra-indicated in the manual isn't
going to help.


Why are you using the SgmlLinkExtractor rather than the default
LinkExtractor?  I can't test in the shell right now, and don't have time to
recreate the codebase, but I'd guess your regex isn't matching how you
think it's matching.

GL.  Stuff looked good to me.

On Tue, Mar 3, 2015 at 9:45 AM, JEBI93 <[email protected]>
wrote:

> Again i don't know how to deal with pagination. Anyway here's problem:
> class GumtreespiderSpider(CrawlSpider):
>     name = "gumtreeSpider"
>     allowed_domains = ["gumtree.com.au"]
>     start_urls = ['http://www.gumtree.com.au/s-jobs/page-1/c9302?ad=wanted
> ']
>
>     rules = (
>         Rule(SgmlLinkExtractor(allow=('/s-jobs/page-\d+c9302?ad=wanted')),
> callback='parse', follow=True),
>     )
>
> What I'm trying to do is iterate with \d+ to scrape 100+ pages but it
> returns only first one(start_urls one).
> Here's full script: http://pastebin.com/CYrPvZuc
>
>  --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Problem with crawling multiple pages 2

Reply via email to