Changing it back to something that's contra-indicated in the manual isn't going to help.
Why are you using the SgmlLinkExtractor rather than the default LinkExtractor? I can't test in the shell right now, and don't have time to recreate the codebase, but I'd guess your regex isn't matching how you think it's matching. GL. Stuff looked good to me. On Tue, Mar 3, 2015 at 9:45 AM, JEBI93 <[email protected]> wrote: > Again i don't know how to deal with pagination. Anyway here's problem: > class GumtreespiderSpider(CrawlSpider): > name = "gumtreeSpider" > allowed_domains = ["gumtree.com.au"] > start_urls = ['http://www.gumtree.com.au/s-jobs/page-1/c9302?ad=wanted > '] > > rules = ( > Rule(SgmlLinkExtractor(allow=('/s-jobs/page-\d+c9302?ad=wanted')), > callback='parse', follow=True), > ) > > What I'm trying to do is iterate with \d+ to scrape 100+ pages but it > returns only first one(start_urls one). > Here's full script: http://pastebin.com/CYrPvZuc > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
