Re: Problem with crawling multiple pages 2

JEBI93 Wed, 04 Mar 2015 10:01:17 -0800

Thanks Aaron, i can't believe i didn't remember to do something like that.

среда, 04. март 2015. 13.29.31 UTC+1, Aaron Tao је написао/ла:
>
> name = "gumtreeSpider"
>     allowed_domains = ["gumtree.com.au"]
>     seed = 'http://www.gumtree.com.au/s-jobs/page-%d/c9302?ad=wanted 
> <http://www.google.com/url?q=http%3A%2F%2Fwww.gumtree.com.au%2Fs-jobs%2Fpage-1%2Fc9302%3Fad%3Dwanted&sa=D&sntz=1&usg=AFQjCNG81op--ulBIkaJOJDoexLkhBRFQg>
> '
>     start_urls = [
>          seed % i for i in range(10)
>     ]
>
>     rules = (
>         Rule(SgmlLinkExtractor(allow=('/s-jobs/page-\d+c9302?ad=wanted')), 
> callback='parse', follow=True),
>     )
>
> On Wednesday, March 4, 2015 at 1:45:57 AM UTC+8, JEBI93 wrote:
>>
>> Again i don't know how to deal with pagination. Anyway here's problem:
>> class GumtreespiderSpider(CrawlSpider):
>>     name = "gumtreeSpider"
>>     allowed_domains = ["gumtree.com.au"]
>>     start_urls = ['
>> http://www.gumtree.com.au/s-jobs/page-1/c9302?ad=wanted']
>>
>>     rules = (
>>         
>> Rule(SgmlLinkExtractor(allow=('/s-jobs/page-\d+c9302?ad=wanted')), 
>> callback='parse', follow=True),
>>     )
>>
>> What I'm trying to do is iterate with \d+ to scrape 100+ pages but it 
>> returns only first one(start_urls one).
>> Here's full script: http://pastebin.com/CYrPvZuc
>>
>>


-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Problem with crawling multiple pages 2

Reply via email to