Problem with crawling multiple pages

JEBI93 Sat, 28 Feb 2015 07:17:46 -0800

Hey guys, i have a small problem when trying to crawl 10+ pages. Heres the 
code:


class ItemspiderSpider(CrawlSpider):
    name = "itemspider"
    allowed_domains = ["openstacksummitnovember2014paris.sched.org"]
    start_urls = 
['http://openstacksummitnovember2014paris.sched.org/directory/attendees/']
    
    rules = (
        Rule(SgmlLinkExtractor(allow=r'/directory/attendees/\d+'), 
callback='parse', follow=True),
    )    

The problem is that when i run this code i get only results of first page, 
not the others. I tried to modify start_urls to something like this and it 
worked fine

start_urls = [
'http://openstacksummitnovember2014paris.sched.org/directory/attendees/1'
'http://openstacksummitnovember2014paris.sched.org/directory/attendees/2'
'http://openstacksummitnovember2014paris.sched.org/directory/attendees/3'
'http://openstacksummitnovember2014paris.sched.org/directory/attendees/4'
etc..
]

I'm guessing i messed up at allow part, probably my regex its not proper.

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Problem with crawling multiple pages

Reply via email to