Hi, CrawlSpider and a custom parse() method do not play well together. See the warning a bit below http://doc.scrapy.org/en/latest/topics/spiders.html#crawling-rules It's easy to miss.
Try renaming your parse() method to something like parse_page(), and reference this new callback name in your rule. Le 28 févr. 2015 16:17, "JEBI93" <[email protected]> a écrit : > Hey guys, i have a small problem when trying to crawl 10+ pages. Heres the > code: > > class ItemspiderSpider(CrawlSpider): > name = "itemspider" > allowed_domains = ["openstacksummitnovember2014paris.sched.org"] > start_urls = [' > http://openstacksummitnovember2014paris.sched.org/directory/attendees/'] > > rules = ( > Rule(SgmlLinkExtractor(allow=r'/directory/attendees/\d+'), > callback='parse', follow=True), > ) > > The problem is that when i run this code i get only results of first page, > not the others. I tried to modify start_urls to something like this and it > worked fine > > start_urls = [ > 'http://openstacksummitnovember2014paris.sched.org/directory/attendees/1' > 'http://openstacksummitnovember2014paris.sched.org/directory/attendees/2' > 'http://openstacksummitnovember2014paris.sched.org/directory/attendees/3' > 'http://openstacksummitnovember2014paris.sched.org/directory/attendees/4' > etc.. > ] > > I'm guessing i messed up at allow part, probably my regex its not proper. > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
