Here's full script: http://pastebin.com/13eNky9W, after i change from parse to parse_page i dont get anything scraped.
субота, 28. фебруар 2015. 16.51.10 UTC+1, Paul Tremberth је написао/ла: > > Hi, > > CrawlSpider and a custom parse() method do not play well together. See the > warning a bit below > http://doc.scrapy.org/en/latest/topics/spiders.html#crawling-rules > It's easy to miss. > > Try renaming your parse() method to something like parse_page(), and > reference this new callback name in your rule. > Le 28 févr. 2015 16:17, "JEBI93" <[email protected] <javascript:>> > a écrit : > >> Hey guys, i have a small problem when trying to crawl 10+ pages. Heres >> the code: >> >> class ItemspiderSpider(CrawlSpider): >> name = "itemspider" >> allowed_domains = ["openstacksummitnovember2014paris.sched.org"] >> start_urls = [' >> http://openstacksummitnovember2014paris.sched.org/directory/attendees/'] >> >> rules = ( >> Rule(SgmlLinkExtractor(allow=r'/directory/attendees/\d+'), >> callback='parse', follow=True), >> ) >> >> The problem is that when i run this code i get only results of first >> page, not the others. I tried to modify start_urls to something like this >> and it worked fine >> >> start_urls = [ >> 'http://openstacksummitnovember2014paris.sched.org/directory/attendees/1' >> 'http://openstacksummitnovember2014paris.sched.org/directory/attendees/2' >> 'http://openstacksummitnovember2014paris.sched.org/directory/attendees/3' >> 'http://openstacksummitnovember2014paris.sched.org/directory/attendees/4' >> etc.. >> ] >> >> I'm guessing i messed up at allow part, probably my regex its not proper. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "scrapy-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at http://groups.google.com/group/scrapy-users. >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
