There are many ways to solve your problem. First of all, starts_url is a list. so you can add all your required pages to the start_url in the constructor of the spider(IE the __init__ function). Second, you can set the start_url to contains only the first link like http://www.example.com/page/12. Then in the parse callback, you can do the followings:
1. extract the items. 2. take the ID of the page, in this case "12" 3. check whether you want to scrap more pages or not. 4. If you want, create a new URL like this http://www.exampel.com/pages/13. 5. yield a new request like this yield Request(newURL, meta, headers, cookies..). On Tuesday, April 29, 2014 9:42:57 AM UTC+3, wilby yang wrote: > > I am now using scrapy's CrawlSpider, which requires you to specify a list > of start_urls? > I am wondering is it possible to specify a range of start_urlsusing > regular expression like 'http://www.example.com/page/[12..20]' > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
