[12..20]'

William Kinaan Wed, 30 Apr 2014 11:48:06 -0700

There are many ways to solve your problem.
First of all, starts_url is a list. so you can add all your required pages 
to the start_url in the constructor of the spider(IE the __init__ function).
Second, you can set the start_url to contains only the first link like 
http://www.example.com/page/12. Then in the parse callback, you can do the 
followings:

   1. extract the items.
   2. take the ID of the page, in this case "12"
   3. check whether you want to scrap more pages or not.
   4. If you want, create a new URL like this 
   http://www.exampel.com/pages/13.
   5. yield a new request like this yield Request(newURL, meta, headers, 
   cookies..). 

On Tuesday, April 29, 2014 9:42:57 AM UTC+3, wilby yang wrote:
>
> I am now using scrapy's CrawlSpider, which requires you to specify a list 
> of start_urls?
> I am wondering is it possible to specify a range of start_urlsusing 
> regular expression like 'http://www.example.com/page/[12..20]'
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: is it possible to specify a range of start_urlsusing regular expression like 'http://www.example.com/page/[12..20]'

Reply via email to