Re: scrape urls with counter till you reach empty page

Dimitris Kouzis - Loukas Sat, 28 May 2016 12:03:34 -0700

Scrapy can certainly deal with this. But since you're a newbie I will give 
you a very O(1) answer (that in general is not good).


    start_urls = ['http://www.example.com/news?count=%d' % i for i in xrange(1, 
1000)]

Use this to download the first thousand pages. If they have data, good. If 
not no problem. There are tons of better solutions but this is so simple 
that is very attractive. It also "teaches" you to not overvalue the "cost" 
of an extra request or a thousand. Scrapy is good at making tons of 
requests. 


On Sunday, May 15, 2016 at 10:44:53 AM UTC+1, Ahmad AlTwaijiry wrote:
>
> Hello
>
> I'm a newbie here so forgive my question,
>
> so I have a url http://example.com/news?count=XX), I want scrapy to go 
> over all count (1,2,3,4,5,) till it reach an empty page (no html).
>
> my issue the total count are unknown so I'm not sure how I can tell scrapy 
> to work like that
>
>
>
> http://example.com/news?count=1 ===> found data, save it
> http://example.com/news?count=2 ===> found data, save it
> http://example.com/news?count=3 ===> found data, save it
> ....
> ....
> ....
> http://example.com/news?count=X ===> no data found, stop here.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: scrape urls with counter till you reach empty page

Reply via email to