Scrapy can certainly deal with this. But since you're a newbie I will give
you a very O(1) answer (that in general is not good).
start_urls = ['http://www.example.com/news?count=%d' % i for i in xrange(1,
1000)]
Use this to download the first thousand pages. If they have data, good. If
not no problem. There are tons of better solutions but this is so simple
that is very attractive. It also "teaches" you to not overvalue the "cost"
of an extra request or a thousand. Scrapy is good at making tons of
requests.
On Sunday, May 15, 2016 at 10:44:53 AM UTC+1, Ahmad AlTwaijiry wrote:
>
> Hello
>
> I'm a newbie here so forgive my question,
>
> so I have a url http://example.com/news?count=XX), I want scrapy to go
> over all count (1,2,3,4,5,) till it reach an empty page (no html).
>
> my issue the total count are unknown so I'm not sure how I can tell scrapy
> to work like that
>
>
>
> http://example.com/news?count=1 ===> found data, save it
> http://example.com/news?count=2 ===> found data, save it
> http://example.com/news?count=3 ===> found data, save it
> ....
> ....
> ....
> http://example.com/news?count=X ===> no data found, stop here.
>
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.