exclude particular url from spider.start_urls on the fly?

tim feirg Wed, 20 Aug 2014 05:10:42 -0700

I'm crawling through some 20 webpages to get my database updated, I want my 
spider to ignore a url completely once it found that the item it just 
returned already exists in database, so that it doesn't follow any other 
links from this url and just move on to those which still contain new items.


it seems fairly easy but I haven't find any smart ways to do it, can 
anybody help? thanks:)

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

exclude particular url from spider.start_urls on the fly?

Reply via email to