Re: exclude particular url from spider.start_urls on the fly?

lewis Wed, 20 Aug 2014 09:06:42 -0700

You want your spider to remove the current site from the list of pages once 
it has found the page your looking for on that site?


On Wednesday, August 20, 2014 11:50:50 AM UTC+1, tim feirg wrote:
>
> I'm crawling through some 20 webpages to get my database updated, I want 
> my spider to ignore a url completely once it found that the item it just 
> returned already exists in database, so that it doesn't follow any other 
> links from this url and just move on to those which still contain new items.
>
> it seems fairly easy but I haven't find any smart ways to do it, can 
> anybody help? thanks:)
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: exclude particular url from spider.start_urls on the fly?

Reply via email to