I have a simple crawler that makes a list of every link on my site. I have been feeding this list into a perl script that purges the cache for each link, but I would like to tell Scrapy to do it. How can I modify this crawler so that it will hit each link a second time?
from scrapy.linkextractors import LinkExtractor from scrapy.spiders import CrawlSpider, Rule from scrapy.item import Item, Field class MyItem(Item): url= Field() class MySpider(CrawlSpider): name = 'test' allowed_domains = ['nydvreports1.example.com'] start_urls = ['http://nydvreports1.example.com/perl/globe_3xx.pl'] rules = (Rule(LinkExtractor(), callback='parse_url', follow=True), ) def parse_url(self, response): print "Visiting %s" % response.url item = MyItem() item['url'] = response.url return item -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
