Hi there, I have some code that jumps into a url in a search result, you may be able to re-use or modify.
http://pastebin.com/H7zLw1FK Have a gander at that. You could maybe change the end of that first function to: 1. if url in self.visitedURLs: 2. request = Request(url, callback=self.productpage) 3. request.meta['item'] = item 4. yield request where 'self.productpage' is the function where you run the same url a second time. On Mon, Sep 14, 2015 at 12:52 PM, <[email protected]> wrote: > I have a simple crawler that makes a list of every link on my site. I have > been feeding this list into a perl script that purges the cache for each > link, but I would like to tell Scrapy to do it. How can I modify this > crawler so that it will hit each link a second time? > > from scrapy.linkextractors import LinkExtractor > from scrapy.spiders import CrawlSpider, Rule > from scrapy.item import Item, Field > > class MyItem(Item): > url= Field() > > class MySpider(CrawlSpider): > name = 'test' > allowed_domains = ['nydvreports1.example.com'] > start_urls = ['http://nydvreports1.example.com/perl/globe_3xx.pl'] > rules = (Rule(LinkExtractor(), callback='parse_url', follow=True), ) > def parse_url(self, response): > print "Visiting %s" % response.url > item = MyItem() > item['url'] = response.url > return item > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
