Hi there,

I have some code that jumps into a url in a search result, you may be able
to re-use or modify.

http://pastebin.com/H7zLw1FK

Have a gander at that. You could maybe change the end of that first
function to:


   1.  if url in self.visitedURLs:
   2.                     request = Request(url, callback=self.productpage)
   3.                     request.meta['item'] = item
   4.                     yield request


where 'self.productpage' is the function where you run the same url a
second time.



On Mon, Sep 14, 2015 at 12:52 PM, <[email protected]> wrote:

> I have a simple crawler that makes a list of every link on my site. I have
> been feeding this list into a perl script that purges the cache for each
> link, but I would like to tell Scrapy to do it. How can I modify this
> crawler so that it will hit each link a second time?
>
> from scrapy.linkextractors import LinkExtractor
> from scrapy.spiders import CrawlSpider, Rule
> from scrapy.item import Item, Field
>
> class MyItem(Item):
>  url= Field()
>
> class MySpider(CrawlSpider):
>  name = 'test'
>  allowed_domains = ['nydvreports1.example.com']
>  start_urls = ['http://nydvreports1.example.com/perl/globe_3xx.pl']
>  rules = (Rule(LinkExtractor(), callback='parse_url', follow=True), )
>  def parse_url(self, response):
>   print "Visiting %s" % response.url
>   item = MyItem()
>   item['url'] = response.url
>   return item
>
> --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to