If it's just for tracking, you could write a simple middleware (either spider or downloader middleware) that parses the URL (using urlparse), extracts the domain and keeps a count of number of requests seen per domain. Just a quick idea...
On Thu, Mar 6, 2014 at 5:39 AM, Alok Singh Mahor <[email protected]>wrote: > Hi everyone, > > I am writing Scrapy Spider which will crawl about 1000 domains. I am > thinking if there is any way to track number of domains crawled. because it > will take long time to crawl 1000 domains in one using process. > > if I could track number of domains process then I can trigger some task > like sending email after crawling of 100 domains out of 1000. > > I tried to find on internet but could not get relevant. > > if anyone know someway please tell me. if I would not find any way then I > have to track number of urls crawled. but it would be good if number of > domains can be tracked. > > > class MySpider(CrawlSpider): > name = 'alok2' > # 'list.txt' file have domains which I have to crawl > allowed_domains = [i.split('\n')[0] for i in > open('list.txt','r').readlines()] > start_urls = ['http://'+i.split('\n')[0] for i in > open('list.txt','r').readlines()] > rules = [Rule(SgmlLinkExtractor(), callback='parse_item',follow=True)] > > def __init__(self,category=None, *args, **kwargs): > super(MySpider, self).__init__(*args, **kwargs) > self.count=0 #this is to keep track of domains whose all links have > been crawled > > def parse_start_url(self, response): > self.parse_item(response) > > def parse_item(self, response): > #lines > #lines > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/groups/opt_out. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
