If it's just for tracking, you could write a simple middleware (either
spider or downloader middleware) that parses the URL (using urlparse),
extracts the domain and keeps a count of number of requests seen per
domain. Just a quick idea...


On Thu, Mar 6, 2014 at 5:39 AM, Alok Singh Mahor <[email protected]>wrote:

> Hi everyone,
>
> I am writing Scrapy Spider which will crawl about 1000 domains. I am
> thinking if there is any way to track number of domains crawled. because it
> will take long time to crawl 1000 domains in one using process.
>
> if I could track number of domains process then I can trigger some task
> like sending email after crawling of 100 domains out of 1000.
>
> I tried to find on internet but could not get relevant.
>
> if anyone know someway please tell me. if I would not find any way then I
> have to track number of urls crawled. but it would be good if number of
> domains can be tracked.
>
>
> class MySpider(CrawlSpider):
>     name = 'alok2'
>     # 'list.txt' file have domains which I have to crawl
>     allowed_domains = [i.split('\n')[0] for i in 
> open('list.txt','r').readlines()]
>     start_urls = ['http://'+i.split('\n')[0] for i in 
> open('list.txt','r').readlines()]
>     rules = [Rule(SgmlLinkExtractor(), callback='parse_item',follow=True)]
>
>     def __init__(self,category=None, *args, **kwargs):
>         super(MySpider, self).__init__(*args, **kwargs)
>         self.count=0 #this is to keep track of domains whose all links have 
> been crawled
>
>     def parse_start_url(self, response):
>         self.parse_item(response)
>
>     def parse_item(self, response):
>         #lines
>         #lines
>
>  --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to