Thanks Daniel, I put my hooks into http11.py and see the calls to the websites now in my monitoring tool. It is just the response time it shows seems to quick for some of the sites which are known to be a bit slow. Could this be because of the asynch mechanism thats used all over the place in twisted ? Where would you say I should put my hooks to exactly capture the time it takes to download/crawl a website ?
Thanks Philipp On Friday, May 22, 2015 at 12:45:17 AM UTC+2, Daniel Fockler wrote: > > Yeah, you're right the HTTP request is happening in the twisted reactor > > > https://github.com/scrapy/scrapy/blob/0.24/scrapy/core/downloader/handlers/http10.py > > On Thursday, May 21, 2015 at 2:47:59 PM UTC-7, Philipp Bussche wrote: >> >> Thanks Daniel, >> >> that sounds like a good idea and I will have a look at that. >> >> But I would also be interested to instrument the call to crawl the actual >> URL so I can put some monitoring code before and after it. >> Do you know how the actual crawl is being done ? Is it done via twisted ? >> It does not look like httplib is being used for that. >> >> Thanks >> Philipp >> >> On Thursday, May 21, 2015 at 10:29:44 PM UTC+2, Daniel Fockler wrote: >>> >>> Hey, >>> >>> Not sure exactly what you are looking for, but you can implement a >>> Scrapy Downloader Middleware and run a process_request function that will >>> pass each request into that function so you can examine it. Here's the docs >>> for that. >>> >>> http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html >>> >>> On Thursday, May 21, 2015 at 7:03:35 AM UTC-7, Philipp Bussche wrote: >>>> >>>> Hi there, >>>> I am working on some monitoring for my python/scrapy deployment using >>>> one of the commercial APM tools. >>>> I was able to instrument the parsing of the response as well as the >>>> pipeline which pushes the items into an ElasticSearch instance. >>>> You can see in the attached screenshot how that is visualized in the >>>> tool. >>>> I would now also like to see the outgoing calls that Scrapy is making >>>> through the downloader to actually crawl the http pages (which is >>>> obviously >>>> happening before parsing and pipelining). >>>> But I can't figure out where in the code the actual http call is made >>>> so that I could put my monitoring hook around it. >>>> Could you guys please point me to the class that is actually doing the >>>> http calls ? >>>> >>>> Thanks >>>> Philipp >>>> >>> -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
