Hi, I'm trying to integrate together Scrapy and Selenium to scrape heavily-weighted JavaScript webapps. I've already found a package [1] that provides a `WebdriverRequest` that works well on my tests.
But I also want to handle not only the requests that I explicitly make using `WebdriverRequest`, but also the requests the browser makes when it navigates to the target page (e.g. javascript and css files, XHR). My current idea is: 1. Make the browser use a "proxy" that can call a HTTP endpoint to log every requests/response pair it receives. 2. A Scrapy extension would create this HTTP endpoint using `twisted.web.server.Site` and inject it into the existing twisted reactor. (I never used Twisted before, so I don't know if this is possible). 3. Somehow let the Spider know that we have data for it process everytime this endpoint is called. Is that possible? How can I implement the last step? Does anyone have another suggestion? Thank you, Paulo [1] https://github.com/brandicted/scrapy-webdriver -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
