Hi,

I'm trying to integrate together Scrapy and Selenium to scrape
heavily-weighted JavaScript webapps. I've already found a package [1] that
provides a `WebdriverRequest` that works well on my tests.

But I also want to handle not only the requests that I explicitly make
using `WebdriverRequest`, but also the requests the browser makes when it
navigates to the target page (e.g. javascript and css files, XHR).

My current idea is:

1. Make the browser use a "proxy" that can call a HTTP endpoint to log
every requests/response pair it receives.
2. A Scrapy extension would create this HTTP endpoint using
`twisted.web.server.Site` and inject it into the existing twisted reactor.
(I never used Twisted before, so I don't know if this is possible).
3. Somehow let the Spider know that we have data for it process everytime
this endpoint is called.

Is that possible? How can I implement the last step? Does anyone have
another suggestion?

Thank you,
Paulo

[1] https://github.com/brandicted/scrapy-webdriver

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to