I have a Java app that works as an HTTP proxy which I'm trying to integrate with Gecko through the JavaXPCom to utilize its DOM rendering abilities to enable server-side DOM rendering/creation/ serialization for scrape purposes.
The problem I'm trying to solve with this is to enable server-side DOM creation. For example, if DOM of some page is entirely javascript built, accessing this page through plain Java APIs will get me the markup of that page with javascript in there. However, I currently have no way of simulating running/opening/rendering that page server- side so that javascript could build the rest of the page's DOM and then serialize that back. The end result is that this new implementation would enable the HTTP proxy to return a completely generated DOM after any/all Javascript has run in that page. I've looked at Crowbar and this tells me that this sort of functionality is possible with xpcom but I need to do it in Java and not JS/XUL. One limitation I found with Crowbar for me is that it doesn't handle content-types at the moment and non hmtl/xhtml responses from target sites its proxying to (such as images). Basically this is what I have right now as part of my Java proxy: 1. Have access to the URL I need to access on the target site 2. Have all URL params that I need to send through (proxy through to the target site) 3. Have all HTTP headers that I need to send through At the minimum I'd like to tell a JavaXPCom component to just render and create a complete DOM out of the HTML markup (so that it runs javascript and creates DOM) and then give me that as a String back. Ultimately, I'd like to use JavaXPCom to get the Gecko engine to do all the transport layer functionality (make the HTTP request, handle the response) and give me back the complete DOM with all the response headers. Note that my app runs as a server-side headless component. Thanks for any help. _______________________________________________ dev-embedding mailing list [email protected] https://lists.mozilla.org/listinfo/dev-embedding
