Andrew Pennebaker <[email protected]> writes:

> Tumblr and other websites delay loading some of their content (images)
> through JavaScript events like *onload*. It would be nice if wget supported
> a *-j* flag for executing this, in order to access these dynamically loaded
> resources. Execution may add some time to downloads, but for users that
> really want the content, having the option is better than not.
>
> Possible solutions:
>
> The HtmlUnit <http://htmlunit.sourceforge.net/> library can already do
> this, but it's written in Java and I believe wget is written in C?

correct, wget is written in C.


> Another consideration for attaching JS execution to wget is
> Node<http://nodejs.org/>, a
> C++ implementation, though we probably only want the core, the
> V8<https://code.google.com/p/v8/>JavaScript engine itself.
>
> Other possibilities include
> SpiderMonkey<http://en.wikipedia.org/wiki/SpiderMonkey_(JavaScript_engine)>,
> the JS engine for Firefox, or
> JavaScriptCore<http://www.webkit.org/projects/javascript/>,
> Safari's JS engine.

how would you programmatically retrieve these links?  Triggering
"onload" or other events?  I wonder how many of these occurrences we can
cover by simply trying to parse cases like document.location='foo'
without involving any JS engine.

Giuseppe

Reply via email to