Andrew Pennebaker <[email protected]> writes: > Tumblr and other websites delay loading some of their content (images) > through JavaScript events like *onload*. It would be nice if wget supported > a *-j* flag for executing this, in order to access these dynamically loaded > resources. Execution may add some time to downloads, but for users that > really want the content, having the option is better than not. > > Possible solutions: > > The HtmlUnit <http://htmlunit.sourceforge.net/> library can already do > this, but it's written in Java and I believe wget is written in C?
correct, wget is written in C. > Another consideration for attaching JS execution to wget is > Node<http://nodejs.org/>, a > C++ implementation, though we probably only want the core, the > V8<https://code.google.com/p/v8/>JavaScript engine itself. > > Other possibilities include > SpiderMonkey<http://en.wikipedia.org/wiki/SpiderMonkey_(JavaScript_engine)>, > the JS engine for Firefox, or > JavaScriptCore<http://www.webkit.org/projects/javascript/>, > Safari's JS engine. how would you programmatically retrieve these links? Triggering "onload" or other events? I wonder how many of these occurrences we can cover by simply trying to parse cases like document.location='foo' without involving any JS engine. Giuseppe
