"The tools I'm using are insufficient for the task at hand. But I don't
want to use anything else." == Full stop.

I know that's not really how you meant it--but at first read that's what
came across.

What exactly is the javascript doing to the page--is it an SPA? If you're
trying to get data, and it's using javascript w/ajax to get it, you may not
need to scrape anything--just use the same API the page's javascript is
using and just grab the data directly.

Otherwise it sounds like you may need to add another tool to finish this
task.

Perl gurus may know of some perl module that could help.

I recommend looking into phantomjs. It may come very close to helping you
solve this problem.

You could perhaps also use elinks or some other text-based browser
w/javascript support that will let you collect the dom after applying the
javascript.

Another option might be to use konqueror which has an
archive-for-offline-viewing feature that you can probably invoke via qdbus
to scrape a page into a war file and then parse that. It may not even need
a display, but you could use xvfb if it does. However, this is what
phantomjs should already allow you to do.


​

/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/

Reply via email to