On Sat, 17 Dec 2005, Andy Lester wrote:

> On Sat, Dec 17, 2005 at 12:16:29PM -0500, Christopher Hart ([EMAIL 
> PROTECTED]) wrote:
> > There are also JavaScript engines available in C and Java
> > (SpiderMonkey and Rhino, respectively, available on mozilla.org).  You
> > may be able to leverage those.
> 
> I didn't know about SpiderMonkey.  I'm going to have a look at it to see
> if it will fit into WWW::Mechanize.

Hi Andy

As I've posted about here before a few times (search Gmane), I actually
did this with my Python port of WWW::Mechanize a few years back, using
spidermonkey.  My implementation was a first-cut half-baked thing, but I
did get it working for a few pages.  I decided that was enough excitement
for me ;-)  I know a few people used it for projects of their own and
improved on it a bit, though (eg. one guy used it in a college project to
make JS-using pages accessible on non-JS devices, by having a proxy server
and executing the JS there -- nice idea).  The code is still available at
wwwsearch.sf.net

I made use of the Perl wrapper of SpiderMonkey to write something very
similar for Python.  IIRC, I had to extend it a little over what was in
the Perl thing.

I used an existing HTML DOM, but had to modify both the DOM, and of course
the DOM builder (and add event stuff and browser object model).  This is
where the work lies :-)  If you intend to try this, and you're not
intimately familiar with the bizarre ways in which people can and do use
<script> tags, I have some email you may want to read (I certainly didn't
understand the issues, so my published code is wrong; a contributor
provided patches & explanations that I never merged in).

Of course:

1. A good, strict, HTML DOM tree builder is not the same as a good browser
DOM builder.  It must be very lenient.  I'm not up-to-date with current
Perl libraries, but I don't think such a thing exists.  Of course, lenient
tree builders like HTML::TreeBuilder exist, but recall that script
execution takes place during DOM building and that script must be able to
access the part-built DOM, so they would need to be 're-targeted' (or even
dynamically mapped, perhaps) to a 'real' DOM tree.  Actually, just a week
ago I was looking at reusing the Mozilla DOM & builder in a lightweight
way (ie. without a GUI and probably without Mozilla's URL-fetching code)
-- I'd be interested if other people get this to work (sum total of the
work I did so far was to compile Firefox and run some of its tests, so I
don't know whether it's feasible yet).

2. A generic HTML DOM is not the same thing as a good browser DOM +
browser object model.  There are many quirks.  And I'm not even sure
there's a good HTML DOM out there for Perl.  Anybody know one?  AFAIK,
there's no good free browser DOM out there in *any* language other than
C++ (in Firefox and KHTML), though I recall Java's httpunit does some JS
stuff, using Rhino (dunno how well), so clearly whatever DOM they use is
good enough for at least some JS to work.

On the whole, don't underestimate the work, but I think it's not *too*
hard to make something useful, if not perfection.


John

Reply via email to