> I have been using Perl's www:mechanize to scrape a series of web pages.
>
> Unfortunately the web page now includes some Javascript, which
> mechanize does not handle. Suggestions?
>
> All of my code is in shell script and perl, so I'd like to stick with
> those.
>
> Suggestions?

+1 for Selenium with a real browser. I have used it with Python quite
extensively over the last year, but I imagine Perl API is reasonable
as well. Python is more polished, though, and you could invoke a
Python processor from Perl. Somewhat heavy weight and slow, but you
can redirect the browser DISPLAY to point to Xvfb null display and
then you can run it in headless mode which saves you the graphics
rendering at the very least. You can do all kinds of cool things -
wait for the browser to render a certain element, inject your own
Javascript, take screenshots. You can combine simple scraping with
Selenium-style as well - e.g. if you detect something that you know
you cannot handle with a simple HTML tree parser, you just delegate it
to Selenium.

Another nice thing is that you can add an option to your script to
show you the visuals and pause at certain breakpoints. I do this all
the time when doing testing and web development. E.g. I have some code
already that gets me to a certain part of the UI. After that I do not
know what to do. Instead of having to get there manually (which means
I have to remember how and to actually do it), I just have my script
take me there, and then pause waiting for a key press in the terminal.
Once it gets there, I use the DOM inspector in the browser to figure
out the ID or XPath, or maybe even custom Javascript to advance to the
next stage and also what to validate and how. In fact, nowadays I do
most of my web development with the help of Selenium - this way my
brain is relieved of the tedium of having to click and type the same
thing over and over, and when I am done I have an automated regression
test to ensure I will never break what I just coded without the test
suite raising a red flag.

And, for a bonus, if you want to demonstrate a web UI failure or
feature to a coworker, you can get his desktop set up with a local
instance of Selenium, and then you just give your test an argument to
point to his instance. Of course in that case you need to make sure
that unwanted parties do not have access to the Selenium port.



-- 
Sasha Pachev

Fast Running Blog.
http://fastrunningblog.com
Run. Blog. Improve. Repeat.

/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/

Reply via email to