Some interesting ideas here...
Date: Sun, 20 Mar 2005 07:11:11 +1000
From: Robert Barta <[EMAIL PROTECTED]>
To: [email protected]
Hi all,
I have put WWW::Agent onto CPAN.
http://search.cpan.org/~drrho/WWW-Agent/
We will use it here to base on it functionality given in
WWW::Mechanize, WWW::Robot, LWP::RobotUA, and another gazillion of
similar packages. From the README:
What is it?
-----------
This suite of packages [ HIGHLY EXPERIMENTAL ] provide basic
functionality of an 'abstract browser'. The idea is that that
abstract browser is only capable to load objects (pages) via
HTTP, FTP, ..., but itself has no other functionality.
It is the tasks of particular plugins to add more specific
functionality, such as 'Link Checking' or 'Spidering' or 'Having
headers like Firefox'.
To make that happen, the abstract browser exposes the phases of a
request to allow plugins (aka modules) to intercept when they feel the
need. [ If you understand Apache's module concept then you immediately
get the idea. ]
To make things interesting, and to allow the agent to be run in
reactive environments, it is written based on POE (Perl Object
Environment, or similar). The good side of this is, that your
application is not necessarily blocked when fetching documents off the
network. The downside is that programming is a bit more, well,
interesting.
The only interesting plugin at this stage is the 'Director' which
interpretes a textual language (called WeeZL) to visit websites via a
script. Everything is quite crude still, but something like the
following might even work: It logs in into our teaching portal, and
scrapes student results from the HTML.
login: { # block to define how
to log in
url m|https?://james.bond.edu.au/.*| or die "there is nothing to
log in here"
<form> and fill uid $username # fill out the login
form (there is
and fill pwd $password # only one there)
and click login
url m|^https://| or die "not using HTTPS"
# now we are using
SSL, good
}
logout: { # block how to log out
<form> and click logout
}
assessment: { # describes how to
find the assessment
url http://james.bond.edu.au/ # go to the site
text m|Welcome to Bond University| # test we are there
warn $course # only debugging
output
login (username => "XXXX", password => 'XXXX') # invoke the login
sequence
wait ~ 15 secs # behave like a human
<a> m|Courses| and click # go to the courses
overview page
<table> and # find the first
table there
<td> m|$course| and # and a td which's
text matches the course title
<a> m|$course| and click # and find the right
link and activate
myscraper # user defined method
to analyze current page
logout() # invoke log out
sequence
}
#-- Here the whole starts
-----------------------------------------------------------
warn "starting"
assessment (course => "System Security") # invoking above
component
warn "stopping"
#------------------------------------------------------------------------------------
The language is (roughly) described there
http://search.cpan.org/~drrho/WWW-Agent/lib/WWW/Agent/Plugins/Director.pm
--
A tutorial about writing plugins is at
http://search.cpan.org/~drrho/WWW-Agent/lib/WWW/Agent/Plugins.pm
I would expect that things will shift dramatically in the beginning,
though. Feedback appreciated.
--
\rho
----- End forwarded message -----
--
Andy Lester => [EMAIL PROTECTED] => www.petdance.com => AIM:petdance