Andy Lester wrote:
Some interesting ideas here...
Date: Sun, 20 Mar 2005 07:11:11 +1000 From: Robert Barta <[EMAIL PROTECTED]> To: libwww@perl.org
Hi all,
I have put WWW::Agent onto CPAN.
http://search.cpan.org/~drrho/WWW-Agent/
We will use it here to base on it functionality given in WWW::Mechanize, WWW::Robot, LWP::RobotUA, and another gazillion of similar packages. From the README:
What is it? -----------
This suite of packages [ HIGHLY EXPERIMENTAL ] provide basic functionality of an 'abstract browser'. The idea is that that abstract browser is only capable to load objects (pages) via HTTP, FTP, ..., but itself has no other functionality.
It is the tasks of particular plugins to add more specific functionality, such as 'Link Checking' or 'Spidering' or 'Having headers like Firefox'.
To make that happen, the abstract browser exposes the phases of a request to allow plugins (aka modules) to intercept when they feel the need. [ If you understand Apache's module concept then you immediately get the idea. ]
To make things interesting, and to allow the agent to be run in reactive environments, it is written based on POE (Perl Object Environment, or similar). The good side of this is, that your application is not necessarily blocked when fetching documents off the network. The downside is that programming is a bit more, well, interesting.
The only interesting plugin at this stage is the 'Director' which interpretes a textual language (called WeeZL) to visit websites via a script. Everything is quite crude still, but something like the following might even work: It logs in into our teaching portal, and scrapes student results from the HTML.
login: { # block to define how to log in url m|https?://james.bond.edu.au/.*| or die "there is nothing to log in here" <form> and fill uid $username # fill out the login form (there is and fill pwd $password # only one there) and click login url m|^https://| or die "not using HTTPS" # now we are using SSL, good }
logout: { # block how to log out <form> and click logout }
assessment: { # describes how to find the assessment url http://james.bond.edu.au/ # go to the site
text m|Welcome to Bond University| # test we are there
warn $course # only debugging output
login (username => "XXXX", password => 'XXXX') # invoke the login sequence
wait ~ 15 secs # behave like a human
<a> m|Courses| and click # go to the courses overview page
<table> and # find the first table there
<td> m|$course| and # and a td which's text matches the course title
<a> m|$course| and click # and find the right link and activate
myscraper # user defined method to analyze current page
logout() # invoke log out sequence
}
#-- Here the whole starts ----------------------------------------------------------- warn "starting" assessment (course => "System Security") # invoking above component warn "stopping" #------------------------------------------------------------------------------------
The language is (roughly) described there
http://search.cpan.org/~drrho/WWW-Agent/lib/WWW/Agent/Plugins/Director.pm
--
A tutorial about writing plugins is at
http://search.cpan.org/~drrho/WWW-Agent/lib/WWW/Agent/Plugins.pm
I would expect that things will shift dramatically in the beginning, though. Feedback appreciated.
--
\rho
----- End forwarded message -----
Interesting idea, but shouldn't POE-based modules stay in the POE namespace?
-ofer