Andy Lester wrote:

Some interesting ideas here...

Date: Sun, 20 Mar 2005 07:11:11 +1000
From: Robert Barta <[EMAIL PROTECTED]>
To: libwww@perl.org

Hi all,

I have put WWW::Agent onto CPAN.

   http://search.cpan.org/~drrho/WWW-Agent/

We will use it here to base on it functionality given in
WWW::Mechanize, WWW::Robot, LWP::RobotUA, and another gazillion of
similar packages. From the README:

   What is it?
   -----------

   This suite of packages [ HIGHLY EXPERIMENTAL ] provide basic
   functionality of an 'abstract browser'. The idea is that that
   abstract browser is only capable to load objects (pages) via
   HTTP, FTP, ..., but itself has no other functionality.

   It is the tasks of particular plugins to add more specific
   functionality, such as 'Link Checking' or 'Spidering' or 'Having
   headers like Firefox'.

   To make that happen, the abstract browser exposes the phases of a
   request to allow plugins (aka modules) to intercept when they feel the
   need. [ If you understand Apache's module concept then you immediately
   get the idea. ]

   To make things interesting, and to allow the agent to be run in
   reactive environments, it is written based on POE (Perl Object
   Environment, or similar). The good side of this is, that your
   application is not necessarily blocked when fetching documents off the
   network. The downside is that programming is a bit more, well,
   interesting.

The only interesting plugin at this stage is the 'Director' which
interpretes a textual language (called WeeZL) to visit websites via a
script.  Everything is quite crude still, but something like the
following might even work: It logs in into our teaching portal, and
scrapes student results from the HTML.

   login: {                                              # block to define how 
to log in
      url m|https?://james.bond.edu.au/.*|      or die "there is nothing to log in 
here"
      <form> and fill uid $username                      # fill out the login 
form (there is
             and fill pwd $password                      # only one there)
             and click login
      url m|^https://|                          or die "not using HTTPS"
                                                         # now we are using 
SSL, good
   }

   logout: {                                             # block how to log out
      <form> and click logout
   }

assessment: { # describes how to find the assessment url http://james.bond.edu.au/ # go to the site
text m|Welcome to Bond University| # test we are there
warn $course # only debugging output
login (username => "XXXX", password => 'XXXX') # invoke the login sequence
wait ~ 15 secs # behave like a human
<a> m|Courses| and click # go to the courses overview page
<table> and # find the first table there
<td> m|$course| and # and a td which's text matches the course title
<a> m|$course| and click # and find the right link and activate
myscraper # user defined method to analyze current page
logout() # invoke log out sequence
}


   #-- Here the whole starts 
-----------------------------------------------------------
   warn "starting"
   assessment (course => "System Security")              # invoking above 
component
   warn "stopping"
   
#------------------------------------------------------------------------------------

The language is (roughly) described there

   http://search.cpan.org/~drrho/WWW-Agent/lib/WWW/Agent/Plugins/Director.pm

--

A tutorial about writing plugins is at

  http://search.cpan.org/~drrho/WWW-Agent/lib/WWW/Agent/Plugins.pm

I would expect that things will shift dramatically in the beginning,
though. Feedback appreciated.

--

\rho


----- End forwarded message -----



Interesting idea, but shouldn't POE-based modules stay in the POE namespace?

-ofer

Reply via email to