Some interesting ideas here...

Date: Sun, 20 Mar 2005 07:11:11 +1000
From: Robert Barta <[EMAIL PROTECTED]>
To: libwww@perl.org

Hi all,

I have put WWW::Agent onto CPAN.

    http://search.cpan.org/~drrho/WWW-Agent/

We will use it here to base on it functionality given in
WWW::Mechanize, WWW::Robot, LWP::RobotUA, and another gazillion of
similar packages. From the README:

    What is it?
    -----------

    This suite of packages [ HIGHLY EXPERIMENTAL ] provide basic
    functionality of an 'abstract browser'. The idea is that that
    abstract browser is only capable to load objects (pages) via
    HTTP, FTP, ..., but itself has no other functionality.

    It is the tasks of particular plugins to add more specific
    functionality, such as 'Link Checking' or 'Spidering' or 'Having
    headers like Firefox'.

    To make that happen, the abstract browser exposes the phases of a
    request to allow plugins (aka modules) to intercept when they feel the
    need. [ If you understand Apache's module concept then you immediately
    get the idea. ]

    To make things interesting, and to allow the agent to be run in
    reactive environments, it is written based on POE (Perl Object
    Environment, or similar). The good side of this is, that your
    application is not necessarily blocked when fetching documents off the
    network. The downside is that programming is a bit more, well,
    interesting.

The only interesting plugin at this stage is the 'Director' which
interpretes a textual language (called WeeZL) to visit websites via a
script.  Everything is quite crude still, but something like the
following might even work: It logs in into our teaching portal, and
scrapes student results from the HTML.

    login: {                                              # block to define how 
to log in
       url m|https?://james.bond.edu.au/.*|      or die "there is nothing to 
log in here"
       <form> and fill uid $username                      # fill out the login 
form (there is
              and fill pwd $password                      # only one there)
              and click login
       url m|^https://|                          or die "not using HTTPS"
                                                          # now we are using 
SSL, good
    }

    logout: {                                             # block how to log out
       <form> and click logout
    }

    assessment: {                                         # describes how to 
find the assessment 
        url http://james.bond.edu.au/                     # go to the site
        text m|Welcome to Bond University|                # test we are there
        warn $course                                      # only debugging 
output
        login (username => "XXXX", password => 'XXXX')    # invoke the login 
sequence
        wait ~ 15 secs                                    # behave like a human
        <a> m|Courses| and click                          # go to the courses 
overview page
        <table> and                                       # find the first 
table there
            <td> m|$course| and                           # and a td which's 
text matches the course title
               <a> m|$course| and click                   # and find the right 
link and activate
        myscraper                                         # user defined method 
to analyze current page
        logout()                                          # invoke log out 
sequence
    }

    #-- Here the whole starts 
-----------------------------------------------------------
    warn "starting"
    assessment (course => "System Security")              # invoking above 
component
    warn "stopping"
    
#------------------------------------------------------------------------------------

The language is (roughly) described there

    http://search.cpan.org/~drrho/WWW-Agent/lib/WWW/Agent/Plugins/Director.pm

--

A tutorial about writing plugins is at

   http://search.cpan.org/~drrho/WWW-Agent/lib/WWW/Agent/Plugins.pm

I would expect that things will shift dramatically in the beginning,
though. Feedback appreciated.

--

\rho


----- End forwarded message -----

-- 
Andy Lester => [EMAIL PROTECTED] => www.petdance.com => AIM:petdance

Reply via email to