I am trying to create a  spider to grab my books off of Safari
for a  batch printing job so I  don't need to go  through each chapter
myself and hit the Print button.  So I used this script to try and log
myself in to the safari site:

# BEGIN CODE
#! /usr/bin/perl

use strict;
use warnings;
use LWP;
use LWP::UserAgent;

# variables
my $cookie_jar_file = "./cookies.txt";
my @headers = (
                  'User-Agent'      => 'Mozilla/4.76 [en] (Win98; U)',
                  'Accept'          => 'image/gif, image/x-bitmap, image/jpeg,
                                        image/pjpeg, image/png, */*',
                  'Accept-Charset'  => 'iso-8859-1,*',
                  'Accept-Language' => 'en-US',
                                  "catid" => "",
                                  "s" => "1",
                                  "o" => "1",
                                  "b" => "1",
                                  "t" => "1",
                                  "f" => "1",
                                  "c" => "1",
                                  "u" => "1",
                                  "r" => "",
                                  "l" => "1",
                                  "g" => "",
                                  "usr" => "myemail",
                                  "pwd" => "mypassword",
                                  "savepwd" => "1",
                 );
# end variables

my $user_agent = LWP::UserAgent->new;
$user_agent->cookie_jar({file => $cookie_jar_file});
my $response = $user_agent->post(
                                 'http://safari.oreilly.com/JVXSL.asp',
                                 @headers,
                                 );
# END CODE

        Now I know that this is the form I should post to because
I stripped the following forms out of the web page (and there is
no Javascript to modify the forms):

<form action="JVXSL.asp" method="post">
<input type="hidden" name="catid" value="">
<input type="hidden" name="s" value="1">
<input type="hidden" name="o" value="1">
<input type="hidden" name="b" value="1">
<input type="hidden" name="t" value="1">
<input type="hidden" name="f" value="1">
<input type="hidden" name="c" value="1">
<input type="hidden" name="u" value="1">
<input type="hidden" name="r" value="">
<input type="hidden" name="l" value="1">
<input type="hidden" name="g" value="">
<input name="usr" type="text" value="" size="12">
<input name="pwd" type="password" value="" size="12">
<input type="checkbox" name="savepwd" value="1">
<input type="image" name="Login" src="images/btn_login.gif" width="40" height="16" 
border="0" align="absmiddle">
</form>

        When I pull up this web page there's nothing in
$response->content.  I know that safari.oreilly.com will return a
blank page if it doesn't like the user agent, and upon signing in
it'll return to the safari.oreilly.com page with a very large number
of get variables.  Does anyone know what I might be doing wrong?

        Also, I figured I'm not the only person who would want to do
this.  Anyone interested in starting up a Sourceforge project with me
and releasing it under the GPL?

-Dan



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to