Dan Anderson wrote:
>
>         I am trying to create a  spider to grab my books off of Safari
> for a  batch printing job so I  don't need to go  through each chapter
> myself and hit the Print button.  So I used this script to try and log
> myself in to the safari site:
>
> # BEGIN CODE
> #! /usr/bin/perl
>
> use strict;
> use warnings;
> use LWP;
> use LWP::UserAgent;

Use one or the other, but not both. LWP is a module that just 'require's
LWP::UserAgent.

> # variables
> my $cookie_jar_file = "./cookies.txt";
> my @headers = (
>   'User-Agent'      => 'Mozilla/4.76 [en] (Win98; U)',
>   'Accept'          => 'image/gif, image/x-bitmap, image/jpeg,
>                                         image/pjpeg, image/png, */*',
>   'Accept-Charset'  => 'iso-8859-1,*',
>   'Accept-Language' => 'en-US',
>   "catid" => "",
>   "s" => "1",
>   "o" => "1",
>   "b" => "1",
>   "t" => "1",
>   "f" => "1",
>   "c" => "1",
>   "u" => "1",
>   "r" => "",
>   "l" => "1",
>   "g" => "",
>   "usr" => "myemail",
>   "pwd" => "mypassword",
>   "savepwd" => "1",
> );
> # end variables
>
> my $user_agent = LWP::UserAgent->new;
> $user_agent->cookie_jar({file => $cookie_jar_file});
> my $response = $user_agent->post(
> 'http://safari.oreilly.com/JVXSL.asp',
> @headers,
> );
> # END CODE
>
>         Now I know that this is the form I should post to because
> I stripped the following forms out of the web page (and there is
> no Javascript to modify the forms):
>
> <form action="JVXSL.asp" method="post">
> <input type="hidden" name="catid" value="">
> <input type="hidden" name="s" value="1">
> <input type="hidden" name="o" value="1">
> <input type="hidden" name="b" value="1">
> <input type="hidden" name="t" value="1">
> <input type="hidden" name="f" value="1">
> <input type="hidden" name="c" value="1">
> <input type="hidden" name="u" value="1">
> <input type="hidden" name="r" value="">
> <input type="hidden" name="l" value="1">
> <input type="hidden" name="g" value="">
> <input name="usr" type="text" value="" size="12">
> <input name="pwd" type="password" value="" size="12">
> <input type="checkbox" name="savepwd" value="1">
> <input type="image" name="Login" src="images/btn_login.gif" width="40" height="16" 
> border="0" align="absmiddle">
> </form>
>
>         When I pull up this web page there's nothing in
> $response->content.  I know that safari.oreilly.com will return a
> blank page if it doesn't like the user agent, and upon signing in
> it'll return to the safari.oreilly.com page with a very large number
> of get variables.  Does anyone know what I might be doing wrong?

You can't put form input into header fields! Use LWP to fetch the
Safari home page and HTML::Form to parse the form and enter
field values. None of the 'Accept' headers are necessary. Take a look
at this:


  use strict;
  use warnings;

  use LWP;
  use HTML::Form;

  my $ua = new LWP::UserAgent(agent => 'Mozilla/4.76 [en] (Win98; U)');
  $ua->cookie_jar({});

  my $resp = $ua->get('http://safari.oreilly.com/');
  die $resp->status_line unless $resp->is_success;

  # There are two forms on the page. Find the one with an input named 'Login'.
  #
  my $login;

  foreach (HTML::Form->parse($resp)) {
    if ($_->find_input('Login')) {
      $login = $_;
      last;
    }
  }

  $login->param('usr', '[EMAIL PROTECTED]');
  $login->param('pwd', 'secret');

  $resp = $ua->request($login->click);
  die $resp->status_line unless $resp->is_success;


HTH,

Rob




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to