Re: Problems with LWP::UserAgent

2003-12-27 Thread Randal L. Schwartz
 Dan == Dan Anderson [EMAIL PROTECTED] writes:

Dan I guess I should stop then, but I was looking at O'Reilly's
Dan robots.txt file (http://safari.oreilly.com/robots.txt):

Dan User-Agent: *
Dan Allow: /

Dan Which made me think spidering was alright.

That's for spiders on the public content.  Not the content that you're
paying for, which is subject to your agreement.

You can't download an entire book because O'Reilly would have no way
of disabling access to that book when you select a different book for
your bookshelf the following month.  You aren't buying the book.
You're renting access to the book in online form for a fixed period.

Please respect the license agreement.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
[EMAIL PROTECTED] URL:http://www.stonehenge.com/merlyn/
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: Problems with LWP::UserAgent

2003-12-26 Thread Rob Dixon
Dan Anderson wrote:

 I am trying to create a  spider to grab my books off of Safari
 for a  batch printing job so I  don't need to go  through each chapter
 myself and hit the Print button.  So I used this script to try and log
 myself in to the safari site:

 # BEGIN CODE
 #! /usr/bin/perl

 use strict;
 use warnings;
 use LWP;
 use LWP::UserAgent;

Use one or the other, but not both. LWP is a module that just 'require's
LWP::UserAgent.

 # variables
 my $cookie_jar_file = ./cookies.txt;
 my @headers = (
   'User-Agent'  = 'Mozilla/4.76 [en] (Win98; U)',
   'Accept'  = 'image/gif, image/x-bitmap, image/jpeg,
 image/pjpeg, image/png, */*',
   'Accept-Charset'  = 'iso-8859-1,*',
   'Accept-Language' = 'en-US',
   catid = ,
   s = 1,
   o = 1,
   b = 1,
   t = 1,
   f = 1,
   c = 1,
   u = 1,
   r = ,
   l = 1,
   g = ,
   usr = myemail,
   pwd = mypassword,
   savepwd = 1,
 );
 # end variables

 my $user_agent = LWP::UserAgent-new;
 $user_agent-cookie_jar({file = $cookie_jar_file});
 my $response = $user_agent-post(
 'http://safari.oreilly.com/JVXSL.asp',
 @headers,
 );
 # END CODE

 Now I know that this is the form I should post to because
 I stripped the following forms out of the web page (and there is
 no Javascript to modify the forms):

 form action=JVXSL.asp method=post
 input type=hidden name=catid value=
 input type=hidden name=s value=1
 input type=hidden name=o value=1
 input type=hidden name=b value=1
 input type=hidden name=t value=1
 input type=hidden name=f value=1
 input type=hidden name=c value=1
 input type=hidden name=u value=1
 input type=hidden name=r value=
 input type=hidden name=l value=1
 input type=hidden name=g value=
 input name=usr type=text value= size=12
 input name=pwd type=password value= size=12
 input type=checkbox name=savepwd value=1
 input type=image name=Login src=images/btn_login.gif width=40 height=16 
 border=0 align=absmiddle
 /form

 When I pull up this web page there's nothing in
 $response-content.  I know that safari.oreilly.com will return a
 blank page if it doesn't like the user agent, and upon signing in
 it'll return to the safari.oreilly.com page with a very large number
 of get variables.  Does anyone know what I might be doing wrong?

You can't put form input into header fields! Use LWP to fetch the
Safari home page and HTML::Form to parse the form and enter
field values. None of the 'Accept' headers are necessary. Take a look
at this:


  use strict;
  use warnings;

  use LWP;
  use HTML::Form;

  my $ua = new LWP::UserAgent(agent = 'Mozilla/4.76 [en] (Win98; U)');
  $ua-cookie_jar({});

  my $resp = $ua-get('http://safari.oreilly.com/');
  die $resp-status_line unless $resp-is_success;

  # There are two forms on the page. Find the one with an input named 'Login'.
  #
  my $login;

  foreach (HTML::Form-parse($resp)) {
if ($_-find_input('Login')) {
  $login = $_;
  last;
}
  }

  $login-param('usr', '[EMAIL PROTECTED]');
  $login-param('pwd', 'secret');

  $resp = $ua-request($login-click);
  die $resp-status_line unless $resp-is_success;


HTH,

Rob




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: Problems with LWP::UserAgent

2003-12-26 Thread R. Joseph Newton
zentara wrote:

 On 24 Dec 2003 16:05:16 -0500, [EMAIL PROTECTED] (Dan Anderson) wrote:
 
 I am trying to create a  spider to grab my books off of Safari
 for a  batch printing job so I  don't need to go  through each chapter
 myself and hit the Print button.  So I used this script to try and log
 myself in to the safari site:

 Watch out, Safari monitors for this, and I believe it's in there EULA.
 I was warned for surfing too fast, and wasn't even using a script.

 You should slow down your script, and randomize times, maybe
 spread it out over the whole day too.

Either that, or just respect their intent.  The open-source world is made of
balances.  One of them is the willingness of authors to make materials
available online, under conditions that still encourage people to buy the
books or materials.  It doesn't seem unreasonable at all to ask that people
at least look at the page they are downloading.

Call me an old fogy, but I think that some of the mechanization of Web
communications has gone too far.  Providing interactive features in the CGI
is one thing.  It provides services for both sides of any transaction
involved.  Batch harvesting of pages meant for human perusal, like batch
dialing of persons homes at mealtimes, strays across a line into misuse of
technology, IMHO.  Apparently, the folks at O'Reilly agree.  Since some of
them at least, have been around the CGI since its inception, you might have
a bit of a challenge in thwarting their intended use of their site.

Joseph


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: Problems with LWP::UserAgent

2003-12-26 Thread Dan Anderson
 Call me an old fogy, but I think that some of the mechanization of Web
 communications has gone too far.  Providing interactive features in the CGI
 is one thing.  It provides services for both sides of any transaction
 involved.  Batch harvesting of pages meant for human perusal, like batch
 dialing of persons homes at mealtimes, strays across a line into misuse of
 technology, IMHO.  Apparently, the folks at O'Reilly agree.  Since some of
 them at least, have been around the CGI since its inception, you might have
 a bit of a challenge in thwarting their intended use of their site.

Well, Safari *does* provide for printing of pages from a book
and e-mailing copies of them to other people.  My intention is not to
twahrt them, but -- for instance -- when I go on a trip for christams
instead of having to print out each and every chapter to the Perl
Cookbook I can just send a script to do it.  IMHO not a violation of
the Safari terms of service.

Not only that Safari has a number of features in place that I
couldn't get around if I wanted to.  For instance, all books must be
kept on the bookshelf for at least 30 days -- which (short of hacking
their server) is not going to be circumvented.

So, all in all, I think that my usage falls under the term
fair use.  I have no desire to circumvent Safari's security -- I'm
just looking to speed up something I do which conforms to the TOS of
the web site.  :-D

-Dan


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: Problems with LWP::UserAgent

2003-12-26 Thread Steve Grazzini
On Fri, Dec 26, 2003 at 12:52:06PM -0500, Dan Anderson wrote:
 So, all in all, I think that my usage falls under the term fair use.
 I have no desire to circumvent Safari's security -- I'm just looking
 to speed up something I do which conforms to the TOS of the web site.

off-topic and grinchy

Fair use is copyright law -- I don't know whether you're infringing
anybody's copyright, but you're certainly violating O'Reilly's Terms of
Service, which requires that you agree:

not to use Web spiders or any other automated retrieval
mechanisms when using the Service other than what is provided
by the Service

-- 
Steve

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: Problems with LWP::UserAgent

2003-12-26 Thread Dan Anderson
 Fair use is copyright law -- I don't know whether you're infringing
 anybody's copyright, but you're certainly violating O'Reilly's Terms of
 Service, which requires that you agree:
 
 not to use Web spiders or any other automated retrieval
 mechanisms when using the Service other than what is provided
 by the Service
 
I guess I should stop then, but I was looking at O'Reilly's
robots.txt file (http://safari.oreilly.com/robots.txt):

User-Agent: *
Allow: /

Which made me think spidering was alright.

-Dan


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Problems with LWP::UserAgent

2003-12-24 Thread Dan Anderson

I am trying to create a  spider to grab my books off of Safari
for a  batch printing job so I  don't need to go  through each chapter
myself and hit the Print button.  So I used this script to try and log
myself in to the safari site:

# BEGIN CODE
#! /usr/bin/perl

use strict;
use warnings;
use LWP;
use LWP::UserAgent;

# variables
my $cookie_jar_file = ./cookies.txt;
my @headers = (
  'User-Agent'  = 'Mozilla/4.76 [en] (Win98; U)',
  'Accept'  = 'image/gif, image/x-bitmap, image/jpeg,
image/pjpeg, image/png, */*',
  'Accept-Charset'  = 'iso-8859-1,*',
  'Accept-Language' = 'en-US',
  catid = ,
  s = 1,
  o = 1,
  b = 1,
  t = 1,
  f = 1,
  c = 1,
  u = 1,
  r = ,
  l = 1,
  g = ,
  usr = myemail,
  pwd = mypassword,
  savepwd = 1,
 );
# end variables

my $user_agent = LWP::UserAgent-new;
$user_agent-cookie_jar({file = $cookie_jar_file});
my $response = $user_agent-post(
 'http://safari.oreilly.com/JVXSL.asp',
 @headers,
 );
# END CODE

Now I know that this is the form I should post to because
I stripped the following forms out of the web page (and there is
no Javascript to modify the forms):

form action=JVXSL.asp method=post
input type=hidden name=catid value=
input type=hidden name=s value=1
input type=hidden name=o value=1
input type=hidden name=b value=1
input type=hidden name=t value=1
input type=hidden name=f value=1
input type=hidden name=c value=1
input type=hidden name=u value=1
input type=hidden name=r value=
input type=hidden name=l value=1
input type=hidden name=g value=
input name=usr type=text value= size=12
input name=pwd type=password value= size=12
input type=checkbox name=savepwd value=1
input type=image name=Login src=images/btn_login.gif width=40 height=16 
border=0 align=absmiddle
/form

When I pull up this web page there's nothing in
$response-content.  I know that safari.oreilly.com will return a
blank page if it doesn't like the user agent, and upon signing in
it'll return to the safari.oreilly.com page with a very large number
of get variables.  Does anyone know what I might be doing wrong?

Also, I figured I'm not the only person who would want to do
this.  Anyone interested in starting up a Sourceforge project with me
and releasing it under the GPL?

-Dan



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: Problems with LWP::UserAgent and HTTP::Response

2001-06-18 Thread Tim Keefer

Hi Ela,
The documentation for perl LWP agent seems sparse. I had a difficult time
figuring out how to send multipart form-data. I'll share the code with you that

some shared with me. Hope it helps.



require  LWP;
use  LWP::UserAgent;
use  HTTP::Request::Common;

# Create a user agent object

  $ua = new LWP::UserAgent;
  $ua-agent(AgentName/0.1  . $ua-agent);

# Pass request to the user agent and get a response back

  my $res = $ua-request (POST $URL, Content_Type = 'form-data', Content = [
  login_id = $Username,
  login_passwd = $Password,
  name_auth= $Prefix,
  fname= [$XML_Dir\\$XML_File],
  operation= 'Submit Batch File',
]);

# Check the outcome of the response - I guess we just file away
  if ($res-is_success) {
print success!\n;
print $res-content;
   if ( $res-content =~ /\QH2SUCCESS\/H2\E/i ) {
print Deposit successful\n;
} else {
 print POSTLOG Deposit FAILED.\n;
   }

  } else {
print  failed!\n;
   }


Ela Jarecka wrote:

 Hi,
 I am using the following code to send and XML document ( output.xml ) to a
 remote server:

 use strict;
 use LWP::Debug qw(+);
 use LWP::UserAgent;
 use IO;

 my $resp;
 $resp = 'response.xml';
 my $FILEH;
 open (FILEH, output.xml) or die Can't open file output.xml!\n;

 my $ua = LWP::UserAgent-new;

 #another version that i've tried...
 #my $h = new HTTP::Headers Date= '2001-05-18';
 #my $req =
 HTTP::Request-new('POST','http://195.252.142.171:8008',$h,$FILEH);

 my $req = HTTP::Request-new(POST = 'http://195.252.142.171:8008');

 #$req-content_type('text/xml');
 $req-content($FILEH);

 my $res = $ua-request($req,$resp); here I've also
 tried plain request($req) but the result is the same
 if ( $res-is_success) {
print OK!\n;
#print $res-as_string;
 } else {
print Failed: , $res-status_line, \n;
 }

 And that's what I get:

 LWP::UserAgent::new: ()
 LWP::UserAgent::request: ()
 LWP::UserAgent::simple_request: POST http://195.252.142.171:8008/
 LWP::UserAgent::_need_proxy: (http://195.252.142.171:8008/)
 LWP::UserAgent::_need_proxy: Not proxied
 LWP::Protocol::http::request: ()
 LWP::Protocol::http::request: POST / HTTP/1.0
 Host: 195.252.142.171:8008
 User-Agent: libwww-perl/5.21

 LWP::Protocol::http::request: reading response
 LWP::UserAgent::request: Simple result: Internal Server Error
 Failed: 500 read timeout

 ###
 Could anyone please help me? The problem is that I am not too sure whether
 my request is correct in the first place.
 In the manuals, $content is described as 'an arbitrary amount of data'.. Is
 my filehandle properly interpreted? I've tried
 using only the name of the file, but obviously it didn't work, being
 interpreted as a 10 chars long string...

 Thanks in advance,
 Ela




AW: Problems with LWP::UserAgent and HTTP::Response

2001-06-18 Thread Ela Jarecka

Thanks, at least I know that I am sending my XML properly.. But I still get
the same error message, so if anyone has more suggestions
please write..

Ela

 -Ursprüngliche Nachricht-
 Von: Tim Keefer [mailto:[EMAIL PROTECTED]]
 Gesendet: Montag, 18. Juni 2001 15:46
 An: Ela Jarecka; Beginners list (E-Mail)
 Betreff: Re: Problems with LWP::UserAgent and HTTP::Response
 
 
 Hi Ela,
 The documentation for perl LWP agent seems sparse. I had a 
 difficult time
 figuring out how to send multipart form-data. I'll share the 
 code with you that
 
 some shared with me. Hope it helps.
 
 
 
 require  LWP;
 use  LWP::UserAgent;
 use  HTTP::Request::Common;
 
 # Create a user agent object
 
   $ua = new LWP::UserAgent;
   $ua-agent(AgentName/0.1  . $ua-agent);
 
 # Pass request to the user agent and get a response back
 
   my $res = $ua-request (POST $URL, Content_Type = 
 'form-data', Content = [
   login_id = $Username,
   login_passwd = $Password,
   name_auth= $Prefix,
   fname= [$XML_Dir\\$XML_File],
   operation= 'Submit Batch File',
 ]);
 
 # Check the outcome of the response - I guess we just file away
   if ($res-is_success) {
 print success!\n;
 print $res-content;
if ( $res-content =~ /\QH2SUCCESS\/H2\E/i ) {
 print Deposit successful\n;
 } else {
  print POSTLOG Deposit FAILED.\n;
}
 
   } else {
 print  failed!\n;
}
 
 
 Ela Jarecka wrote:
 
  Hi,
  I am using the following code to send and XML document ( 
 output.xml ) to a
  remote server:
 
  use strict;
  use LWP::Debug qw(+);
  use LWP::UserAgent;
  use IO;
 
  my $resp;
  $resp = 'response.xml';
  my $FILEH;
  open (FILEH, output.xml) or die Can't open file output.xml!\n;
 
  my $ua = LWP::UserAgent-new;
 
  #another version that i've tried...
  #my $h = new HTTP::Headers Date= '2001-05-18';
  #my $req =
  HTTP::Request-new('POST','http://195.252.142.171:8008',$h,$FILEH);
 
  my $req = HTTP::Request-new(POST = 'http://195.252.142.171:8008');
 
  #$req-content_type('text/xml');
  $req-content($FILEH);
 
  my $res = $ua-request($req,$resp); 
 here I've also
  tried plain request($req) but the result is the same
  if ( $res-is_success) {
 print OK!\n;
 #print $res-as_string;
  } else {
 print Failed: , $res-status_line, \n;
  }
 
  And that's what I get:
 
  LWP::UserAgent::new: ()
  LWP::UserAgent::request: ()
  LWP::UserAgent::simple_request: POST http://195.252.142.171:8008/
  LWP::UserAgent::_need_proxy: (http://195.252.142.171:8008/)
  LWP::UserAgent::_need_proxy: Not proxied
  LWP::Protocol::http::request: ()
  LWP::Protocol::http::request: POST / HTTP/1.0
  Host: 195.252.142.171:8008
  User-Agent: libwww-perl/5.21
 
  LWP::Protocol::http::request: reading response
  LWP::UserAgent::request: Simple result: Internal Server Error
  Failed: 500 read timeout
 
  ###
  Could anyone please help me? The problem is that I am not 
 too sure whether
  my request is correct in the first place.
  In the manuals, $content is described as 'an arbitrary 
 amount of data'.. Is
  my filehandle properly interpreted? I've tried
  using only the name of the file, but obviously it didn't work, being
  interpreted as a 10 chars long string...
 
  Thanks in advance,
  Ela
 



Re: AW: Problems with LWP::UserAgent and HTTP::Response

2001-06-18 Thread Jos Boumans

I'm being a bit lazy and just showing you a bit of code i wrote to fetch all
film info from imdb.com and comment on it a bit, to explain what goes on:

### config hash ###
my $href = {
 base  = 'http://www.imdb.com/',
 spage  = 'Find',
 ua  = 'Mozilla/4.74 [en] (Win98; U)',
 form  = 'select=Allfor=',
};

### Set up the content ###
my $content = $href-{form} . $film;

# here, $film is the user input

### Set up the useragent ###
 my $ua = new LWP::UserAgent;
 $ua-agent( $href-{ua} );

 ### Set up the headers ###
 my $header = new HTTP::Headers(
   'Accept' = 'text/html',
   'content-length' = length($content),
   'content-type'   = 'application/x-www-form-urlencoded',
 );

 ### do the request, get the responce ###
 my $req = new HTTP::Request('POST', $url, $header, $content);
 my $res = $ua-request($req);

if you now print $res-as_string; you'll find that it holds the entire reply
from the server...

in short, you setup your content as follows (and you can try it if you like by
changing a 'post' to a 'get' on some page and see what is displayed in the
adresbar):
thing1=foothing2=barthing3=quux etc etc
be sure to define the header properly, as well as the useragent, which above
snippet shows you how to do...
and then it's as simple as doing the last step: do the request, get the
responce...

i hope this example shows you The Path To The Dark Side ;-)

Jos Boumans



Ela Jarecka wrote:

 Thanks, at least I know that I am sending my XML properly.. But I still get
 the same error message, so if anyone has more suggestions
 please write..

 Ela

  -Ursprüngliche Nachricht-
  Von: Tim Keefer [mailto:[EMAIL PROTECTED]]
  Gesendet: Montag, 18. Juni 2001 15:46
  An: Ela Jarecka; Beginners list (E-Mail)
  Betreff: Re: Problems with LWP::UserAgent and HTTP::Response
 
 
  Hi Ela,
  The documentation for perl LWP agent seems sparse. I had a
  difficult time
  figuring out how to send multipart form-data. I'll share the
  code with you that
 
  some shared with me. Hope it helps.
 
 
 
  require  LWP;
  use  LWP::UserAgent;
  use  HTTP::Request::Common;
 
  # Create a user agent object
 
$ua = new LWP::UserAgent;
$ua-agent(AgentName/0.1  . $ua-agent);
 
  # Pass request to the user agent and get a response back
 
my $res = $ua-request (POST $URL, Content_Type =
  'form-data', Content = [
login_id = $Username,
login_passwd = $Password,
name_auth= $Prefix,
fname= [$XML_Dir\\$XML_File],
operation= 'Submit Batch File',
  ]);
 
  # Check the outcome of the response - I guess we just file away
if ($res-is_success) {
  print success!\n;
  print $res-content;
 if ( $res-content =~ /\QH2SUCCESS\/H2\E/i ) {
  print Deposit successful\n;
  } else {
   print POSTLOG Deposit FAILED.\n;
 }
 
} else {
  print  failed!\n;
 }
 
 
  Ela Jarecka wrote:
 
   Hi,
   I am using the following code to send and XML document (
  output.xml ) to a
   remote server:
  
   use strict;
   use LWP::Debug qw(+);
   use LWP::UserAgent;
   use IO;
  
   my $resp;
   $resp = 'response.xml';
   my $FILEH;
   open (FILEH, output.xml) or die Can't open file output.xml!\n;
  
   my $ua = LWP::UserAgent-new;
  
   #another version that i've tried...
   #my $h = new HTTP::Headers Date= '2001-05-18';
   #my $req =
   HTTP::Request-new('POST','http://195.252.142.171:8008',$h,$FILEH);
  
   my $req = HTTP::Request-new(POST = 'http://195.252.142.171:8008');
  
   #$req-content_type('text/xml');
   $req-content($FILEH);
  
   my $res = $ua-request($req,$resp);
  here I've also
   tried plain request($req) but the result is the same
   if ( $res-is_success) {
  print OK!\n;
  #print $res-as_string;
   } else {
  print Failed: , $res-status_line, \n;
   }
  
   And that's what I get:
  
   LWP::UserAgent::new: ()
   LWP::UserAgent::request: ()
   LWP::UserAgent::simple_request: POST http://195.252.142.171:8008/
   LWP::UserAgent::_need_proxy: (http://195.252.142.171:8008/)
   LWP::UserAgent::_need_proxy: Not proxied
   LWP::Protocol::http::request: ()
   LWP::Protocol::http::request: POST / HTTP/1.0
   Host: 195.252.142.171:8008
   User-Agent: libwww-perl/5.21
  
   LWP::Protocol::http::request: reading response
   LWP::UserAgent::request: Simple result: Internal Server Error
   Failed: 500 read timeout
  
   ###
   Could anyone please help me? The problem is that I am not
  too sure whether
   my request is correct in the first place.
   In the manuals, $content is described as 'an arbitrary
  amount of data'.. Is
   my filehandle properly interpreted? I've tried
   using only the name of the file, but obviously it didn't work, being
   interpreted as a 10 chars long string...
  
   Thanks in advance,
   Ela