Re: Problems with LWP::UserAgent
Dan == Dan Anderson [EMAIL PROTECTED] writes: Dan I guess I should stop then, but I was looking at O'Reilly's Dan robots.txt file (http://safari.oreilly.com/robots.txt): Dan User-Agent: * Dan Allow: / Dan Which made me think spidering was alright. That's for spiders on the public content. Not the content that you're paying for, which is subject to your agreement. You can't download an entire book because O'Reilly would have no way of disabling access to that book when you select a different book for your bookshelf the following month. You aren't buying the book. You're renting access to the book in online form for a fixed period. Please respect the license agreement. -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 [EMAIL PROTECTED] URL:http://www.stonehenge.com/merlyn/ Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training! -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: Problems with LWP::UserAgent
Dan Anderson wrote: I am trying to create a spider to grab my books off of Safari for a batch printing job so I don't need to go through each chapter myself and hit the Print button. So I used this script to try and log myself in to the safari site: # BEGIN CODE #! /usr/bin/perl use strict; use warnings; use LWP; use LWP::UserAgent; Use one or the other, but not both. LWP is a module that just 'require's LWP::UserAgent. # variables my $cookie_jar_file = ./cookies.txt; my @headers = ( 'User-Agent' = 'Mozilla/4.76 [en] (Win98; U)', 'Accept' = 'image/gif, image/x-bitmap, image/jpeg, image/pjpeg, image/png, */*', 'Accept-Charset' = 'iso-8859-1,*', 'Accept-Language' = 'en-US', catid = , s = 1, o = 1, b = 1, t = 1, f = 1, c = 1, u = 1, r = , l = 1, g = , usr = myemail, pwd = mypassword, savepwd = 1, ); # end variables my $user_agent = LWP::UserAgent-new; $user_agent-cookie_jar({file = $cookie_jar_file}); my $response = $user_agent-post( 'http://safari.oreilly.com/JVXSL.asp', @headers, ); # END CODE Now I know that this is the form I should post to because I stripped the following forms out of the web page (and there is no Javascript to modify the forms): form action=JVXSL.asp method=post input type=hidden name=catid value= input type=hidden name=s value=1 input type=hidden name=o value=1 input type=hidden name=b value=1 input type=hidden name=t value=1 input type=hidden name=f value=1 input type=hidden name=c value=1 input type=hidden name=u value=1 input type=hidden name=r value= input type=hidden name=l value=1 input type=hidden name=g value= input name=usr type=text value= size=12 input name=pwd type=password value= size=12 input type=checkbox name=savepwd value=1 input type=image name=Login src=images/btn_login.gif width=40 height=16 border=0 align=absmiddle /form When I pull up this web page there's nothing in $response-content. I know that safari.oreilly.com will return a blank page if it doesn't like the user agent, and upon signing in it'll return to the safari.oreilly.com page with a very large number of get variables. Does anyone know what I might be doing wrong? You can't put form input into header fields! Use LWP to fetch the Safari home page and HTML::Form to parse the form and enter field values. None of the 'Accept' headers are necessary. Take a look at this: use strict; use warnings; use LWP; use HTML::Form; my $ua = new LWP::UserAgent(agent = 'Mozilla/4.76 [en] (Win98; U)'); $ua-cookie_jar({}); my $resp = $ua-get('http://safari.oreilly.com/'); die $resp-status_line unless $resp-is_success; # There are two forms on the page. Find the one with an input named 'Login'. # my $login; foreach (HTML::Form-parse($resp)) { if ($_-find_input('Login')) { $login = $_; last; } } $login-param('usr', '[EMAIL PROTECTED]'); $login-param('pwd', 'secret'); $resp = $ua-request($login-click); die $resp-status_line unless $resp-is_success; HTH, Rob -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: Problems with LWP::UserAgent
zentara wrote: On 24 Dec 2003 16:05:16 -0500, [EMAIL PROTECTED] (Dan Anderson) wrote: I am trying to create a spider to grab my books off of Safari for a batch printing job so I don't need to go through each chapter myself and hit the Print button. So I used this script to try and log myself in to the safari site: Watch out, Safari monitors for this, and I believe it's in there EULA. I was warned for surfing too fast, and wasn't even using a script. You should slow down your script, and randomize times, maybe spread it out over the whole day too. Either that, or just respect their intent. The open-source world is made of balances. One of them is the willingness of authors to make materials available online, under conditions that still encourage people to buy the books or materials. It doesn't seem unreasonable at all to ask that people at least look at the page they are downloading. Call me an old fogy, but I think that some of the mechanization of Web communications has gone too far. Providing interactive features in the CGI is one thing. It provides services for both sides of any transaction involved. Batch harvesting of pages meant for human perusal, like batch dialing of persons homes at mealtimes, strays across a line into misuse of technology, IMHO. Apparently, the folks at O'Reilly agree. Since some of them at least, have been around the CGI since its inception, you might have a bit of a challenge in thwarting their intended use of their site. Joseph -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: Problems with LWP::UserAgent
Call me an old fogy, but I think that some of the mechanization of Web communications has gone too far. Providing interactive features in the CGI is one thing. It provides services for both sides of any transaction involved. Batch harvesting of pages meant for human perusal, like batch dialing of persons homes at mealtimes, strays across a line into misuse of technology, IMHO. Apparently, the folks at O'Reilly agree. Since some of them at least, have been around the CGI since its inception, you might have a bit of a challenge in thwarting their intended use of their site. Well, Safari *does* provide for printing of pages from a book and e-mailing copies of them to other people. My intention is not to twahrt them, but -- for instance -- when I go on a trip for christams instead of having to print out each and every chapter to the Perl Cookbook I can just send a script to do it. IMHO not a violation of the Safari terms of service. Not only that Safari has a number of features in place that I couldn't get around if I wanted to. For instance, all books must be kept on the bookshelf for at least 30 days -- which (short of hacking their server) is not going to be circumvented. So, all in all, I think that my usage falls under the term fair use. I have no desire to circumvent Safari's security -- I'm just looking to speed up something I do which conforms to the TOS of the web site. :-D -Dan -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: Problems with LWP::UserAgent
On Fri, Dec 26, 2003 at 12:52:06PM -0500, Dan Anderson wrote: So, all in all, I think that my usage falls under the term fair use. I have no desire to circumvent Safari's security -- I'm just looking to speed up something I do which conforms to the TOS of the web site. off-topic and grinchy Fair use is copyright law -- I don't know whether you're infringing anybody's copyright, but you're certainly violating O'Reilly's Terms of Service, which requires that you agree: not to use Web spiders or any other automated retrieval mechanisms when using the Service other than what is provided by the Service -- Steve -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: Problems with LWP::UserAgent
Fair use is copyright law -- I don't know whether you're infringing anybody's copyright, but you're certainly violating O'Reilly's Terms of Service, which requires that you agree: not to use Web spiders or any other automated retrieval mechanisms when using the Service other than what is provided by the Service I guess I should stop then, but I was looking at O'Reilly's robots.txt file (http://safari.oreilly.com/robots.txt): User-Agent: * Allow: / Which made me think spidering was alright. -Dan -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Problems with LWP::UserAgent
I am trying to create a spider to grab my books off of Safari for a batch printing job so I don't need to go through each chapter myself and hit the Print button. So I used this script to try and log myself in to the safari site: # BEGIN CODE #! /usr/bin/perl use strict; use warnings; use LWP; use LWP::UserAgent; # variables my $cookie_jar_file = ./cookies.txt; my @headers = ( 'User-Agent' = 'Mozilla/4.76 [en] (Win98; U)', 'Accept' = 'image/gif, image/x-bitmap, image/jpeg, image/pjpeg, image/png, */*', 'Accept-Charset' = 'iso-8859-1,*', 'Accept-Language' = 'en-US', catid = , s = 1, o = 1, b = 1, t = 1, f = 1, c = 1, u = 1, r = , l = 1, g = , usr = myemail, pwd = mypassword, savepwd = 1, ); # end variables my $user_agent = LWP::UserAgent-new; $user_agent-cookie_jar({file = $cookie_jar_file}); my $response = $user_agent-post( 'http://safari.oreilly.com/JVXSL.asp', @headers, ); # END CODE Now I know that this is the form I should post to because I stripped the following forms out of the web page (and there is no Javascript to modify the forms): form action=JVXSL.asp method=post input type=hidden name=catid value= input type=hidden name=s value=1 input type=hidden name=o value=1 input type=hidden name=b value=1 input type=hidden name=t value=1 input type=hidden name=f value=1 input type=hidden name=c value=1 input type=hidden name=u value=1 input type=hidden name=r value= input type=hidden name=l value=1 input type=hidden name=g value= input name=usr type=text value= size=12 input name=pwd type=password value= size=12 input type=checkbox name=savepwd value=1 input type=image name=Login src=images/btn_login.gif width=40 height=16 border=0 align=absmiddle /form When I pull up this web page there's nothing in $response-content. I know that safari.oreilly.com will return a blank page if it doesn't like the user agent, and upon signing in it'll return to the safari.oreilly.com page with a very large number of get variables. Does anyone know what I might be doing wrong? Also, I figured I'm not the only person who would want to do this. Anyone interested in starting up a Sourceforge project with me and releasing it under the GPL? -Dan -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: Problems with LWP::UserAgent and HTTP::Response
Hi Ela, The documentation for perl LWP agent seems sparse. I had a difficult time figuring out how to send multipart form-data. I'll share the code with you that some shared with me. Hope it helps. require LWP; use LWP::UserAgent; use HTTP::Request::Common; # Create a user agent object $ua = new LWP::UserAgent; $ua-agent(AgentName/0.1 . $ua-agent); # Pass request to the user agent and get a response back my $res = $ua-request (POST $URL, Content_Type = 'form-data', Content = [ login_id = $Username, login_passwd = $Password, name_auth= $Prefix, fname= [$XML_Dir\\$XML_File], operation= 'Submit Batch File', ]); # Check the outcome of the response - I guess we just file away if ($res-is_success) { print success!\n; print $res-content; if ( $res-content =~ /\QH2SUCCESS\/H2\E/i ) { print Deposit successful\n; } else { print POSTLOG Deposit FAILED.\n; } } else { print failed!\n; } Ela Jarecka wrote: Hi, I am using the following code to send and XML document ( output.xml ) to a remote server: use strict; use LWP::Debug qw(+); use LWP::UserAgent; use IO; my $resp; $resp = 'response.xml'; my $FILEH; open (FILEH, output.xml) or die Can't open file output.xml!\n; my $ua = LWP::UserAgent-new; #another version that i've tried... #my $h = new HTTP::Headers Date= '2001-05-18'; #my $req = HTTP::Request-new('POST','http://195.252.142.171:8008',$h,$FILEH); my $req = HTTP::Request-new(POST = 'http://195.252.142.171:8008'); #$req-content_type('text/xml'); $req-content($FILEH); my $res = $ua-request($req,$resp); here I've also tried plain request($req) but the result is the same if ( $res-is_success) { print OK!\n; #print $res-as_string; } else { print Failed: , $res-status_line, \n; } And that's what I get: LWP::UserAgent::new: () LWP::UserAgent::request: () LWP::UserAgent::simple_request: POST http://195.252.142.171:8008/ LWP::UserAgent::_need_proxy: (http://195.252.142.171:8008/) LWP::UserAgent::_need_proxy: Not proxied LWP::Protocol::http::request: () LWP::Protocol::http::request: POST / HTTP/1.0 Host: 195.252.142.171:8008 User-Agent: libwww-perl/5.21 LWP::Protocol::http::request: reading response LWP::UserAgent::request: Simple result: Internal Server Error Failed: 500 read timeout ### Could anyone please help me? The problem is that I am not too sure whether my request is correct in the first place. In the manuals, $content is described as 'an arbitrary amount of data'.. Is my filehandle properly interpreted? I've tried using only the name of the file, but obviously it didn't work, being interpreted as a 10 chars long string... Thanks in advance, Ela
AW: Problems with LWP::UserAgent and HTTP::Response
Thanks, at least I know that I am sending my XML properly.. But I still get the same error message, so if anyone has more suggestions please write.. Ela -Ursprüngliche Nachricht- Von: Tim Keefer [mailto:[EMAIL PROTECTED]] Gesendet: Montag, 18. Juni 2001 15:46 An: Ela Jarecka; Beginners list (E-Mail) Betreff: Re: Problems with LWP::UserAgent and HTTP::Response Hi Ela, The documentation for perl LWP agent seems sparse. I had a difficult time figuring out how to send multipart form-data. I'll share the code with you that some shared with me. Hope it helps. require LWP; use LWP::UserAgent; use HTTP::Request::Common; # Create a user agent object $ua = new LWP::UserAgent; $ua-agent(AgentName/0.1 . $ua-agent); # Pass request to the user agent and get a response back my $res = $ua-request (POST $URL, Content_Type = 'form-data', Content = [ login_id = $Username, login_passwd = $Password, name_auth= $Prefix, fname= [$XML_Dir\\$XML_File], operation= 'Submit Batch File', ]); # Check the outcome of the response - I guess we just file away if ($res-is_success) { print success!\n; print $res-content; if ( $res-content =~ /\QH2SUCCESS\/H2\E/i ) { print Deposit successful\n; } else { print POSTLOG Deposit FAILED.\n; } } else { print failed!\n; } Ela Jarecka wrote: Hi, I am using the following code to send and XML document ( output.xml ) to a remote server: use strict; use LWP::Debug qw(+); use LWP::UserAgent; use IO; my $resp; $resp = 'response.xml'; my $FILEH; open (FILEH, output.xml) or die Can't open file output.xml!\n; my $ua = LWP::UserAgent-new; #another version that i've tried... #my $h = new HTTP::Headers Date= '2001-05-18'; #my $req = HTTP::Request-new('POST','http://195.252.142.171:8008',$h,$FILEH); my $req = HTTP::Request-new(POST = 'http://195.252.142.171:8008'); #$req-content_type('text/xml'); $req-content($FILEH); my $res = $ua-request($req,$resp); here I've also tried plain request($req) but the result is the same if ( $res-is_success) { print OK!\n; #print $res-as_string; } else { print Failed: , $res-status_line, \n; } And that's what I get: LWP::UserAgent::new: () LWP::UserAgent::request: () LWP::UserAgent::simple_request: POST http://195.252.142.171:8008/ LWP::UserAgent::_need_proxy: (http://195.252.142.171:8008/) LWP::UserAgent::_need_proxy: Not proxied LWP::Protocol::http::request: () LWP::Protocol::http::request: POST / HTTP/1.0 Host: 195.252.142.171:8008 User-Agent: libwww-perl/5.21 LWP::Protocol::http::request: reading response LWP::UserAgent::request: Simple result: Internal Server Error Failed: 500 read timeout ### Could anyone please help me? The problem is that I am not too sure whether my request is correct in the first place. In the manuals, $content is described as 'an arbitrary amount of data'.. Is my filehandle properly interpreted? I've tried using only the name of the file, but obviously it didn't work, being interpreted as a 10 chars long string... Thanks in advance, Ela
Re: AW: Problems with LWP::UserAgent and HTTP::Response
I'm being a bit lazy and just showing you a bit of code i wrote to fetch all film info from imdb.com and comment on it a bit, to explain what goes on: ### config hash ### my $href = { base = 'http://www.imdb.com/', spage = 'Find', ua = 'Mozilla/4.74 [en] (Win98; U)', form = 'select=Allfor=', }; ### Set up the content ### my $content = $href-{form} . $film; # here, $film is the user input ### Set up the useragent ### my $ua = new LWP::UserAgent; $ua-agent( $href-{ua} ); ### Set up the headers ### my $header = new HTTP::Headers( 'Accept' = 'text/html', 'content-length' = length($content), 'content-type' = 'application/x-www-form-urlencoded', ); ### do the request, get the responce ### my $req = new HTTP::Request('POST', $url, $header, $content); my $res = $ua-request($req); if you now print $res-as_string; you'll find that it holds the entire reply from the server... in short, you setup your content as follows (and you can try it if you like by changing a 'post' to a 'get' on some page and see what is displayed in the adresbar): thing1=foothing2=barthing3=quux etc etc be sure to define the header properly, as well as the useragent, which above snippet shows you how to do... and then it's as simple as doing the last step: do the request, get the responce... i hope this example shows you The Path To The Dark Side ;-) Jos Boumans Ela Jarecka wrote: Thanks, at least I know that I am sending my XML properly.. But I still get the same error message, so if anyone has more suggestions please write.. Ela -Ursprüngliche Nachricht- Von: Tim Keefer [mailto:[EMAIL PROTECTED]] Gesendet: Montag, 18. Juni 2001 15:46 An: Ela Jarecka; Beginners list (E-Mail) Betreff: Re: Problems with LWP::UserAgent and HTTP::Response Hi Ela, The documentation for perl LWP agent seems sparse. I had a difficult time figuring out how to send multipart form-data. I'll share the code with you that some shared with me. Hope it helps. require LWP; use LWP::UserAgent; use HTTP::Request::Common; # Create a user agent object $ua = new LWP::UserAgent; $ua-agent(AgentName/0.1 . $ua-agent); # Pass request to the user agent and get a response back my $res = $ua-request (POST $URL, Content_Type = 'form-data', Content = [ login_id = $Username, login_passwd = $Password, name_auth= $Prefix, fname= [$XML_Dir\\$XML_File], operation= 'Submit Batch File', ]); # Check the outcome of the response - I guess we just file away if ($res-is_success) { print success!\n; print $res-content; if ( $res-content =~ /\QH2SUCCESS\/H2\E/i ) { print Deposit successful\n; } else { print POSTLOG Deposit FAILED.\n; } } else { print failed!\n; } Ela Jarecka wrote: Hi, I am using the following code to send and XML document ( output.xml ) to a remote server: use strict; use LWP::Debug qw(+); use LWP::UserAgent; use IO; my $resp; $resp = 'response.xml'; my $FILEH; open (FILEH, output.xml) or die Can't open file output.xml!\n; my $ua = LWP::UserAgent-new; #another version that i've tried... #my $h = new HTTP::Headers Date= '2001-05-18'; #my $req = HTTP::Request-new('POST','http://195.252.142.171:8008',$h,$FILEH); my $req = HTTP::Request-new(POST = 'http://195.252.142.171:8008'); #$req-content_type('text/xml'); $req-content($FILEH); my $res = $ua-request($req,$resp); here I've also tried plain request($req) but the result is the same if ( $res-is_success) { print OK!\n; #print $res-as_string; } else { print Failed: , $res-status_line, \n; } And that's what I get: LWP::UserAgent::new: () LWP::UserAgent::request: () LWP::UserAgent::simple_request: POST http://195.252.142.171:8008/ LWP::UserAgent::_need_proxy: (http://195.252.142.171:8008/) LWP::UserAgent::_need_proxy: Not proxied LWP::Protocol::http::request: () LWP::Protocol::http::request: POST / HTTP/1.0 Host: 195.252.142.171:8008 User-Agent: libwww-perl/5.21 LWP::Protocol::http::request: reading response LWP::UserAgent::request: Simple result: Internal Server Error Failed: 500 read timeout ### Could anyone please help me? The problem is that I am not too sure whether my request is correct in the first place. In the manuals, $content is described as 'an arbitrary amount of data'.. Is my filehandle properly interpreted? I've tried using only the name of the file, but obviously it didn't work, being interpreted as a 10 chars long string... Thanks in advance, Ela