Re: Mechanize: first attempt at scraping (should be something trivial)
On Mar 13, 2013, at 12:09 PM, G M wrote: Hi all, I'm making an attempt at my first screen scraping script. For some reason the script doesn't continue after the invocation of the get method on the last line: use strict; use WWW::Mechanize; use HTML::TokeParser; use Data::Dumper; print Content-type: text/html\n\n; print setting up mechbr /; my $agent = WWW::Mechanize-new(); $agent-agent_alias('Windows Mozilla'); print mech setup; $agent-get('http://www.easyjet.com/en/'); Can anyone see anything wrong with this? I've tried double quotes and different urls but it doesn't attempt to get the page. I can see that you are not saving the return result from $agent-get(). What do you mean by the script doesn't continue? Does the script hang or does it terminate? Is there any error message? Is there more to the script than you are showing? How do you know that it doesn't attempt to 'get' the page? Maybe it attempts to get the page and fails. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Mechanize: first attempt at scraping (should be something trivial)
On Wed, Mar 13, 2013 at 2:09 PM, G M iamnotregiste...@hotmail.com wrote: I'm making an attempt at my first screen scraping script. Works here: !/usr/bin/perl use strict; use WWW::Mechanize; use HTML::TokeParser; use Data::Dumper; print Content-type: text/html\n\n; print setting up mechbr /; my $agent = WWW::Mechanize-new(); $agent-agent_alias('Windows Mozilla'); print mech setup; my $page = $agent-get('http://www.easyjet.com/en/'); print mech ran, ref $agent, \n; # print agent: , Dumper(\$agent), \n; # print page: , Dumper(\$page), \n; print page ran, ref $page, \n; if ($page-is_success) { print page content, \n; print $page-decoded_content; } else { print STDERR $page-status_line, \n; } $page is an HTTP::Response object, I get decoded_content() but the warning: Wide character in print at /usr/local/bin/mech_test.pl line 18. that's the print page. -- a Andy Bach, afb...@gmail.com 608 658-1890 cell 608 261-5738 wk
Re: Mechanize: first attempt at scraping (should be something trivial)
On 03/13/2013 01:46 PM, Andy Bach wrote: Wide character in print at /usr/local/bin/mech_test.pl line 18. that's the print page. By the way -- you can eliminate the wide-char warniung by telling perl that your terminal can eat UTF-8 encoded unicode binmode STDOUT, :utf8; -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Mechanize: first attempt at scraping (should be something trivial)
On Wed, Mar 13, 2013 at 12:09 PM, G M iamnotregiste...@hotmail.com wrote: Hi all, I'm making an attempt at my first screen scraping script. For some reason the script doesn't continue after the invocation of the get method on the last line: use strict; use WWW::Mechanize; use HTML::TokeParser; use Data::Dumper; print Content-type: text/html\n\n; print setting up mechbr /; my $agent = WWW::Mechanize-new(); $agent-agent_alias('Windows Mozilla'); print mech setup; $agent-get('http://www.easyjet.com/en/'); Can anyone see anything wrong with this? I've tried double quotes and different urls but it doesn't attempt to get the page. Hm, what code follows the get? Always a good idea to check for errors in case of site outage for instance. However this worked for me a few moments ago when I tried: $agent-get(...); die $agent-status unless $agent-success; ; print content: $a-content; -- Charles DeRykus -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
RE: Mechanize: first attempt at scraping (should be something trivial)
Hi, Yeah I tried putting a die line in after doing a bit of googling, I've got a print mech ran line where you've got die, doesn't print anything out though :( Cheers, G Date: Wed, 13 Mar 2013 13:04:39 -0700 Subject: Re: Mechanize: first attempt at scraping (should be something trivial) From: dery...@gmail.com To: iamnotregiste...@hotmail.com CC: beginners@perl.org On Wed, Mar 13, 2013 at 12:09 PM, G M iamnotregiste...@hotmail.com wrote: Hi all, I'm making an attempt at my first screen scraping script. For some reason the script doesn't continue after the invocation of the get method on the last line: use strict; use WWW::Mechanize; use HTML::TokeParser; use Data::Dumper; print Content-type: text/html\n\n; print setting up mechbr /; my $agent = WWW::Mechanize-new(); $agent-agent_alias('Windows Mozilla'); print mech setup; $agent-get('http://www.easyjet.com/en/'); Can anyone see anything wrong with this? I've tried double quotes and different urls but it doesn't attempt to get the page. Hm, what code follows the get? Always a good idea to check for errors in case of site outage for instance. However this worked for me a few moments ago when I tried: $agent-get(...); die $agent-status unless $agent-success; ; print content: $a-content; -- Charles DeRykus
Re: Mechanize: first attempt at scraping (should be something trivial)
On Wed, Mar 13, 2013 at 1:08 PM, G M iamnotregiste...@hotmail.com wrote: Hi, Yeah I tried putting a die line in after doing a bit of googling, I've got a print mech ran line where you've got die, doesn't print anything out though :( Hm, the problem is that Mech by default throws fatal errors so if it couldn't fetch content, your program dies before mech ran occurs. Only if you said $agent-new(autocheck=0), would you see it. You can see the differing output in these: $agent-new(autocheck=0); # toggle 0/1 $agent-get(http://nowhere.com/nono;); print mech ran; -- Charles DeRykus -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
RE: Mechanize: first attempt at scraping (should be something trivial)
Definitely appears to be network related as I'm getting this when using warnings/fatalsToBrowser: Error GETing http://www.easyjet.com/en/: Can't connect to www.easyjet.com:80 (connect: Connection refused) Date: Wed, 13 Mar 2013 14:19:00 -0700 Subject: Re: Mechanize: first attempt at scraping (should be something trivial) From: dery...@gmail.com To: iamnotregiste...@hotmail.com CC: beginners@perl.org On Wed, Mar 13, 2013 at 1:08 PM, G M iamnotregiste...@hotmail.com wrote: Hi, Yeah I tried putting a die line in after doing a bit of googling, I've got a print mech ran line where you've got die, doesn't print anything out though :( Hm, the problem is that Mech by default throws fatal errors so if it couldn't fetch content, your program dies before mech ran occurs. Only if you said $agent-new(autocheck=0), would you see it. You can see the differing output in these: $agent-new(autocheck=0); # toggle 0/1 $agent-get(http://nowhere.com/nono;); print mech ran; -- Charles DeRykus -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/