Mechanize: first attempt at scraping (should be something trivial)

2013-03-13 Thread G M

Hi all,

I'm making an attempt at my first screen scraping script.

For some reason the script doesn't continue after the invocation of the get 
method on the last line:

use strict;
use WWW::Mechanize;
use HTML::TokeParser;
use Data::Dumper;
print Content-type: text/html\n\n;
print setting up mechbr /;
my $agent = WWW::Mechanize-new();
   $agent-agent_alias('Windows Mozilla');
   print mech setup;
  $agent-get('http://www.easyjet.com/en/');


Can anyone see anything wrong with this?  I've tried double quotes and 
different urls but it doesn't attempt to get the page.


Thanks in advance for any advice on this.

G :)
  

Re: Mechanize: first attempt at scraping (should be something trivial)

2013-03-13 Thread Jim Gibson

On Mar 13, 2013, at 12:09 PM, G M wrote:

 
 Hi all,
 
 I'm making an attempt at my first screen scraping script.
 
 For some reason the script doesn't continue after the invocation of the get 
 method on the last line:
 
 use strict;
 use WWW::Mechanize;
 use HTML::TokeParser;
 use Data::Dumper;
 print Content-type: text/html\n\n;
 print setting up mechbr /;
 my $agent = WWW::Mechanize-new();
   $agent-agent_alias('Windows Mozilla');
   print mech setup;
  $agent-get('http://www.easyjet.com/en/');
 
 
 Can anyone see anything wrong with this?  I've tried double quotes and 
 different urls but it doesn't attempt to get the page.

I can see that you are not saving the return result from $agent-get().

What do you mean by the script doesn't continue? Does the script hang or does 
it terminate? Is there any error message? 

Is there more to the script than you are showing? 

How do you know that it doesn't attempt to 'get' the page? Maybe it attempts 
to get the page and fails.


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Mechanize: first attempt at scraping (should be something trivial)

2013-03-13 Thread Andy Bach
On Wed, Mar 13, 2013 at 2:09 PM, G M iamnotregiste...@hotmail.com wrote:


 I'm making an attempt at my first screen scraping script.


Works here:
!/usr/bin/perl
use strict;
use WWW::Mechanize;
use HTML::TokeParser;
use Data::Dumper;
print Content-type: text/html\n\n;
print setting up mechbr /;
my $agent = WWW::Mechanize-new();
   $agent-agent_alias('Windows Mozilla');
   print mech setup;
  my $page = $agent-get('http://www.easyjet.com/en/');
   print mech ran, ref $agent, \n;
#   print agent: , Dumper(\$agent), \n;
#   print page: , Dumper(\$page), \n;
   print page ran, ref $page, \n;
   if ($page-is_success) {
   print page content, \n;
   print $page-decoded_content;
   }
   else {
 print STDERR $page-status_line, \n;
   }


$page is an HTTP::Response object, I get decoded_content() but the
warning:
Wide character in print at /usr/local/bin/mech_test.pl line 18.

that's the print page.

-- 

a

Andy Bach,
afb...@gmail.com
608 658-1890 cell
608 261-5738 wk


Re: Mechanize: first attempt at scraping (should be something trivial)

2013-03-13 Thread Lawrence Statton

On 03/13/2013 01:46 PM, Andy Bach wrote:

Wide character in print at /usr/local/bin/mech_test.pl line 18.

that's the print page.


By the way -- you can eliminate the wide-char warniung by telling perl 
that your terminal can eat UTF-8 encoded unicode


binmode STDOUT, :utf8;



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Mechanize: first attempt at scraping (should be something trivial)

2013-03-13 Thread Charles DeRykus
On Wed, Mar 13, 2013 at 12:09 PM, G M iamnotregiste...@hotmail.com wrote:

 Hi all,

 I'm making an attempt at my first screen scraping script.

 For some reason the script doesn't continue after the invocation of the get 
 method on the last line:

 use strict;
 use WWW::Mechanize;
 use HTML::TokeParser;
 use Data::Dumper;
 print Content-type: text/html\n\n;
 print setting up mechbr /;
 my $agent = WWW::Mechanize-new();
$agent-agent_alias('Windows Mozilla');
print mech setup;
   $agent-get('http://www.easyjet.com/en/');


 Can anyone see anything wrong with this?  I've tried double quotes and 
 different urls but it doesn't attempt to get the page.


Hm,  what code follows the get?

Always a good idea to  check for errors in case of site
outage for instance.  However this worked for me a few
moments ago when  I tried:

  $agent-get(...);
  die $agent-status unless $agent-success;  ;
  print content:  $a-content;

-- 
Charles DeRykus

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




RE: Mechanize: first attempt at scraping (should be something trivial)

2013-03-13 Thread G M

Hi,

Yeah I tried putting a die line in after doing a bit of googling, I've got a 
print mech ran line where you've got die, doesn't print anything out though 
:(



Cheers,

G 
 Date: Wed, 13 Mar 2013 13:04:39 -0700
 Subject: Re: Mechanize: first attempt at scraping (should be something 
 trivial)
 From: dery...@gmail.com
 To: iamnotregiste...@hotmail.com
 CC: beginners@perl.org
 
 On Wed, Mar 13, 2013 at 12:09 PM, G M iamnotregiste...@hotmail.com wrote:
 
  Hi all,
 
  I'm making an attempt at my first screen scraping script.
 
  For some reason the script doesn't continue after the invocation of the get 
  method on the last line:
 
  use strict;
  use WWW::Mechanize;
  use HTML::TokeParser;
  use Data::Dumper;
  print Content-type: text/html\n\n;
  print setting up mechbr /;
  my $agent = WWW::Mechanize-new();
 $agent-agent_alias('Windows Mozilla');
 print mech setup;
$agent-get('http://www.easyjet.com/en/');
 
 
  Can anyone see anything wrong with this?  I've tried double quotes and 
  different urls but it doesn't attempt to get the page.
 
 
 Hm,  what code follows the get?
 
 Always a good idea to  check for errors in case of site
 outage for instance.  However this worked for me a few
 moments ago when  I tried:
 
   $agent-get(...);
   die $agent-status unless $agent-success;  ;
   print content:  $a-content;
 
 -- 
 Charles DeRykus
  

Re: Mechanize: first attempt at scraping (should be something trivial)

2013-03-13 Thread Charles DeRykus
On Wed, Mar 13, 2013 at 1:08 PM, G M iamnotregiste...@hotmail.com wrote:
 Hi,

 Yeah I tried putting a die line in after doing a bit of googling, I've got a
 print mech ran line where you've got die, doesn't print anything out
 though :(




Hm, the problem is that Mech by default throws fatal errors so if
it couldn't fetch content, your program dies before mech ran
occurs. Only if you said $agent-new(autocheck=0), would you
see it.

You can see the differing output in these:

$agent-new(autocheck=0);  # toggle 0/1
$agent-get(http://nowhere.com/nono;);
print mech ran;

-- 
Charles DeRykus

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




RE: Mechanize: first attempt at scraping (should be something trivial)

2013-03-13 Thread G M

Definitely appears to be network related as I'm getting this when using 
warnings/fatalsToBrowser:

Error GETing http://www.easyjet.com/en/: Can't connect to www.easyjet.com:80 
(connect: Connection refused)


 Date: Wed, 13 Mar 2013 14:19:00 -0700
 Subject: Re: Mechanize: first attempt at scraping (should be something 
 trivial)
 From: dery...@gmail.com
 To: iamnotregiste...@hotmail.com
 CC: beginners@perl.org
 
 On Wed, Mar 13, 2013 at 1:08 PM, G M iamnotregiste...@hotmail.com wrote:
  Hi,
 
  Yeah I tried putting a die line in after doing a bit of googling, I've got a
  print mech ran line where you've got die, doesn't print anything out
  though :(
 
 
 
 
 Hm, the problem is that Mech by default throws fatal errors so if
 it couldn't fetch content, your program dies before mech ran
 occurs. Only if you said $agent-new(autocheck=0), would you
 see it.
 
 You can see the differing output in these:
 
 $agent-new(autocheck=0);  # toggle 0/1
 $agent-get(http://nowhere.com/nono;);
 print mech ran;
 
 -- 
 Charles DeRykus
 
 -- 
 To unsubscribe, e-mail: beginners-unsubscr...@perl.org
 For additional commands, e-mail: beginners-h...@perl.org
 http://learn.perl.org/