On Sun, 2011-01-16 at 04:48 -0800, Carl Wells wrote:
> Hi,
> 
> I hope you don't mind my newbie question.  I'm new to web-programming (and 
> indeed am somewhat rusty with programming in general).  I'm out of work and 
> trying to teach myself C++, PERL, SQL and other skills and in order to do 
> this I've set myself a project.  As part of this project I need to access 
> data from this URL:
> 
> http://www.reuters.com/finance/stocks/incomeStatement/detail?perType=ANN&symbol=BATS.L
> 
> the problem I'm having is that this redirects to the reuters.com login page.  
> I've tried to use both existing cookie files from internet explorer (I had to 
> rename these because the name of the cookie involved my user name which 
> incorporates a space and an @ e.g. fred bumble...@honeypot.org and Perl 
> didn't seem to like that/my syntax was wrong) and setting up perl to receive 
> a new cookie from the site.  Neither has worked for me.  I've spent the past 
> 3 days trying to glue bits of code together from various googles and the cpan 
> module descriptions for LWP and Mechanize.  An example of code thats not 
> working for me is as below:
> 
> #!/usr/local/bin/perl -w
> use strict;
> use Crypt::SSLeay;
> use LWP::UserAgent;
> use LWP::Simple;
> use HTTP::Request::Common qw(POST);
> use HTTP::Cookies;
> 
> my $ua = LWP::UserAgent->new;
> my $cookie_jar = HTTP::Cookies->new(file => "lwpcookies2.txt",
> autosave => 1);
> $ua->cookie_jar( $cookie_jar);
> $ua->agent('Mozilla/5.0');
> my $url = 'https://commerce.us.reuters.com/login/pages/login/login.do';
> my $req = POST $url, ['login' => 'Fredbumblebee', 'password' => 'BzzZZZ!'];
> my $res = $ua->request($req);
> $cookie_jar->extract_cookies($res);
> 
> if ($res->is_success) {
> # print out result to look at headers
> print $res->as_string;
> 
> # access page with cookie secured after logged in
> my $req = HTTP::Request->new(GET => 
> 'http://www.reuters.com/finance/stocks/incomeStatement/detail?perType=ANN&symbol=BATS.L');
> $cookie_jar->add_cookie_header($req);
> $res = $ua->request($req);
> #print $res->as_string;
> } else {
> print "Failed: ", $res->status_line, "\n";
> }
> 
> The cookie file only contains #LWP-Cookies-1.0.  I'm currently trying to use 
> the Live HTTP Headers addon in firefox to figure out what is being passed to 
> and from the web server but I am a bit out of my depth :(.
> 
> Once I've done this for BATS I'm planning to get a few more pages for other 
> stocks so I'm guessing I'll want to create a session, not create a new 
> cookie/log in again for each page request!  I also don't want to hammer their 
> site, I gather one can use a 'sleep' command, do you have any advice on this?
> 
> I've managed to use HTML::tableextract to get tables I want from other 
> reuters.com pages which didn't require the free logon but no joy here!  I 
> started using C++/CURL/tidylib/tinyxml but moved to PERL as its so much 
> easier to use!  Once I have done this I'll want to call PERL from C++ so that 
> I can pass my data into C++ objects; I've already looked into this and am 
> finding it tricky (running a simple perl script from C++ is fine but calling 
> PERL with modules such as LWP has not worked for me yet; I've read the docs 
> but not managed to get the XS thing to run, Perl was saying it couldn't run 
> dynamic code in this way; does anyone know a good, easy to use Perl Wrapper 
> for C++?? there are several but they all seem to be from 2003!! and not sure 
> they will work)
> 
> If some kind soul would help me out or even suggest what I might need to read 
> to find my solution that would be very much appreciated!!
> 
> Thanks,
> 
> Carl
> 
> 
> 
> 
>       
> 
Hi Carl , if I read your post correctly , your trying to scrape a
website  of some data  using the  Perl LWP  methods ,  It is a common
task  for  Perl ,  May I suggest  that you   do some research on
Scrapping  and Perl , you will find that there are several approaches to
navigating  the target site  ,  your user agent should be able to
respond to login request  from the target site , and  proceed to the
next page the site presents  as well as make selections from drop down
box's and fill in text entry fields and  press the  submit buttons,
check with perl.com for some tutorials. WWW::Mechanize  may be the
module your looking for. or a combination of LWP::UserAgent and the
Perl Expect.pm may  be hacked together.

I have found  when I  last wrote  a scraping script it helped  to
manually walk through each and every step , look at the source of each
page and record  the  form  widgets  names and  what they  were  suppose
to  contain , then  reproduce the same  experience programing with the
script

hope this helps 

Greg

--
To unsubscribe, e-mail: beginners-cgi-unsubscr...@perl.org
For additional commands, e-mail: beginners-cgi-h...@perl.org
http://learn.perl.org/


Reply via email to