On Mon, 15 Jan 2007 12:17:22 -0600, William Jones wrote:
> I could use some help with Mechanize and Andy Lester recommended I post an
> email on the libwww mailing list.  I am trying to do what should be a
> simple scrape of the us patent and trademark website for bibliographic
> info that they post for all patents.  Unfortunately I keep getting
> re-routed to a page that says
>   
> Basically, I am wondering how the website could know that I am using
> mechanize and not internet explorer to enter the info into the fields and
> click "submit."

You could set the user_agent, but see below.

> Here is my perl code.  Thanks.
>  
>  
> #!/usr/local/bin/perl -w
> print "Content-type: text/html\n\n";
> use strict;
> use WWW::Mechanize;
> use Crypt::SSLeay;
> my $url = "https://ramps.uspto.gov/eram/";; my $maintenancepatent =
> "5771669";
> my $maintenanceapp = "08672157";
> my $outfile = "out.htm";
> my $mech = WWW::Mechanize->new( autocheck => 1); $mech->proxy(['https'],
> '');
> $mech->get($url);
> $mech->follow_link(text => "Pay or Look up Patent Maintenance Fees", n =>
> 1);
> $mech->form_name('mfInputForm');
> $mech->field(patentNum => "$maintenancepatent");
> $mech->field(applicationNum => "$maintenanceapp"); $mech->add_header(
> Referer => $url ); $mech->click_button (number => 2);
> open(OUTFILE, ">$outfile");
> my $output_page = $mech->content();
> print OUTFILE "$output_page";
> close(OUTFILE);
> print "done";

I would say one of two things: either (a) you've made more requests than
their terms of service permit and your IP is blacklisted, or (b) you've
got something unnecessary above, because when I try it with less code than
you've got, it works:

$ perl -MWWW::Mechanize -de '$m = WWW::Mechanize->new; 1'

Loading DB routines from perl5db.pl version 1.28
Editor support available.

Enter h or `h h' for help, or `man perldebug' for more help.

main::(-e:1):   $m = WWW::Mechanize->new; 1
  DB<1> n
main::(-e:1):   $m = WWW::Mechanize->new; 1
  DB<1> $m->get("https://ramps.uspto.gov/eram/";)

  DB<2> $m->follow_link(text => "Pay or Look up Patent Maintenance Fees", n 
=>1) or die

  DB<3> $m->form_name('mfInputForm')

  DB<4> $m->field(patentNum => "5771669")

  DB<5> $m->field(applicationNum => "08672157")

  DB<6> $m->click_button(number=>2)

  DB<7> p $m->content(format=>'text')
 USPTO - Patent Bibliographic Data (Patent Number: 5771669)  Patent
 Bibliographic Data01/16/2007 09:46 AMPatent Number:5771669Application
 Number:08672157Issue Date:06/30/1998Filing Date:06/27/1996Title:METHOD
 AND APPARATUS FOR MOWING IRREGULAR TURF AREAS[...]

-- 
Peter Scott
http://www.perlmedic.com/
http://www.perldebugged.com/

Reply via email to