On Mon, 15 Jan 2007 12:17:22 -0600, William Jones wrote: > I could use some help with Mechanize and Andy Lester recommended I post an > email on the libwww mailing list. I am trying to do what should be a > simple scrape of the us patent and trademark website for bibliographic > info that they post for all patents. Unfortunately I keep getting > re-routed to a page that says > > Basically, I am wondering how the website could know that I am using > mechanize and not internet explorer to enter the info into the fields and > click "submit."
You could set the user_agent, but see below. > Here is my perl code. Thanks. > > > #!/usr/local/bin/perl -w > print "Content-type: text/html\n\n"; > use strict; > use WWW::Mechanize; > use Crypt::SSLeay; > my $url = "https://ramps.uspto.gov/eram/"; my $maintenancepatent = > "5771669"; > my $maintenanceapp = "08672157"; > my $outfile = "out.htm"; > my $mech = WWW::Mechanize->new( autocheck => 1); $mech->proxy(['https'], > ''); > $mech->get($url); > $mech->follow_link(text => "Pay or Look up Patent Maintenance Fees", n => > 1); > $mech->form_name('mfInputForm'); > $mech->field(patentNum => "$maintenancepatent"); > $mech->field(applicationNum => "$maintenanceapp"); $mech->add_header( > Referer => $url ); $mech->click_button (number => 2); > open(OUTFILE, ">$outfile"); > my $output_page = $mech->content(); > print OUTFILE "$output_page"; > close(OUTFILE); > print "done"; I would say one of two things: either (a) you've made more requests than their terms of service permit and your IP is blacklisted, or (b) you've got something unnecessary above, because when I try it with less code than you've got, it works: $ perl -MWWW::Mechanize -de '$m = WWW::Mechanize->new; 1' Loading DB routines from perl5db.pl version 1.28 Editor support available. Enter h or `h h' for help, or `man perldebug' for more help. main::(-e:1): $m = WWW::Mechanize->new; 1 DB<1> n main::(-e:1): $m = WWW::Mechanize->new; 1 DB<1> $m->get("https://ramps.uspto.gov/eram/") DB<2> $m->follow_link(text => "Pay or Look up Patent Maintenance Fees", n =>1) or die DB<3> $m->form_name('mfInputForm') DB<4> $m->field(patentNum => "5771669") DB<5> $m->field(applicationNum => "08672157") DB<6> $m->click_button(number=>2) DB<7> p $m->content(format=>'text') USPTO - Patent Bibliographic Data (Patent Number: 5771669) Patent Bibliographic Data01/16/2007 09:46 AMPatent Number:5771669Application Number:08672157Issue Date:06/30/1998Filing Date:06/27/1996Title:METHOD AND APPARATUS FOR MOWING IRREGULAR TURF AREAS[...] -- Peter Scott http://www.perlmedic.com/ http://www.perldebugged.com/