Dear web perlers,
I encountered a web page that consistently hangs, no matter what browsing
agent is used, IE, Firefox and (this is my problem) LWP::UserAgent ("ua").
This page is at
http://www.tase.co.il/TASEEng/MarketData/Indices/Additional/IndexHistoryData
.htm?Action=1&addTab=&IndexId=166, when you click on the "Intra-Day
Transaction Data" link and then on the "Display data" button.
The problem is that when I am using ua, it hangs too, forever, and never
times-out! A ^C is then required to abort it.
Here is my environment:
* OS: x64 Windows 7 Pro on an Intel x64 processor
* Perl: ActiveState Perl v5.14.2 built for MSWin32-x64-multi-thread
(with 1 registered patch, see perl -V for more detail)
* WWW::Mechanize: version 1.72
* LWP (including LWP::UserAgent and LWP::Protocol::http) version
6.02
I am including a demonstrating program that can show both a good page and
the bad one above. If though it turns out to be too long to be accepted by
the list server, I will send it to anybody on request.
This program uses WWW::Mechanize but I traced the hanging point to the
LWP::UserAgent method "$protocol->request" at LINE 193, called from
WWW::Mechanize 'submit_form' method. There the $protocol object is of the
'LWP::Protocol::http' class. At this point I got lost. I am too much of a
newbie to understand what is going on there.
Can anybody show me what to do to further trace the problem?
Regards,
Meir
============================
#!/usr/bin/perl
# Copyright Juan Pedro Paredes Caballero <[email protected]>
use WWW::Mechanize;
use HTTP::Cookies;
use LWP::ConnCache;
# This URL points to a HANGING page. A request to it always hangs, and
causes UserAgent to hang too:
my $urlbase =
"http://www.tase.co.il/TASEEng/MarketData/Indices/Additional/IndexHistoryDat
a.htm?Action=2&IndexId=166&subDataType=0";
# For comparison, this URL directs to a good page that does respond well:
#my $urlbase =
"http://www.tase.co.il/TASEEng/MarketData/Indices/MarketCap/IndexHistoryData
.htm?Action=1&addTab=&IndexId=142";
# And this is the the second URL required to complete the query:
my $urltsv =
"http://www.tase.co.il/TASE/Pages/Export.aspx?tbl=0&Columns=AddColColumnsHis
tory&Titles=AddColTitlesHistory&sn=dsHistory&enumTblType=GridHistoryinner&Ex
portType=4";
#Session cache
my $conn_cache = LWP::ConnCache->new;
#Cookies (Cookie store) This is the key for a query success, we must obtain
a query cookie, store it in cookie jar and keep it across requests
my $cookie_jar = HTTP::Cookies->new;
#Create Mechanize session with our session cache and cookie jar
my $mech = WWW::Mechanize->new(conn_cache =>
$conn_cache,cookie_jar=>$cookie_jar);
#Some headers to emulate a Firefox Browser.
$mech->add_header('User-Agent','Mozilla/5.0 (Windows; U; Windows NT 5.1;
es-ES; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12');
$mech->add_header('Accept','text/html,application/xhtml+xml,application/xml;
q=0.9,*/*;q=0.8');
$mech->add_header('Accept-Language','en-us;q=0.8,en;q=0.3');
$mech->add_header('Accept-Encoding','gzip,deflate');
$mech->add_header('Accept-Charset','utf-8;q=0.7,*;q=0.7');
# we get the content of the URL base
#print "URL:$urlbase\n";
$mech->get($urlbase);
$out = $mech->content();
# To obtain the final TSV one must bypass/emulate some changes usually done
by javascript
# The form action is not correct, the submit button is not a submit one, and
the hiddenID
# unlocks the request and indicates the kind of request to the server. We
modify the form
# action to bypass a javascript submit:
my($base) = $out =~ /base href="(.*?)"/;
$out =~ s/action=".*?" /action="$base" /;
# We modify the button to bypass the javascript submit
$out=~s/<input type="button" value="Display Data" Class="RegularButton"
Width="70" onclick="frmsubmit\('1'\)" >/<input type="submit"
name="Display Data" value="Display Data" Class="RegularButton" Width="70"
onclick="frmsubmit('1')" >/;
# We update the html page with our non javascript submit:
$mech->update_html($out);
# We then submit the form-activating hiddenID lock (another javascript
bypass)
# IT IS IN THIS WWW::Mechanize 'submit_form' METHOD THAT THE LWP::UserAgent
METHOD "$protocol->request" (LINE 193) HANGS ON A HUNG 'TASE' SERVER:
$mech->submit_form(
form_name => "Form1",
fields => {
'HistoryData1$hiddenID' => "1"
},
button => "Display Data"
);
# And this is the last stage of the download request:
$mech->get($urltsv);
#print "URL:$urltsv\n";
$out = $mech->content();
#We format $out to remove extra line feeds characters
$out=~s/[\r\n]+/\n/g;
my $response = $mech->response;
my $filename = $response->filename;
$filename="TSV2_$filename";
open (TSV, ">:encoding(utf8)", $filename);
print TSV $out;
close (TSV);
print "TSV saved to $filename\n";