Hi all,
I wonder if any of you experienced the following phenomenon:
I am trying to download TSV (Tab-Separated-Values) contents from an HTTP
server using the WWW::Mechanize package. On the server the encoding is
UNICODE UTF-8.
When I capture the exchange of the script with the server using WireShark,
it is definitely the Windows end-of-line '\r\n' (0x0d0a) which flows across
the interface.
In the script I also have:
<$mech->add_header('Accept-Charset','utf-8;q=0.7,*;q=0.7');>
Unfortunately, using for example $mech->content(...) I get content with a
three bytes end-of-line '\r\r\n' (0x0d0d0a) for each '\r\n' in the TSV.
I get the same result when I use the <$mech->save_content($filename);>
method.
However, when I use <$mech->get($url, ':content_file' => $filename);> I get
a file with the *correct* end-of-line!
Yes, I know! It is not difficult to RegEx the content right. But still, may
be there is a bug lurking around here...
Anyone care to elucidate?
Regards,
Meir