As far as I know, LWP (and most modules) do not do any data conversions, except 
maybe 
for line terminators (CR vs LF vs CRLF), without explicitely stating so in 
their documentation. 
You said you looked at the source and the second columns lined 
up--visually--but did you 
check to see if they were lined up with spaces or tabs?

Open the page you want in your web browser, and try to save it to your local 
computer. Then, 
open it in MS-Word or some other text editor or word processor which has some 
way to 
check, display, and/or convert tabs and spaces. (I use EditPad Pro, which can 
be set to 
highlight spaces vs tabs, as well as line endings, etc.)

Another thing you can do, since you're not 100% certain the web page you're 
trying to get has 
data in the format you want, is create a web page that you are 100% certain has 
tabs, upload 
it somewhere, then use LWP to get it and see if it gets the same thing, byte 
for byte. Then 
you can at least be sure it is not (or maybe it is) LWP that is the 
problem--One of the first 
things I was taught in computer science: Garbage in, garbage out.

Good luck,
C. M.

PS: I uploaded several test pages myself and LWP returned them all for me, byte 
for byte, 
with no conversions. This further leads me to suspect the input data is bad.


On 15 Sep 2008 at 14:34, Wayne Simmons wrote:

From:                   "Wayne Simmons" <[EMAIL PROTECTED]>
To:                     <[email protected]>
Subject:                LWP dropping tabs?
Date sent:              Mon, 15 Sep 2008 14:34:15 -0600

> All,
> 
> I didn't see an LWP specific list, so I hope someone here knows about
> this. I'm using LWP agent to submit an HTTP::Request for a url that is
> returning text/htmlcharset=UTF-8 content type. However it's supposed
> to be tab delimited data but by the time $response->content gets it
> the tabs are gone and it seems to have been space expanded!
> 
> I can see the data from the website through a regular browser and it
> looks like the data is tab delimited in the source view on mozilla (ie
> the second column is all lined up) but when I analyze within perl the
> data I'm getting multiple 0x20 bytes instead of a single 0x09.  
> 
> Is it possible the source is wrong?  Unfortunately I don't have
> control over the source, and I can't post the link to the data (as it
> requires a user/password to access). The code I use is in essence:
> 
> my $browser = LWP::UserAgent->new;
> my $response = $browser->get( $URL );
> 
> my $foo = pack("C",9);
> if ( $response->content =~ /$foo/) 
> {
>  print "found tab!";
> } else
> {
>  print "no tabs!";
> }
> 
> And I get no tabs. Can anyone think of a way to verify the source data
> is correct, and/or know if there is a LWP or HTML header I should be
> setting to prevent tab expansion to spaces (if that's what's
> happening).
> 
> -Wayne Simmons
> 
> 
> --
> Software Engineer
> InterSystems USA, Inc.
> 303-858-1000 
> 
> 
> 
> _______________________________________________
> ActivePerl mailing list
> [email protected]
> To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to