Weird (or eof?) characters screwing up my file read?

Gary Hawkins Mon, 17 Dec 2001 18:01:53 -0800

I'm reading a 25 meg file that contains the content of web pages and their
urls.  The entire content of the file is placed in an array and then
worked-over from there.  (I know there are faster ways but ran into a loop snag
with $/ redefine).


The problem is that when I run the script the array only contains 590 elements,
when it should have 902.  On page 590 are some vertical black bars if I look at
it in Notepad.  Tried the usual trick of opening and saving it in Word, but
Word doesn't even display those, not even with a placeholder, and a save
doesn't solve it.

So my theory is that those characters are causing only part of the file to be
read into the array.

Is there some way to find out what their ascii or ansi value is?

If identifiable, how can they be replaced with carriage returns in a script?

Had the same problem on page 411, replaced those manually with <enter> but
there are more, and with such a large file that's awfully slow going, mostly
just waiting for refresh.  Another attempt, FTP to server and back in ascii
mode is going to take 40 minutes.

Also tried these:

$content =~ s#\r#\n#g;
$content =~ s#\cM#\n#g;

/g


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Weird (or eof?) characters screwing up my file read?

Reply via email to