On Jun 13, 2006, at 20:26, Anthony Ettinger wrote:

I have to write a simple function which strips out the various
newlines on text files, and replaces them with the standard unix
newline \n

In Perl "\n" depends on the system, it is eq "\012" everywhere except in MacOS pre-X, where it is "\015". The standard Unix newline is "\012".

In text-mode, I open the file, and do the following:

               while (defined(my $line = <INFILE>))
               {
                       my $outline;
                       if ($line =~ m/\cM\cJ/)
                       {
                               print "dos\n";
($outline = $line) =~ s/\cM\cJ/\cJ/; #win32

                       } elsif ($line =~ m/\cM(?!\cJ)/) {
                               print "mac\n";
                               ($outline = $line) =~ s/\cM/\cJ/g; #mac
                       } else {
                               print "other\n";
                               $outline = $line; #default
                       }

                       print OUTFILE $outline;
               }

It works fine on unix when I run the unit tests on old mac files, win,
and unix files and do a hexdump -C on them....however, when I run it
on win32 perl 5.6.1, it is not doing any replacement. Teh lines remain
unchanged.

My understanding is that \n is a reference (depending on which OS your
perl is running on) to CR (mac), CRLF (dos), and LF (unix) in
text-mode STDIO.

That is a common misconception. The string "\n" has length 1 always, everywhere. It is not CRLF on Windows.

To explain this properly I'd need to reproduce here an article I've written for Perl.com, not yet published. But to address the problem is enough to say that in text mode there is an I/O layer (PerlIO) that does some magic back and forth between \n and the native newline convention. That's the way portability is accomplished, inherited from C.

To be able to deal with any newline convention the way you want in a portable way you disable that magic enabling binmode on the filehandle. The easiest solution is to slurp the text and s/// it like this (written inline):

  binmode $in_fh;
  my $raw_text = do { local $/; <$in_fh> };

  # order matters
  $raw_text =~ s/\015\012/\n/g;
  $raw_text =~ s/\012/\n/g unless "\n" eq "\012";
  $raw_text =~ s/\015/\n/g unless "\n" eq "\015";

  # $raw_text is normalized here

Since the newline convention is not necessarily the one in the runtime platform you cannot write a line-oriented script. If files are too big to slurp then you'd work on chunks, but need to check by hand whether a CRLF has been cut in the middle.

-- fxn


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to