At 16:30 -0800 19/11/02, Heather Madrone wrote:
I've already encountered a few text file anomalies on OS X. Most GUI applications seem to default to Mac-style text files (linefeeds only), but shell programs such as vi do not handle Mac-style text files gracefully.

Is perl on the Mac going to care whether source files are Mac-style or Unix-style?
Is it going to have difficulty reading and operating on either kind of file? What kind of text files will it write?

Thanks in advance for any illumination.
Definitely read the perlport section of the documentation at:

http://www.perldoc.com/perl5.6.1/pod/perlport.html

Traditionally on Mac OS, line endings have been carriage return (cr) only.

Unix uses just linefeed line (lf) endings.

DOS/Windows uses carriage-linefeed (crlf) line endings.

Under Mac OS X, it is quite schizophrenic - some applications with handle only Mac line endings, some applications handle only Unix line endings, some applications will handle Unix or Mac (or even DOS) line endings.

Ignoring MacPerl (running under Mac OS X), and looking only at Mac OS X's /usr/bin/perl (or wherever you've installed perl), which is a Unix perl, not a Mac perl, we have:

Perl source files must have Unix line endings (lf only). If the source file has Mac line endings, then it will usually run and do absolutely nothing (if you run it as "perl script.pl", or it will complain "script.pl: Command not found." if you run it as ../script.pl. This is because the first line is #!/usr/bin/perl - but after that the cr is not a line ending and so the entire source file appears as a single line. If you run it with perl, then it will ignore the entire file as a comment. If you run it yourself, then it will try to use the entire file as a command and wont be able to find "/usr/bin/perl<cr><cr>use" (for example) as a command to run.

By default, Perl will read and write unix line ending files. You can change the input separator with $/ = "\r" for Mac line endings, "\r\n" for DOS line ending (and back to "\n" for Unix, although saving and restoring is better practice) . You can change the output by just printing the appropriate line ending. In this case, a nice practice might be to do:

our $eol = "\015\012"; # Windows line ending

print "First Line$eol";

My suggestion for Mac OS X users is to switch to using Unix line endings as soon as possible, and wherever possible support reading files with any line ending. One simple thing I almost always do is:

while (<>) {
s/\015?\012$//; # instead of chomp
}

Yes, chomp is probably faster, but most of the time it makes no difference. Not that the above code will not help you with Mac files because the <> will read the entire file in one go :-(

It's really unfortunate that there is no special case value for $/ (like "" perhaps) that handles \015\012|\015|\012 as a line ending. There is talk of making $/ a regex which would allow that, but that's huge overkill just to handle this one particular very special case.

An alternative is to read the entire file in (undef $/) and then split it:

local( $/ ) = undef;
my $file = <>; # read in entire file
my @lines = split( /\015\012|\015|\012/, $file );
foreach my $line (@lines) {
print "'$line'\n";
}

Which is ok, but not great for big files.

Enjoy,
Peter.


--
<http://www.interarchy.com/> <http://download.interarchy.com/>


Reply via email to