Re: Regex and Mac vs UNIX line endings

Bruce Van Allen Thu, 20 Jul 2006 09:21:31 -0700

Peter gave some good examples, so I shortened this to supplement his
suggestions.


I prefer to determine what the end-of-line (eol) "character" is using
something less slippery than \r and \n. In Perl, \n is the native eol
for the OS that Perl is executing under, so it could any of the \n, \r,
\r\n, etc., constructs.

Instead, use the octal characters, which for this are:

Mac                 CR (Carriage Return)  "\015"
UNIX, Linux, VMS    LF (Line Feed)        "\012"
Win                 CRLF                  "\015\012"

BTW, many apps in Mac OS X (Excel, Filemaker Pro) continue to use the
eol used in OS 9 and before (CR), not the UNIX eol (LF).

Here's my favorite way to get the eol and convert it to native, no
matter what's in the original file (at least in the popular OSes):

    $text       =~ s/(\015?\012|\015)/\n/gs;

You could also specify what you want, if that isn't simply the native
eol:
    my $new_eol     = "\015";  # or "\012" or "\015\012"
    $text           =~ s/(\015?\012|\015)/$new_eol/gs;
    
If the file is large, then you may need to use a heuristic (that is,
test some of the text trying to detect a pattern), as Doug suggests,
testing the first x characters of the file to find one of the above eol
constructs, and then seeing whether it shows up again, and then backing
up and processing the whole file. Or use the look-ahead/behind
approaches that Peter suggests.

1;


- Bruce

__bruce__van_allen__santa_cruz__ca__

Re: Regex and Mac vs UNIX line endings

Reply via email to