Detecting file's line endings

2005-12-22 Thread James Harvard
I'm trying to detect a file's line endings (\r\n for DOS, \r for Mac and \n for 
Unix as I'm sure y'all know).

Is there any easy way to do this?

I don't want to slurp the whole file, because it could be 14 MB or more, so I 
wanted to read in chunks until I got to a line break. However I can see a 
potential problem ending a chunk half way through a DOS \r\n, so then you just 
get \r which makes it look like a Mac formatted file.

Anyway, I started to roll my own code for it, and because I'm new to Perl I 
hoped that one of you kind souls would have a quick look (below) to check that 
I've got the right idea of how to do this sort of thing with Perl. (It seems to 
work with my tests, but that doesn't necessarily mean that it is a robust 
method!)

Also, I assume that one can pass a file handle to a sub-routine?
$/ = sniff_line_endings(INFILE) ;

Many thanks,
James Harvard

open (INFILE,$filename) or die Couldn't open ;
$/ = \50 ;
my $taste = '' ;
my $lb = undef ;
until ($lb) {
$taste .= INFILE ;
if ($taste =~ /\r\n/) {
$lb = \r\n ;
# DOS line endings
} elsif ($taste =~ /\r(?!$)/) {
$lb = \r ;
# Mac line endings
} elsif ($taste =~ /\n/) {
$lb = \n ;
# Unix line endings
}
}
$/ = $lb ;
seek INFILE, 0, 0 ; # reset the file read pointer
# do while(INFILE) stuff


Re: Detecting file's line endings

2005-12-22 Thread John Delacour

At 3:15 pm + 22/12/05, James Harvard wrote:

I'm trying to detect a file's line endings (\r\n for DOS, \r for Mac 
and \n for Unix as I'm sure y'all know).


Is there any easy way to do this?


At 10:45 am +0800 21/11/02, Peter N Lewis wrote:


At 13:22 + 20/11/02, John Delacour wrote:


 if (/\015\012/) {
  $/ = \015\012 ;
 } elsif (/\015/) {
   $/ = \015 ;
 } else {
   $/ = \012 ;
 }


You can do this with one regular expression which will pick up the 
first line ending:


 $/ = /(\015\012|\015|\012)/ ? $1: \n;

Note that because Perl picks the first match location, and after 
that picks the first of an or | set, it will find the first 
location, and will find the \015\012 if it is there in preference to 
the \015 by itself.


Enjoy,
   Peter.


Re: Detecting file's line endings

2005-12-22 Thread John Delacour

At 3:15 pm + 22/12/05, James Harvard wrote:


Is there any easy way to do this?


PS.  The whole script, from which Peter quoted only the last bit in 
providing his genial one-liner, was as follows:




#!/usr/bin/perl
$f = $ENV{HOME}/Documents/Eudora Folder/Mail Folder/Manningham ;
sysopen F, $f, O_RDONLY ;
sysread F, $_, 1000 ;
if (/\015\012/) {
  $/ = \015\012 ;
 } elsif (/\015/) {
   $/ = \015 ;
 } else {
   $/ = \012 ;
 }
 open F, $f ;
 for (F) {
   /^From: / and chomp and print $_\n
 }


At 10:45 am +0800 21/11/02, Peter N Lewis wrote:

You can do this with one regular expression which will pick up the 
first line ending:


 $/ = /(\015\012|\015|\012)/ ? $1: \n;

   Peter.


Re: Detecting file's line endings

2005-12-22 Thread Doug McNutt

At 15:15 + 12/22/05, James Harvard wrote:
I'm trying to detect a file's line endings (\r\n for DOS, \r for Mac 
and \n for Unix as I'm sure y'all know).


ftp://ftp.macnauchtan.com/Software/LineEnds/FixEndsFolder.sit  52 kB
ftp://ftp.macnauchtan.com/Software/LineEnds/ReadMe_fixends.txt  4 kB

I have trouble with files that contain multiple types of line ends. 
The result was these drag and drop AppleScripts that might help. They 
do look at the whole file but the underlying code (included) is in C 
and pretty fast and not memory intensive. You can change or just test 
for line endings but they don't (yet) handle the two newer 16 bit 
unicode line ends.


--

Applescript syntax is like English spelling:
Roughly, but not thoroughly, thought through.


Re: Detecting file's line endings

2005-12-22 Thread Peter N Lewis

At 15:15 + 22/12/05, James Harvard wrote:
I'm trying to detect a file's line endings (\r\n for DOS, \r for Mac 
and \n for Unix as I'm sure y'all know).


Is there any easy way to do this?


use Fcntl;

sub get_line_ending_for_file {
  my( $file ) = @_;

  my $fh;
  sysopen( $fh, $file, O_RDONLY );
  sysread( $fh, $_, 33000 );
  close( $fh );

  return /(\015\012|\015|\012)/ ? $1 : \n;
}

Adjust the 33000 number to whatever maximum line size you think might 
be appropriate.


Enjoy,
   Peter.



I don't want to slurp the whole file, because it could be 14 MB or 
more, so I wanted to read in chunks until I got to a line break. 
However I can see a potential problem ending a chunk half way 
through a DOS \r\n, so then you just get \r which makes it look like 
a Mac formatted file.


Anyway, I started to roll my own code for it, and because I'm new to 
Perl I hoped that one of you kind souls would have a quick look 
(below) to check that I've got the right idea of how to do this sort 
of thing with Perl. (It seems to work with my tests, but that 
doesn't necessarily mean that it is a robust method!)


Also, I assume that one can pass a file handle to a sub-routine?
$/ = sniff_line_endings(INFILE) ;

Many thanks,
James Harvard

open (INFILE,$filename) or die Couldn't open ;
$/ = \50 ;
my $taste = '' ;
my $lb = undef ;
until ($lb) {
$taste .= INFILE ;
if ($taste =~ /\r\n/) {
$lb = \r\n ;
# DOS line endings
} elsif ($taste =~ /\r(?!$)/) {
$lb = \r ;
# Mac line endings
} elsif ($taste =~ /\n/) {
$lb = \n ;
# Unix line endings
}
}
$/ = $lb ;
seek INFILE, 0, 0 ; # reset the file read pointer
# do while(INFILE) stuff



--
http://www.stairways.com/  http://download.stairways.com/