Re: Detecting file's line endings
On 1/3/06 1:55 am, Peter N Lewis [EMAIL PROTECTED] wrote: At 17:25 + 28/2/06, Adam Witney wrote: Does this work on all platforms? When I try it it works fine on OSX/Linux with MAC/DOS/UNIX line endings, but fails (reads the whole file) when reading DOS line endings on WinXP... Here is my script use Fcntl; my $file = $ARGV[0]; open(INFILE, $file) || die cannot open $file: $!\n; { local $/ = get_line_ending_for_file($file); Try reading the line ending before opening the file, ie: my $temp_line_ending = get_line_ending_for_file($file); open(INFILE, $file) || die cannot open $file: $!\n; { local $/ = $temp_line_ending; It may be that WinXP is getting confused by opening the file, and then sysopen/closing the file in get_line_ending_for_file, and then expecting to be able to read from the file. Not all platforms allow you to open the same file multiple times and have independent access to it - not that I know anything about WinXP, but old Classic Mac OS would quite probably have had problems with this. Hi Peter, Unfortunately this doesn't work either, the only way to get it to read the DOS file properly on WinXP is not to set $/ at all, but of course this breaks the other platforms Anyway, I have rewritten it to use sysread to read a chunk at the top of the file (I only need the header of the file) and process that with a split and foreach, which seems to be working fine! Thanks for your help Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Re: Detecting file's line endings
At 15:15 + 22/12/05, James Harvard wrote: I'm trying to detect a file's line endings (\r\n for DOS, \r for Mac and \n for Unix as I'm sure y'all know). Is there any easy way to do this? use Fcntl; sub get_line_ending_for_file { my( $file ) = @_; my $fh; sysopen( $fh, $file, O_RDONLY ); sysread( $fh, $_, 33000 ); close( $fh ); return /(\015\012|\015|\012)/ ? $1 : \n; } Does this work on all platforms? When I try it it works fine on OSX/Linux with MAC/DOS/UNIX line endings, but fails (reads the whole file) when reading DOS line endings on WinXP... Here is my script use Fcntl; my $file = $ARGV[0]; open(INFILE, $file) || die cannot open $file: $!\n; { local $/ = get_line_ending_for_file($file); while(INFILE) { my $line = $_; chomp $line; print \n\n.length($line).\n\n; last; } } sub get_line_ending_for_file { my($file) = @_; my $fh; sysopen( $fh, $file, O_RDONLY ); sysread( $fh, $_, 200 ); close( $fh ); return /(\015\012|\015|\012)/ ? $1 : \n; } Thanks for any help Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Detecting file's line endings
I'm trying to detect a file's line endings (\r\n for DOS, \r for Mac and \n for Unix as I'm sure y'all know). Is there any easy way to do this? I don't want to slurp the whole file, because it could be 14 MB or more, so I wanted to read in chunks until I got to a line break. However I can see a potential problem ending a chunk half way through a DOS \r\n, so then you just get \r which makes it look like a Mac formatted file. Anyway, I started to roll my own code for it, and because I'm new to Perl I hoped that one of you kind souls would have a quick look (below) to check that I've got the right idea of how to do this sort of thing with Perl. (It seems to work with my tests, but that doesn't necessarily mean that it is a robust method!) Also, I assume that one can pass a file handle to a sub-routine? $/ = sniff_line_endings(INFILE) ; Many thanks, James Harvard open (INFILE,$filename) or die Couldn't open ; $/ = \50 ; my $taste = '' ; my $lb = undef ; until ($lb) { $taste .= INFILE ; if ($taste =~ /\r\n/) { $lb = \r\n ; # DOS line endings } elsif ($taste =~ /\r(?!$)/) { $lb = \r ; # Mac line endings } elsif ($taste =~ /\n/) { $lb = \n ; # Unix line endings } } $/ = $lb ; seek INFILE, 0, 0 ; # reset the file read pointer # do while(INFILE) stuff
Re: Detecting file's line endings
At 3:15 pm + 22/12/05, James Harvard wrote: I'm trying to detect a file's line endings (\r\n for DOS, \r for Mac and \n for Unix as I'm sure y'all know). Is there any easy way to do this? At 10:45 am +0800 21/11/02, Peter N Lewis wrote: At 13:22 + 20/11/02, John Delacour wrote: if (/\015\012/) { $/ = \015\012 ; } elsif (/\015/) { $/ = \015 ; } else { $/ = \012 ; } You can do this with one regular expression which will pick up the first line ending: $/ = /(\015\012|\015|\012)/ ? $1: \n; Note that because Perl picks the first match location, and after that picks the first of an or | set, it will find the first location, and will find the \015\012 if it is there in preference to the \015 by itself. Enjoy, Peter.
Re: Detecting file's line endings
At 3:15 pm + 22/12/05, James Harvard wrote: Is there any easy way to do this? PS. The whole script, from which Peter quoted only the last bit in providing his genial one-liner, was as follows: #!/usr/bin/perl $f = $ENV{HOME}/Documents/Eudora Folder/Mail Folder/Manningham ; sysopen F, $f, O_RDONLY ; sysread F, $_, 1000 ; if (/\015\012/) { $/ = \015\012 ; } elsif (/\015/) { $/ = \015 ; } else { $/ = \012 ; } open F, $f ; for (F) { /^From: / and chomp and print $_\n } At 10:45 am +0800 21/11/02, Peter N Lewis wrote: You can do this with one regular expression which will pick up the first line ending: $/ = /(\015\012|\015|\012)/ ? $1: \n; Peter.
Re: Detecting file's line endings
At 15:15 + 12/22/05, James Harvard wrote: I'm trying to detect a file's line endings (\r\n for DOS, \r for Mac and \n for Unix as I'm sure y'all know). ftp://ftp.macnauchtan.com/Software/LineEnds/FixEndsFolder.sit 52 kB ftp://ftp.macnauchtan.com/Software/LineEnds/ReadMe_fixends.txt 4 kB I have trouble with files that contain multiple types of line ends. The result was these drag and drop AppleScripts that might help. They do look at the whole file but the underlying code (included) is in C and pretty fast and not memory intensive. You can change or just test for line endings but they don't (yet) handle the two newer 16 bit unicode line ends. -- Applescript syntax is like English spelling: Roughly, but not thoroughly, thought through.
Re: Detecting file's line endings
At 15:15 + 22/12/05, James Harvard wrote: I'm trying to detect a file's line endings (\r\n for DOS, \r for Mac and \n for Unix as I'm sure y'all know). Is there any easy way to do this? use Fcntl; sub get_line_ending_for_file { my( $file ) = @_; my $fh; sysopen( $fh, $file, O_RDONLY ); sysread( $fh, $_, 33000 ); close( $fh ); return /(\015\012|\015|\012)/ ? $1 : \n; } Adjust the 33000 number to whatever maximum line size you think might be appropriate. Enjoy, Peter. I don't want to slurp the whole file, because it could be 14 MB or more, so I wanted to read in chunks until I got to a line break. However I can see a potential problem ending a chunk half way through a DOS \r\n, so then you just get \r which makes it look like a Mac formatted file. Anyway, I started to roll my own code for it, and because I'm new to Perl I hoped that one of you kind souls would have a quick look (below) to check that I've got the right idea of how to do this sort of thing with Perl. (It seems to work with my tests, but that doesn't necessarily mean that it is a robust method!) Also, I assume that one can pass a file handle to a sub-routine? $/ = sniff_line_endings(INFILE) ; Many thanks, James Harvard open (INFILE,$filename) or die Couldn't open ; $/ = \50 ; my $taste = '' ; my $lb = undef ; until ($lb) { $taste .= INFILE ; if ($taste =~ /\r\n/) { $lb = \r\n ; # DOS line endings } elsif ($taste =~ /\r(?!$)/) { $lb = \r ; # Mac line endings } elsif ($taste =~ /\n/) { $lb = \n ; # Unix line endings } } $/ = $lb ; seek INFILE, 0, 0 ; # reset the file read pointer # do while(INFILE) stuff -- http://www.stairways.com/ http://download.stairways.com/