Re: Strange behavior with print
Chris Wagner wrote: What I was specifically thinking of was handling DOS text files on Unix. Activestate on Unix has no conception of CRLF as a new line sequence out-of-the-box. This is a case where u have to manually inspect the file and either set $/ (and reread the file) or strip the CR's. Having a :text IO discipline would fulfill portability. What I did was create a chomp2 function that does s/[\cM]?\cJ$//. You might want to take a look at PerlIO::eol, an IO layer which does basically what you're asking for. From the docs: This layer normalizes any of CR, LF, CRLF and Native into the designated line ending. It works for both input and output handles. http://search.cpan.org/~audreyt/PerlIO-eol-0.14/eol.pm I have an app that spits out CR new line sequences the output of which I parse with a perl script. Normally I just slurp smaller files into an array but in this case I had to add a split /\r/ to get the lines from the file handle. open my $crfile, , $filename or die(Couldn't open $filename: $!); my @lines; { local $/ = \r; @lines = $crfile; } open FILE, :text, obnoxious.txt; $LE = textmode(FILE); #$LE will be a discipline ref suitable for open() open OUTPUT, :$LE, newfile.txt; #OUTPUT will have the same NLS as FILE PerlIO::eol operates on the IO stream, and silently translates line endings into whatever you request -- but it doesn't have a concept of what the default line ending for a file is, so you still couldn't do the above - you'd have to handle it yourself. I would guess what you're asking for isn't implemented because it breaks the stream model, even though it is usually valid to assume the first line ending is the way it will be the rest of the way through. In any case, it's pretty easy to do yourself, and I don't think there's much demand for it. Hope the above helps. -- Mike Gillis ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Strange behavior with print
At 10:17 AM 1/3/2008 -0800, Mike Gillis wrote: You might want to take a look at PerlIO::eol, an IO layer which does basically what you're asking for. From the docs: This layer normalizes any of CR, LF, CRLF and Native into the designated line ending. It works for both input and output handles. Cool, I'll check that out. It would be nice if this kind of functionality as well as the ability to return the NLS could be incorporated into PerlIO core. open my $crfile, , $filename or die(Couldn't open $filename: $!); my @lines; { local $/ = \r; @lines = $crfile; } Right. I just felt like doing split there for some reason. ;) -- REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =-- ...ne cede malis 0100 ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Strange behavior with print
At 05:27 PM 12/28/2007 -0800, Jan Dubois wrote: I'm not sure what you are trying to say here. While reading a file in text mode on Windows, it does not really matter if the lines are terminated by CRLF or LF alone; the CR will be stripped and at the Perl level you always have just a LF, just like on Unix. What I was specifically thinking of was handling DOS text files on Unix. Activestate on Unix has no conception of CRLF as a new line sequence out-of-the-box. This is a case where u have to manually inspect the file and either set $/ (and reread the file) or strip the CR's. Having a :text IO discipline would fulfill portability. What I did was create a chomp2 function that does s/[\cM]?\cJ$//. Files with CR line endings are extremely rare nowadays; as far as I Rare, true, but irrelevant. I have an app that spits out CR new line sequences the output of which I parse with a perl script. Normally I just slurp smaller files into an array but in this case I had to add a split /\r/ to get the lines from the file handle. Not a burden at all but unelegant. And anyways, you cannot know the line endings of a file until you have read all of it, which is not really practical for a streaming IO layer. Eh, not really. If u have something that u know is a text file u can be pretty sure of the line ending after hitting the first one. A :text discipline could function by reading in bytes until it hits a CRLF, LF, or CR. At that point it would set that as $/ for that handle. It would be nice if there were a method for getting $/ on a per handle basis. What if u need to read in two files of different line endings at the same time and write them out to similarly ended new files? How do u set $/ in that case? AFAIK $/ is global and doesn't have a per handle character. psuedo code: open FILE, :text, obnoxious.txt; $LE = textmode(FILE); #$LE will be a discipline ref suitable for open() open OUTPUT, :$LE, newfile.txt; #OUTPUT will have the same NLS as FILE Portable and elegant. :) -- REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =-- ...ne cede malis 0100 ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Strange behavior with print
I see what ur saying and actually agree with u. But PerlIO automatically (this is one automatic thing I don't like) translates between CRLF and LF on Windows systems. As Jan said setting binmode() will solve ur problem. I personally feel that binmode should be the default everywhere and it should be the programmer's responsibility to decide how to interpret the new line sequences. FYI there are three possible new line sequences, CR, CRLF, and LF, depending on what system ur on. In this day and age it's no longer useful to assume what kind of new line sequences u have based on ur system type. U might very well have to deal with all three at the same time or a contrary type to ur system's native type. It would be nice if there were a :text IO layer that would automatically determine the new line sequence of a particular text file and do the translation, or not, automatically. Currently u have to do an obnoxious manual inspection of the file to know what $/ is. *cough*Jan*cough* ;) HTH At 11:24 AM 12/27/2007 -0500, Bullock, Howard A. wrote: I appear that perl's print command when using a file handle is changing each occurrence of chr(10) to (chr(13) chr(10)). This does not seem to be a helpful or proper action when working with defined literal strings. I can make some sense out of file:///C:/Perl/html/lib/Pod/perlport.html#newlines but do see how that is relevant to literal printing strings. -- REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =-- ...ne cede malis 0100 ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Strange behavior with print
On Fri, 28 Dec 2007, Chris Wagner wrote: It would be nice if there were a :text IO layer that would automatically determine the new line sequence of a particular text file and do the translation, or not, automatically. Currently u have to do an obnoxious manual inspection of the file to know what $/ is. *cough*Jan*cough* ;) I'm not sure what you are trying to say here. While reading a file in text mode on Windows, it does not really matter if the lines are terminated by CRLF or LF alone; the CR will be stripped and at the Perl level you always have just a LF, just like on Unix. Files with CR line endings are extremely rare nowadays; as far as I know only Mac OS up to version 9 used those. Everyone using a Mac is running OS X now, which uses LF line endings just like any other Unix-derived system. And anyways, you cannot know the line endings of a file until you have read all of it, which is not really practical for a streaming IO layer. Cheers, -Jan ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Strange behavior with print
Sorry more info: I am using: Binary build 820 [274739] provided by ActiveState http://www.ActiveState.com Built Jan 23 2007 15:57:46 --- I have a short program that extracts text file from a .gz archive using Compress::ZLIB. Using while ($gz-gzreadline($line) 0) I read each line, potential process it, then write the line to a new file. If print the length of $line it show a length of the text plus two characters for 0D and 0A. When I use: print length=.length($line) .\n; my @a = split //,$line; foreach my $x (@a){ print $x . . ord($x) .\n; } I see exactly what I expect. However when I print the line to a file and view the file using two different Hex editors, I see that the line terminates with 0D 0D 0A. I am not using chomp, chop, or adding any characters to the variable during the print. When I again use Perl to read the new file and use the sample code above to analyze the input line, Perl again shows me the data I would expect to see. That is a text line ending in just 0D 0A. When check the length of the input string it matches that number of what was supplied to the print command. Is there some feature I am missing here or is this a bug? ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Strange behavior with print
I have a short program that extracts text file from a .gz archive using Compress::ZLIB. Using while ($gz-gzreadline($line) 0) I read each line, potential process it, then write the line to a new file. If print the length of $line it show a length of the text plus two characters for 0D and 0A. When I use: print length=.length($line) .\n; my @a = split //,$line; foreach my $x (@a){ print $x . . ord($x) .\n; } I see exactly what I expect. However when I print the line to a file and view the file using two different Hex editors, I see that the line terminates with 0D 0D 0A. I am not using chomp, chop, or adding any characters to the variable during the print. When I again use Perl to read the new file and use the sample code above to analyze the input line, Perl again shows me the data I would expect to see. That is a text line ending in just 0D 0A. When check the length of the input string it matches that number of what was supplied to the print command. Is there some feature I am missing here or is this a bug? ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Strange behavior with print
I appear that perl's print command when using a file handle is changing each occurrence of chr(10) to (chr(13) chr(10)). This does not seem to be a helpful or proper action when working with defined literal strings. I can make some sense out of file:///C:/Perl/html/lib/Pod/perlport.html#newlines but do see how that is relevant to literal printing strings. [Bullock, Howard A.] Sorry more info: I am using: Binary build 820 [274739] provided by ActiveState http://www.ActiveState.com Built Jan 23 2007 15:57:46 --- I have a short program that extracts text file from a .gz archive using Compress::ZLIB. Using while ($gz-gzreadline($line) 0) I read each line, potential process it, then write the line to a new file. If print the length of $line it show a length of the text plus two characters for 0D and 0A. When I use: print length=.length($line) .\n; my @a = split //,$line; foreach my $x (@a){ print $x . . ord($x) .\n; } I see exactly what I expect. However when I print the line to a file and view the file using two different Hex editors, I see that the line terminates with 0D 0D 0A. I am not using chomp, chop, or adding any characters to the variable during the print. When I again use Perl to read the new file and use the sample code above to analyze the input line, Perl again shows me the data I would expect to see. That is a text line ending in just 0D 0A. When check the length of the input string it matches that number of what was supplied to the print command. Is there some feature I am missing here or is this a bug? ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Strange behavior with print
You need to change your file handle to binary mode to prevent the line ending transformations: binmode(STDOUT); Please read `perldoc -f binmode`. Cheers, -Jan -Original Message- From: [EMAIL PROTECTED] [mailto:perl-win32-users- [EMAIL PROTECTED] On Behalf Of Bullock, Howard A. Sent: December 27, 2007 8:25 AM To: perl-win32-users@listserv.activestate.com Subject: RE: Strange behavior with print I appear that perl's print command when using a file handle is changing each occurrence of chr(10) to (chr(13) chr(10)). This does not seem to be a helpful or proper action when working with defined literal strings. I can make some sense out of file:///C:/Perl/html/lib/Pod/perlport.html#newlines but do see how that is relevant to literal printing strings. [Bullock, Howard A.] Sorry more info: I am using: Binary build 820 [274739] provided by ActiveState http://www.ActiveState.com Built Jan 23 2007 15:57:46 --- I have a short program that extracts text file from a .gz archive using Compress::ZLIB. Using while ($gz-gzreadline($line) 0) I read each line, potential process it, then write the line to a new file. If print the length of $line it show a length of the text plus two characters for 0D and 0A. When I use: print length=.length($line) .\n; my @a = split //,$line; foreach my $x (@a){ print $x . . ord($x) .\n; } I see exactly what I expect. However when I print the line to a file and view the file using two different Hex editors, I see that the line terminates with 0D 0D 0A. I am not using chomp, chop, or adding any characters to the variable during the print. When I again use Perl to read the new file and use the sample code above to analyze the input line, Perl again shows me the data I would expect to see. That is a text line ending in just 0D 0A. When check the length of the input string it matches that number of what was supplied to the print command. Is there some feature I am missing here or is this a bug? ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs