Re: Strange behavior with print

2008-01-03 Thread Mike Gillis
Chris Wagner wrote:
 What I was specifically thinking of was handling DOS text files on Unix.
 Activestate on Unix has no conception of CRLF as a new line sequence
 out-of-the-box.  This is a case where u have to manually inspect the file
 and either set $/ (and reread the file) or strip the CR's.  Having a :text
 IO discipline would fulfill portability.  What I did was create a chomp2
 function that does s/[\cM]?\cJ$//.

You might want to take a look at PerlIO::eol, an IO layer which does 
basically what you're asking for.

 From the docs:

This layer normalizes any of CR, LF, CRLF and Native into the 
designated line ending. It works for both input and output handles.

http://search.cpan.org/~audreyt/PerlIO-eol-0.14/eol.pm


 I have an app that spits out CR new line
 sequences the output of which I parse with a perl script.  Normally I just
 slurp smaller files into an array but in this case I had to add a split /\r/
 to get the lines from the file handle.

open my $crfile, , $filename or die(Couldn't open $filename: $!);
my @lines;
{
local $/ = \r;
@lines = $crfile;
}


 open FILE, :text, obnoxious.txt;
 $LE = textmode(FILE); #$LE will be a discipline ref suitable for open()
 open OUTPUT, :$LE, newfile.txt; #OUTPUT will have the same NLS as FILE

PerlIO::eol operates on the IO stream, and silently translates line 
endings into whatever you request -- but it doesn't have a concept of 
what the default line ending for a file is, so you still couldn't do the 
above - you'd have to handle it yourself.

I would guess what you're asking for isn't implemented because it breaks 
the stream model, even though it is usually valid to assume the first 
line ending is the way it will be the rest of the way through. In any 
case, it's pretty easy to do yourself, and I don't think there's much 
demand for it.

Hope the above helps.

--
Mike Gillis
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Strange behavior with print

2008-01-03 Thread Chris Wagner
At 10:17 AM 1/3/2008 -0800, Mike Gillis wrote:
You might want to take a look at PerlIO::eol, an IO layer which does 
basically what you're asking for.

 From the docs:

This layer normalizes any of CR, LF, CRLF and Native into the 
designated line ending. It works for both input and output handles.

Cool, I'll check that out.  It would be nice if this kind of functionality
as well as the ability to return the NLS could be incorporated into PerlIO core.



open my $crfile, , $filename or die(Couldn't open $filename: $!);
my @lines;
{
local $/ = \r;
@lines = $crfile;
}

Right.  I just felt like doing split there for some reason. ;)





--
REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =--
...ne cede malis

0100

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Strange behavior with print

2007-12-29 Thread Chris Wagner
At 05:27 PM 12/28/2007 -0800, Jan Dubois wrote:
I'm not sure what you are trying to say here.  While reading a file
in text mode on Windows, it does not really matter if the lines are
terminated by CRLF or LF alone; the CR will be stripped and at the
Perl level you always have just a LF, just like on Unix.

What I was specifically thinking of was handling DOS text files on Unix.
Activestate on Unix has no conception of CRLF as a new line sequence
out-of-the-box.  This is a case where u have to manually inspect the file
and either set $/ (and reread the file) or strip the CR's.  Having a :text
IO discipline would fulfill portability.  What I did was create a chomp2
function that does s/[\cM]?\cJ$//.


Files with CR line endings are extremely rare nowadays; as far as I

Rare, true, but irrelevant.  I have an app that spits out CR new line
sequences the output of which I parse with a perl script.  Normally I just
slurp smaller files into an array but in this case I had to add a split /\r/
to get the lines from the file handle.  Not a burden at all but unelegant.


And anyways, you cannot know the line endings of a file until you have
read all of it, which is not really practical for a streaming IO layer.

Eh, not really.  If u have something that u know is a text file u can be
pretty sure of the line ending after hitting the first one.  A :text
discipline could function by reading in bytes until it hits a CRLF, LF, or
CR. At that point it would set that as $/ for that handle.  It would be nice
if there were a method for getting $/ on a per handle basis.  What if u need
to read in two files of different line endings at the same time and write
them out to similarly ended new files?  How do u set $/ in that case?  AFAIK
$/ is global and doesn't have a per handle character.

psuedo code:
open FILE, :text, obnoxious.txt;
$LE = textmode(FILE); #$LE will be a discipline ref suitable for open()
open OUTPUT, :$LE, newfile.txt; #OUTPUT will have the same NLS as FILE

Portable and elegant. :)





--
REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =--
...ne cede malis

0100

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Strange behavior with print

2007-12-28 Thread Chris Wagner
I see what ur saying and actually agree with u.  But PerlIO automatically
(this is one automatic thing I don't like) translates between CRLF and LF on
Windows systems.  As Jan said setting binmode() will solve ur problem.  

I personally feel that binmode should be the default everywhere and it
should be the programmer's responsibility to decide how to interpret the new
line sequences.  FYI there are three possible new line sequences, CR, CRLF,
and LF, depending on what system ur on.  In this day and age it's no longer
useful to assume what kind of new line sequences u have based on ur system
type.  U might very well have to deal with all three at the same time or a
contrary type to ur system's native type.

It would be nice if there were a :text IO layer that would automatically
determine the new line sequence of a particular text file and do the
translation, or not, automatically.  Currently u have to do an obnoxious
manual inspection of the file to know what $/ is. *cough*Jan*cough* ;)

HTH


At 11:24 AM 12/27/2007 -0500, Bullock, Howard A. wrote:
I appear that perl's print command when using a file handle is changing
each occurrence of chr(10) to (chr(13) chr(10)). This does not seem to
be a helpful or proper action when working with defined literal strings.

I can make some sense out of
file:///C:/Perl/html/lib/Pod/perlport.html#newlines but do see how that
is relevant to literal printing strings.





--
REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =--
...ne cede malis

0100

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Strange behavior with print

2007-12-28 Thread Jan Dubois
On Fri, 28 Dec 2007, Chris Wagner wrote:
 It would be nice if there were a :text IO layer that would automatically
 determine the new line sequence of a particular text file and do the
 translation, or not, automatically.  Currently u have to do an obnoxious
 manual inspection of the file to know what $/ is. *cough*Jan*cough* ;)

I'm not sure what you are trying to say here.  While reading a file
in text mode on Windows, it does not really matter if the lines are
terminated by CRLF or LF alone; the CR will be stripped and at the
Perl level you always have just a LF, just like on Unix.

Files with CR line endings are extremely rare nowadays; as far as I
know only Mac OS up to version 9 used those.  Everyone using a Mac
is running OS X now, which uses LF line endings just like any other
Unix-derived system.

And anyways, you cannot know the line endings of a file until you have
read all of it, which is not really practical for a streaming IO layer.

Cheers,
-Jan

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Strange behavior with print

2007-12-27 Thread Bullock, Howard A.
Sorry more info:  

I am using: Binary build 820 [274739] provided by ActiveState
http://www.ActiveState.com Built Jan 23 2007 15:57:46

---

I have a short program that extracts text file from a .gz archive using
Compress::ZLIB.
Using while ($gz-gzreadline($line)  0) I read each line, potential
process it, then write the line to a new file.

If print the length of $line it show a length of the text plus two
characters for 0D and 0A.
When I use:

print  length=.length($line) .\n;
my @a = split //,$line;
foreach my $x (@a){
   print $x .   . ord($x) .\n;
}

I see exactly what I expect.

However when I print the line to a file and view the file using two
different Hex editors, I see that the line terminates with 0D 0D 0A. I
am not using chomp, chop, or adding any characters to the variable
during the print.

When I again use Perl to read the new file and use the sample code above
to analyze the input line, Perl again shows me the data I would expect
to see. That is a text line ending in just 0D 0A.

When check the length of the input string it matches that number of what
was supplied to the print command.

Is there some feature I am missing here or is this a bug?
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Strange behavior with print

2007-12-27 Thread Bullock, Howard A.
I have a short program that extracts text file from a .gz archive using
Compress::ZLIB.
Using while ($gz-gzreadline($line)  0) I read each line, potential
process it, then write the line to a new file.

If print the length of $line it show a length of the text plus two
characters for 0D and 0A.
When I use:

print  length=.length($line) .\n;
my @a = split //,$line;
foreach my $x (@a){
   print $x .   . ord($x) .\n;
}

I see exactly what I expect.

However when I print the line to a file and view the file using two
different Hex editors, I see that the line terminates with 0D 0D 0A. I
am not using chomp, chop, or adding any characters to the variable
during the print.

When I again use Perl to read the new file and use the sample code above
to analyze the input line, Perl again shows me the data I would expect
to see. That is a text line ending in just 0D 0A.

When check the length of the input string it matches that number of what
was supplied to the print command.

Is there some feature I am missing here or is this a bug?
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Strange behavior with print

2007-12-27 Thread Bullock, Howard A.
I appear that perl's print command when using a file handle is changing
each occurrence of chr(10) to (chr(13) chr(10)). This does not seem to
be a helpful or proper action when working with defined literal strings.

I can make some sense out of
file:///C:/Perl/html/lib/Pod/perlport.html#newlines but do see how that
is relevant to literal printing strings.

[Bullock, Howard A.] 
Sorry more info:  

I am using: Binary build 820 [274739] provided by ActiveState
http://www.ActiveState.com Built Jan 23 2007 15:57:46

---

I have a short program that extracts text file from a .gz archive using
Compress::ZLIB.
Using while ($gz-gzreadline($line)  0) I read each line, potential
process it, then write the line to a new file.

If print the length of $line it show a length of the text plus two
characters for 0D and 0A.
When I use:

print  length=.length($line) .\n;
my @a = split //,$line;
foreach my $x (@a){
   print $x .   . ord($x) .\n;
}

I see exactly what I expect.

However when I print the line to a file and view the file using two
different Hex editors, I see that the line terminates with 0D 0D 0A. I
am not using chomp, chop, or adding any characters to the variable
during the print.

When I again use Perl to read the new file and use the sample code above
to analyze the input line, Perl again shows me the data I would expect
to see. That is a text line ending in just 0D 0A.

When check the length of the input string it matches that number of what
was supplied to the print command.

Is there some feature I am missing here or is this a bug?
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Strange behavior with print

2007-12-27 Thread Jan Dubois
You need to change your file handle to binary mode to prevent the
line ending transformations:

binmode(STDOUT);

Please read `perldoc -f binmode`.

Cheers,
-Jan


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:perl-win32-users-
 [EMAIL PROTECTED] On Behalf Of Bullock, Howard A.
 Sent: December 27, 2007 8:25 AM
 To: perl-win32-users@listserv.activestate.com
 Subject: RE: Strange behavior with print
 
 I appear that perl's print command when using a file handle is changing
 each occurrence of chr(10) to (chr(13) chr(10)). This does not seem to
 be a helpful or proper action when working with defined literal strings.
 
 I can make some sense out of
 file:///C:/Perl/html/lib/Pod/perlport.html#newlines but do see how that
 is relevant to literal printing strings.
 
 [Bullock, Howard A.]
 Sorry more info:
 
 I am using: Binary build 820 [274739] provided by ActiveState
 http://www.ActiveState.com Built Jan 23 2007 15:57:46
 
 ---
 
 I have a short program that extracts text file from a .gz archive using
 Compress::ZLIB.
 Using while ($gz-gzreadline($line)  0) I read each line, potential
 process it, then write the line to a new file.
 
 If print the length of $line it show a length of the text plus two
 characters for 0D and 0A.
 When I use:
 
 print  length=.length($line) .\n;
 my @a = split //,$line;
 foreach my $x (@a){
print $x .   . ord($x) .\n;
 }
 
 I see exactly what I expect.
 
 However when I print the line to a file and view the file using two
 different Hex editors, I see that the line terminates with 0D 0D 0A. I
 am not using chomp, chop, or adding any characters to the variable
 during the print.
 
 When I again use Perl to read the new file and use the sample code above
 to analyze the input line, Perl again shows me the data I would expect
 to see. That is a text line ending in just 0D 0A.
 
 When check the length of the input string it matches that number of what
 was supplied to the print command.
 
 Is there some feature I am missing here or is this a bug?
 ___
 Perl-Win32-Users mailing list
 Perl-Win32-Users@listserv.ActiveState.com
 To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
 ___
 Perl-Win32-Users mailing list
 Perl-Win32-Users@listserv.ActiveState.com
 To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs