RE: How to strip out EOF characters
I might also add that: 1) I did not have to use binmode. 2) I came up with the correct number of lines (5) Note that even if you do not have an ending CrLf at the end of the file, the last line will be counted. If you you have any characters after the final CrLf, they will be counted as a new line (another way of stating the same thing). -- Chris Snyder I wrote: Seems like a little case of WYSINWYG or What You see is NOT What You Get. It may even be a bug somewhere (presumably in the command window). I ran a simple test in the debugger with this file: this is a line this is a second line this one has the 1A (X) character this is the fourth line this is the fifth line Where the X is, I replaced it with hex 1A. Then I ran a simple read and print script, outputting to STDOUT and a file: BEGIN use strict; open FH, "<1a.txt" or die "cannot find file"; open OUT, ">test.txt" or die "no output"; my $numLines = 0; while () { print OUT "...$_"; print "...$_"; $numLines++; } #print "Number of lines = $numLines\n"; END The screen output does not match the file output. screen output: ...this is a line ...this is a second line ...this one has the 1A (...this is the fourth line ...this is the fifth line file output: ...this is a line ...this is a second line ...this one has the 1A (X) character ...this is the fourth line ...this is the fifth line The file output I retrieved by using a text editor. However, if you just use "type test.txt", the output looks like the screen output. So, the first thing to learn is that if you print a string with a hex 1A directly to the screen, Windoze barfs. That does not mean that the string is corrupt, just the printing to the screen. If you thought that was a little confusing, try it in a debugger. It gets worse. This is a screen capture from debugging after I got to the line with the hex 1A: --- snip --- DB<4> p $_ this one has the 1A ( DB<5> use Data::Dumper DB<6> p Data::Dumper::Dumper $_ $VAR1 = 'this one has the 1A ('; DB<7> p length $_ 34 DB<8> p substr $_, 22 ) character --- snip --- Printing the string appears to truncate it. Even using Data::Dumper it appears to truncate it (see DB<6>). However, the whole string is there (see DB<7>) as the length of the string reflects the correct length. If you print a substring after the hex 1A character (see DB<9>, you see that it is, in fact, still there. So what conclusion can we reach, class? The obvious one is "Use Unix", but that is not germane. I, personally, conclude that we need to be wary when dealing with hex 1A and command window in a Windoze environment -- WYSINWYG. I would also imagine that it is dependent upon the version that you run. My company hates me, so I am running Windows 98. Hopefully, things are better under XP. I can test it at home (where my wife likes me and I run XP). Finally, don't reach any conclusion based upon what is painted on the screen in this case. Print to a file if need be. Look at the original file and the output file (using a hex editor, if necessary) to reach the right conclusion. Anyone want to repeat my steps under other versions of Windoze and let me know if you get the same results? -- Chris Snyder ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: How to strip out EOF characters
Charlie Anderson wrote: --- snip --- I need to count the number of lines in a series of files for a database load validation. However, some of the files have hex 1A imbedded in the middle of the file, so my while loop dies a premature death. Any suggestions on how to proceed would be welcome. Snip below, TIA, --- snip --- Seems like a little case of WYSINWYG or What You see is NOT What You Get. It may even be a bug somewhere (presumably in the command window). I ran a simple test in the debugger with this file: this is a line this is a second line this one has the 1A (X) character this is the fourth line this is the fifth line Where the X is, I replaced it with hex 1A. Then I ran a simple read and print script, outputting to STDOUT and a file: BEGIN use strict; open FH, "<1a.txt" or die "cannot find file"; open OUT, ">test.txt" or die "no output"; my $numLines = 0; while () { print OUT "...$_"; print "...$_"; $numLines++; } #print "Number of lines = $numLines\n"; END The screen output does not match the file output. screen output: ...this is a line ...this is a second line ...this one has the 1A (...this is the fourth line ...this is the fifth line file output: ...this is a line ...this is a second line ...this one has the 1A (X) character ...this is the fourth line ...this is the fifth line The file output I retrieved by using a text editor. However, if you just use "type test.txt", the output looks like the screen output. So, the first thing to learn is that if you print a string with a hex 1A directly to the screen, Windoze barfs. That does not mean that the string is corrupt, just the printing to the screen. If you thought that was a little confusing, try it in a debugger. It gets worse. This is a screen capture from debugging after I got to the line with the hex 1A: --- snip --- DB<4> p $_ this one has the 1A ( DB<5> use Data::Dumper DB<6> p Data::Dumper::Dumper $_ $VAR1 = 'this one has the 1A ('; DB<7> p length $_ 34 DB<8> p substr $_, 22 ) character --- snip --- Printing the string appears to truncate it. Even using Data::Dumper it appears to truncate it (see DB<6>). However, the whole string is there (see DB<7>) as the length of the string reflects the correct length. If you print a substring after the hex 1A character (see DB<9>, you see that it is, in fact, still there. So what conclusion can we reach, class? The obvious one is "Use Unix", but that is not germane. I, personally, conclude that we need to be wary when dealing with hex 1A and command window in a Windoze environment -- WYSINWYG. I would also imagine that it is dependent upon the version that you run. My company hates me, so I am running Windows 98. Hopefully, things are better under XP. I can test it at home (where my wife likes me and I run XP). Finally, don't reach any conclusion based upon what is painted on the screen in this case. Print to a file if need be. Look at the original file and the output file (using a hex editor, if necessary) to reach the right conclusion. Anyone want to repeat my steps under other versions of Windoze and let me know if you get the same results? -- Chris Snyder ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: How to strip out EOF characters
Isn't it a little silly to go on to try and read from whichever file handle after the open has failed? You've got two flags set (errorcheck2 and error_flag), so at least use one to skip around the "while <>". It looks like you're expecting to have a number of possible failures - make sure to include useful data like file name that failed and why: if ($errorcheck2 eq "TRUE") { print mainlog "ERROR \- data table $table$data_ext missing for validation ($fileDir open failed: $!)\n"; $error_flag = "TRUE"; } Andy Bach, Sys. Mangler Internet: [EMAIL PROTECTED] VOICE: (608) 261-5738 FAX 264-5030 "I'm just doing my job, nothing personal, sorry." ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: How to strip out EOF characters
Charlie Anderson wrote: > Here is the complete (test) script, its part of a larger script, but this > one fails all by itself. > > $fileDir = shift @ARGV; > # $myvar = "srctree.txt"; > print "opening file $fileDir\n"; > open (tablehandle, $fileDir) || ($errorcheck2 = "TRUE"); >if ($errorcheck2 eq "TRUE") > { > print mainlog "ERROR \- data table $table$data_ext missing for > validation\n"; > $error_flag = "TRUE"; > } >$filerowcount = 0; >binmode tablehande; >while () > # until (eof (tablehandle)==1) > { > $tableline = $_; > # $tableline = ; > $filerowcount ++; > print "$tableline\n"; > print "current count = $filerowcount\n"; > } > close (tablehandle); > print "total count $filerowcount\n"; This is not a complete program and doesn't compile. When you post a snippet to this group, 1) put a use strict and -w (or use warnings) at the top. 2) run the script as you will be posting it to make sure it fails as described. Here's my altered version which runs fine (I inserted a ^Z into the middle of your data using VIM for testing). It produced an additional line of output: current count = 12 ? current count = 13 use strict; my $fileDir = shift @ARGV || 'data/foo.txt'; print "opening file $fileDir\n"; my $errorcheck2; my $error_flag; open TH, $fileDir or $errorcheck2 = "TRUE"; if ($errorcheck2 eq "TRUE") { print "ERROR - data table $fileDir missing for validation\n"; $error_flag = "TRUE"; } my $filerowcount = 0; binmode TH; while () { my $tableline = $_; $filerowcount++; print "$tableline\n"; print "current count = $filerowcount\n"; } close TH; print "total count $filerowcount\n"; __END__ > Also, > > Here is part of the file to be processed. > > DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING ROLLFORWARD_RPTG > 99404Cumultv EOC in Actg Principle Cumultv EORT_GROUPS Accounting > RT_DETAILSRoll Forward Reporting 4 5 > DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING ROLLFORWARD_RPTG > 99405Purchase Accounting Adj. Purchase ART_GROUPS Accounting > RT_DETAILSRoll Forward Reporting 4 5 > DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING ROLLFORWARD_RPTG > 99406Interacct Transfers (reclassesInteracct RT_GROUPS Accounting > RT_DETAILSRoll Forward Reporting 4 5 > DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING ROLLFORWARD_RPTG > 99407Intercompany TransfersIntercompaRT_GROUPS Accounting > RT_DETAILSRoll Forward Reporting 4 5 > DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING ROLLFORWARD_RPTG > 99408Add'tInvestment Expenditures Add'tInveRT_GROUPS Accounting > RT_DETAILSRoll Forward Reporting 4 5 > DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING ROLLFORWARD_RPTG > 99409Add'tInc's in Resv/Accrl Exp. Add'tInc'RT_GROUPS Accounting > RT_DETAILSRoll Forward Reporting 4 5 > DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING ROLLFORWARD_RPTG > 99411Add'tOth Bal Sheet Chn'g Add'tOth RT_GROUPS Accounting > RT_DETAILSRoll Forward Reporting 4 5 > DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING ROLLFORWARD_RPTG > 99412Add't-Capital ExpendituresAdd't-CapiRT_GROUPS Accounting > RT_DETAILSRoll Forward Reporting 4 5 > DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING ROLLFORWARD_RPTG > 99413Add't-Increase in DebtAdd't-IncrRT_GROUPS Accounting > RT_DETAILSRoll Forward Reporting 4 5 -- ,-/- __ _ _ $Bill LuebkertMailto:[EMAIL PROTECTED] (_/ / )// // DBE CollectiblesMailto:[EMAIL PROTECTED] / ) /--< o // // Castle of Medieval Myth & Magic http://www.todbe.com/ -/-' /___/_<_http://dbecoll.tripod.com/ (My Perl/Lakers stuff) ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: How to strip out EOF characters
Here is the complete (test) script, its part of a larger script, but this one fails all by itself. $fileDir = shift @ARGV; # $myvar = "srctree.txt"; print "opening file $fileDir\n"; open (tablehandle, $fileDir) || ($errorcheck2 = "TRUE"); if ($errorcheck2 eq "TRUE") { print mainlog "ERROR \- data table $table$data_ext missing for validation\n"; $error_flag = "TRUE"; } $filerowcount = 0; binmode tablehande; while () # until (eof (tablehandle)==1) { $tableline = $_; # $tableline = ; $filerowcount ++; print "$tableline\n"; print "current count = $filerowcount\n"; } close (tablehandle); print "total count $filerowcount\n"; Also, Here is part of the file to be processed. DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING ROLLFORWARD_RPTG 99404Cumultv EOC in Actg Principle Cumultv EORT_GROUPS Accounting RT_DETAILSRoll Forward Reporting 4 5 DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING ROLLFORWARD_RPTG 99405Purchase Accounting Adj. Purchase ART_GROUPS Accounting RT_DETAILSRoll Forward Reporting 4 5 DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING ROLLFORWARD_RPTG 99406Interacct Transfers (reclassesInteracct RT_GROUPS Accounting RT_DETAILSRoll Forward Reporting 4 5 DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING ROLLFORWARD_RPTG 99407Intercompany TransfersIntercompaRT_GROUPS Accounting RT_DETAILSRoll Forward Reporting 4 5 DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING ROLLFORWARD_RPTG 99408Add'tInvestment Expenditures Add'tInveRT_GROUPS Accounting RT_DETAILSRoll Forward Reporting 4 5 DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING ROLLFORWARD_RPTG 99409Add'tInc's in Resv/Accrl Exp. Add'tInc'RT_GROUPS Accounting RT_DETAILSRoll Forward Reporting 4 5 DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING ROLLFORWARD_RPTG 99411Add'tOth Bal Sheet Chn'g Add'tOth RT_GROUPS Accounting RT_DETAILSRoll Forward Reporting 4 5 DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING ROLLFORWARD_RPTG 99412Add't-Capital ExpendituresAdd't-CapiRT_GROUPS Accounting RT_DETAILSRoll Forward Reporting 4 5 DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING ROLLFORWARD_RPTG 99413Add't-Increase in DebtAdd't-IncrRT_GROUPS Accounting RT_DETAILSRoll Forward Reporting 4 5 DU Charlie Anderson Duke Energy Field Services - IT 303-605-1689 "$Bill Luebkert" <[EMAIL PROTECTED]To: Charlie Anderson <[EMAIL PROTECTED]> .net>cc: [EMAIL PROTECTED] Subject: Re: How to strip out EOF characters 02/03/2004 02:30 PM Charlie Anderson wrote: > I need to count the number of lines in a series of files for a database > load validation. However, some of the files have hex 1A imbedded in the > middle of the file, so my while loop dies a premature death. Any > suggestions on how to proceed would be welcome. Snip below, TIA, The binmode should allow you to read any binary data including an Windoze EOF. There must be something you're not showing us. How about a *complete* failing snippet (small as possible) - you have to be doing something wrong to come to your conclusion. > open (tablehandle, $fileDir) || die; > binmode tablehandle; > while () { > $tableline = $_; > $filerowcount ++; > } > print "total count $filerowcount\n"; > close (tablehandle); You might also wish to change tablehandle to FH or TH to eliminate any warnings. -- ,-/- __ _ _ $Bill LuebkertMailto:[EMAIL PROTECTED] (_/ / )// // DBE CollectiblesMailto:[EMAIL PROTECTED] / ) /--< o // // Castle of Medieval Myth & Magic http://www.todbe.com/ -/-' /___/_<_http://dbecoll.tripod.com/ (My Perl/Lakers stuff) ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: How to strip out EOF characters
Charlie Anderson wrote: > I need to count the number of lines in a series of files for a database > load validation. However, some of the files have hex 1A imbedded in the > middle of the file, so my while loop dies a premature death. Any > suggestions on how to proceed would be welcome. Snip below, TIA, The binmode should allow you to read any binary data including an Windoze EOF. There must be something you're not showing us. How about a *complete* failing snippet (small as possible) - you have to be doing something wrong to come to your conclusion. > open (tablehandle, $fileDir) || die; > binmode tablehandle; > while () { > $tableline = $_; > $filerowcount ++; > } > print "total count $filerowcount\n"; > close (tablehandle); You might also wish to change tablehandle to FH or TH to eliminate any warnings. -- ,-/- __ _ _ $Bill LuebkertMailto:[EMAIL PROTECTED] (_/ / )// // DBE CollectiblesMailto:[EMAIL PROTECTED] / ) /--< o // // Castle of Medieval Myth & Magic http://www.todbe.com/ -/-' /___/_<_http://dbecoll.tripod.com/ (My Perl/Lakers stuff) ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
How to strip out EOF characters
I need to count the number of lines in a series of files for a database load validation. However, some of the files have hex 1A imbedded in the middle of the file, so my while loop dies a premature death. Any suggestions on how to proceed would be welcome. Snip below, TIA, open (tablehandle, $fileDir) || die; binmode tablehandle; while () { $tableline = $_; $filerowcount ++; } print "total count $filerowcount\n"; close (tablehandle); Charlie Anderson Duke Energy Field Services - IT 303-605-1689 ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs