RE: How to strip out EOF characters

2004-02-03 Thread Chris Snyder
I might also add that:
1) I did not have to use binmode.
2) I came up with the correct number of lines (5)

Note that even if you do not have an ending CrLf at the end of the file, the
last line will be counted.  If you you have any characters after the final
CrLf, they will be counted as a new line (another way of stating the same
thing).

-- Chris Snyder


I wrote:

Seems like a little case of WYSINWYG or What You see is NOT What You Get.
It may even be a bug somewhere (presumably in the command window).

I ran a simple test in the debugger with this file:

this is a line
this is a second line
this one has the 1A (X) character
this is the fourth line
this is the fifth line

Where the X is, I replaced it with hex 1A.

Then I ran a simple read and print script, outputting to STDOUT and a file:

 BEGIN 
use strict;

open FH, "<1a.txt" or die "cannot find file";
open OUT, ">test.txt" or die "no output";

my $numLines = 0;
while () {
print OUT "...$_";
print "...$_";
$numLines++;
}
#print "Number of lines = $numLines\n";
 END 

The screen output does not match the file output.
screen output:

...this is a line
...this is a second line
...this one has the 1A (...this is the fourth line
...this is the fifth line

file output:
...this is a line
...this is a second line
...this one has the 1A (X) character
...this is the fourth line
...this is the fifth line

The file output I retrieved by using a text editor.  However, if you just
use "type test.txt", the output looks like the screen output.  So, the first
thing to learn is that if you print a string with a hex 1A directly to the
screen, Windoze barfs.  That does not mean that the string is corrupt, just
the printing to the screen.

If you thought that was a little confusing, try it in a debugger.  It gets
worse.

This is a screen capture from debugging after I got to the line with the hex
1A:

--- snip ---
  DB<4> p $_
this one has the 1A (
  DB<5> use Data::Dumper

  DB<6> p Data::Dumper::Dumper $_
$VAR1 = 'this one has the 1A (';

  DB<7> p length $_
34
  DB<8> p substr $_, 22
) character
--- snip ---

Printing the string appears to truncate it.  Even using Data::Dumper it
appears to truncate it (see DB<6>).  However, the whole string is there (see
DB<7>) as the length of the string reflects the correct length.  If you
print a substring after the hex 1A character (see DB<9>, you see that it is,
in fact, still there.

So what conclusion can we reach, class?  The obvious one is "Use Unix", but
that is not germane.  I, personally,  conclude that we need to be wary when
dealing with hex 1A and command window in a Windoze environment -- WYSINWYG.
I would also imagine that it is dependent upon the version that you run.  My
company hates me, so I am running Windows 98.  Hopefully, things are better
under XP.  I can test it at home (where my wife likes me and I run XP).

Finally, don't reach any conclusion based upon what is painted on the screen
in this case.  Print to a file if need be.  Look at the original file and
the output file (using a hex editor, if necessary) to reach the right
conclusion.

Anyone want to repeat my steps under other versions of Windoze and let me
know if you get the same results?

-- Chris Snyder


___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: How to strip out EOF characters

2004-02-03 Thread Chris Snyder
Charlie Anderson wrote:

--- snip ---
I need to count the number of lines in a series of files for a database
load validation.  However, some of the files have hex 1A imbedded in the
middle of the file, so my while loop dies a premature death.  Any
suggestions on how to proceed would be welcome.  Snip below, TIA,
--- snip ---

Seems like a little case of WYSINWYG or What You see is NOT What You Get.
It may even be a bug somewhere (presumably in the command window).

I ran a simple test in the debugger with this file:

this is a line
this is a second line
this one has the 1A (X) character
this is the fourth line
this is the fifth line

Where the X is, I replaced it with hex 1A.

Then I ran a simple read and print script, outputting to STDOUT and a file:

 BEGIN 
use strict;

open FH, "<1a.txt" or die "cannot find file";
open OUT, ">test.txt" or die "no output";

my $numLines = 0;
while () {
print OUT "...$_";
print "...$_";
$numLines++;
}
#print "Number of lines = $numLines\n";
 END 

The screen output does not match the file output.
screen output:

...this is a line
...this is a second line
...this one has the 1A (...this is the fourth line
...this is the fifth line

file output:
...this is a line
...this is a second line
...this one has the 1A (X) character
...this is the fourth line
...this is the fifth line

The file output I retrieved by using a text editor.  However, if you just
use "type test.txt", the output looks like the screen output.  So, the first
thing to learn is that if you print a string with a hex 1A directly to the
screen, Windoze barfs.  That does not mean that the string is corrupt, just
the printing to the screen.

If you thought that was a little confusing, try it in a debugger.  It gets
worse.

This is a screen capture from debugging after I got to the line with the hex
1A:

--- snip ---
  DB<4> p $_
this one has the 1A (
  DB<5> use Data::Dumper

  DB<6> p Data::Dumper::Dumper $_
$VAR1 = 'this one has the 1A (';

  DB<7> p length $_
34
  DB<8> p substr $_, 22
) character
--- snip ---

Printing the string appears to truncate it.  Even using Data::Dumper it
appears to truncate it (see DB<6>).  However, the whole string is there (see
DB<7>) as the length of the string reflects the correct length.  If you
print a substring after the hex 1A character (see DB<9>, you see that it is,
in fact, still there.

So what conclusion can we reach, class?  The obvious one is "Use Unix", but
that is not germane.  I, personally,  conclude that we need to be wary when
dealing with hex 1A and command window in a Windoze environment -- WYSINWYG.
I would also imagine that it is dependent upon the version that you run.  My
company hates me, so I am running Windows 98.  Hopefully, things are better
under XP.  I can test it at home (where my wife likes me and I run XP).

Finally, don't reach any conclusion based upon what is painted on the screen
in this case.  Print to a file if need be.  Look at the original file and
the output file (using a hex editor, if necessary) to reach the right
conclusion.

Anyone want to repeat my steps under other versions of Windoze and let me
know if you get the same results?

-- Chris Snyder


___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: How to strip out EOF characters

2004-02-03 Thread Andy_Bach
Isn't it a little silly to go on to try and read from whichever file 
handle after the open has failed?  You've got two flags set (errorcheck2 
and error_flag), so at least use  one to skip around the "while <>".

It looks like you're expecting to have a number of possible failures - 
make sure to include useful data like file name that failed and why:
if ($errorcheck2 eq "TRUE")
   {
  print mainlog "ERROR \- data table $table$data_ext missing for 
validation ($fileDir open failed: $!)\n";
  $error_flag = "TRUE";
 }

Andy Bach, Sys. Mangler
Internet: [EMAIL PROTECTED] 
VOICE: (608) 261-5738  FAX 264-5030

"I'm just doing my job, nothing personal, sorry."
___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: How to strip out EOF characters

2004-02-03 Thread $Bill Luebkert
Charlie Anderson wrote:

> Here is the complete (test) script, its part of a larger script, but this
> one fails all by itself.
> 
> $fileDir = shift @ARGV;
> # $myvar = "srctree.txt";
> print "opening file $fileDir\n";
>   open (tablehandle, $fileDir) || ($errorcheck2 = "TRUE");
>if ($errorcheck2 eq "TRUE")
> {
>  print mainlog "ERROR \- data table $table$data_ext missing for
> validation\n";
>  $error_flag = "TRUE";
> }
>$filerowcount = 0;
>binmode tablehande;
>while ()
> #  until (eof (tablehandle)==1)
> {
> $tableline = $_;
>  #  $tableline = ;
>   $filerowcount ++;
> print "$tableline\n";
>  print "current count = $filerowcount\n";
> }
> close (tablehandle);
> print "total count $filerowcount\n";

This is not a complete program and doesn't compile.  When you
post a snippet to this group, 1) put a use strict and -w (or use
warnings) at the top.  2) run the script as you will be posting
it to make sure it fails as described.

Here's my altered version which runs fine (I inserted a ^Z into
the middle of your data using VIM for testing).  It produced an
additional line of output:

current count = 12
?

current count = 13

use strict;
my $fileDir = shift @ARGV || 'data/foo.txt';

print "opening file $fileDir\n";
my $errorcheck2;
my $error_flag;
open TH, $fileDir or $errorcheck2 = "TRUE";
if ($errorcheck2 eq "TRUE") {
print "ERROR - data table $fileDir missing for validation\n";
$error_flag = "TRUE";
}
my $filerowcount = 0;
binmode TH;
while () {
my $tableline = $_;
$filerowcount++;
print "$tableline\n";
print "current count = $filerowcount\n";
}
close TH;

print "total count $filerowcount\n";

__END__


> Also,
> 
> Here is part of the file to be processed.
> 
> DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING  ROLLFORWARD_RPTG
> 99404Cumultv EOC in Actg Principle Cumultv EORT_GROUPS Accounting
> RT_DETAILSRoll Forward Reporting  4  5
> DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING  ROLLFORWARD_RPTG
> 99405Purchase Accounting Adj.  Purchase ART_GROUPS Accounting
> RT_DETAILSRoll Forward Reporting  4  5
> DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING  ROLLFORWARD_RPTG
> 99406Interacct Transfers (reclassesInteracct RT_GROUPS Accounting
> RT_DETAILSRoll Forward Reporting  4  5
> DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING  ROLLFORWARD_RPTG
> 99407Intercompany TransfersIntercompaRT_GROUPS Accounting
> RT_DETAILSRoll Forward Reporting  4  5
> DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING  ROLLFORWARD_RPTG
> 99408Add'tInvestment Expenditures  Add'tInveRT_GROUPS Accounting
> RT_DETAILSRoll Forward Reporting  4  5
> DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING  ROLLFORWARD_RPTG
> 99409Add'tInc's in Resv/Accrl Exp. Add'tInc'RT_GROUPS Accounting
> RT_DETAILSRoll Forward Reporting  4  5
> DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING  ROLLFORWARD_RPTG
> 99411Add'tOth Bal Sheet Chn'g  Add'tOth RT_GROUPS Accounting
> RT_DETAILSRoll Forward Reporting  4  5
> DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING  ROLLFORWARD_RPTG
> 99412Add't-Capital ExpendituresAdd't-CapiRT_GROUPS Accounting
> RT_DETAILSRoll Forward Reporting  4  5
> DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING  ROLLFORWARD_RPTG
> 99413Add't-Increase in DebtAdd't-IncrRT_GROUPS Accounting
> RT_DETAILSRoll Forward Reporting  4  5


-- 
  ,-/-  __  _  _ $Bill LuebkertMailto:[EMAIL PROTECTED]
 (_/   /  )// //   DBE CollectiblesMailto:[EMAIL PROTECTED]
  / ) /--<  o // //  Castle of Medieval Myth & Magic http://www.todbe.com/
-/-' /___/_<_http://dbecoll.tripod.com/ (My Perl/Lakers stuff)


___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: How to strip out EOF characters

2004-02-03 Thread Charlie Anderson

Here is the complete (test) script, its part of a larger script, but this
one fails all by itself.

$fileDir = shift @ARGV;
# $myvar = "srctree.txt";
print "opening file $fileDir\n";
  open (tablehandle, $fileDir) || ($errorcheck2 = "TRUE");
   if ($errorcheck2 eq "TRUE")
{
 print mainlog "ERROR \- data table $table$data_ext missing for
validation\n";
 $error_flag = "TRUE";
}
   $filerowcount = 0;
   binmode tablehande;
   while ()
#  until (eof (tablehandle)==1)
{
$tableline = $_;
 #  $tableline = ;
  $filerowcount ++;
print "$tableline\n";
 print "current count = $filerowcount\n";
}
close (tablehandle);
print "total count $filerowcount\n";


Also,

Here is part of the file to be processed.

DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING  ROLLFORWARD_RPTG
99404Cumultv EOC in Actg Principle Cumultv EORT_GROUPS Accounting
RT_DETAILSRoll Forward Reporting  4  5
DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING  ROLLFORWARD_RPTG
99405Purchase Accounting Adj.  Purchase ART_GROUPS Accounting
RT_DETAILSRoll Forward Reporting  4  5
DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING  ROLLFORWARD_RPTG
99406Interacct Transfers (reclassesInteracct RT_GROUPS Accounting
RT_DETAILSRoll Forward Reporting  4  5
DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING  ROLLFORWARD_RPTG
99407Intercompany TransfersIntercompaRT_GROUPS Accounting
RT_DETAILSRoll Forward Reporting  4  5
DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING  ROLLFORWARD_RPTG
99408Add'tInvestment Expenditures  Add'tInveRT_GROUPS Accounting
RT_DETAILSRoll Forward Reporting  4  5
DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING  ROLLFORWARD_RPTG
99409Add'tInc's in Resv/Accrl Exp. Add'tInc'RT_GROUPS Accounting
RT_DETAILSRoll Forward Reporting  4  5
DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING  ROLLFORWARD_RPTG
99411Add'tOth Bal Sheet Chn'g  Add'tOth RT_GROUPS Accounting
RT_DETAILSRoll Forward Reporting  4  5
DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING  ROLLFORWARD_RPTG
99412Add't-Capital ExpendituresAdd't-CapiRT_GROUPS Accounting
RT_DETAILSRoll Forward Reporting  4  5
DUKENRESOURCE_TYPE 01/01/1940ACCOUNTING  ROLLFORWARD_RPTG
99413Add't-Increase in DebtAdd't-IncrRT_GROUPS Accounting
RT_DETAILSRoll Forward Reporting  4  5
DU

Charlie Anderson
Duke Energy Field Services - IT
303-605-1689


   
 
  "$Bill Luebkert" 
 
  <[EMAIL PROTECTED]To:   Charlie Anderson <[EMAIL 
PROTECTED]>           
  .net>cc:   [EMAIL PROTECTED] 
 
   Subject:  Re: How to strip out EOF 
characters
  02/03/2004 02:30 
 
  PM   
 
   
 
   
 




Charlie Anderson wrote:

> I need to count the number of lines in a series of files for a database
> load validation.  However, some of the files have hex 1A imbedded in the
> middle of the file, so my while loop dies a premature death.  Any
> suggestions on how to proceed would be welcome.  Snip below, TIA,

The binmode should allow you to read any binary data including an
Windoze EOF.  There must be something you're not showing us.  How
about a *complete* failing snippet (small as possible) - you have to
be doing something wrong to come to your conclusion.

> open (tablehandle, $fileDir) || die;
> binmode tablehandle;
> while () {
>   $tableline = $_;
>   $filerowcount ++;
> }
> print "total count $filerowcount\n";
> close (tablehandle);

You might also wish to change tablehandle to FH or TH to
eliminate any warnings.

--
  ,-/-  __  _  _ $Bill LuebkertMailto:[EMAIL PROTECTED]
 (_/   /  )// //   DBE CollectiblesMailto:[EMAIL PROTECTED]
  / ) /--<  o // //  Castle of Medieval Myth & Magic
http://www.todbe.com/
-/-' /___/_<_http://dbecoll.tripod.com/ (My Perl/Lakers stuff)







___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: How to strip out EOF characters

2004-02-03 Thread $Bill Luebkert
Charlie Anderson wrote:

> I need to count the number of lines in a series of files for a database
> load validation.  However, some of the files have hex 1A imbedded in the
> middle of the file, so my while loop dies a premature death.  Any
> suggestions on how to proceed would be welcome.  Snip below, TIA,

The binmode should allow you to read any binary data including an
Windoze EOF.  There must be something you're not showing us.  How
about a *complete* failing snippet (small as possible) - you have to
be doing something wrong to come to your conclusion.

> open (tablehandle, $fileDir) || die;
> binmode tablehandle;
> while () {
>   $tableline = $_;
>   $filerowcount ++;
> }
> print "total count $filerowcount\n";
> close (tablehandle);

You might also wish to change tablehandle to FH or TH to
eliminate any warnings.

-- 
  ,-/-  __  _  _ $Bill LuebkertMailto:[EMAIL PROTECTED]
 (_/   /  )// //   DBE CollectiblesMailto:[EMAIL PROTECTED]
  / ) /--<  o // //  Castle of Medieval Myth & Magic http://www.todbe.com/
-/-' /___/_<_http://dbecoll.tripod.com/ (My Perl/Lakers stuff)

___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


How to strip out EOF characters

2004-02-03 Thread Charlie Anderson
I need to count the number of lines in a series of files for a database
load validation.  However, some of the files have hex 1A imbedded in the
middle of the file, so my while loop dies a premature death.  Any
suggestions on how to proceed would be welcome.  Snip below, TIA,


open (tablehandle, $fileDir) || die;
binmode tablehandle;
while () {
  $tableline = $_;
  $filerowcount ++;
}
print "total count $filerowcount\n";
close (tablehandle);


Charlie Anderson
Duke Energy Field Services - IT
303-605-1689


___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs