Sharan Basappa am Montag, 18. Dezember 2006 08:09:
> On 12/17/06, D. Bolliger <[EMAIL PROTECTED]> wrote:
> > Sharan Basappa am Sonntag, 17. Dezember 2006 16:22:
> >
> > Hello
> >
> > > While going through some of the old text files, which til now I thought
> > > were in good shape, found
> > > that some of the files were corrupt and contained garbage data.
> >
> > We need to know more; what garbage? Where in the file? Are the files
> > corrupt
> > because they were stored on for example old floppies?
> >
> > > I would
> > > like to know if there is a
> > > simple way to find this out using script since I have 100s of such
> > > files and it is difficult for me to go through all these files ..
> >
> > What's garbage and what's not depends from the format of the file
> > content, it's intended usage...
> >
> > Your task may be easy or nearly impossible to solve automatically.
> > If there's a way to exactly separate garbage from non-garbage, and
> > express this with means of a script language, it may be easy.

Hello Sharan Basappa

(please don't top post)

> actuall these look like invalid ascii files to me (files seem to look like
> binary content).
> typically this happens when I transfer files from one machine to another
> using my usb key.
> But this is not the case with all files. So this is the reason I wanted to
> know if there is a way to
> recursively go through all files and report if a file does not seem to a
> valid ascii file ..

Maybe the following script [tested] can be a start, you'd have to adapt it to 
your needs. 

Dani

#!/usr/bin/perl

# usage: this_script filename1 [,filename2 ...]

use strict;
use warnings;

# ***Adjust to your needs***, see perldoc perlre
#
# (invalid defined as "not in the set of valid chars"
#
my $invalid=qr/[^0-9a-zA-Z_!?.;,\s"'()-]/;

my @invalids=(); # contains filenames

for my $fn (@ARGV) {
  open my $fh, '<', $fn or die $!;

  while (<$fh>) {
    if (/($invalid)/) {
      warn "'$fn' seems to have a first invalid char '$1' on line $.\n";
      push @invalids, $fn;
      last;
    }
  }

  close $fh or die $!;
}

warn "\nfiles with invalid chars:\n", (join "\n", @invalids), "\n";

__END__

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to