Thanks, will give it a shot. I actually ended up doing a substitue
([^\x09-\x0d\x20-\x7e])  on the whole file string and then going
line by line. Almost similar to what you have but ugly.

Thanks.
Sumit

----- Original Message ----- 
From: <[EMAIL PROTECTED]>
To: "Sumit" <[EMAIL PROTECTED]>
Cc: "active" <[EMAIL PROTECTED]>
Sent: Wednesday, August 20, 2003 1:06 PM
Subject: Re: read binary files


> Hi,
>
> I recently put something together. Hope it's still useful. It even has
> some POD doc ... ;-)
>
> Ekkehard
>
> [snip]
>
> #!perl -w
> use English;
> use Getopt::Std;
> use strict;
>
>
#==========================================================================
> =pod
>
> =item @str strings($file, $maxbytes, $translate, $minchar, $offset)
>
> This method is similar to the GNU version of the strings utility. It
> reads the file $file upto the $maxbytes byte and extracts all strings
> it encounters. A string is here a sequence of at least $minchar bytes
> in the range 0x40 - 0x176 or any umlaut followed by a byte not out of
> those character set. If $translate is set to 1 any umlauts and german
> 'sz' found in IBM encoding are translated to ISO-8859-1 encoding. If
> the $offset parameter is not 0 each element in the returned list
> contains the offset within the file before the string and - tab
> separated - the string found. The offset is given in either decimal
> (d), octal (o) or hexadecimal (x) notation.
>
> =cut
>
#==========================================================================
> sub strings
> {
>     my ($file, $maxbytes, $translate, $minchar, $offset) = @ARG;
>
>     return unless -r $file;
>     return unless $translate =~ /^[01]$/;
>     return unless $minchar   =~ /^\d+$/;
>     return unless $offset    =~ /^[0dox]{1}$/;
>
>     $maxbytes = 3000 unless defined $maxbytes;
>     my $data;
>
>     open    IN,"<$file"          or return;
>     binmode IN;
>     read    IN, $data, $maxbytes or return;
>     close   IN                   or return;
>
>     #            AE  OE  UE  ae  oe  ue  sz
>     my $IBM  = '\216\231\232\204\224\201\341';
>     my $ANSI = '\304\326\334\344\366\374\337';
>
>     # What we consider as 'ascii':
>     my $ascii="\040-\176$IBM$ANSI";
>
>     my @strings;
>     while ($data =~ /([$ascii]{$minchar,})[^$ascii]/g)
>     {
>         if ($offset ne "0")
>         {
>             push @strings,
> sprintf("%$offset",pos($data)-length($1)-1)."\t$1";
>         }
>         else
>         {
>             push @strings, $1;
>         }
>     }
>
>     # Translate IBM to ANSI encoding. We can not use the variables
>     # here as the translation table is generated at compile time (see
>     # tr documentation) !!!
>     if ($translate == 1)
>     {
>         foreach (@strings)
>         { tr [\216\231\232\204\224\201\341]
>              [\304\326\334\344\366\374\337]; }
>     }
>
>     return @strings;
> }
>
>
> our $opt_b = 0;
> our $opt_n = 4;
> our $opt_t = 0;
> our $opt_T = 0;
> our $opt_h;
>
> getopts('b:n:t:Th');
>
> if ($opt_h || @ARGV==0)
> {
>     print <<HEND;
>
> Usage: pstrings [-n <minchars>] [-b <maxbytes>] [-t radix] [-T] [-h]
> <filename>
>
>      -n at least minchars characters make up the string
>      -b at most maxbytes are read from the file if specified.
>         Otherwise the entire file is read.
>      -t Print the offset within the file before each string. The single
>         character argument specifies the radix of the offset: o for octal,
>         x for hexadecimal, or d for decimal.
>      -T translates IBM encoded umlauts to ISO encoded ones
>      -h shows this help
>
> The filename has to be in a NTish form.
>
> HEND
>
>     exit 0;
> }
>
> my $file = $ARGV[0];
> $file =~ s|\\|/|g;
>
> die "Canot read from $file: $!" unless -r $file;
>
> $opt_b = -s _ if $opt_b == 0;
>
> my @s = strings($file,$opt_b,$opt_T,$opt_n, $opt_t) or die "No strings:
> $!";
>
> binmode STDOUT;
> print join("\n",@s);
>
> exit 0;
>
>
_______________________________________________
ActivePerl mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to