RE: Searching a binary file for a specific sequence of hex values?
On Mon, 18 Nov 2002, Peter Guzis wrote: > You're right on the first issue. I need to slow down in my rush to be first > reply :P > > The search string will NOT be truncated because of these lines: > > my $search_length = length ($search); > my $chunk_size = $search_length > CHUNK_SIZE ? $search_length : > CHUNK_SIZE; > > > -Original Message- > From: Carl Jolley [mailto:[EMAIL PROTECTED]] > Sent: Sunday, November 17, 2002 7:35 PM > To: Peter Guzis > Cc: Perl Win32 Users (E-mail) > Subject: RE: Searching a binary file for a specific sequence of hex > values? > > > Shouldn't the index function be: > > $idx = index "$last_chunk$chunk", $search; > > instead of > > $idx = index "$chunk$last_chunk", $search; > > Somehow appending the last several (i.e. match string length) bytes of the > previous chunk to the end of the next chunk would not seem to allow > a match on the search string if it was truncated due to the length of > of chunk. > > On Fri, 15 Nov 2002, Peter Guzis wrote: > > > Try the code below. For a search string you can specify either hex codes > > (e.g. '70 65 72 6C' or '7065726C') or binary data (e.g. 'perl'). > > > > --- > > > > use strict; > > use Fcntl 'O_RDONLY'; > > > > use constant CHUNK_SIZE => 4096; > > > > SearchBinary ('searchstring', 'path/to/file.ext'); > > > > sub SearchBinary { > > > > my $search = shift; > > my $file = shift; > > my ($chunk, $last_chunk, $chunks_read, $idx); > > die "expected: SearchBinary (\$search, \$file)\n" unless length $search > && > > length $file; > > > > # convert hex code to binary data > > > > if ($search =~ /^(?:[0-9A-F]{2}\s*)+$/i) { > > > > $search =~ s/\s+//; > > $search = join '', (map { chr(hex $_) } $search =~ /../g); > > > > } > > > > my $search_length = length ($search); > > my $chunk_size = $search_length > CHUNK_SIZE ? $search_length : > > CHUNK_SIZE; > > die "File '$file' does not exist\n" unless -f $file; > > my $file_size = -s $file; > > sysopen BIN, $file, O_RDONLY or die "Could not read $file: $!\n"; > > binmode BIN; > > > > while ($chunks_read * $chunk_size < $file_size) { > > > > sysread BIN, $chunk, $chunk_size; > > $idx = index "$chunk$last_chunk", $search; > > > > if ($idx > -1) { > > > > printf "Found string at position %d\n", $chunks_read * $chunk_size + > > $idx; > > > > } > > > > $last_chunk = substr $chunk, $chunk_size - $search_length, > > $search_length - 1; > > $chunks_read++; > > > > } > > > > close BIN; > > > > } > > > > > > -Original Message- > > From: Thad Schultz [mailto:[EMAIL PROTECTED]] > > Sent: Friday, November 15, 2002 6:04 AM > > To: Perl Win32 Users (E-mail) > > Subject: Searching a binary file for a specific sequence of hex values? > > > > > > What are the best ways to search a binary file for a specific sequence of > > hex values? The sequence that I'm looking for is: FF D8 FF E0 00 10 4A 46 > > 49 46 00. The files that I'm searching are 14K bytes in size. I suppose > > the easiest way would be slurp the whole file in at once and then search > for > > my sequence in some string variable. But what if my files were huge? Do > I > > read in one character at a time until I find an FF and then I look for the > > D8 and then the rest of the hex values resetting my search if I fail to > find > > the next value in the sequence? Or do I read in 1K of data at a time and > > search that for my sequence? Could some of you who have been down this > road > > before point me in the right direction? I'm not asking you to write my > code > > for me (unless you want to). I'm just looking for pointers and > suggestions. > > > > Thanks! > > > > Thad Schultz > > EDA Librarian / Sys Admin > > Woodward Industrial Controls > > [EMAIL PROTECTED] > > ph (970)498-3570 > > fax (970)498-3077 > > www.woodward.com > > The actual string to be matched in the file is what I was talking about. Regardless of the size of the chunk, the actual mathing string _can_ be truncated due the the size of the chunk depending of the location of the matching string in the file. I.E. if the chunk was 1024 bytes long and the matching string began at location 1020 AND was longer than 4 bytes OR if the chunch size was 12 and the matching string began at location 8 and was more than 4 characters long. [EMAIL PROTECTED] All opinions are my own and not necessarily those of my employer k ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Searching a binary file for a specific sequence of hex values?
You're right on the first issue. I need to slow down in my rush to be first reply :P The search string will NOT be truncated because of these lines: my $search_length = length ($search); my $chunk_size = $search_length > CHUNK_SIZE ? $search_length : CHUNK_SIZE; Peter Guzis Web Administrator, Sr. ENCAD, Inc. - A Kodak Company email: [EMAIL PROTECTED] www.encad.com -Original Message- From: Carl Jolley [mailto:[EMAIL PROTECTED]] Sent: Sunday, November 17, 2002 7:35 PM To: Peter Guzis Cc: Perl Win32 Users (E-mail) Subject: RE: Searching a binary file for a specific sequence of hex values? Shouldn't the index function be: $idx = index "$last_chunk$chunk", $search; instead of $idx = index "$chunk$last_chunk", $search; Somehow appending the last several (i.e. match string length) bytes of the previous chunk to the end of the next chunk would not seem to allow a match on the search string if it was truncated due to the length of of chunk. [EMAIL PROTECTED] All opinions are my own and not necessarily those of my employer On Fri, 15 Nov 2002, Peter Guzis wrote: > Try the code below. For a search string you can specify either hex codes > (e.g. '70 65 72 6C' or '7065726C') or binary data (e.g. 'perl'). > > --- > > use strict; > use Fcntl 'O_RDONLY'; > > use constant CHUNK_SIZE => 4096; > > SearchBinary ('searchstring', 'path/to/file.ext'); > > sub SearchBinary { > > my $search = shift; > my $file = shift; > my ($chunk, $last_chunk, $chunks_read, $idx); > die "expected: SearchBinary (\$search, \$file)\n" unless length $search && > length $file; > > # convert hex code to binary data > > if ($search =~ /^(?:[0-9A-F]{2}\s*)+$/i) { > > $search =~ s/\s+//; > $search = join '', (map { chr(hex $_) } $search =~ /../g); > > } > > my $search_length = length ($search); > my $chunk_size = $search_length > CHUNK_SIZE ? $search_length : > CHUNK_SIZE; > die "File '$file' does not exist\n" unless -f $file; > my $file_size = -s $file; > sysopen BIN, $file, O_RDONLY or die "Could not read $file: $!\n"; > binmode BIN; > > while ($chunks_read * $chunk_size < $file_size) { > > sysread BIN, $chunk, $chunk_size; > $idx = index "$chunk$last_chunk", $search; > > if ($idx > -1) { > > printf "Found string at position %d\n", $chunks_read * $chunk_size + > $idx; > > } > > $last_chunk = substr $chunk, $chunk_size - $search_length, > $search_length - 1; > $chunks_read++; > > } > > close BIN; > > } > > Peter Guzis > Web Administrator, Sr. > ENCAD, Inc. > - A Kodak Company > email: [EMAIL PROTECTED] > www.encad.com > > -Original Message- > From: Thad Schultz [mailto:[EMAIL PROTECTED]] > Sent: Friday, November 15, 2002 6:04 AM > To: Perl Win32 Users (E-mail) > Subject: Searching a binary file for a specific sequence of hex values? > > > What are the best ways to search a binary file for a specific sequence of > hex values? The sequence that I'm looking for is: FF D8 FF E0 00 10 4A 46 > 49 46 00. The files that I'm searching are 14K bytes in size. I suppose > the easiest way would be slurp the whole file in at once and then search for > my sequence in some string variable. But what if my files were huge? Do I > read in one character at a time until I find an FF and then I look for the > D8 and then the rest of the hex values resetting my search if I fail to find > the next value in the sequence? Or do I read in 1K of data at a time and > search that for my sequence? Could some of you who have been down this road > before point me in the right direction? I'm not asking you to write my code > for me (unless you want to). I'm just looking for pointers and suggestions. > > Thanks! > > Thad Schultz > EDA Librarian / Sys Admin > Woodward Industrial Controls > [EMAIL PROTECTED] > ph (970)498-3570 > fax (970)498-3077 > www.woodward.com > > > > *** > The information in this e-mail is confidential and intended solely for the > individual or entity to whom it is addressed. If you have received this > e-mail in error please notify the sender by return e-mail, delete this > e-mail, and refrain from any disclosure or action based on the information. > > ___ > Perl-Win32-Users mailing list > [EMAIL PROTECTED] > To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs > ___ > Perl-Win32-Users mailing list > [EMAIL PROTECTED] > To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs > ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Searching a binary file for a specific sequence of hex values?
Try the code below. For a search string you can specify either hex codes (e.g. '70 65 72 6C' or '7065726C') or binary data (e.g. 'perl'). --- use strict; use Fcntl 'O_RDONLY'; use constant CHUNK_SIZE => 4096; SearchBinary ('searchstring', 'path/to/file.ext'); sub SearchBinary { my $search = shift; my $file = shift; my ($chunk, $last_chunk, $chunks_read, $idx); die "expected: SearchBinary (\$search, \$file)\n" unless length $search && length $file; # convert hex code to binary data if ($search =~ /^(?:[0-9A-F]{2}\s*)+$/i) { $search =~ s/\s+//; $search = join '', (map { chr(hex $_) } $search =~ /../g); } my $search_length = length ($search); my $chunk_size = $search_length > CHUNK_SIZE ? $search_length : CHUNK_SIZE; die "File '$file' does not exist\n" unless -f $file; my $file_size = -s $file; sysopen BIN, $file, O_RDONLY or die "Could not read $file: $!\n"; binmode BIN; while ($chunks_read * $chunk_size < $file_size) { sysread BIN, $chunk, $chunk_size; $idx = index "$chunk$last_chunk", $search; if ($idx > -1) { printf "Found string at position %d\n", $chunks_read * $chunk_size + $idx; } $last_chunk = substr $chunk, $chunk_size - $search_length, $search_length - 1; $chunks_read++; } close BIN; } Peter Guzis Web Administrator, Sr. ENCAD, Inc. - A Kodak Company email: [EMAIL PROTECTED] www.encad.com -Original Message----- From: Thad Schultz [mailto:tschul@;woodward.com] Sent: Friday, November 15, 2002 6:04 AM To: Perl Win32 Users (E-mail) Subject: Searching a binary file for a specific sequence of hex values? What are the best ways to search a binary file for a specific sequence of hex values? The sequence that I'm looking for is: FF D8 FF E0 00 10 4A 46 49 46 00. The files that I'm searching are 14K bytes in size. I suppose the easiest way would be slurp the whole file in at once and then search for my sequence in some string variable. But what if my files were huge? Do I read in one character at a time until I find an FF and then I look for the D8 and then the rest of the hex values resetting my search if I fail to find the next value in the sequence? Or do I read in 1K of data at a time and search that for my sequence? Could some of you who have been down this road before point me in the right direction? I'm not asking you to write my code for me (unless you want to). I'm just looking for pointers and suggestions. Thanks! Thad Schultz EDA Librarian / Sys Admin Woodward Industrial Controls [EMAIL PROTECTED] ph (970)498-3570 fax (970)498-3077 www.woodward.com *** The information in this e-mail is confidential and intended solely for the individual or entity to whom it is addressed. If you have received this e-mail in error please notify the sender by return e-mail, delete this e-mail, and refrain from any disclosure or action based on the information. ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Searching a binary file for a specific sequence of hex values?
What are the best ways to search a binary file for a specific sequence of hex values? The sequence that I'm looking for is: FF D8 FF E0 00 10 4A 46 49 46 00. The files that I'm searching are 14K bytes in size. I suppose the easiest way would be slurp the whole file in at once and then search for my sequence in some string variable. But what if my files were huge? Do I read in one character at a time until I find an FF and then I look for the D8 and then the rest of the hex values resetting my search if I fail to find the next value in the sequence? Or do I read in 1K of data at a time and search that for my sequence? Could some of you who have been down this road before point me in the right direction? I'm not asking you to write my code for me (unless you want to). I'm just looking for pointers and suggestions. Thanks! Thad Schultz EDA Librarian / Sys Admin Woodward Industrial Controls [EMAIL PROTECTED] ph (970)498-3570 fax (970)498-3077 www.woodward.com *** The information in this e-mail is confidential and intended solely for the individual or entity to whom it is addressed. If you have received this e-mail in error please notify the sender by return e-mail, delete this e-mail, and refrain from any disclosure or action based on the information. ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs