On Tue, 26 Oct 2004, Jim wrote:

> I have a binary file that I have been tasked to discover the format of 
> and somehow convert the records to readable text. Is there any way I 
> can find out what binary format the file is in, so I can create an 
> template for unpack() to convert the binary to text?

The best place to start is with the `file` command, and the magic 
numbers behind it, which not nearly enough people know about these days.

On Unix systems (or Cygwin on Windows), `file` uses a database of magic 
numbers -- fingerprints for different file types -- to identify files, 
regardless of how the file is named (i.e. the file extension doesn't 
matter here). For example, consider this output:

  % file ~/Movies/*
  61980main_PIA06410-movie.mov:   Apple QuickTime movie file (moov)
  CoLC_fog.mov:                   Apple QuickTime movie file (mdat)
  Don_Quijote_animation.avi:      RIFF (little-endian) data, AVI, 320 x 240, 25.00 
fps, video: DivX 5, audio: (mono, 8000 Hz)
  Jon_Stewart_Crossfire.rm:       RealMedia file
  Mahnamahna.mpeg:                MPEG system stream data
  Movies:                         symbolic link to `/Volumes/d2/Movies'
  Tenacious D - Tribute.mpeg:     MPEG system stream data
  The Incredibles - trailer.mov:  Apple QuickTime movie file (moov)
  crossfire-20041015.wmv:         Microsoft ASF
  crossfire-20041015001.mp4:      Apple QuickTime movie file (ftyp)
  crossfire-20041015001.mp4.html: XML document text
  goingupriver.dmg:               Apple Partition data block size: 512, first type: 
Apple_partition_map, name: Apple, number of blocks: 63, second type: Apple_HFS, name: 
disk image, number of blocks: 1325920,
  goingupriver.mov:               Apple QuickTime movie file (moov)
  %

Note that this isn't looking at file extensions: there's multiple files 
with the ".mov" extension, but the command is able to figure out that 
they're actually different formats. It works via the magic (ahem) of the 
magic database, which describes predicted markers for many file types.

To illustrate, consider the GIF format. Each GIF image begins with:

 * a signature, the three character string "GIF"
 * a version string, either "87a" or "89a"
 * image width & height, two bytes each
 * a color table, one byte
 * a background color index, one byte

Here's what the magic database entry for GIF looks like:

    # GIF
    0       string          GIF8            GIF image data
    >4      string          7a              \b, version 8%s,
    >4      string          9a              \b, version 8%s,
    >6      leshort         >0              %hd x
    >8      leshort         >0              %hd

You can puzzle out for yourself how this notation works, but it should 
be plain to see that the GIF fingerprint is being represented here.


SO, long preamble aside, you want to do this in Perl, right?

It looks like the module you want is File::Type or File::MMagic:

    use File::Type;
    my $ft = File::Type->new();
    
    # read in data from file to $data, then
    my $type_from_data = $ft->checktype_contents($data);
    
    # alternatively, check file from disk
    my $type_from_file = $ft->checktype_filename($file);

    # convenient method for checking either a file or data
    my $type_1 = $ft->mime_type($file);
    my $type_2 = $ft->mime_type($data);

-- or --

    use File::MMagic;
    use FileHandle;

    $mm = new File::MMagic; # use internal magic file
    # $mm = File::MMagic->new('/etc/magic'); # use external magic file
    $res = $mm->checktype_filename("/somewhere/unknown/file");

    $fh = new FileHandle "< /somewhere/unknown/file2";
    $res = $mm->checktype_filehandle($fh);

    $fh->read($data, 0x8564);
    $res = $mm->checktype_contents($data);

See <http://search.cpan.org/~pmison/File-Type/lib/File/Type.pm> or 
<http://search.cpan.org/~knok/File-MMagic-1.22/MMagic.pm> for details. 
The File::Type page includes a brief overview of the different modules 
availablee, with critiques of why the author felt that the others didn't 
quite do the job (which you may or may not agree with, that's okay).


Take a look over these modules, then try writing some code (or cheat and 
just look it up with the `file` command) and let us know how it goes.


-- 
Chris Devers




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to