On my Ubuntu system, the database underlying the file command is found at
/usr/share/file/magic.mgc . It is around 2.5M.

It is a binary file; I would imagine the best source for how to use that
database would be the source code for the file command itself. I read an
article on the internals some years back but otherwise have seen no
references.

On Tue, Oct 6, 2015, 1:22 PM Björn Helgason <[email protected]> wrote:

> read this better when I get back home.
> On 6 Oct 2015 15:03, "Dan Bron" <[email protected]> wrote:
>
> > Bjorn asked:
> > > I am thinking about if it is possible to get a util that
> > > can read parts or whole files and decide what they are.
> >
> > Raul wrote:
> > > If you are on linux, the 'file' utility will do it.
> >
> > Yes, and if you are not on Linux (or BSD, OS X, etc), you can still get
> > your hands on the “magic files”, which are plain-text files describing
> the
> > pattern which is used to determine a file’s type*. Just Google for them.
> >
> > For example, on my Mac, there is a file /usr/share/file/magic/dyadic
> which
> > identifies Dyalog APL workspaces and component files (I was fairly
> > surprised to find this installed by default on OSX!).
> >
> >
> >
> #------------------------------------------------------------------------------
> > # $File: dyadic,v 1.4 2009/09/19 16:28:09 christos Exp $
> > # Dyadic: file(1) magic for Dyalog APL.
> > #
> > 0       byte    0xaa
> > >1      byte    <4              Dyalog APL
> > >>1     byte    0x00            incomplete workspace
> > >>1     byte    0x01            component file
> > >>1     byte    0x02            external variable
> > >>1     byte    0x03            workspace
> > >>2     byte    x               version %d
> > >>3     byte    x               .%d
> >
> > This says: if the first byte of a file is 170 (i.e. 0xAA), and the 2nd
> > byte of the file is less than 4, then you’ve got a Dyalog APL object. If
> > that pattern doesn’t match, “file” will know it’s got something other
> than
> > a Dyalog APL object, so it will move on and try out the next magic file
> > pattern.
> >
> > If that pattern does match, however, the following lines help identify
> the
> > kind of Dyalog APL object more specifically.
> >
> > If the 2nd byte (which must be less than 4) is zero, then it’s an
> > “incomplete workspace”; if one, then a “component file”, if two, then an
> > “external variable”; if three, then a (not-incomplete) “workspace”.
> >
> > Again, if the initial test about (firstByte=170) *. (secondByte<4)
> > matched, and we know we’re dealing with a Dyalog APL object, then the 3rd
> > and 4th bytes will give the major and minor versions of the interpreter
> > which created it, respectively.
> >
> > Bjorn wrote:
> > > I know extensions are indications of what they are.
> >
> > Worth pointing out, pragmatically speaking, if a file’s type is not
> > self-evident on your OS, or file extensions being insufficient or
> > misleading clues often enough that you need to use “file” with some
> > frequency, it might be more productive to identify the root cause of that
> > issue, rather than re-implementing the utility.
> >
> > I suppose one use case for “file” is increasing one’s confidence that a
> > file one downloaded from a not-perfectly-trustworthy source is indeed
> what
> > it advertises itself to be…
> >
> >
> > -Dan
> >
> > * Please note these “magic file tests” are applied at a specific point in
> > the utility’s workflow, after some preliminary tests at a higher level.
> >
> > So the files are useful, but not completely sufficient. If you can’t use
> > “file” directly, and want to reimplement it, you’ll have to reimplement
> > some of these preliminary tests as well.
> >
> > A good place to start is the manpage for file, followed by its source
> code
> > (if you really want to get into it).
> >
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to