On my Ubuntu system, the database underlying the file command is found at /usr/share/file/magic.mgc . It is around 2.5M.
It is a binary file; I would imagine the best source for how to use that database would be the source code for the file command itself. I read an article on the internals some years back but otherwise have seen no references. On Tue, Oct 6, 2015, 1:22 PM Björn Helgason <[email protected]> wrote: > read this better when I get back home. > On 6 Oct 2015 15:03, "Dan Bron" <[email protected]> wrote: > > > Bjorn asked: > > > I am thinking about if it is possible to get a util that > > > can read parts or whole files and decide what they are. > > > > Raul wrote: > > > If you are on linux, the 'file' utility will do it. > > > > Yes, and if you are not on Linux (or BSD, OS X, etc), you can still get > > your hands on the “magic files”, which are plain-text files describing > the > > pattern which is used to determine a file’s type*. Just Google for them. > > > > For example, on my Mac, there is a file /usr/share/file/magic/dyadic > which > > identifies Dyalog APL workspaces and component files (I was fairly > > surprised to find this installed by default on OSX!). > > > > > > > #------------------------------------------------------------------------------ > > # $File: dyadic,v 1.4 2009/09/19 16:28:09 christos Exp $ > > # Dyadic: file(1) magic for Dyalog APL. > > # > > 0 byte 0xaa > > >1 byte <4 Dyalog APL > > >>1 byte 0x00 incomplete workspace > > >>1 byte 0x01 component file > > >>1 byte 0x02 external variable > > >>1 byte 0x03 workspace > > >>2 byte x version %d > > >>3 byte x .%d > > > > This says: if the first byte of a file is 170 (i.e. 0xAA), and the 2nd > > byte of the file is less than 4, then you’ve got a Dyalog APL object. If > > that pattern doesn’t match, “file” will know it’s got something other > than > > a Dyalog APL object, so it will move on and try out the next magic file > > pattern. > > > > If that pattern does match, however, the following lines help identify > the > > kind of Dyalog APL object more specifically. > > > > If the 2nd byte (which must be less than 4) is zero, then it’s an > > “incomplete workspace”; if one, then a “component file”, if two, then an > > “external variable”; if three, then a (not-incomplete) “workspace”. > > > > Again, if the initial test about (firstByte=170) *. (secondByte<4) > > matched, and we know we’re dealing with a Dyalog APL object, then the 3rd > > and 4th bytes will give the major and minor versions of the interpreter > > which created it, respectively. > > > > Bjorn wrote: > > > I know extensions are indications of what they are. > > > > Worth pointing out, pragmatically speaking, if a file’s type is not > > self-evident on your OS, or file extensions being insufficient or > > misleading clues often enough that you need to use “file” with some > > frequency, it might be more productive to identify the root cause of that > > issue, rather than re-implementing the utility. > > > > I suppose one use case for “file” is increasing one’s confidence that a > > file one downloaded from a not-perfectly-trustworthy source is indeed > what > > it advertises itself to be… > > > > > > -Dan > > > > * Please note these “magic file tests” are applied at a specific point in > > the utility’s workflow, after some preliminary tests at a higher level. > > > > So the files are useful, but not completely sufficient. If you can’t use > > “file” directly, and want to reimplement it, you’ll have to reimplement > > some of these preliminary tests as well. > > > > A good place to start is the manpage for file, followed by its source > code > > (if you really want to get into it). > > > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
