Bjorn asked: > I am thinking about if it is possible to get a util that > can read parts or whole files and decide what they are.
Raul wrote: > If you are on linux, the 'file' utility will do it. Yes, and if you are not on Linux (or BSD, OS X, etc), you can still get your hands on the “magic files”, which are plain-text files describing the pattern which is used to determine a file’s type*. Just Google for them. For example, on my Mac, there is a file /usr/share/file/magic/dyadic which identifies Dyalog APL workspaces and component files (I was fairly surprised to find this installed by default on OSX!). #------------------------------------------------------------------------------ # $File: dyadic,v 1.4 2009/09/19 16:28:09 christos Exp $ # Dyadic: file(1) magic for Dyalog APL. # 0 byte 0xaa >1 byte <4 Dyalog APL >>1 byte 0x00 incomplete workspace >>1 byte 0x01 component file >>1 byte 0x02 external variable >>1 byte 0x03 workspace >>2 byte x version %d >>3 byte x .%d This says: if the first byte of a file is 170 (i.e. 0xAA), and the 2nd byte of the file is less than 4, then you’ve got a Dyalog APL object. If that pattern doesn’t match, “file” will know it’s got something other than a Dyalog APL object, so it will move on and try out the next magic file pattern. If that pattern does match, however, the following lines help identify the kind of Dyalog APL object more specifically. If the 2nd byte (which must be less than 4) is zero, then it’s an “incomplete workspace”; if one, then a “component file”, if two, then an “external variable”; if three, then a (not-incomplete) “workspace”. Again, if the initial test about (firstByte=170) *. (secondByte<4) matched, and we know we’re dealing with a Dyalog APL object, then the 3rd and 4th bytes will give the major and minor versions of the interpreter which created it, respectively. Bjorn wrote: > I know extensions are indications of what they are. Worth pointing out, pragmatically speaking, if a file’s type is not self-evident on your OS, or file extensions being insufficient or misleading clues often enough that you need to use “file” with some frequency, it might be more productive to identify the root cause of that issue, rather than re-implementing the utility. I suppose one use case for “file” is increasing one’s confidence that a file one downloaded from a not-perfectly-trustworthy source is indeed what it advertises itself to be… -Dan * Please note these “magic file tests” are applied at a specific point in the utility’s workflow, after some preliminary tests at a higher level. So the files are useful, but not completely sufficient. If you can’t use “file” directly, and want to reimplement it, you’ll have to reimplement some of these preliminary tests as well. A good place to start is the manpage for file, followed by its source code (if you really want to get into it). ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
