Re: [9fans] a question of file and the history of magic

2008-07-11 Thread Dan Cross
I know!  We need dynamically loadable shared object files and a new
language to describe all the things that file can do, and then a
compiler for that language that generates shared objects that are
dynamically loaded at runtime

Oh, wait.

But seriously (yes, Virginia, for the humor impaired, that *was* a
joke...) there are benefits to external description files.  Somewhat
obviously, it's the whole little language concept that we all know
and love, and we all know that, but the question becomes, for
something like file, how complex does one make that little language?
 At what point does the tradeoff between complexity of the description
language and hand-coded C break in favor of one versus the other?  How
often are we updating things?  And that *is* a legitimate question,
and I think it's the basis of the original question.



Re: [9fans] a question of file and the history of magic^H^H^H^H^HUNIX

2008-07-08 Thread Lyndon Nerenberg

On 2008-Jul-6, at 14:59 , Brantley Coile wrote:

I remember the day I first saw a file magic file.  I welcomed it  
because for the first time I didn't have access to the source code.   
Those were the days when you had to have $45k to get the source.


Closer to $100K for most people. I had great fun writing nroff (yup,  
*n*roff) output device tables as binary blobs to interface with the  
non-source UNIXen of the day. And the Convergent Technologies X.25  
binary code was a wonder to configure/tune as an end user :-P


Remember kids: UNIX source code (BSD, really) wasn't free until 1994  
(give or take a bit). You haven't lived until you've resolved device  
driver configuration and ordering problems when trying to link a  
binary version of SunOS 2 or 3 (no, not Solaris :-).  Or even better,  
an NBI VME 68K box pretending it's a UNIBUS VAX.


Is alt.folklore.computers still alive? 99% of the list traffic  
[cs]hould be redirected there.


-- (creaky/grumpy olde) lyndon





[9fans] a question of file and the history of magic

2008-07-06 Thread Jeff Sickel
This is a comment/question about file(1) as implemented in Plan 9 and  
p9p.


Over the years I've been using various versions of file with editable  
magic files.  Though file can make mistakes, this worked out rather  
well when I just wanted a little more detail than 'binary' with the  
tradeoff of the command being a bit slow at times.  While deciding to  
use p9p's rc for a script to help with some picture process, I  
realized I needed to use file to help determine the type of data I'm  
checking on the file system.  So I added the following (though it  
could just be added to the long0tab just as easily):


% hg diff file.c
diff -r d7799c860a8f src/cmd/file.c
--- a/src/cmd/file.cSat Jul 05 10:01:43 2008 -0400
+++ b/src/cmd/file.cSun Jul 06 12:30:28 2008 -0500
@@ -655,6 +655,7 @@
\377\330\377\340,   jpeg,   4,  
image/jpeg,
\377\330\377\341,   jpeg,   4,  
image/jpeg,
\377\330\377\333,   jpeg,   4,  
image/jpeg,
+   \106\117\126\142,   x3f,4,  
image/x3f,
BM, bmp,2,  
image/bmp,
 	\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1,	microsoft office document,	 
8,	application/octet-stream,

MakerFile , FrameMaker file,11, 
application/framemaker,


This addition helped my scripts become a little more streamlined, but  
of course puts in an additional entry into the source file I need to  
track.  As file name extensions don't always work across all sorts of  
systems, many still hamstrung by 8.3, what is the preferred or  
recommend mechanism for checking file types the Plan 9 way since we no  
longer have the System V magic?


In a sense, a modified xd(1) that has an option for a restricted range  
of byte sequences would work.  That would at least provide a fast seek  
into a file that can be pipelined into any other command sequence--no  
need to dump the whole file when you just need to the first four  
bytes, but then it just gets to the point of having a magic file.


-jas




Re: [9fans] a question of file and the history of magic

2008-07-06 Thread erik quanstrom
 This addition helped my scripts become a little more streamlined, but  
 of course puts in an additional entry into the source file I need to  
 track.  As file name extensions don't always work across all sorts of  
 systems, many still hamstrung by 8.3, what is the preferred or  
 recommend mechanism for checking file types the Plan 9 way since we no  
 longer have the System V magic?

i'm pretty confused by what you're saying here.  why doesn't file(1) work?
are you saying there's something wrong with editing the source as opposed
to to editing a configuration file?

either way your system is equally non-standard.  in either event,
submitting a patch and having it accepted is the only way around this.

 In a sense, a modified xd(1) that has an option for a restricted range  
 of byte sequences would work.  That would at least provide a fast seek  
 into a file that can be pipelined into any other command sequence--no  
 need to dump the whole file when you just need to the first four  
 bytes, but then it just gets to the point of having a magic file.

why would xd need modification?  how about

dd -if $infile -bs $nbytes -count 1 | xd

there are no restrictions placed by dd on $nbytes.  it could be
4 or 99132 or whatever.  dd's -iseek option similarly can specify
any offset.

- erik



Re: [9fans] a question of file and the history of magic

2008-07-06 Thread erik quanstrom
 In a sense, the question is more about the historical change and/or  
 adoption of a new file command for Plan 9 that doesn't use a magic  
 file for references.  Why opt out of a magic file other than the  
 obvious performance hit of scanning it each run?  Is it worth  
 repeating the old forms that used magic, or has anyone in the Plan 9  
 community already improved upon the idea and introduced a new, more  
 adaptable tool?

what is the upside to an external magic file?  as you've shown, you
can add a file type in 1 line of code.  while the external magic file
isn't c, i would argue that it's still code.  

the disadvantage is that you need to write a parser for yet another
file format.  it turns out that linux file's maintainers felt that a text file
wasn't good enough so they implemented a magic compiler.  i really
don't understand the logic behind the compiler, since it would seem
to trade reduced cpu cycles for increased i/o.  that would seem to be
a terrible trade off these days.

; wc magic magic.mgc
  13469   69850  484372 magic
   1301   17997 1062400 magic.mgc   # compiled version

the source is pretty big, too:

; wc -l ffile-4.20/src/*.[ch]|grep total
  9273 total

according to wikipedia (http://en.wikipedia.org/wiki/File_(Unix)),
system v introduced the external magic file.  i don't think that system v
was in anyway an ancestor of plan 9.  but i don't know anything of
the history of plan 9 file.

- erik



Re: [9fans] a question of file and the history of magic

2008-07-06 Thread Brantley Coile
I remember the day I first saw a file magic file.  I welcomed it because 
for the first time I didn't have access to the source code.  Those were 
the days when you had to have $45k to get the source.  A hard thing to 
ask for.  Today a separate magic file is just a leftover vestige of the 
past.  There are a lot of things like that.  Do we still need to 
compress man pages on 1TB disk driver? :)


erik quanstrom wrote:
In a sense, the question is more about the historical change and/or  
adoption of a new file command for Plan 9 that doesn't use a magic  
file for references.  Why opt out of a magic file other than the  
obvious performance hit of scanning it each run?  Is it worth  
repeating the old forms that used magic, or has anyone in the Plan 9  
community already improved upon the idea and introduced a new, more  
adaptable tool?



what is the upside to an external magic file?  as you've shown, you
can add a file type in 1 line of code.  while the external magic file
isn't c, i would argue that it's still code.  


the disadvantage is that you need to write a parser for yet another
file format.  it turns out that linux file's maintainers felt that a text file
wasn't good enough so they implemented a magic compiler.  i really
don't understand the logic behind the compiler, since it would seem
to trade reduced cpu cycles for increased i/o.  that would seem to be
a terrible trade off these days.

; wc magic magic.mgc
  13469   69850  484372 magic
   1301   17997 1062400 magic.mgc   # compiled version

the source is pretty big, too:

; wc -l ffile-4.20/src/*.[ch]|grep total
  9273 total

according to wikipedia (http://en.wikipedia.org/wiki/File_(Unix)),
system v introduced the external magic file.  i don't think that system v
was in anyway an ancestor of plan 9.  but i don't know anything of
the history of plan 9 file.

- erik





Re: [9fans] a question of file and the history of magic

2008-07-06 Thread Bakul Shah
On Sun, 06 Jul 2008 17:20:12 EDT erik quanstrom [EMAIL PROTECTED]  wrote:
 what is the upside to an external magic file?  as you've shown, you
 can add a file type in 1 line of code.  while the external magic file
 isn't c, i would argue that it's still code.  

Yes it is code but the advantage is that the parser language
is factored out and anyone can add knowledge about new file
formats and it is easy to debug and experiment.

The main disadvantage of gnu file is performance.  As an
example, on about 6000 files totalling 200MB, gnu file takes
2s user, 1s system and 30s real time.  Compared to that p9p
file takes 1.65s user, 0.25s system and 1.9s real time.
Note: any cache effects have been accounted for by looking at
only the best 3 runs of of each test.  As per csh, there were
no page faults and no disk io.

The magic file should really be compiled and linked w/
file(1) -- if that was done right, the rest of file(1) code
would be pretty trivial.  On the other hand file is usually
not a performance bottleneck.  On the gripping hand there are
a lot of similarities between cracking file formats and
packet formats so may be there is value in factoring all that
out and sticking it in a library routine.



Re: [9fans] a question of file and the history of magic

2008-07-06 Thread Charles Forsyth
 The main disadvantage of gnu file is performance.

the magic file contains surprisingly many spells,
even excluding muttered incantations.