Hi Markus, Jörg Schaible wrote on Wednesday, April 19, 2006 8:46 AM:
> Hi Markus, > > Markus Härnvi wrote on Wednesday, April 19, 2006 8:47 AM: > >> Hi! >> >>> Starting from scratch would be possibly the best anyway. I >> had it also on my todo list on a very low priority ... but >> just, because I found that jMimeMagic has a really worse >> implemenattion - extremly slow and not working correctly. I >> have a good pile of image files it does not detect. Main >> reason is, that the implementation is simply wrong. The >> original magic files have a clear idea of precedence of >> patterns - this has been lost completely in the >> conversion/implementation of jMimeMagic. >>> >>> - Jörg >>> >> >> Using the original magic file and parse it in Java also makes it >> easier to keep it updated. Just add the newest magic file to the jar >> file and we are done. > > That would have been my approach also. I was just not sure, > whether we should bundle the magic file or try to locate it > (this is the interesting part and highly system dependent). > And a user might have an additional magic file in its home - > at least this can be located. After looking into the magic files (magic and magic.mime) I am somewhat disappointed. While file magic is good at binary formats with fixed headers, its definition language is poor for string based formats, e.g. rules for detecting XML & XSL: ===== %< ===== 0 string/cb \<?xml XML document text 0 string \<?xml\ version " XML 0 string \<?xml\ version=" XML >15 string >\0 %.3s document text >>23 string \<xsl:stylesheet (XSL stylesheet) >>24 string \<xsl:stylesheet (XSL stylesheet) 0 string/b \<?xml XML document text 0 string/cb \<?xml broken XML document text ===== %< ===== This is quite poor. The second line is invalid XML. It looks at offset 23 or 24 for "<xsl:stylesheet" totally ignoring the fact, that the offset might be quite different if the XML declaration contains an encoding attribute or depending on the whitspaces and line ending. See detection of xml mime formats: ===== %< ===== 0 string \<?xml >38 string \<\!DOCTYPE\040svg image/svg+xml 0 string \<?xml text/xml ===== %< ===== Again I am quite sure, that a lot of SVG documents are not recognized. Main problem is that the format specification cannot deal with variable length. See "man magic" for the format definition. You cannot express, that a file with an XML declaration followed by a non-empty line with a DOCTYPE declaration for SVG is "image/svg+xml". Bottom line: I am no longer sure, if a mime detection based on the definitions of file magic is really a good idea :-/ - Jörg --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
