Andreas Meier created TIKA-2609: ----------------------------------- Summary: Refine Emacs Lisp file recognition (.elc) Key: TIKA-2609 URL: https://issues.apache.org/jira/browse/TIKA-2609 Project: Tika Issue Type: Improvement Components: core Reporter: Andreas Meier
Some newer .elc files are not recognized properly by the current matcher. (Tested with emacs 24.4 files from [https://github.com/jwiegley/emacs-release/tree/master/lisp]) I created a regex that should handle these files similar to the linux magic: {code:java} # Emacs 18 - this is always correct, but not very magical. 0 string \012( Emacs v18 byte-compiled Lisp data !:mime application/x-elc # Emacs 19+ - ver. recognition added by Ian Springer # Also applies to XEmacs 19+ .elc files; could tell them apart with regexs # - Chris Chittleborough <cchittleboro...@yahoo.com.au> 0 string ;ELC >4 byte >18 >4 byte <32 Emacs/XEmacs v%d byte-compiled Lisp data !:mime application/x-elc{code} {code:xml} <mime-type type="application/x-elc"> <_comment>Emacs Lisp bytecode</_comment> <magic priority="50"> <!-- Emacs 18 --> <match value="\012(" type="string" offset="0" /> <!-- Emacs 19 --> <match value=";ELC" type="string" offset="0" > <match value="[\\x13-\\x1F]" type="regex" offset="4"/> </match> </magic> <glob pattern="*.elc"/> </mime-type> {code} Please verify the hexvalues before committing. Regards Andreas -- This message was sent by Atlassian JIRA (v7.6.3#76005)