[ https://issues.apache.org/jira/browse/JCR-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469546 ]
Jukka Zitting commented on JCR-728: ----------------------------------- I've looked at jmimemagic too, but as you mentioned, it's a bit limited. It's also licensed under the LGPL, which makes it a bit troublesome for us. There's a recent codebase at http://hedges.net/archives/2006/11/08/java-shared-mime-info/ that seems pretty good, but the code is under the GPL. I recently discussed with some people form Apache Nutch about a project to implement the shared mime info standard from freedesktop.org (http://www.freedesktop.org/wiki/Standards_2fshared_2dmime_2dinfo_2dspec), and apparently someone already had some Apache-licensed code for that but I haven't yet seen it. I've been planning to propose an implementation project for the mime info standard in Apache Labs (http://labs.apache.org/), but if there's more interest within the Jackrabbit community we could also start working on it within the jackrabbit-text-extractors component. > Automatic MIME type detection > ----------------------------- > > Key: JCR-728 > URL: https://issues.apache.org/jira/browse/JCR-728 > Project: Jackrabbit > Issue Type: Improvement > Components: indexing > Reporter: Jukka Zitting > Priority: Minor > > Currently only the jcr:mimeType property is used to determine the MIME type > and thus the applicable text extractor to use for indexing a document. If the > jcr:mimeType property is not available or is set to a generic value like > "application/octet-stream", then the indexer could also use some heuristics > based on the node name or magic numbers within the binary stream to determine > the type of the document. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.