[ 
https://issues.apache.org/jira/browse/TIKA-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357408#comment-14357408
 ] 

Nick Burch commented on TIKA-1573:
----------------------------------

My hunch is that your profiling will show almost no difference in detection 
speed... It would be good if you could try your detection with the normal copy 
of Tika, then again with a copy where you've unzipped the tika-core jar + 
removed the unwanted entries from the mimetypes file + zipped-up again. 
(There's no point spending lots of time and effort removing a handful of mime 
types if profiling shows it makes almost no difference to actual runtimes!)

Longer term, we may well want to offer a programmatic way for users to be able 
to add or remove mime types or their magic at runtime. There was a thread on 
the mailing list recently which gave one good reason for that, search for "Add 
Custom Mime Type Programmatically" for that. Almost everything else can be done 
with a custom mimetypes file, or a "stop, you don't want to be doing that in 
the first place", so we'd need to do that carefully and with thought, to 
prevent people getting into problems by doing it for the wrong reasons.


> Not possible to restrict default mime types
> -------------------------------------------
>
>                 Key: TIKA-1573
>                 URL: https://issues.apache.org/jira/browse/TIKA-1573
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Pavel Micka
>            Priority: Minor
>              Labels: performance
>
> I am facing the following problem. I am using MagicNumber detector, but the 
> detection is slow for my purposes, so I have decided to limit the number of 
> detected types. However this is not easily possible as: 
>  * Mimetypes does not have any remove method.
>  * getDefaultMimeTypes method by default load the full set
>  * MimeTypes constructor does not accept parameters (mimes with magics)
>  * method add is package friendly (so one must construct the wrapper in the 
> same package, which is awkward)
>  * MimeTypes class is final, so it does not allow to subclass it a improve 
> the implementation in object oriented way
> My workaround was to force the expected implementation (public add) with 
> reflection:
>                     Method addMethod = 
> decrMimeTypes.getClass().getDeclaredMethod("add", MimeType.class);
>                     addMethod.setAccessible(true);
>                     addMethod.invoke(myMimeTypes, 
> defaultMimeTypes.getRegisteredMimeType(m.toString()));
> I can imagine that the current implementation is done this way to be 
> immutable, but this can also achieved with parametrized constructor (point 3) 
> with no effect on immutability of the class. Or with explicit flag (set by 
> method call) that would disallow any further object modifications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to