[ 
https://issues.apache.org/jira/browse/TIKA-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870087#comment-13870087
 ] 

Peter Ansell commented on TIKA-1217:
------------------------------------

[~jukkaz] The rationale for checking first on filename, in a Java-7 context, 
was that Path objects do not hold File Descriptors. Hence, a content type 
detection method taking a Path object may also be able to avoid getting a File 
Descriptor.

However, if there is an unacceptable loss in fidelity by checking first on the 
filename then feel free to remove that clause, as it isn't critical to the 
functionality for me.

There cannot, however, easily be two different implementations in the same 
module, as java.util.ServiceLoader isn't ordered so it cannot preference one 
over the other. In addition, there are no OpenOptions or LinkOptions attached 
to Files.probeContentType as there are with other methods such as 
Files.isRegularFile. That makes it difficult for users to pass in their 
preferences about how Files.probeContentType should operate (ie, whether it 
should try to avoid getting a file descriptor if possible, or not to follow 
symbolic links).

If we wanted to do a second implementation that always used File it would be 
perfectly possible, but it would need to go in a separate module to distinguish 
between the META-INF/services files based on which module is loaded. We would 
also have to rename the current module from tika-java7 to something more 
specific.

As you say, in a performance critical application, the results will be cached 
to avoid duplication, so it isn't a big deal in the greater scheme of things.

[~lewismc] You can find the patch that Jukka committed in the Tika trunk if you 
want to test it, but it isn't necessary to do it now if you have other things 
to do. 
https://github.com/apache/tika/commit/39370848b8bd9214dc4b7720539edc0eb595300c

> Integrate with Java-7 FileTypeDetector API
> ------------------------------------------
>
>                 Key: TIKA-1217
>                 URL: https://issues.apache.org/jira/browse/TIKA-1217
>             Project: Tika
>          Issue Type: New Feature
>          Components: detector, mime
>            Reporter: Peter Ansell
>         Attachments: TIKA-1217-v2.patch, TIKA-1217.patch
>
>
> It would be useful if Tika natively provided Java-7 FileTypeDetector [1] 
> implementations. Adding the corresponding 
> META-INF/services/java.nio.file.spi.FileTypeDetector files would allow the 
> use of Files.probeContentType [2] without any specific links to Tika for this 
> functionality.
> If you do not want to rely on Java-7 for the core, then this could be added 
> as an extension module.
> [1] 
> http://docs.oracle.com/javase/7/docs/api/java/nio/file/spi/FileTypeDetector.html
> [2] 
> http://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#probeContentType(java.nio.file.Path)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to