[ 
https://issues.apache.org/jira/browse/TIKA-3941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728138#comment-17728138
 ] 

Tim Allison commented on TIKA-3941:
-----------------------------------

I started a branch to work on this: 
https://github.com/apache/tika/tree/TIKA-3941

Need to figure out if we can (or are?) bypassing double detection -- we're 
calling detect and then parse.

Need to add unit tests for digesting.

Need to figure out if we are bypassing double digesting...or how to make that 
the default option.

> Consider having pipesserver return intermediate results
> -------------------------------------------------------
>
>                 Key: TIKA-3941
>                 URL: https://issues.apache.org/jira/browse/TIKA-3941
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Minor
>
> If the pipes server crashes, the only information that the pipesclient 
> receives is of the crash.  It would be useful at a minimum to have the pipes 
> server report an intermediate result after file detection. 
> Ideally, at a minimum, the pipesclient could report file type, content-length 
> (if possible) and digest information.
>  
> On another ticket (future work), we could extend intermediate results to 
> include partial parses/metadata extraction.  The challenge here is that the 
> underlying metadata objects are not thread safe...so we'll punt this to deal 
> with later if necessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to