[ 
https://issues.apache.org/jira/browse/TIKA-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533184#comment-14533184
 ] 

Ahmed Owian edited comment on TIKA-634 at 5/7/15 8:39 PM:
----------------------------------------------------------

I'm creating a [unit test for 
ExternalParser|https://github.com/ahmedowian/tika-ffmpeg/commit/8366c2753aa762f55cd8533e8e22793638f7a229]
 with the intent to display the concurrency issue.  I started by using {{cat}}

Firstly, when *not* putting the output token, the parser puts the full output 
from standard out into the ContentHandler, but it doesn't parse the metadata 
from standard out, only from standard error.

When putting the output token, cat outputs an error to standard error, because 
the output token itself is never replaced with anything.  
https://issues.apache.org/jira/browse/TIKA-1620 fixes that, but cat does not 
have an option for an output file other than the system redirects.  So by 
specifying both input and output files, cat just treats it like two input files.

Incidentally, [~rgauss], we are extracting metadata from standard error when 
using ffmpeg, because it actually expects an output file.  Was that intended? 


was (Author: ahmedowian):
I'm creating a [unit test for 
ExternalParser|https://github.com/ahmedowian/tika-ffmpeg/commit/8366c2753aa762f55cd8533e8e22793638f7a229]
 with the intent to display the concurrency issue.  I started by using {{cat}}

Firstly, when *not* putting the output token, the parser puts the full output 
from standard out into the ContentHandler, but it doesn't parse the metadata 
from standard out, only from standard error.

When putting the output token, cat outputs an error to standard error, because 
the output token itself is never replaced with anything.  
https://issues.apache.org/jira/browse/TIKA-1620 fixes that, but cat does not 
have an option for an output file other than the system redirects.  So by 
specifying both input and output files, cat just treats it like two input files.

> Command Line Parser for Metadata Extraction
> -------------------------------------------
>
>                 Key: TIKA-634
>                 URL: https://issues.apache.org/jira/browse/TIKA-634
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 0.9
>            Reporter: Nick Burch
>            Assignee: Nick Burch
>            Priority: Minor
>              Labels: new-parser
>
> As discussed on the mailing list:
> http://mail-archives.apache.org/mod_mbox/tika-dev/201104.mbox/%3calpine.deb.2.00.1104052028380.29...@urchin.earth.li%3E
> This issue is to track improvements in the ExternalParser support to handle 
> metadata extraction, and probably easier configuration of an external parser 
> too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to