[ https://issues.apache.org/jira/browse/TIKA-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533184#comment-14533184 ]
Ahmed Owian edited comment on TIKA-634 at 5/11/15 2:15 PM: ----------------------------------------------------------- I created a unit test for ExternalParser {{[WaitingExternalParserTest.java|https://github.com/ahmedowian/tika-ffmpeg/blob/master/src/test/java/org/apache/tika/parser/external/WaitingExternalParserTest.java]}} which displays the concurrency issue. It uses {{cat}} *Notes:* Firstly, when *not* putting the output token, the parser puts the full output from standard out into the ContentHandler, but it doesn't parse the metadata from standard out, only from standard error. When putting the output token, cat outputs an error to standard error, because the output token itself is never replaced with anything. https://issues.apache.org/jira/browse/TIKA-1620 fixes that, but cat does not have an option for an output file other than the system redirects. So by specifying both input and output files, cat just treats it like two input files. Incidentally, [~rgauss], we are extracting metadata from standard error when using ffmpeg, because it actually expects an output file. Was that intended? was (Author: ahmedowian): I created a unit test for ExternalParser {{[WaitingExternalParserTest.java|https://github.com/ahmedowian/tika-ffmpeg/blob/f20327dcbc3406a70e2790321230b93c03e7289f/src/test/java/org/apache/tika/parser/external/WaitingExternalParserTest.java]}} which displays the concurrency issue. It uses {{cat}} *Notes:* Firstly, when *not* putting the output token, the parser puts the full output from standard out into the ContentHandler, but it doesn't parse the metadata from standard out, only from standard error. When putting the output token, cat outputs an error to standard error, because the output token itself is never replaced with anything. https://issues.apache.org/jira/browse/TIKA-1620 fixes that, but cat does not have an option for an output file other than the system redirects. So by specifying both input and output files, cat just treats it like two input files. Incidentally, [~rgauss], we are extracting metadata from standard error when using ffmpeg, because it actually expects an output file. Was that intended? > Command Line Parser for Metadata Extraction > ------------------------------------------- > > Key: TIKA-634 > URL: https://issues.apache.org/jira/browse/TIKA-634 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 0.9 > Reporter: Nick Burch > Assignee: Nick Burch > Priority: Minor > Labels: new-parser > > As discussed on the mailing list: > http://mail-archives.apache.org/mod_mbox/tika-dev/201104.mbox/%3calpine.deb.2.00.1104052028380.29...@urchin.earth.li%3E > This issue is to track improvements in the ExternalParser support to handle > metadata extraction, and probably easier configuration of an external parser > too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)