[ 
https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Burch updated TIKA-1739:
-----------------------------
    Comment: was deleted

(was: We explicitly don't let you set an {{AutoDetectParser}} in the config, 
it's something you have to choose to use, giving it the parser(s) you want used 
post-detection

In the non-cTAKES case, you get a Composite Parser that'll handle your formats 
(directly/explicitly/via Tika Config xml/via default Tika Config), then give 
that (perhaps implicitly) to {{AutoDetectParser}}. {{AutoDetectParser}} 
identifies the type of the document, then picks the right parser based on the 
type

In the cTAKES case, you get your chosen Composite Parser again, and give that 
to cTAKES (possibly via Tika Config xml, eg in the case above). You now create 
an {{AutoDetectParser}} as before, and give it cTAKES. {{AutoDetectParser}} 
identifies the type, then gives the document *with the type* to cTAKES, as 
cTAKES claims all the mime types. cTAKES then uses its child Composite Parser 
to have the real parsing done, based on the type that {{AutoDetectParser}} 
supplied to it. When that's done, cTAKES then decorates the output.

Or, if you know the type yourself, you give that to cTAKES, which gives it to 
the child Composite Parser for parsing, then decorates the result, with no 
{{AutoDetectParser}} needed)

> cTAKESParser doesn't work in 1.11
> ---------------------------------
>
>                 Key: TIKA-1739
>                 URL: https://issues.apache.org/jira/browse/TIKA-1739
>             Project: Tika
>          Issue Type: Bug
>          Components: parser, server
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 1.11
>
>         Attachments: TIKA-1739.patch
>
>
> Tika cTAKESParser integration doesn't work in 1.11. The parser is called, but 
> blank metadata comes back:
> {noformat}
> curl -T test.txt -H "Content-Type: text/plain" 
> http://localhost:9999/rmeta/text
> [{"Content-Type":"text/plain","X-Parsed-By":["org.apache.tika.parser.CompositeParser","org.apache.tika.parser.ctakes.CTAKESParser","org.apache.tika.parser.EmptyParser"],"X-TIKA:parse_time_millis":"20371","ctakes:schema":"coveredText:start:end:ontologyConceptArr"}
> {noformat}
> [~gagravarr] I wonder if something that happened in TIKA-1653 broke it?
> http://svn.apache.org/viewvc?view=revision&revision=1684199
> [~gostep] can you help me look here?
> I'm working on 
> https://github.com/chrismattmann/shangridocs/tree/convert-wicket which is 
> where I first saw this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to