[ 
https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903123#comment-14903123
 ] 

Giuseppe Totaro commented on TIKA-1739:
---------------------------------------

Hi [~chrismattmann], Hi [~gagravarr],
I looked at the last code of {{CTAKESParser.java}} and I did some experiments 
on my laptop.
Basically, the problem is due to the default constructor of 
{{CTAKESParser.java}}:
{code:java}
/**
 * Wraps the default Parser
 */
public CTAKESParser() {
    this(TikaConfig.getDefaultConfig());
}
{code}

To use CTAKESParser, we need to create a specific configuration for 
CTAKESParser (unless we aim at using the parser programmatically), as reported 
in [ctakesparser-utils|https://github.com/chrismattmann/ctakesparser-utils] 
repository.
While parsing, the default constructor of CTAKESParser is used by Tika 
overriding the given configuration at runtime. Therefore, CTAKESParser is only 
"visited" by Tika that will use, instead, the EmptyParser as fallback.

For instance, if we use again the previous default constructor (that does not 
override the given configuration), then we can use properly cTAKES and obtain 
the right metadata:
{code:java}
public CTAKESParser() {
    super(new AutoDetectParser());
}
{code}

[~chrismattmann] and [~gagravarr]], I will be really gald to hear your feedback.
Thanks a lot,
Giuseppe

> cTAKESParser doesn't work in 1.11
> ---------------------------------
>
>                 Key: TIKA-1739
>                 URL: https://issues.apache.org/jira/browse/TIKA-1739
>             Project: Tika
>          Issue Type: Bug
>          Components: parser, server
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 1.11
>
>
> Tika cTAKESParser integration doesn't work in 1.11. The parser is called, but 
> blank metadata comes back:
> {noformat}
> curl -T test.txt -H "Content-Type: text/plain" 
> http://localhost:9999/rmeta/text
> [{"Content-Type":"text/plain","X-Parsed-By":["org.apache.tika.parser.CompositeParser","org.apache.tika.parser.ctakes.CTAKESParser","org.apache.tika.parser.EmptyParser"],"X-TIKA:parse_time_millis":"20371","ctakes:schema":"coveredText:start:end:ontologyConceptArr"}
> {noformat}
> [~gagravarr] I wonder if something that happened in TIKA-1653 broke it?
> http://svn.apache.org/viewvc?view=revision&revision=1684199
> [~gostep] can you help me look here?
> I'm working on 
> https://github.com/chrismattmann/shangridocs/tree/convert-wicket which is 
> where I first saw this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to