[
https://issues.apache.org/jira/browse/CONNECTORS-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17548732#comment-17548732
]
Cihad Guzel commented on CONNECTORS-1667:
-----------------------------------------
Hi [[email protected]] ,
Actually i know that it is a differently issue
([CONNECTOR-1699|https://issues.apache.org/jira/browse/CONNECTORS-1699]). On
the other hand, I checked the tika api and i saw some differents for tika
server. So, I have test the new tika server. it is not run correctly. i will
check again with the new tika version.
Some metadata keys is changed according to the migration document. For example,
the mcf-tika-service-rmeta-connector use the metadata that is "X-Parsed-By" at
line 800 in TikaExtractor.java . It should be change as follow:
"X-TIKA:Parsed-By" . I will test it.
I inspected the Tika connector. If I can make time for it, maybe I can, but not
right now. Because like you said, it has a lot of work. On the other hand,
Manifold should perhaps only support new versions for mcf-tika
service-rmeta-connector. Thus, the maintenance cost can be reduced. What do you
think about that?
> New Tika Service Connector
> --------------------------
>
> Key: CONNECTORS-1667
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1667
> Project: ManifoldCF
> Issue Type: New Feature
> Components: Tika service connector
> Reporter: Julien Massiera
> Assignee: Julien Massiera
> Priority: Major
> Fix For: ManifoldCF 2.20
>
>
> The current Tika Service Connector exploits the '/unpack/all' endpoint of a
> Tika Server. This endpoint is not optimal to only extract document's metadata
> and content. We should develop a new connector based on the 'rmeta' endpoint
> which is more suited for our needs.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)