[ 
https://issues.apache.org/jira/browse/CONNECTORS-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17548732#comment-17548732
 ] 

Cihad Guzel commented on CONNECTORS-1667:
-----------------------------------------

Hi [[email protected]] ,

Actually i know that it is a differently issue 
([CONNECTOR-1699|https://issues.apache.org/jira/browse/CONNECTORS-1699]). On 
the other hand, I checked the tika api and i saw some differents for tika 
server. So, I have test the new tika server. it is not run correctly. i will 
check again with the new tika version.

Some metadata keys is changed according to the migration document. For example, 
the mcf-tika-service-rmeta-connector use the metadata that is "X-Parsed-By" at 
line 800 in TikaExtractor.java . It should be change as follow: 
"X-TIKA:Parsed-By" . I will test it.

I inspected the Tika connector. If I can make time for it, maybe I can, but not 
right now. Because like you said, it has a lot of work.  On the other hand, 
Manifold should perhaps only support new versions for mcf-tika 
service-rmeta-connector. Thus, the maintenance cost can be reduced. What do you 
think about that?

> New Tika Service Connector
> --------------------------
>
>                 Key: CONNECTORS-1667
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1667
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Tika service connector
>            Reporter: Julien Massiera
>            Assignee: Julien Massiera
>            Priority: Major
>             Fix For: ManifoldCF 2.20
>
>
> The current Tika Service Connector exploits the '/unpack/all' endpoint of a 
> Tika Server. This endpoint is not optimal to only extract document's metadata 
> and content.  We should develop a new connector based on the 'rmeta' endpoint 
> which is more suited for our needs.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to