[ 
https://issues.apache.org/jira/browse/OAK-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18086095#comment-18086095
 ] 

Marco Matessi commented on OAK-9752:
------------------------------------

I've been looking into this issue and ran a local spike to understand the 
current state.

With TIKA-4698 now fixed upstream (PR apache/tika#2842, merging into 3.3.2), I 
started working on the classpath migration. I have a draft branch 
([basix86/jackrabbit-oak 
issue/OAK-9752|https://github.com/basix86/jackrabbit-oak/tree/issue/OAK-9752]) 
with the following changes:
* {{tika.version}} bumped to 3.3.1; {{tika-parsers}} renamed to 
{{tika-parsers-standard-package}}
* One API fix: {{ParseContext.getDocumentBuilder()}} replaced with 
{{XMLReaderUtils.getDocumentBuilder()}}
* {{tika-core}} added as explicit test dep in oak-search-elastic (no longer 
transitive in 3.x)
* {{TikaExtractionOsgiIT}} marked {{@Ignore}}, the OSGi piece needs Tika 3.3.2 
(TIKA-4698 fix) and a rework of the bundle provisioning for the 3.x topology

The classpath test suite passes (oak-run, oak-search, oak-lucene, 
oak-search-elastic). slf4j 1.7.36 is sufficient since Tika 3.x uses no 2.0-only 
API. Legacy Lucene 4.7.2 is unaffected.

Before going further: does this approach look reasonable? Any guidance on the 
OSGi provisioning for 3.x would be especially appreciated. I saw Konrad's 
mailing-list thread but could not find a conclusion. Happy to bring this to 
oak-dev@ if that is the preferred channel.

> Update Tika version to 3.x
> --------------------------
>
>                 Key: OAK-9752
>                 URL: https://issues.apache.org/jira/browse/OAK-9752
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>            Reporter: Mohit Kataria
>            Assignee: Konrad Windszus
>            Priority: Minor
>              Labels: indexing
>
> Currently oak uses tika-1.24.1
> We should upgrade tika version to latest tika -i.e. 2.x- meanwhile 3.x



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to