[jira] [Commented] (NUTCH-3006) Downgrade Tika dependency to 2.2.1 (core and parse-tika)

2023-09-29 Thread Sebastian Nagel (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770320#comment-17770320
 ] 

Sebastian Nagel commented on NUTCH-3006:


> revert CloseShieldInputStream.wrap(), which I think was the only conflict

Yes, looks like it was the only conflict. If it's an option to revert this, 
yes, why not.

The idea of the downgrade was more to avoid that this issue blocks any release. 
And downgrading from 2.3.0 (current master) to 2.2.1 sounds less dramatic.

> how far out Hadoop 3.4.0 is

Even if it's released, it takes some time (a couple of months) until Hadoop 
distributions (for example Apache Bigtop) pick the release and/or users deploy 
it.

> Downgrade Tika dependency to 2.2.1 (core and parse-tika)
> 
>
> Key: NUTCH-3006
> URL: https://issues.apache.org/jira/browse/NUTCH-3006
> Project: Nutch
>  Issue Type: Bug
>Affects Versions: 1.20
>Reporter: Sebastian Nagel
>Priority: Major
> Fix For: 1.20
>
>
> Tika 2.3.0 and upwards depend on a commons-io 2.11.0 (or even higher) which 
> is not available when Nutch is used on Hadoop. Only Hadoop 3.4.0 is expected 
> to ship with commons-io 2.11.0 (HADOOP-18301), all currently released 
> versions provide commons-io 2.8.0. Because Hadoop-required dependencies are 
> enforced in (pseudo)distributed mode, using Tika may cause issues, see 
> NUTCH-2937 and NUTCH-2959.
> [~lewismc] suggested in the discussion of [Githup PR 
> #776|https://github.com/apache/nutch/pull/776] to downgrade to Tika 2.2.1 to 
> resolve these issues for now and until Hadoop 3.4.0 becomes available.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-3006) Downgrade Tika dependency to 2.2.1 (core and parse-tika)

2023-09-28 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770059#comment-17770059
 ] 

Tim Allison commented on NUTCH-3006:


An alternative approach would be for Tika to revert 
CloseShieldInputStream.wrap(), which I think was the only conflict?!  Should I 
check with the Tika community about that?

The notion of downgrading Tika to a December 2021 release unsettles me, and I 
have no idea how far out Hadoop 3.4.0 is.

WDYT?

> Downgrade Tika dependency to 2.2.1 (core and parse-tika)
> 
>
> Key: NUTCH-3006
> URL: https://issues.apache.org/jira/browse/NUTCH-3006
> Project: Nutch
>  Issue Type: Bug
>Affects Versions: 1.20
>Reporter: Sebastian Nagel
>Priority: Major
> Fix For: 1.20
>
>
> Tika 2.3.0 and upwards depend on a commons-io 2.11.0 (or even higher) which 
> is not available when Nutch is used on Hadoop. Only Hadoop 3.4.0 is expected 
> to ship with commons-io 2.11.0 (HADOOP-18301), all currently released 
> versions provide commons-io 2.8.0. Because Hadoop-required dependencies are 
> enforced in (pseudo)distributed mode, using Tika may cause issues, see 
> NUTCH-2937 and NUTCH-2959.
> [~lewismc] suggested in the discussion of [Githup PR 
> #776|https://github.com/apache/nutch/pull/776] to downgrade to Tika 2.2.1 to 
> resolve these issues for now and until Hadoop 3.4.0 becomes available.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)