> Migrating javax->jakarta has been quite a chore on Tika because of
> dependencies.  Given back-compat issues with hadoop, is this even on the
> horizon for Nutch?

Good point. I think we are pretty free to replace javax packages in Nutch core
and plugins - they're used in multiple classes.

If it's about transitive dependencies of mandatory dependencies such as Hadoop:
well, that's strictly speaking not our job. But there should be no or very few Nutch classes which rely on javax classes shared with dependencies.


>  Y, I'd like to get a working Tika version in a release fairly soon.

Definitely.


> Not sure how much effort a release is?

See https://cwiki.apache.org/confluence/display/NUTCH/Release_HOWTO

Plainly spoken, it's too much effort. And if you take testing seriously,
it's even more, because there are no automated tests to verify that everything
runs well on a Hadoop cluster and to test indexing into Solr, ES, OpenSearch.

On 9/28/23 15:37, Tim Allison wrote:
Sorry for two emails...

Migrating javax->jakarta has been quite a chore on Tika because of dependencies. Given back-compat issues with hadoop, is this even on the horizon for Nutch?

On Thu, Sep 28, 2023 at 9:29 AM Tim Allison <talli...@apache.org <mailto:talli...@apache.org>> wrote:

    Y, I'd like to get a working Tika version in a release fairly soon. Not sure
    how much effort a release is?


    On Thu, Sep 28, 2023 at 8:29 AM Sebastian Nagel <sna...@apache.org
    <mailto:sna...@apache.org>> wrote:

        Hi Lewis,

        thanks!

        I'd put on top of the list

        * release 1.20

        Since the release of 1.19 more than one year has elapsed.

        Otherwise I agree with all points on the road map, even
        in this order / priority.

        Best,
        Sebastian


        On 9/26/23 18:37, lewis john mcgibbney wrote:
         > Hi dev@,
         >
         > I've been at arms length for a while as $dayjob changed and then
         > changed again over the last number of years.
         >
         > With that being said, I wanted to start a thread on $title with the
         > goal of establishing some "big items" we could put on the roadmap and
         > maybe even publish...
         >
         > Here are some of the thing's I've been thinking about (unordered)
         >
         > * NUTCH-2940 Develop Gradle Core Build for Apache Nutch
         > * Metrics system integration cf.
        https://github.com/apache/nutch/pull/712
        <https://github.com/apache/nutch/pull/712>
         > * Upgrading Javac version > 11
         > * Trade study to consider integrating (something like) Plugin
         > Framework for Java (PF4J) into Nutch
         > * porting Nutch to run on Apache Beam https://beam.apache.org/
        <https://beam.apache.org/>
         >
         > Does anyone else have candidates they wish to add?
         >
         > Thanks for your consideration.
         >
         > lewismc
         >
         >

Reply via email to