Re: Establishing a Nutch development roadmap

2023-09-29 Thread Sebastian Nagel

> Migrating javax->jakarta has been quite a chore on Tika because of
> dependencies.  Given back-compat issues with hadoop, is this even on the
> horizon for Nutch?

Good point. I think we are pretty free to replace javax packages in Nutch core
and plugins - they're used in multiple classes.

If it's about transitive dependencies of mandatory dependencies such as Hadoop:
well, that's strictly speaking not our job. But there should be no or very few 
Nutch classes which rely on javax classes shared with dependencies.



>  Y, I'd like to get a working Tika version in a release fairly soon.

Definitely.


> Not sure how much effort a release is?

See https://cwiki.apache.org/confluence/display/NUTCH/Release_HOWTO

Plainly spoken, it's too much effort. And if you take testing seriously,
it's even more, because there are no automated tests to verify that everything
runs well on a Hadoop cluster and to test indexing into Solr, ES, OpenSearch.

On 9/28/23 15:37, Tim Allison wrote:

Sorry for two emails...

Migrating javax->jakarta has been quite a chore on Tika because of dependencies. 
Given back-compat issues with hadoop, is this even on the horizon for Nutch?


On Thu, Sep 28, 2023 at 9:29 AM Tim Allison > wrote:


Y, I'd like to get a working Tika version in a release fairly soon. Not sure
how much effort a release is?


On Thu, Sep 28, 2023 at 8:29 AM Sebastian Nagel mailto:sna...@apache.org>> wrote:

Hi Lewis,

thanks!

I'd put on top of the list

* release 1.20

Since the release of 1.19 more than one year has elapsed.

Otherwise I agree with all points on the road map, even
in this order / priority.

Best,
Sebastian


On 9/26/23 18:37, lewis john mcgibbney wrote:
 > Hi dev@,
 >
 > I've been at arms length for a while as $dayjob changed and then
 > changed again over the last number of years.
 >
 > With that being said, I wanted to start a thread on $title with the
 > goal of establishing some "big items" we could put on the roadmap and
 > maybe even publish...
 >
 > Here are some of the thing's I've been thinking about (unordered)
 >
 > * NUTCH-2940 Develop Gradle Core Build for Apache Nutch
 > * Metrics system integration cf.
https://github.com/apache/nutch/pull/712

 > * Upgrading Javac version > 11
 > * Trade study to consider integrating (something like) Plugin
 > Framework for Java (PF4J) into Nutch
 > * porting Nutch to run on Apache Beam https://beam.apache.org/

 >
 > Does anyone else have candidates they wish to add?
 >
 > Thanks for your consideration.
 >
 > lewismc
 >
 >



Re: Establishing a Nutch development roadmap

2023-09-28 Thread Tim Allison
Sorry for two emails...

Migrating javax->jakarta has been quite a chore on Tika because of
dependencies. Given back-compat issues with hadoop, is this even on the
horizon for Nutch?

On Thu, Sep 28, 2023 at 9:29 AM Tim Allison  wrote:

> Y, I'd like to get a working Tika version in a release fairly soon. Not
> sure how much effort a release is?
>
>
> On Thu, Sep 28, 2023 at 8:29 AM Sebastian Nagel  wrote:
>
>> Hi Lewis,
>>
>> thanks!
>>
>> I'd put on top of the list
>>
>> * release 1.20
>>
>> Since the release of 1.19 more than one year has elapsed.
>>
>> Otherwise I agree with all points on the road map, even
>> in this order / priority.
>>
>> Best,
>> Sebastian
>>
>>
>> On 9/26/23 18:37, lewis john mcgibbney wrote:
>> > Hi dev@,
>> >
>> > I've been at arms length for a while as $dayjob changed and then
>> > changed again over the last number of years.
>> >
>> > With that being said, I wanted to start a thread on $title with the
>> > goal of establishing some "big items" we could put on the roadmap and
>> > maybe even publish...
>> >
>> > Here are some of the thing's I've been thinking about (unordered)
>> >
>> > * NUTCH-2940 Develop Gradle Core Build for Apache Nutch
>> > * Metrics system integration cf.
>> https://github.com/apache/nutch/pull/712
>> > * Upgrading Javac version > 11
>> > * Trade study to consider integrating (something like) Plugin
>> > Framework for Java (PF4J) into Nutch
>> > * porting Nutch to run on Apache Beam https://beam.apache.org/
>> >
>> > Does anyone else have candidates they wish to add?
>> >
>> > Thanks for your consideration.
>> >
>> > lewismc
>> >
>> >
>>
>


Re: Establishing a Nutch development roadmap

2023-09-28 Thread Tim Allison
Y, I'd like to get a working Tika version in a release fairly soon. Not
sure how much effort a release is?


On Thu, Sep 28, 2023 at 8:29 AM Sebastian Nagel  wrote:

> Hi Lewis,
>
> thanks!
>
> I'd put on top of the list
>
> * release 1.20
>
> Since the release of 1.19 more than one year has elapsed.
>
> Otherwise I agree with all points on the road map, even
> in this order / priority.
>
> Best,
> Sebastian
>
>
> On 9/26/23 18:37, lewis john mcgibbney wrote:
> > Hi dev@,
> >
> > I've been at arms length for a while as $dayjob changed and then
> > changed again over the last number of years.
> >
> > With that being said, I wanted to start a thread on $title with the
> > goal of establishing some "big items" we could put on the roadmap and
> > maybe even publish...
> >
> > Here are some of the thing's I've been thinking about (unordered)
> >
> > * NUTCH-2940 Develop Gradle Core Build for Apache Nutch
> > * Metrics system integration cf.
> https://github.com/apache/nutch/pull/712
> > * Upgrading Javac version > 11
> > * Trade study to consider integrating (something like) Plugin
> > Framework for Java (PF4J) into Nutch
> > * porting Nutch to run on Apache Beam https://beam.apache.org/
> >
> > Does anyone else have candidates they wish to add?
> >
> > Thanks for your consideration.
> >
> > lewismc
> >
> >
>


Re: Establishing a Nutch development roadmap

2023-09-28 Thread Sebastian Nagel

Hi Lewis,

thanks!

I'd put on top of the list

* release 1.20

Since the release of 1.19 more than one year has elapsed.

Otherwise I agree with all points on the road map, even
in this order / priority.

Best,
Sebastian


On 9/26/23 18:37, lewis john mcgibbney wrote:

Hi dev@,

I've been at arms length for a while as $dayjob changed and then
changed again over the last number of years.

With that being said, I wanted to start a thread on $title with the
goal of establishing some "big items" we could put on the roadmap and
maybe even publish...

Here are some of the thing's I've been thinking about (unordered)

* NUTCH-2940 Develop Gradle Core Build for Apache Nutch
* Metrics system integration cf. https://github.com/apache/nutch/pull/712
* Upgrading Javac version > 11
* Trade study to consider integrating (something like) Plugin
Framework for Java (PF4J) into Nutch
* porting Nutch to run on Apache Beam https://beam.apache.org/

Does anyone else have candidates they wish to add?

Thanks for your consideration.

lewismc




Establishing a Nutch development roadmap

2023-09-26 Thread lewis john mcgibbney
Hi dev@,

I've been at arms length for a while as $dayjob changed and then
changed again over the last number of years.

With that being said, I wanted to start a thread on $title with the
goal of establishing some "big items" we could put on the roadmap and
maybe even publish...

Here are some of the thing's I've been thinking about (unordered)

* NUTCH-2940 Develop Gradle Core Build for Apache Nutch
* Metrics system integration cf. https://github.com/apache/nutch/pull/712
* Upgrading Javac version > 11
* Trade study to consider integrating (something like) Plugin
Framework for Java (PF4J) into Nutch
* porting Nutch to run on Apache Beam https://beam.apache.org/

Does anyone else have candidates they wish to add?

Thanks for your consideration.

lewismc


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc