[ 
https://issues.apache.org/jira/browse/NUTCH-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-3034:
----------------------------------------
    Description: 
h1. Motivation

Plugins provide a large part of the functionality of Nutch. Although the legacy 
plugin framework continues to offer lots of value i.e.,
 # [some aspects e.g. examples, are [fairly well 
documented|h[ttps://cwiki.apache.org/confluence/display/NUTCH/PluginCentral|https://cwiki.apache.org/confluence/display/NUTCH/PluginCentral]]
 # it is generally stable, and
 # offers reasonable test coverage (on a plugin-by-plugin basis)
 # … probably loads more positives which I am overlooking...

… there are also several aspects which could be improved
 # the [core framework is sparsely 
documented|[https://cwiki.apache.org/confluence/display/NUTCH/WhichTechnicalConceptsAreBehindTheNutchPluginSystem]],
 this extends to very important aspects like the {*}plugin lifecycle{*}, 
{*}classloading{*}, {*}packaging{*}, {*}thread safety{*}, and lots of other 
topics which are of intrinsic value to developers and maintainers. 
 # the core framework is somewhat [sparsely 
tested|[https://github.com/apache/nutch/blob/master/src/test/org/apache/nutch/plugin/TestPluginSystem.java]]…
 currently 7 tests as of writing. Traditionally, developers have focused on 
providing unit tests on the plugin-level as opposed to the legacy plugin 
framework.
 # see’s very low maintenance/attention. It is my gut feeling (and I may be 
totally wrong here) but I _think_ that not many people know much about the core 
legacy plugin framework.
 # writing plugins is clunky. This largely has to do with the legacy Ant + Ivy 
build and dependency management system, but that being said, it is clunky 
non-the-less.
 # generally speaking, any reduction of code in the Nutch codebase through 
careful selection and dependence of well maintained, well tested 3rd party 
libraries would be a good thing for the Nutch codebase.

*This issue therefore proposes to overhaul the* *legacy* *Nutch plugin 
framework and replace it with Plugin Framework for Java (PF4J).*
h1. Task Breakdown

The following is a proposed breakdown of this overall initiative intp Epics. 
These Epics should likely be decomposed further but that will be left down to 
the implementer(s).
 # {*}document the legacy Nutch plugin lifecycle{*}; taking inspiration from 
[PF4J’s plugin lifecycle 
documentaiton|[https://pf4j.org/doc/plugin-lifecycle.html]] provide both 
documentation and a diagram which clearly outline how the legacy plugin 
lifecycle works. Might also be a good idea to make a contribution to PF4J and 
provide them with a diagram to accompany their documentation :). Generally 
speaking just familiarize ones-self with the legacy plugin framework and 
understand where the gaps are.
 # *study PF4J framework and* {*}perform feasibility study{*}{*};{*} this will 
provide an opportunity to identify gaps between what the legacy plugin 
framework does (and what Nutch) needs Vs what PF4J provides. Touch base with 
the PF4J community, describe the intention to replace the legacy Nutch plugin 
framework with PF4J. Obtain guidance on how to proceed. Document this all in 
the Nutch wiki. Create mapping of [legacy 
Classes|[https://github.com/apache/nutch/tree/master/src/java/org/apache/nutch/plugin]]
 to [PF4J 
equivalents|[https://github.com/pf4j/pf4j/tree/master/pf4j/src/main/java/org/pf4j]].
 # {*}Restructure the legacy Nutch plugin package{*}: 
[https://github.com/apache/nutch/tree/master/src/java/org/apache/nutch/plugin]
 # {*}Restructure each plugin in the plugins directory{*}: 
[https://github.com/apache/nutch/tree/master/src/plugin]
 # *Update Nutch plugin documentation* 
 # {*}Create/propose plugin utility toolings{*}: #4 in the motivation section 
states that developing plugins in clunky. A utility tool which streamlines the 
creation of new plugins would be ideal. For example, this could take the form 
of a [new bash script|[https://github.com/apache/nutch/tree/master/src/bin]] 
which prompts the developer for input and then generates the plugin skeleton. 
{*}This is a nice to have{*}.

h1. Google Summer of Code Details

This initiative is being proposed as a GSoC 2024 project. 

{*}Proposed Mentor{*}: [~lewismc] 

{*}Proposed Co-Mentor{*}:

 

  was:
h1. Motivation

Plugins provide a large part of the functionality of Nutch. Although the legacy 
plugin framework continues to offer lots of value i.e.,
 # [some aspects e.g. examples, are [fairly well 
documented|h[ttps://cwiki.apache.org/confluence/display/NUTCH/PluginCentral|https://cwiki.apache.org/confluence/display/NUTCH/PluginCentral]]
 # it is generally stable, and
 # offers reasonable test coverage (on a plugin-by-plugin basis)
 # … probably loads more positives which I am overlooking...

… there are also several aspects which could be improved
 # the [core framework is sparsely 
documented|[https://cwiki.apache.org/confluence/display/NUTCH/WhichTechnicalConceptsAreBehindTheNutchPluginSystem]],
 this extends to very important aspects like the {*}plugin lifecycle{*}, 
{*}classloading{*}, {*}packaging{*}, {*}thread safety{*}, and lots of other 
topics which are of intrinsic value to developers and maintainers. 
 # the core framework is somewhat [sparsely 
tested|[https://github.com/apache/nutch/blob/master/src/test/org/apache/nutch/plugin/TestPluginSystem.java]]…
 currently 7 tests as of writing. Traditionally, developers have focused on 
providing unit tests on the plugin-level as opposed to the legacy plugin 
framework.
 # see’s very low maintenance/attention. It is my gut feeling (and I may be 
totally wrong here) but I _think_ that not many people know much about the core 
legacy plugin framework.
 # writing plugins is clunky. This largely has to do with the legacy Ant + Ivy 
build and dependency management system, but that being said, it is clunky 
non-the-less.
 # generally speaking, any reduction of code in the Nutch codebase through 
careful selection and dependence of well maintained, well tested 3rd party 
libraries would be a good thing for the Nutch codebase.

*This issue therefore proposes to overhaul the* *legacy* *Nutch plugin 
framework and replace it with Plugin Framework for Java (PF4J).*
h1. Task Breakdown

The following is a proposed breakdown of this overall initiative intp Epics. 
These Epics should likely be decomposed further but that will be left down to 
the implementer(s).
 # {*}document the legacy Nutch plugin lifecycle{*}; taking inspiration from 
[PF4J’s plugin lifecycle 
documentaiton|[https://pf4j.org/doc/plugin-lifecycle.html]] provide both 
documentation and a diagram which clearly outline how the legacy plugin 
lifecycle works. Might also be a good idea to make a contribution to PF4J and 
provide them with a diagram to accompany their documentation :). Generally 
speaking just familiarize ones-self with the legacy plugin framework and 
understand where the gaps are.
 # *study PF4J framework and* {*}perform feasibility study{*}{*};{*} this will 
provide an opportunity to identify gaps between what the legacy plugin 
framework does (and what Nutch) needs Vs what PF4J provides. Touch base with 
the PF4J community, describe the intention to replace the legacy Nutch plugin 
framework with PF4J. Obtain guidance on how to proceed. Document this all in 
the Nutch wiki. Create mapping of [legacy 
Classes|[https://github.com/apache/nutch/tree/master/src/java/org/apache/nutch/plugin]]
 to [PF4J 
equivalents|[https://github.com/pf4j/pf4j/tree/master/pf4j/src/main/java/org/pf4j]].
 # {*}Restructure the legacy Nutch plugin package{*}: 
[https://github.com/apache/nutch/tree/master/src/java/org/apache/nutch/plugin]
 # {*}Restructure each plugin in the plugins directory{*}: 
[https://github.com/apache/nutch/tree/master/src/plugin]

 
h1. Google Summer of Code Details

This initiative is being proposed as a GSoC 2024 project. 

{*}Proposed Mentor{*}: [~lewismc] 

{*}Proposed Co-Mentor{*}:

 


> Overhaul the legacy Nutch plugin framework and replace it with PF4J
> -------------------------------------------------------------------
>
>                 Key: NUTCH-3034
>                 URL: https://issues.apache.org/jira/browse/NUTCH-3034
>             Project: Nutch
>          Issue Type: Improvement
>          Components: pf4j, plugin
>            Reporter: Lewis John McGibbney
>            Priority: Major
>              Labels: gsoc2024
>
> h1. Motivation
> Plugins provide a large part of the functionality of Nutch. Although the 
> legacy plugin framework continues to offer lots of value i.e.,
>  # [some aspects e.g. examples, are [fairly well 
> documented|h[ttps://cwiki.apache.org/confluence/display/NUTCH/PluginCentral|https://cwiki.apache.org/confluence/display/NUTCH/PluginCentral]]
>  # it is generally stable, and
>  # offers reasonable test coverage (on a plugin-by-plugin basis)
>  # … probably loads more positives which I am overlooking...
> … there are also several aspects which could be improved
>  # the [core framework is sparsely 
> documented|[https://cwiki.apache.org/confluence/display/NUTCH/WhichTechnicalConceptsAreBehindTheNutchPluginSystem]],
>  this extends to very important aspects like the {*}plugin lifecycle{*}, 
> {*}classloading{*}, {*}packaging{*}, {*}thread safety{*}, and lots of other 
> topics which are of intrinsic value to developers and maintainers. 
>  # the core framework is somewhat [sparsely 
> tested|[https://github.com/apache/nutch/blob/master/src/test/org/apache/nutch/plugin/TestPluginSystem.java]]…
>  currently 7 tests as of writing. Traditionally, developers have focused on 
> providing unit tests on the plugin-level as opposed to the legacy plugin 
> framework.
>  # see’s very low maintenance/attention. It is my gut feeling (and I may be 
> totally wrong here) but I _think_ that not many people know much about the 
> core legacy plugin framework.
>  # writing plugins is clunky. This largely has to do with the legacy Ant + 
> Ivy build and dependency management system, but that being said, it is clunky 
> non-the-less.
>  # generally speaking, any reduction of code in the Nutch codebase through 
> careful selection and dependence of well maintained, well tested 3rd party 
> libraries would be a good thing for the Nutch codebase.
> *This issue therefore proposes to overhaul the* *legacy* *Nutch plugin 
> framework and replace it with Plugin Framework for Java (PF4J).*
> h1. Task Breakdown
> The following is a proposed breakdown of this overall initiative intp Epics. 
> These Epics should likely be decomposed further but that will be left down to 
> the implementer(s).
>  # {*}document the legacy Nutch plugin lifecycle{*}; taking inspiration from 
> [PF4J’s plugin lifecycle 
> documentaiton|[https://pf4j.org/doc/plugin-lifecycle.html]] provide both 
> documentation and a diagram which clearly outline how the legacy plugin 
> lifecycle works. Might also be a good idea to make a contribution to PF4J and 
> provide them with a diagram to accompany their documentation :). Generally 
> speaking just familiarize ones-self with the legacy plugin framework and 
> understand where the gaps are.
>  # *study PF4J framework and* {*}perform feasibility study{*}{*};{*} this 
> will provide an opportunity to identify gaps between what the legacy plugin 
> framework does (and what Nutch) needs Vs what PF4J provides. Touch base with 
> the PF4J community, describe the intention to replace the legacy Nutch plugin 
> framework with PF4J. Obtain guidance on how to proceed. Document this all in 
> the Nutch wiki. Create mapping of [legacy 
> Classes|[https://github.com/apache/nutch/tree/master/src/java/org/apache/nutch/plugin]]
>  to [PF4J 
> equivalents|[https://github.com/pf4j/pf4j/tree/master/pf4j/src/main/java/org/pf4j]].
>  # {*}Restructure the legacy Nutch plugin package{*}: 
> [https://github.com/apache/nutch/tree/master/src/java/org/apache/nutch/plugin]
>  # {*}Restructure each plugin in the plugins directory{*}: 
> [https://github.com/apache/nutch/tree/master/src/plugin]
>  # *Update Nutch plugin documentation* 
>  # {*}Create/propose plugin utility toolings{*}: #4 in the motivation section 
> states that developing plugins in clunky. A utility tool which streamlines 
> the creation of new plugins would be ideal. For example, this could take the 
> form of a [new bash 
> script|[https://github.com/apache/nutch/tree/master/src/bin]] which prompts 
> the developer for input and then generates the plugin skeleton. {*}This is a 
> nice to have{*}.
> h1. Google Summer of Code Details
> This initiative is being proposed as a GSoC 2024 project. 
> {*}Proposed Mentor{*}: [~lewismc] 
> {*}Proposed Co-Mentor{*}:
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to