Re: Self Introduction - Xuanwo
Nice welcome Xuanwo thanks for introeucing yourself. lewismc On 2024/03/10 05:20:20 Xuanwo wrote: > Hello, everyone > > I'm Xuanwo, and I'm following the "Contribute" guide in > comdev-working-groups[1] to introduce myself and kickstart my contributions :) > > My personal vision is "Empowering freely data access from ANY storage service > in ANY method". Open source is definitely an important part of achieving my > vision. > > - I'm the PMC Chair for Apache OpenDAL [2], a project that graduated in > January 2024, aimed at enabling free data access. > - I work at Databend Labs [3], focusing on cost-effective data analysis. > - I'm also contributing to Apache Iceberg [4] to simplify reading SQL tables. > > My current interest lies in open source sustainability. I want to learn how > to ensure a project's sustainability and foster community growth. I'm here to > explore how I can contribute to expanding the ASF community. > > Pleased to meet you here; I'm looking forward to working together with you. > > [1]: https://github.com/apache/comdev-working-groups > [2]: https://github.com/apache/opendal > [3]: https://github.com/datafuselabs/databend/ > [4]: https://github.com/apache/iceberg-rust > > Xuanwo > > - > To unsubscribe, e-mail: dev-unsubscr...@community.apache.org > For additional commands, e-mail: dev-h...@community.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@community.apache.org For additional commands, e-mail: dev-h...@community.apache.org
Re: [QUESTION] What should community do in GSoC timeline?
Hi Xuanwo, It’s been a few years since I participated in GSoC as a mentor… but this year I intend to. Let me see if I can provide answers to some of your questions. On 2024/03/11 03:07:29 Xuanwo wrote: > > 2024-02-22: Potential GSoC contributors discuss application ideas with > mentoring organizations > > Q: Should those ideas/proposals been posted to mailing list? Or just discuss > with mentors? Mailing list is great however I don’t think there are any hard rules. This period is really just for attracting interest in the initiative (if it was created by the PMC/Committership) or convincing a PMC to take on your initiative (if it was created by a potential GSoC student). > Q: Should student-submitted ideas/proposals be added to Jira? Yes absolutely. Make sure the JIRA issue is labeled with “gsoc2024” as well. That way it will show up in the filter at https://issues.apache.org/jira/issues/?jql=labels+%3D+gsoc2024. > > 2024-04-15: Proposals to ASF projects must be reviewed roughly and have a > potential mentor so that we know how many slots to request. > > Q: Who will review/rank/score those proposals? The corresponding community's > PMC? > In short yes but really it is down to the mentor(s). It is always good to have a backup mentor as well in-case the mentor is unable to see the project through. HTH lewismc - To unsubscribe, e-mail: dev-unsubscr...@community.apache.org For additional commands, e-mail: dev-h...@community.apache.org
[jira] [Closed] (NUTCH-3033) Upgrade Ivy to v2.5.2
[ https://issues.apache.org/jira/browse/NUTCH-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-3033. --- > Upgrade Ivy to v2.5.2 > - > > Key: NUTCH-3033 > URL: https://issues.apache.org/jira/browse/NUTCH-3033 > Project: Nutch > Issue Type: Task > Components: ivy > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Major > Fix For: 1.20 > > > Ivy v2.5.2 was released August 20th 2023. Let’s upgrade. > [https://ant.apache.org/ivy/history/2.5.2/release-notes.html] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (NUTCH-3033) Upgrade Ivy to v2.5.2
[ https://issues.apache.org/jira/browse/NUTCH-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-3033. - Resolution: Fixed > Upgrade Ivy to v2.5.2 > - > > Key: NUTCH-3033 > URL: https://issues.apache.org/jira/browse/NUTCH-3033 > Project: Nutch > Issue Type: Task > Components: ivy > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Major > Fix For: 1.20 > > > Ivy v2.5.2 was released August 20th 2023. Let’s upgrade. > [https://ant.apache.org/ivy/history/2.5.2/release-notes.html] -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] Release Nutch 1.20
I submitted a patch for the Ivy 2.5.2 upgrade. If folks could have a look at that it would be ideal. https://github.com/apache/nutch/pull/803 I am free to roll a release candidate towards the end of this week. lewismc On 2024/03/10 15:08:36 Lewis John McGibbney wrote: > Nice > I wee that we are a couple releases behind of Ivy as well as I’ll submit a > patch for that. > I can push this release this time. It’s been a while since I exercised the > workflow and it would be good to blow away the cobb webs. > lewismc > > On 2024/03/10 11:55:20 Markus Jelsma wrote: > > Good idea! I'll finish work on three open issues the next week. > > > > Op za 9 mrt 2024 om 13:02 schreef Sebastian Nagel < > > wastl.na...@googlemail.com>: > > > > > Hi Lewis, > > > > > > yes, of course! > > > > > > Some points we should do before the release: > > > > > > - address the ES licensing issue, > > >the easiest way is to downgrade, see NUTCH-3008 > > >If done update the license-related files. > > > > > > - there are three short PRs open > > > > > > I'll try to have a look at these points the next days. > > > > > > Best, > > > Sebastian > > > > > > > > > On 3/8/24 01:43, lewis john mcgibbney wrote: > > > > Hi dev@, > > > > As of today, 51 issues have been addressed in the 1.20 development > > > > drive. > > > > https://issues.apache.org/jira/projects/NUTCH/versions/12352190 > > > > <https://issues.apache.org/jira/projects/NUTCH/versions/12352190> > > > > I would like to push a release soon and ship it to the user community. > > > > Any objections? > > > > Thank you > > > > lewismc > > > > > > > > > >
[jira] [Updated] (NUTCH-3033) Upgrade Ivy to v2.5.2
[ https://issues.apache.org/jira/browse/NUTCH-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-3033: Due Date: 12/Mar/24 (was: 11/Mar/24) > Upgrade Ivy to v2.5.2 > - > > Key: NUTCH-3033 > URL: https://issues.apache.org/jira/browse/NUTCH-3033 > Project: Nutch > Issue Type: Task > Components: ivy > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Major > Fix For: 1.20 > > > Ivy v2.5.2 was released August 20th 2023. Let’s upgrade. > [https://ant.apache.org/ivy/history/2.5.2/release-notes.html] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work stopped] (NUTCH-3033) Upgrade Ivy to v2.5.2
[ https://issues.apache.org/jira/browse/NUTCH-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-3033 stopped by Lewis John McGibbney. --- > Upgrade Ivy to v2.5.2 > - > > Key: NUTCH-3033 > URL: https://issues.apache.org/jira/browse/NUTCH-3033 > Project: Nutch > Issue Type: Task > Components: ivy > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Major > Fix For: 1.20 > > > Ivy v2.5.2 was released August 20th 2023. Let’s upgrade. > [https://ant.apache.org/ivy/history/2.5.2/release-notes.html] -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: Differences in retrieve pattern between Ivy 2.5.0/2.5.1 & 2.5.2?
Thanks for this guidance Stefan :) I was able to get a patch together at https://github.com/apache/nutch/pull/803 Hopefully this helps others who may be confused as I was. Thank you lewsmc On 2024/03/12 18:57:51 Stefan Bodewig wrote: > On 2024-03-11, lewis john mcgibbney wrote: > > > I am working on upgrading Ivy to latest over in the Apache Nutch project. > > The build works just fine with 2.5.0 and 2.5.1 but with 2.5.2 the CI > > fails with the following complaint > > > /home/runner/work/nutch/nutch/src/plugin/build-plugin.xml:234: > > impossible to ivy retrieve: java.lang.RuntimeException: problem during > > retrieve of org.apache.nutch#lib-htmlunit: java.lang.RuntimeException: > > Multiple artifacts of the module > > io.netty#netty-transport-native-kqueue;4.1.84.Final are retrieved to > > the same file! Update the retrieve pattern to fix this error. > > Ivy 2.5.2 fixes a bug[1] when dealing with dependencies that have > multiple Maven artifacts with different Maven classifiers. Prior to > 2.5.2 Ivy would think they'd all be the same and just pick one. > > io.netty#netty-transport-native-kqueue has several artifacts, at least > this is what the repo looks like. I completely fail to understand the > POM :-) > > Your pattern probably needs a [classifier] to make sure two artifacts > that differ by Maven classifier also target different file names. > > Something like > > pattern="${local-maven2-dir}/[organisation]/[module]/[revision]/[module]-[revision](-[classifier]).[ext]" > > Stefan > > [1] https://issues.apache.org/jira/browse/IVY-1642 >
[GSoC 2024 PROPOSAL] Overhaul the legacy Nutch plugin framework and replace it with PF4J
Hi user@ & dev@, I decided to write up a GSoC’24 proposal and encourage interested applicants to register your interest in the JIRA issue or else reach out to the Nutch PMC over on dev@nutch.apache.org (please CC lewi...@apache.org). Title: Overhaul the legacy Nutch plugin framework and replace it with PF4J JIRA: https://issues.apache.org/jira/browse/NUTCH-3034 Thanks in advance, and good luck to prospective GSoC applicants. lewismc -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc
[GSoC 2024 PROPOSAL] Overhaul the legacy Nutch plugin framework and replace it with PF4J
Hi user@ & dev@, I decided to write up a GSoC’24 proposal and encourage interested applicants to register your interest in the JIRA issue or else reach out to the Nutch PMC over on d...@nutch.apache.org (please CC lewi...@apache.org). Title: Overhaul the legacy Nutch plugin framework and replace it with PF4J JIRA: https://issues.apache.org/jira/browse/NUTCH-3034 Thanks in advance, and good luck to prospective GSoC applicants. lewismc -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc
[jira] [Updated] (NUTCH-3034) Overhaul the legacy Nutch plugin framework and replace it with PF4J
[ https://issues.apache.org/jira/browse/NUTCH-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-3034: Description: h1. Motivation Plugins provide a large part of the functionality of Nutch. Although the legacy plugin framework continues to offer lots of value i.e., # [some aspects e.g. examples, are [fairly well documented|h[ttps://cwiki.apache.org/confluence/display/NUTCH/PluginCentral|https://cwiki.apache.org/confluence/display/NUTCH/PluginCentral]] # it is generally stable, and # offers reasonable test coverage (on a plugin-by-plugin basis) # … probably loads more positives which I am overlooking... … there are also several aspects which could be improved # the [core framework is sparsely documented|[https://cwiki.apache.org/confluence/display/NUTCH/WhichTechnicalConceptsAreBehindTheNutchPluginSystem]], this extends to very important aspects like the {*}plugin lifecycle{*}, {*}classloading{*}, {*}packaging{*}, {*}thread safety{*}, and lots of other topics which are of intrinsic value to developers and maintainers. # the core framework is somewhat [sparsely tested|[https://github.com/apache/nutch/blob/master/src/test/org/apache/nutch/plugin/TestPluginSystem.java]]… currently 7 tests as of writing. Traditionally, developers have focused on providing unit tests on the plugin-level as opposed to the legacy plugin framework. # see’s very low maintenance/attention. It is my gut feeling (and I may be totally wrong here) but I _think_ that not many people know much about the core legacy plugin framework. # writing plugins is clunky. This largely has to do with the legacy Ant + Ivy build and dependency management system, but that being said, it is clunky non-the-less. # generally speaking, any reduction of code in the Nutch codebase through careful selection and dependence of well maintained, well tested 3rd party libraries would be a good thing for the Nutch codebase. *This issue therefore proposes to overhaul the* *legacy* *Nutch plugin framework and replace it with Plugin Framework for Java (PF4J).* h1. Task Breakdown The following is a proposed breakdown of this overall initiative intp Epics. These Epics should likely be decomposed further but that will be left down to the implementer(s). # {*}document the legacy Nutch plugin lifecycle{*}; taking inspiration from [PF4J’s plugin lifecycle documentaiton|[https://pf4j.org/doc/plugin-lifecycle.html]] provide both documentation and a diagram which clearly outline how the legacy plugin lifecycle works. Might also be a good idea to make a contribution to PF4J and provide them with a diagram to accompany their documentation :). Generally speaking just familiarize ones-self with the legacy plugin framework and understand where the gaps are. # *study PF4J framework and* {*}perform feasibility study{*}{*};{*} this will provide an opportunity to identify gaps between what the legacy plugin framework does (and what Nutch) needs Vs what PF4J provides. Touch base with the PF4J community, describe the intention to replace the legacy Nutch plugin framework with PF4J. Obtain guidance on how to proceed. Document this all in the Nutch wiki. Create mapping of [legacy Classes|[https://github.com/apache/nutch/tree/master/src/java/org/apache/nutch/plugin]] to [PF4J equivalents|[https://github.com/pf4j/pf4j/tree/master/pf4j/src/main/java/org/pf4j]]. # {*}Restructure the legacy Nutch plugin package{*}: [https://github.com/apache/nutch/tree/master/src/java/org/apache/nutch/plugin] # {*}Restructure each plugin in the plugins directory{*}: [https://github.com/apache/nutch/tree/master/src/plugin] # *Update Nutch plugin documentation* # {*}Create/propose plugin utility toolings{*}: #4 in the motivation section states that developing plugins in clunky. A utility tool which streamlines the creation of new plugins would be ideal. For example, this could take the form of a [new bash script|[https://github.com/apache/nutch/tree/master/src/bin]] which prompts the developer for input and then generates the plugin skeleton. {*}This is a nice to have{*}. h1. Google Summer of Code Details This initiative is being proposed as a GSoC 2024 project. {*}Proposed Mentor{*}: [~lewismc] {*}Proposed Co-Mentor{*}: was: h1. Motivation Plugins provide a large part of the functionality of Nutch. Although the legacy plugin framework continues to offer lots of value i.e., # [some aspects e.g. examples, are [fairly well documented|h[ttps://cwiki.apache.org/confluence/display/NUTCH/PluginCentral|https://cwiki.apache.org/confluence/display/NUTCH/PluginCentral]] # it is generally stable, and # offers reasonable test coverage (on a plugin-by-plugin basis) # … probably loads more positives which I am overlooking... … there are also several aspects which could be improved # the [core framework is sparsely documented|[https
[jira] [Updated] (NUTCH-3034) Overhaul the legacy Nutch plugin framework and replace it with PF4J
[ https://issues.apache.org/jira/browse/NUTCH-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-3034: Description: h1. Motivation Plugins provide a large part of the functionality of Nutch. Although the legacy plugin framework continues to offer lots of value i.e., # [some aspects e.g. examples, are [fairly well documented|h[ttps://cwiki.apache.org/confluence/display/NUTCH/PluginCentral|https://cwiki.apache.org/confluence/display/NUTCH/PluginCentral]] # it is generally stable, and # offers reasonable test coverage (on a plugin-by-plugin basis) # … probably loads more positives which I am overlooking... … there are also several aspects which could be improved # the [core framework is sparsely documented|[https://cwiki.apache.org/confluence/display/NUTCH/WhichTechnicalConceptsAreBehindTheNutchPluginSystem]], this extends to very important aspects like the {*}plugin lifecycle{*}, {*}classloading{*}, {*}packaging{*}, {*}thread safety{*}, and lots of other topics which are of intrinsic value to developers and maintainers. # the core framework is somewhat [sparsely tested|[https://github.com/apache/nutch/blob/master/src/test/org/apache/nutch/plugin/TestPluginSystem.java]]… currently 7 tests as of writing. Traditionally, developers have focused on providing unit tests on the plugin-level as opposed to the legacy plugin framework. # see’s very low maintenance/attention. It is my gut feeling (and I may be totally wrong here) but I _think_ that not many people know much about the core legacy plugin framework. # writing plugins is clunky. This largely has to do with the legacy Ant + Ivy build and dependency management system, but that being said, it is clunky non-the-less. # generally speaking, any reduction of code in the Nutch codebase through careful selection and dependence of well maintained, well tested 3rd party libraries would be a good thing for the Nutch codebase. *This issue therefore proposes to overhaul the* *legacy* *Nutch plugin framework and replace it with Plugin Framework for Java (PF4J).* h1. Task Breakdown The following is a proposed breakdown of this overall initiative intp Epics. These Epics should likely be decomposed further but that will be left down to the implementer(s). # {*}document the legacy Nutch plugin lifecycle{*}; taking inspiration from [PF4J’s plugin lifecycle documentaiton|[https://pf4j.org/doc/plugin-lifecycle.html]] provide both documentation and a diagram which clearly outline how the legacy plugin lifecycle works. Might also be a good idea to make a contribution to PF4J and provide them with a diagram to accompany their documentation :). Generally speaking just familiarize ones-self with the legacy plugin framework and understand where the gaps are. # *study PF4J framework and* {*}perform feasibility study{*}{*};{*} this will provide an opportunity to identify gaps between what the legacy plugin framework does (and what Nutch) needs Vs what PF4J provides. Touch base with the PF4J community, describe the intention to replace the legacy Nutch plugin framework with PF4J. Obtain guidance on how to proceed. Document this all in the Nutch wiki. Create mapping of [legacy Classes|[https://github.com/apache/nutch/tree/master/src/java/org/apache/nutch/plugin]] to [PF4J equivalents|[https://github.com/pf4j/pf4j/tree/master/pf4j/src/main/java/org/pf4j]]. # {*}Restructure the legacy Nutch plugin package{*}: [https://github.com/apache/nutch/tree/master/src/java/org/apache/nutch/plugin] # {*}Restructure each plugin in the plugins directory{*}: [https://github.com/apache/nutch/tree/master/src/plugin] h1. Google Summer of Code Details This initiative is being proposed as a GSoC 2024 project. {*}Proposed Mentor{*}: [~lewismc] {*}Proposed Co-Mentor{*}: was: h1. Motivation Plugins provide a large part of the functionality of Nutch. Although the legacy plugin framework continues to offer lots of value i.e., # [some aspects e.g. examples, are [fairly well documented|h[ttps://cwiki.apache.org/confluence/display/NUTCH/PluginCentral|https://cwiki.apache.org/confluence/display/NUTCH/PluginCentral]] # it is generally stable, and # offers reasonable test coverage (on a plugin-by-plugin basis) # … probably loads more positives which I am overlooking... … there are also several aspects which could be improved # the [core framework is sparsely documented|[https://cwiki.apache.org/confluence/display/NUTCH/WhichTechnicalConceptsAreBehindTheNutchPluginSystem]], this extends to very important aspects like the {*}plugin lifecycle{*}, {*}classloading{*}, {*}packaging{*}, {*}thread safety{*}, and lots of other topics which are of intrinsic value to developers and maintainers. # the core framework is somewhat [sparsely tested|[https://github.com/apache/nutch/blob/master/src/test/org/apache/nutch/plugin
[jira] [Updated] (NUTCH-3034) Overhaul the legacy Nutch plugin framework and replace it with PF4J
[ https://issues.apache.org/jira/browse/NUTCH-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-3034: Description: h1. Motivation Plugins provide a large part of the functionality of Nutch. Although the legacy plugin framework continues to offer lots of value i.e., # [some aspects e.g. examples, are [fairly well documented|h[ttps://cwiki.apache.org/confluence/display/NUTCH/PluginCentral|https://cwiki.apache.org/confluence/display/NUTCH/PluginCentral]] # it is generally stable, and # offers reasonable test coverage (on a plugin-by-plugin basis) # … probably loads more positives which I am overlooking... … there are also several aspects which could be improved # the [core framework is sparsely documented|[https://cwiki.apache.org/confluence/display/NUTCH/WhichTechnicalConceptsAreBehindTheNutchPluginSystem]], this extends to very important aspects like the {*}plugin lifecycle{*}, {*}classloading{*}, {*}packaging{*}, {*}thread safety{*}, and lots of other topics which are of intrinsic value to developers and maintainers. # the core framework is somewhat [sparsely tested|[https://github.com/apache/nutch/blob/master/src/test/org/apache/nutch/plugin/TestPluginSystem.java]]… currently 7 tests as of writing. Traditionally, developers have focused on providing unit tests on the plugin-level as opposed to the legacy plugin framework. # see’s very low maintenance/attention. It is my gut feeling (and I may be totally wrong here) but I _think_ that not many people know much about the core legacy plugin framework. # writing plugins is clunky. This largely has to do with the legacy Ant + Ivy build and dependency management system, but that being said, it is clunky non-the-less. # generally speaking, any reduction of code in the Nutch codebase through careful selection and dependence of well maintained, well tested 3rd party libraries would be a good thing for the Nutch codebase. *This issue therefore proposes to overhaul the* *legacy* *Nutch plugin framework and replace it with Plugin Framework for Java (PF4J).* h1. Task Breakdown The following is a proposed breakdown of this overall initiative intp Epics. These Epics should likely be decomposed further but that will be left down to the implementer(s). # {*}document the legacy Nutch plugin lifecycle{*}; taking inspiration from [PF4J’s plugin lifecycle documentaiton|[https://pf4j.org/doc/plugin-lifecycle.html]] provide both documentation and a diagram which clearly outline how the legacy plugin lifecycle works. Might also be a good idea to make a contribution to PF4J and provide them with a diagram to accompany their documentation :). Generally speaking just familiarize ones-self with the legacy plugin framework and understand where the gaps are. # *study PF4J framework and* {*}perform feasibility study{*}{*};{*} this will provide an opportunity to identify gaps between what the legacy plugin framework does (and what Nutch) needs Vs what PF4J provides. Touch base with the PF4J community, describe the intention to replace the legacy Nutch plugin framework with PF4J. Obtain guidance on how to proceed. Document this all in the Nutch wiki. Create mapping of [legacy Classes|[https://github.com/apache/nutch/tree/master/src/java/org/apache/nutch/plugin]] to [PF4J equivalents|[https://github.com/pf4j/pf4j/tree/master/pf4j/src/main/java/org/pf4j]]. # {*}Restructure the legacy Nutch plugin package{*}: [https://github.com/apache/nutch/tree/master/src/java/org/apache/nutch/plugin] # {*}Restructure each plugin in the plugins directory{*}: [https://github.com/apache/nutch/tree/master/src/plugin] # was: h1. Motivation Plugins provide a large part of the functionality of Nutch. Although the legacy plugin framework continues to offer lots of value i.e., # [some aspects e.g. examples, are [fairly well documented|h[ttps://cwiki.apache.org/confluence/display/NUTCH/PluginCentral|https://cwiki.apache.org/confluence/display/NUTCH/PluginCentral]] # it is generally stable, and # offers reasonable test coverage (on a plugin-by-plugin basis) # … probably loads more positives which I am overlooking... … there are also several aspects which could be improved # the [core framework is sparsely documented|[https://cwiki.apache.org/confluence/display/NUTCH/WhichTechnicalConceptsAreBehindTheNutchPluginSystem]], this extends to very important aspects like the {*}plugin lifecycle{*}, {*}classloading{*}, {*}packaging{*}, {*}thread safety{*}, and lots of other topics which are of intrinsic value to developers and maintainers. # the core framework is somewhat [sparsely tested|[https://github.com/apache/nutch/blob/master/src/test/org/apache/nutch/plugin/TestPluginSystem.java]]… only 7 tests. Traditionally, developers have focused on providing unit tests on the plugin-level as opposed to the legacy plugin framework
[jira] [Updated] (NUTCH-3034) Overhaul the legacy Nutch plugin framework and replace it with PF4J
[ https://issues.apache.org/jira/browse/NUTCH-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-3034: Description: h1. Motivation Plugins provide a large part of the functionality of Nutch. Although the legacy plugin framework continues to offer lots of value i.e., # [some aspects e.g. examples, are [fairly well documented|h[ttps://cwiki.apache.org/confluence/display/NUTCH/PluginCentral|https://cwiki.apache.org/confluence/display/NUTCH/PluginCentral]] # it is generally stable, and # offers reasonable test coverage (on a plugin-by-plugin basis) # … probably loads more positives which I am overlooking... … there are also several aspects which could be improved # the [core framework is sparsely documented|[https://cwiki.apache.org/confluence/display/NUTCH/WhichTechnicalConceptsAreBehindTheNutchPluginSystem]], this extends to very important aspects like the {*}plugin lifecycle{*}, {*}classloading{*}, {*}packaging{*}, {*}thread safety{*}, and lots of other topics which are of intrinsic value to developers and maintainers. # the core framework is somewhat [sparsely tested|[https://github.com/apache/nutch/blob/master/src/test/org/apache/nutch/plugin/TestPluginSystem.java]]… only 7 tests. Traditionally, developers have focused on providing unit tests on the plugin-level as opposed to the legacy plugin framework. # see’s very low maintenance/attention. It is my gut feeling (and I may be totally wrong here) but I _think_ that not many people know much about the core legacy plugin framework. # writing plugins is clunky. This largely has to do with the legacy Ant + Ivy build and dependency management system, but that being said, it is clunky non-the-less. *This issue therefore proposes to overhaul the* *legacy* *Nutch plugin framework and replace it with Plugin Framework for Java (PF4J).* h1. Task Breakdown The following is a proposed breakdown of this overall initiative intp Epics. These Epics should likely be decomposed further but that will be left down to the implementer(s). * {*}document the legacy Nutch plugin lifecycle{*}; taking inspiration from [PF4J’s plugin lifecycle documentaiton|[https://pf4j.org/doc/plugin-lifecycle.html]] provide both documentation and a diagram which clearly outline how the legacy plugin lifecycle works. Might also be a good idea to make a contribution to PF4J and provide them with a diagram to accompany their documentation :). * *study PF4J framework and* {*}perform feasibility study{*}{*};{*} this will provide an opportunity to identify gaps between what the legacy plugin framework does (and what Nutch) needs Vs what PF4J provides. Touch base with the PF4J community, describe the intention to replace the legacy Nutch plugin framework with PF4J. Obtain guidance on how to proceed. Document this all in the Nutch wiki. * was: h1. Motivation Plugins provide a large part of the functionality of Nutch. Although the legacy plugin framework continues to offer lots of value i.e., # [some aspects e.g. examples, are [fairly well documented|h[ttps://cwiki.apache.org/confluence/display/NUTCH/PluginCentral|https://cwiki.apache.org/confluence/display/NUTCH/PluginCentral]] # it is generally stable, and # offers reasonable test coverage (on a plugin-by-plugin basis) # … probably loads more positives which I am overlooking... … there are also several aspects which could be improved # the [core framework is sparsely documented|[https://cwiki.apache.org/confluence/display/NUTCH/WhichTechnicalConceptsAreBehindTheNutchPluginSystem]], this extends to very important aspects like the {*}plugin lifecycle{*}, {*}classloading{*}, {*}packaging{*}, {*}thread safety{*}, and lots of other topics which are of intrinsic value to developers and maintainers. # the core framework is somewhat [sparsely tested|[https://github.com/apache/nutch/blob/master/src/test/org/apache/nutch/plugin/TestPluginSystem.java]]… only 7 tests. Traditionally, developers have focused on providing unit tests on the plugin-level as opposed to the legacy plugin framework. # see’s very low maintenance/attention. It is my gut feeling (and I may be totally wrong here) but I _think_ that not many people know much about the core legacy plugin framework. # writing plugins is clunky. This largely has to do with the legacy Ant + Ivy build and dependency management system, but that being said, it is clunky non-the-less. *This issue therefore proposes to overhaul the* *legacy* *Nutch plugin framework and replace it with Plugin Framework for Java (PF4J).* h1. Task Breakdown The following is a proposed breakdown of this overall initiative intp Epics. These Epics should likely be decomposed further but that will be left down to the implementer(s). * {*}perform feasibility study{*}; touch base with the PF4J community, describe the intention to replace the legacy Nutch
[jira] [Updated] (NUTCH-3034) Overhaul the legacy Nutch plugin framework and replace it with PF4J
[ https://issues.apache.org/jira/browse/NUTCH-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-3034: Description: h1. Motivation Plugins provide a large part of the functionality of Nutch. Although the legacy plugin framework continues to offer lots of value i.e., # [some aspects e.g. examples, are fairly well documented|h[ttps://cwiki.apache.org/confluence/display/NUTCH/PluginCentral|https://cwiki.apache.org/confluence/display/NUTCH/PluginCentral]] # it is generally stable, and # offers reasonable test coverage (on a plugin-by-plugin basis) # … probably loads more positives which I am overlooking... … there are also several aspects which could be improved # the [core framework is sparsely documented|[https://cwiki.apache.org/confluence/display/NUTCH/WhichTechnicalConceptsAreBehindTheNutchPluginSystem]], this extends to very important aspects like the {*}plugin lifecycle{*}, {*}classloading{*}, {*}packaging{*}, {*}thread safety{*}, and lots of other topics which are of intrinsic value to developers and maintainers. # the core framework is somewhat [sparsely tested|[https://github.com/apache/nutch/blob/master/src/test/org/apache/nutch/plugin/TestPluginSystem.java]]… only 7 tests. Traditionally, developers have focused on providing unit tests on the plugin-level as opposed to the legacy plugin framework. # see’s very low maintenance/attention. It is my gut feeling (and I may be totally wrong here) but I _think_ that not many people know much about the core legacy plugin framework. # writing plugins is clunky. This largely has to do with the legacy Ant + Ivy build and dependency management system, but that being said, it is clunky non-the-less. *This issue therefore proposes to overhaul the* *legacy* *Nutch plugin framework and replace it with Plugin Framework for Java (PF4J).* h1. Task Breakdown The following is a proposed breakdown of this overall initiative intp Epics. These Epics should likely be decomposed further but that will be left down to the implementer(s). * {*}perform feasibility study{*}; touch base with the PF4J community, describe the intention to replace the legacy Nutch plugin framework with PF4J. Obtain guidance on how to proceed. Document this all in the Nutch wiki. * {*}document the legacy Nutch plugin lifecycle{*}; taking inspiration from [PF4J’s plugin lifecycle documentaiton|[https://pf4j.org/doc/plugin-lifecycle.html]] provide both documentation and a diagram which clearly outline how the legacy plugin lifecycle works. Might also be a good idea to make a contribution to PF4J and provide them with a diagram to accompany their documentation :) * was: h1. Motivation Plugins provide a large part of the functionality of Nutch. Although the legacy plugin framework continues to offer lots of value i.e., # [some aspects e.g. examples, are fairly well documented|[https://cwiki.apache.org/confluence/display/NUTCH/PluginCentral]] # it is generally stable, and # offers reasonable test coverage (on a plugin-by-plugin basis) # … probably loads more positives which I am overlooking... … there are also several aspects which could be improved # the [core framework is sparsely documented|[https://cwiki.apache.org/confluence/display/NUTCH/WhichTechnicalConceptsAreBehindTheNutchPluginSystem]], this extends to very important aspects like the {*}plugin lifecycle{*}, {*}classloading{*}, {*}packaging{*}, \{*}thread safety{*}, and lots of other topics which are of intrinsic value to developers and maintainers. # the core framework is somewhat [sparsely tested|[https://github.com/apache/nutch/blob/master/src/test/org/apache/nutch/plugin/TestPluginSystem.java]]… only 7 tests. Traditionally, developers have focused on providing unit tests on the plugin-level as opposed to the legacy plugin framework. # see’s very low maintenance/attention. It is my gut feeling (and I may be totally wrong here) but I _think_ that not many people know much about the core legacy plugin framework. # writing plugins is clunky. This largely has to do with the legacy Ant + Ivy build and dependency management system, but that being said, it is clunky non-the-less. *This issue therefore proposes to overhaul the* *legacy* *Nutch plugin framework and replace it with Plugin Framework for Java (PF4J).* h1. Task Breakdown The following is a proposed breakdown of this overall initiative intp Epics. These Epics should likely be decomposed further but that will be left down to the implementer(s). * {*}perform feasibility study{*}; touch base with the PF4J community, describe the intention to replace the legacy Nutch plugin framework with PF4J. Obtain guidance on how to proceed. Document this all in the Nutch wiki. * {*}document the legacy Nutch plugin lifecycle{*}; taking inspiration from [PF4J’s plugin lifecycle documentaiton|[https://pf4j.org/doc
[jira] [Updated] (NUTCH-3034) Overhaul the legacy Nutch plugin framework and replace it with PF4J
[ https://issues.apache.org/jira/browse/NUTCH-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-3034: Description: h1. Motivation Plugins provide a large part of the functionality of Nutch. Although the legacy plugin framework continues to offer lots of value i.e., # [some aspects e.g. examples, are [fairly well documented|h[ttps://cwiki.apache.org/confluence/display/NUTCH/PluginCentral|https://cwiki.apache.org/confluence/display/NUTCH/PluginCentral]] # it is generally stable, and # offers reasonable test coverage (on a plugin-by-plugin basis) # … probably loads more positives which I am overlooking... … there are also several aspects which could be improved # the [core framework is sparsely documented|[https://cwiki.apache.org/confluence/display/NUTCH/WhichTechnicalConceptsAreBehindTheNutchPluginSystem]], this extends to very important aspects like the {*}plugin lifecycle{*}, {*}classloading{*}, {*}packaging{*}, {*}thread safety{*}, and lots of other topics which are of intrinsic value to developers and maintainers. # the core framework is somewhat [sparsely tested|[https://github.com/apache/nutch/blob/master/src/test/org/apache/nutch/plugin/TestPluginSystem.java]]… only 7 tests. Traditionally, developers have focused on providing unit tests on the plugin-level as opposed to the legacy plugin framework. # see’s very low maintenance/attention. It is my gut feeling (and I may be totally wrong here) but I _think_ that not many people know much about the core legacy plugin framework. # writing plugins is clunky. This largely has to do with the legacy Ant + Ivy build and dependency management system, but that being said, it is clunky non-the-less. *This issue therefore proposes to overhaul the* *legacy* *Nutch plugin framework and replace it with Plugin Framework for Java (PF4J).* h1. Task Breakdown The following is a proposed breakdown of this overall initiative intp Epics. These Epics should likely be decomposed further but that will be left down to the implementer(s). * {*}perform feasibility study{*}; touch base with the PF4J community, describe the intention to replace the legacy Nutch plugin framework with PF4J. Obtain guidance on how to proceed. Document this all in the Nutch wiki. * {*}document the legacy Nutch plugin lifecycle{*}; taking inspiration from [PF4J’s plugin lifecycle documentaiton|[https://pf4j.org/doc/plugin-lifecycle.html]] provide both documentation and a diagram which clearly outline how the legacy plugin lifecycle works. Might also be a good idea to make a contribution to PF4J and provide them with a diagram to accompany their documentation :) * was: h1. Motivation Plugins provide a large part of the functionality of Nutch. Although the legacy plugin framework continues to offer lots of value i.e., # [some aspects e.g. examples, are fairly well documented|h[ttps://cwiki.apache.org/confluence/display/NUTCH/PluginCentral|https://cwiki.apache.org/confluence/display/NUTCH/PluginCentral]] # it is generally stable, and # offers reasonable test coverage (on a plugin-by-plugin basis) # … probably loads more positives which I am overlooking... … there are also several aspects which could be improved # the [core framework is sparsely documented|[https://cwiki.apache.org/confluence/display/NUTCH/WhichTechnicalConceptsAreBehindTheNutchPluginSystem]], this extends to very important aspects like the {*}plugin lifecycle{*}, {*}classloading{*}, {*}packaging{*}, {*}thread safety{*}, and lots of other topics which are of intrinsic value to developers and maintainers. # the core framework is somewhat [sparsely tested|[https://github.com/apache/nutch/blob/master/src/test/org/apache/nutch/plugin/TestPluginSystem.java]]… only 7 tests. Traditionally, developers have focused on providing unit tests on the plugin-level as opposed to the legacy plugin framework. # see’s very low maintenance/attention. It is my gut feeling (and I may be totally wrong here) but I _think_ that not many people know much about the core legacy plugin framework. # writing plugins is clunky. This largely has to do with the legacy Ant + Ivy build and dependency management system, but that being said, it is clunky non-the-less. *This issue therefore proposes to overhaul the* *legacy* *Nutch plugin framework and replace it with Plugin Framework for Java (PF4J).* h1. Task Breakdown The following is a proposed breakdown of this overall initiative intp Epics. These Epics should likely be decomposed further but that will be left down to the implementer(s). * {*}perform feasibility study{*}; touch base with the PF4J community, describe the intention to replace the legacy Nutch plugin framework with PF4J. Obtain guidance on how to proceed. Document this all in the Nutch wiki. * {*}document the legacy Nutch plugin lifecycle{*}; taking inspiration from
[jira] [Created] (NUTCH-3034) Overhaul the legacy Nutch plugin framework and replace it with PF4J
Lewis John McGibbney created NUTCH-3034: --- Summary: Overhaul the legacy Nutch plugin framework and replace it with PF4J Key: NUTCH-3034 URL: https://issues.apache.org/jira/browse/NUTCH-3034 Project: Nutch Issue Type: Improvement Components: pf4j, plugin Reporter: Lewis John McGibbney h1. Motivation Plugins provide a large part of the functionality of Nutch. Although the legacy plugin framework continues to offer lots of value i.e., # [some aspects e.g. examples, are fairly well documented|[https://cwiki.apache.org/confluence/display/NUTCH/PluginCentral]] # it is generally stable, and # offers reasonable test coverage (on a plugin-by-plugin basis) # … probably loads more positives which I am overlooking... … there are also several aspects which could be improved # the [core framework is sparsely documented|[https://cwiki.apache.org/confluence/display/NUTCH/WhichTechnicalConceptsAreBehindTheNutchPluginSystem]], this extends to very important aspects like the {*}plugin lifecycle{*}, {*}classloading{*}, {*}packaging{*}, \{*}thread safety{*}, and lots of other topics which are of intrinsic value to developers and maintainers. # the core framework is somewhat [sparsely tested|[https://github.com/apache/nutch/blob/master/src/test/org/apache/nutch/plugin/TestPluginSystem.java]]… only 7 tests. Traditionally, developers have focused on providing unit tests on the plugin-level as opposed to the legacy plugin framework. # see’s very low maintenance/attention. It is my gut feeling (and I may be totally wrong here) but I _think_ that not many people know much about the core legacy plugin framework. # writing plugins is clunky. This largely has to do with the legacy Ant + Ivy build and dependency management system, but that being said, it is clunky non-the-less. *This issue therefore proposes to overhaul the* *legacy* *Nutch plugin framework and replace it with Plugin Framework for Java (PF4J).* h1. Task Breakdown The following is a proposed breakdown of this overall initiative intp Epics. These Epics should likely be decomposed further but that will be left down to the implementer(s). * {*}perform feasibility study{*}; touch base with the PF4J community, describe the intention to replace the legacy Nutch plugin framework with PF4J. Obtain guidance on how to proceed. Document this all in the Nutch wiki. * {*}document the legacy Nutch plugin lifecycle{*}; taking inspiration from [PF4J’s plugin lifecycle documentaiton|[https://pf4j.org/doc/plugin-lifecycle.html]] provide both documentation and a diagram which clearly outline how the legacy plugin lifecycle works. Might also be a good idea to make a contribution to PF4J and provide them with a diagram to accompany their documentation :) * -- This message was sent by Atlassian Jira (v8.20.10#820010)
Differences in retrieve pattern between Ivy 2.5.0/2.5.1 & 2.5.2?
Hi ivy-user@, I am working on upgrading Ivy to latest over in the Apache Nutch project. The build works just fine with 2.5.0 and 2.5.1 but with 2.5.2 the CI fails with the following complaint /home/runner/work/nutch/nutch/src/plugin/build-plugin.xml:234: impossible to ivy retrieve: java.lang.RuntimeException: problem during retrieve of org.apache.nutch#lib-htmlunit: java.lang.RuntimeException: Multiple artifacts of the module io.netty#netty-transport-native-kqueue;4.1.84.Final are retrieved to the same file! Update the retrieve pattern to fix this error. I’m not sure what to do here… any ideas would be appreciated. The Nutch ivysettings.xml van be found at https://github.com/apache/nutch/blob/master/ivy/ivysettings.xml Thanks for any assistance. lewismc -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc
[jira] [Created] (NUTCH-3033) Upgrade Ivy to v2.5.2
Lewis John McGibbney created NUTCH-3033: --- Summary: Upgrade Ivy to v2.5.2 Key: NUTCH-3033 URL: https://issues.apache.org/jira/browse/NUTCH-3033 Project: Nutch Issue Type: Task Components: ivy Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Fix For: 1.20 Ivy v2.5.2 was released August 20th 2023. Let’s upgrade. [https://ant.apache.org/ivy/history/2.5.2/release-notes.html] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (NUTCH-3033) Upgrade Ivy to v2.5.2
[ https://issues.apache.org/jira/browse/NUTCH-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-3033 started by Lewis John McGibbney. --- > Upgrade Ivy to v2.5.2 > - > > Key: NUTCH-3033 > URL: https://issues.apache.org/jira/browse/NUTCH-3033 > Project: Nutch > Issue Type: Task > Components: ivy > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Major > Fix For: 1.20 > > > Ivy v2.5.2 was released August 20th 2023. Let’s upgrade. > [https://ant.apache.org/ivy/history/2.5.2/release-notes.html] -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] Release Nutch 1.20
Nice I wee that we are a couple releases behind of Ivy as well as I’ll submit a patch for that. I can push this release this time. It’s been a while since I exercised the workflow and it would be good to blow away the cobb webs. lewismc On 2024/03/10 11:55:20 Markus Jelsma wrote: > Good idea! I'll finish work on three open issues the next week. > > Op za 9 mrt 2024 om 13:02 schreef Sebastian Nagel < > wastl.na...@googlemail.com>: > > > Hi Lewis, > > > > yes, of course! > > > > Some points we should do before the release: > > > > - address the ES licensing issue, > >the easiest way is to downgrade, see NUTCH-3008 > >If done update the license-related files. > > > > - there are three short PRs open > > > > I'll try to have a look at these points the next days. > > > > Best, > > Sebastian > > > > > > On 3/8/24 01:43, lewis john mcgibbney wrote: > > > Hi dev@, > > > As of today, 51 issues have been addressed in the 1.20 development drive. > > > https://issues.apache.org/jira/projects/NUTCH/versions/12352190 > > > <https://issues.apache.org/jira/projects/NUTCH/versions/12352190> > > > I would like to push a release soon and ship it to the user community. > > > Any objections? > > > Thank you > > > lewismc > > > > > >
Re: Indexing arbitrary fields
Hi Joe, Thanks for describing your work in detail. It provides a great utility which I think could be of immense value. Please feel free to create a JIRA ticket which can be used as the basis for linking to the prior similar examples you referenced. A WIP pull request would be ideal. Thanks lewismc On 2024/03/08 01:06:18 Joe Gilvary wrote: > Good day, all, > > I wanted to index some values that I had to derive from fields in the > NutchDocument. I started on an indexing plugin. Then I realized I would > need more than one, or I could generalize the plugin. I went with the > generalizing and wrote a plugin that will use custom POJOs to process & > inject whatever the Nutch user wants, based on properties in > NUTCH_CONF_DIR/nutch-site.xml. I've tested it so far with > > one POJO that uses jsoup to extract values from the page based on a CSS > selector specified in nutch-site.xml, > > another POJO that takes a regex from nutch-site.xml and applies it to > the URL to determine how "deep" the URL directory structure goes for the > document, > > and a third toy POJO to take multiple arguments from nutch-site.xml and > return their product. That last test was just to be sure the plug-in > would handle more than two arguments in the property value. > > There's an optional boolean in the config to set whether to overwrite an > existing field, or (by default) add to it. Finally, I hacked a naming > convention and the way the plugin uses the setConf() call so the plugin > will accept configuration for multiple different POJOs to set multiple > fields in the NutchDocument. I didn't see any examples of a plugin > running more than once for each document quite that way, so I'm not sure > if this conforms to whatever canonical approach might exist. > > I think of this plugin as a way to extend the reach of the plugin > architecture's flexibility out to POJO-land :) for anyone who > can't/won't for whatever reason write a plugin of their own. The POJOs > have to accept a String in a constructor, but they don't work on > NutchDocument or CrawlDatum or anything. I think if the plugin wants to > pass all that to a POJO for reflection, it's a clever way to waste time > when the work could be done in the plugin itself. For some subset of > indexing requirements, I think this could be useful to a wider set of > users. Still, I'm not a wider set of users, so I'm asking here. > > NUTCH-585 has a lot of discussion about a concern similar to what this > jsoup example enables and Solr itself includes the > URLClassifierProcessor that addresses the same type of task that the > regex example shows, so is there any interest in this kind of > generalized plugin? Just from those examples, it could enable some > altered version of those capabilities. I've only built and tested with > the 1.19 branch and main branch code so far, and only with a Solr 9.2.1 > cloud install, 'cause that's what I'm running, but if it seems > worthwhile to others, I'll beef up the documentation and write JUnit cases. > > Thanks, stay safe, stay healthy, > > Joe > >
[DISCUSS] Release Nutch 1.20
Hi dev@, As of today, 51 issues have been addressed in the 1.20 development drive. https://issues.apache.org/jira/projects/NUTCH/versions/12352190 I would like to push a release soon and ship it to the user community. Any objections? Thank you lewismc
Re: [DISCUSS] Graduate Apache SDAP (Incubating) as a Top Level Project
Julien’s has very succinctly described the community growth challenges and podling direction. For a number of years I acted as mentor for SDAP and was puzzled by the inability for the community to push releases. This still concerns me... That being said, there is definitely potential (the software is being used) and I do feel that SDAP should graduate. Please carry my +1 through to a VOTE. Thanks, and congratulations to the SDAP community… and a HUGE thanks for Julien as well. lewismc On 2024/02/22 18:01:31 Riley Kuttruff wrote: > Hi all, > > Apache SDAP joined Incubator in October 2017. In the time since, we've > made significant progress towards maturing our community and our > project and adopting the Apache Way. > > After community discussion [1][2][3], the community has voted [4] that we > would like to proceed with graduation [5]. We now call upon the Incubator > PMC to review and discuss our progress and would appreciate any and all > feedback towards graduation. > > Below are some facts and project highlights from the incubation phase as > well as the draft resolution: > > - Our community consists of 21 committers, with 2 being mentors and > the remaining 19 serving as our PPMC > - Several pending and planned invites to bring on new committers and/or > PPMC members from additional organizations > - Completed 2 releases with 2 release managers - with a 3rd release run by > a 3rd release manager in progress > - Our software is currently being utilized by organizations such as NASA > Jet Propulsion Laboratory, NSF National Center for Atmospheric Research, > Florida State University, and George Mason University in support of projects > such as the NASA Sea Level Change Portal, Estimating the Circulation and > Climate of the Ocean (ECCO) project, GRACE/GRACE-FO, Cloud-based > Data Match-Up Service, Integrated Digital Earth Analysis System (IDEAS), > and many others. > - Opened 400+ PRs across 3 main code repositories, 350+ of which are > merged or closed (some are pending our next release) > - Maturity model self assessment [6] > > We have resolved all branding issues we are aware of: logo, GitHub, > Website, etc > > We’d like to also extend a sincere thank you to our mentors, current and > former for their invaluable insight and assistance with getting us to this > point. > > Thank you, Julian, Jörn, Trevor, Lewis, Suneel, and Raphael! > > --- > > Establish the Apache SDAP Project > > WHEREAS, the Board of Directors deems it to be in the best interests of > the Foundation and consistent with the Foundation's purpose to establish > a Project Management Committee charged with the creation and maintenance > of open-source software, for distribution at no charge to the public, > related to an integrated data analytic center for Big Science problems. > > NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee > (PMC), to be known as the "Apache SDAP Project", be and hereby is > established pursuant to Bylaws of the Foundation; and be it further > > RESOLVED, that the Apache SDAP Project be and hereby is responsible > for the creation and maintenance of software related to an integrated data > analytic center for Big Science problems; and be it further > > RESOLVED, that the office of "Vice President, Apache SDAP" be and > hereby is created, the person holding such office to serve at the > direction of the Board of Directors as the chair of the Apache SDAP > Project, and to have primary responsibility for management of the > projects within the scope of responsibility of the Apache SDAP > Project; and be it further > > RESOLVED, that the persons listed immediately below be and hereby are > appointed to serve as the initial members of the Apache SDAP Project: > > - Edward M Armstrong > - Nga Thien Chung > - Thomas Cram > - Frank Greguska > - Thomas Huang > - Julian Hyde > - Joseph C. Jacob > - Jason Kang > - Riley Kuttruff > - Thomas G Loubrieu > - Kevin Marlis > - Stepheny Perez > - Wai Linn Phyo > > NOW, THEREFORE, BE IT FURTHER RESOLVED, that Nga Thien Chung > be appointed to the office of Vice President, Apache SDAP, to serve in > accordance with and subject to the direction of the Board of Directors > and the Bylaws of the Foundation until death, resignation, retirement, > removal or disqualification, or until a successor is appointed; and be it > further > > RESOLVED, that the Apache SDAP Project be and hereby is tasked with > the migration and rationalization of the Apache Incubator SDAP > podling; and be it further > > RESOLVED, that all responsibilities pertaining to the Apache Incubator > SDAP podling encumbered upon the Apache Incubator PMC are hereafter > discharged. > > [1] https://lists.apache.org/thread/vjwjmp0h2f22dv423h262cvdg5x7jl03 > [2] https://lists.apache.org/thread/m9vqwv23jdsofwgmhgxg25f5l1v2j7nz > [3]
Re: [DISCUSS] Incubating Proposal for StormCrawler
I think StromCrawler would be an excellent candidate for the Incubator. If the podling is looking for an additional mentor, I would be happy to chip in. lewismc On 2024/03/03 23:24:38 PJ Fanning wrote: > Hi everyone, > > I would like to propose StormCrawler [1] as a new Apache Incubator project, > and you can examine the proposal [2] for more details. > > StormCrawler is a collection of resources for building low-latency, > customisable and scalable web crawlers on Apache Storm. > > Proposal > > The aim of StormCrawler is to help build web crawlers that are: > > * scalable > * resilient > * low latency > * easy to extend > * polite yet efficient > > StormCrawler achieves this partly with Apache Storm, which it is based > on. To use an analogy, Apache Storm is to StormCrawler what Apache > Hadoop is to Apache Nutch. > > StormCrawler is mature (26 releases to date) and is used by many > organisations world-wide. > > Initial Committers > > Julien Nioche [jnio...@apache.org https://github.com/jnioche] > Sebastian Nagel [sna...@apache.org https://github.com/sebastian-nagel] > Richard Zowalla [r...@apache.org https://github.com/rzo1] > Tim Allison [talli...@apache.org https://github.com/tballison] > Michael Dinzinger [michael.dinzin...@uni-passau.de > https://github.com/michaeldinzinger] > > Most of the existing StormCrawler contributors are existing ASF > committers and are looking to build a vibrant community following the > Apache Way. > > I will help this project as the champion and mentor. We would welcome > additional mentors, if anyone has an interest in helping. > > We are looking forward to your questions and feedback. > > Thanks, > PJ > > [1] https://github.com/DigitalPebble/storm-crawler > [2] > https://cwiki.apache.org/confluence/display/INCUBATOR/StormCrawler+Proposal > > - > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > > - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] Graduate Apache Celeborn (Incubating) as a Top Level Project
+1 Excellent work on the Incubating releases and community building, lewismc On 2024/03/05 06:00:49 Yu Li wrote: > Hi All, > > Apache Celeborn joined Incubator in October 2022 [1]. Since then, > we've made significant progress towards maturing our community and > adopting the Apache Way. > > After a thorough discussion [2], the community has voted [3] that we > would like to proceed with graduation [4]. Furthermore, we'd like to > call upon the Incubator PMC to review and discuss our progress and > would appreciate any and all feedback towards graduation. > > Below are some facts and project highlights from the incubation phase > as well as the draft resolution: > > - Currently, our community consists of 19 committers (including > mentors) from more than 10 companies, with 13 serving as PPMC members > [5]. > - So far, we have boasted 81 contributors. > - Throughout the incubation period, we've made 6 releases [6] in 16 > months, at a stable pace. > - We've had 6 different release managers to date. > - Our software is used in production by 10+ well known entities [7]. > - As yet, we have opened 1,302 issues with 1,191 successfully resolved [8]. > - We have submitted a total of 1,840 PRs, out of which 1,830 have been > merged or closed [9]. > - Through self-assessment [10], we have met all maturity criteria as > outlined in [11]. > > We've resolved all branding issues which include Logo, GitHub repo, > document, website, and others [12] [13]. > > We'd also like to take this opportunity to extend a sincere thank you > to our mentors, for their invaluable insight and assistance with > getting us to this point. > > Thanks a lot, Becket Qin, Duo Zhang, Lidong Dai, Willem Ning Jiang and Yu Li! > > --- > > Establish the Apache Celeborn Project > > WHEREAS, the Board of Directors deems it to be in the best interests of > the Foundation and consistent with the Foundation's purpose to establish > a Project Management Committee charged with the creation and maintenance > of open-source software, for distribution at no charge to the public, > related to an intermediate data service for big data computing engines > to boost performance, stability, and flexibility. > > NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee > (PMC), to be known as the "Apache Celeborn Project", be and hereby is > established pursuant to Bylaws of the Foundation; and be it further > > RESOLVED, that the Apache Celeborn Project be and hereby is responsible > for the creation and maintenance of software related to an intermediate > data service for big data computing engines to boost performance, > stability, and flexibility; and be it further > > RESOLVED, that the office of "Vice President, Apache Celeborn" be and > hereby is created, the person holding such office to serve at the > direction of the Board of Directors as the chair of the Apache Celeborn > Project, and to have primary responsibility for management of the > projects within the scope of responsibility of the Apache Celeborn > Project; and be it further > > RESOLVED, that the persons listed immediately below be and hereby are > appointed to serve as the initial members of the Apache Celeborn > Project: > > * Becket Qin > * Cheng Pan > * Duo Zhang > * Ethan Feng > * Fu Chen > * Jiashu Xiong > * Kerwin Zhang > * Keyong Zhou > * Lidong Dai > * Willem Ning Jiang > * Wu Wei > * Yi Zhu > * Yu Li > > NOW, THEREFORE, BE IT FURTHER RESOLVED, that Keyong Zhou be appointed to > the office of Vice President, Apache Celeborn, to serve in accordance > with and subject to the direction of the Board of Directors and the > Bylaws of the Foundation until death, resignation, retirement, removal > or disqualification, or until a successor is appointed; and be it > further > > RESOLVED, that the Apache Celeborn Project be and hereby is tasked with > the migration and rationalization of the Apache Incubator Celeborn > podling; and be it further > > RESOLVED, that all responsibilities pertaining to the Apache Incubator > Celeborn podling encumbered upon the Apache Incubator PMC are hereafter > discharged. > > --- > > Best Regards, > Yu (on behalf of the Apache Celeborn PPMC) > > [1] https://incubator.apache.org/projects/celeborn.html > [2] https://lists.apache.org/thread/z17rs0mw4nyv0s112dklmv7s3j053mby > [3] https://lists.apache.org/thread/p1gykvxog456v5chvwmr4wk454qzmh3o > [4] https://lists.apache.org/thread/tqhh28q9r38czx677nh2ktc97tnlndw3 > [5] https://celeborn.apache.org/community/project_management_committee > [6] > https://issues.apache.org/jira/projects/CELEBORN?selectedItem=com.atlassian.jira.jira-projects-plugin:release-page=released > [7] https://github.com/apache/incubator-celeborn/issues/2140 > [8] https://s.apache.org/celeborn_jira_issues > [9] https://github.com/apache/incubator-celeborn/pulls > [10] >
[jira] [Closed] (NUTCH-3024) Remove flaky 'dependency check' target
[ https://issues.apache.org/jira/browse/NUTCH-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-3024. --- > Remove flaky 'dependency check' target > -- > > Key: NUTCH-3024 > URL: https://issues.apache.org/jira/browse/NUTCH-3024 > Project: Nutch > Issue Type: Task > Components: build >Affects Versions: 1.19 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Minor > Fix For: 1.20 > > > I [started a > thread|https://lists.apache.org/thread/ol3ssjphdqqxwsxhc65qoqg1dj1kjbxb] > covering my observations running the ant _*dependency-check*_ target. It > fails unpredictably in both GitHub actions and our trusty Jenkins builds on > ci-builds.apache.org. > I propose to simply remove this target (and associated configuration) in a > bid to clean up some flaky legacy build code. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (NUTCH-3024) Remove flaky 'dependency check' target
[ https://issues.apache.org/jira/browse/NUTCH-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-3024. - Resolution: Fixed > Remove flaky 'dependency check' target > -- > > Key: NUTCH-3024 > URL: https://issues.apache.org/jira/browse/NUTCH-3024 > Project: Nutch > Issue Type: Task > Components: build >Affects Versions: 1.19 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Minor > Fix For: 1.20 > > > I [started a > thread|https://lists.apache.org/thread/ol3ssjphdqqxwsxhc65qoqg1dj1kjbxb] > covering my observations running the ant _*dependency-check*_ target. It > fails unpredictably in both GitHub actions and our trusty Jenkins builds on > ci-builds.apache.org. > I propose to simply remove this target (and associated configuration) in a > bid to clean up some flaky legacy build code. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (TIKA-4169) Create a parser for Functional Mockup Unit (FMU) media type with .fmu extension
[ https://issues.apache.org/jira/browse/TIKA-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated TIKA-4169: --- Description: An Functional Mockup Unit (FMU) is a software component used for exchanging and simulating dynamic system models. It is designed to enable simulations of system models regardless of the simulation tool, programming language, or hardware platform. This is made possible through a standard interface that allows FMUs to be exported and imported across different simulation environments. The FMU media type ships with the .fmu file suffix I think the MIT licensed [NTNU-IHB/FMI4j|https://github.com/NTNU-IHB/FMI4j] can be used as the underlying parser implementation. I will go on the hunt for some sample files we can use in unit tests. I think we can make some available via [https://github.com/Open-MBEE/perseverance-modelica] was: An Functional Mockup Unit (FMU) is a software component used for exchanging and simulating dynamic system models. It is designed to enable simulations of system models regardless of the simulation tool, programming language, or hardware platform. This is made possible through a standard interface that allows FMUs to be exported and imported across different simulation environments. The FMU media type ships with the .fmu file suffix I think the MIT licensed [NTNU-IHB/FMI4j|https://github.com/NTNU-IHB/FMI4j] can be used as the underlying parser implementation. > Create a parser for Functional Mockup Unit (FMU) media type with .fmu > extension > --- > > Key: TIKA-4169 > URL: https://issues.apache.org/jira/browse/TIKA-4169 > Project: Tika > Issue Type: New Feature > Components: parser > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Minor > > An Functional Mockup Unit (FMU) is a software component used for exchanging > and simulating dynamic system models. It is designed to enable simulations of > system models regardless of the simulation tool, programming language, or > hardware platform. This is made possible through a standard interface that > allows FMUs to be exported and imported across different simulation > environments. > The FMU media type ships with the .fmu file suffix > I think the MIT licensed [NTNU-IHB/FMI4j|https://github.com/NTNU-IHB/FMI4j] > can be used as the underlying parser implementation. > I will go on the hunt for some sample files we can use in unit tests. I think > we can make some available via > [https://github.com/Open-MBEE/perseverance-modelica] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (TIKA-4169) Create a parser for Functional Mockup Unit (FMU) media type with .fmu extension
[ https://issues.apache.org/jira/browse/TIKA-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated TIKA-4169: --- Description: An Functional Mockup Unit (FMU) is a software component used for exchanging and simulating dynamic system models. It is designed to enable simulations of system models regardless of the simulation tool, programming language, or hardware platform. This is made possible through a standard interface that allows FMUs to be exported and imported across different simulation environments. The FMU media type ships with the .fmu file suffix I think the MIT licensed [NTNU-IHB/FMI4j|https://github.com/NTNU-IHB/FMI4j] can be used as the underlying parser implementation. was: An Functional Mockup Unit (FMU) is a software component used for exchanging and simulating dynamic system models. It is designed to enable simulations of system models regardless of the simulation tool, programming language, or hardware platform. This is made possible through a standard interface that allows FMUs to be exported and imported across different simulation environments. The FMU media type ships with the .fmu file suffix I think the MIT licensed [NTNU-IHB/FMI4j|[https://github.com/NTNU-IHB/FMI4j]] can be used as the underlying parser implementation. > Create a parser for Functional Mockup Unit (FMU) media type with .fmu > extension > --- > > Key: TIKA-4169 > URL: https://issues.apache.org/jira/browse/TIKA-4169 > Project: Tika > Issue Type: New Feature > Components: parser > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Minor > > An Functional Mockup Unit (FMU) is a software component used for exchanging > and simulating dynamic system models. It is designed to enable simulations of > system models regardless of the simulation tool, programming language, or > hardware platform. This is made possible through a standard interface that > allows FMUs to be exported and imported across different simulation > environments. > The FMU media type ships with the .fmu file suffix > I think the MIT licensed [NTNU-IHB/FMI4j|https://github.com/NTNU-IHB/FMI4j] > can be used as the underlying parser implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (TIKA-4169) Create a parser for Functional Mockup Unit (FMU) media type with .fmu extension
Lewis John McGibbney created TIKA-4169: -- Summary: Create a parser for Functional Mockup Unit (FMU) media type with .fmu extension Key: TIKA-4169 URL: https://issues.apache.org/jira/browse/TIKA-4169 Project: Tika Issue Type: New Feature Components: parser Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney An Functional Mockup Unit (FMU) is a software component used for exchanging and simulating dynamic system models. It is designed to enable simulations of system models regardless of the simulation tool, programming language, or hardware platform. This is made possible through a standard interface that allows FMUs to be exported and imported across different simulation environments. The FMU media type ships with the .fmu file suffix I think the MIT licensed [NTNU-IHB/FMI4j|[https://github.com/NTNU-IHB/FMI4j]] can be used as the underlying parser implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (NUTCH-3007) Fix impossible casts
[ https://issues.apache.org/jira/browse/NUTCH-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-3007. --- > Fix impossible casts > > > Key: NUTCH-3007 > URL: https://issues.apache.org/jira/browse/NUTCH-3007 > Project: Nutch > Issue Type: Sub-task >Affects Versions: 1.19 >Reporter: Sebastian Nagel >Assignee: Sebastian Nagel >Priority: Major > Fix For: 1.20 > > > Spotbugs reports two occurrences of > Impossible cast from java.util.ArrayList to String[] in > org.apache.nutch.fetcher.Fetcher.run(Map, String) > Both were introduced later into the {{run(Map args, String > crawlId)}} method and obviously never used (would throw a > ClassCastException). The code blocks should be removed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (NUTCH-2846) Fix various bugs spotted by NUTCH-2815
[ https://issues.apache.org/jira/browse/NUTCH-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-2846. --- > Fix various bugs spotted by NUTCH-2815 > -- > > Key: NUTCH-2846 > URL: https://issues.apache.org/jira/browse/NUTCH-2846 > Project: Nutch > Issue Type: Sub-task >Affects Versions: 1.18 >Reporter: Sebastian Nagel >Priority: Major > Fix For: 1.19 > > > This issue addresses various bugs spotted by Spotbugs (NUTCH-2815): > - use static method Integer.parseInt(...) > - use integer arithmetic instead of floating point with rounding floats > afterwards > - erroneous declaration of constructor in BasicURLNormalizer > - fix bracketing when calculating hash code of CrawlDatum -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (NUTCH-2852) Method invokes System.exit(...) 9 bugs
[ https://issues.apache.org/jira/browse/NUTCH-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-2852. --- > Method invokes System.exit(...) 9 bugs > -- > > Key: NUTCH-2852 > URL: https://issues.apache.org/jira/browse/NUTCH-2852 > Project: Nutch > Issue Type: Sub-task >Affects Versions: 1.18 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Major > Fix For: 1.20 > > > org.apache.nutch.indexer.IndexingFiltersChecker since first historized release > In class org.apache.nutch.indexer.IndexingFiltersChecker > In method org.apache.nutch.indexer.IndexingFiltersChecker.run(String[]) > At IndexingFiltersChecker.java:[line 96] > Another occurrence at IndexingFiltersChecker.java:[line 129] > org.apache.nutch.indexer.IndexingFiltersChecker.run(String[]) invokes > System.exit(...), which shuts down the entire virtual machine > Invoking System.exit shuts down the entire Java virtual machine. This should > only been done when it is appropriate. Such calls make it hard or impossible > for your code to be invoked by other code. Consider throwing a > RuntimeException instead. > Also occurs in >org.apache.nutch.net.URLFilterChecker since first historized release >org.apache.nutch.net.URLNormalizerChecker since first historized release >org.apache.nutch.parse.ParseSegment since first historized release >org.apache.nutch.parse.ParserChecker since first historized release >org.apache.nutch.service.NutchServer since first historized release >org.apache.nutch.tools.CommonCrawlDataDumper since first historized release >org.apache.nutch.tools.DmozParser since first historized release >org.apache.nutch.util.AbstractChecker since first historized release -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (NUTCH-2819) Move spotbugs "installation" directory to avoid that spotbugs is shipped in Nutch runtime
[ https://issues.apache.org/jira/browse/NUTCH-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-2819. --- > Move spotbugs "installation" directory to avoid that spotbugs is shipped in > Nutch runtime > - > > Key: NUTCH-2819 > URL: https://issues.apache.org/jira/browse/NUTCH-2819 > Project: Nutch > Issue Type: Sub-task >Affects Versions: 1.18 >Reporter: Sebastian Nagel >Assignee: Shashanka Balakuntala Srinivasa >Priority: Minor > Fix For: 1.19 > > > With NUTCH-2816 the Spotbugs tool is "installed" in lib/. However, files in > lib/ are copied to build/ and runtime/. To avoid that the spotbugs jars are > shipped in runtime and eventually also releases, the spotbugs installation > folder should be moved into a different directory. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (NUTCH-2851) Random object created and used only once
[ https://issues.apache.org/jira/browse/NUTCH-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-2851. --- > Random object created and used only once > > > Key: NUTCH-2851 > URL: https://issues.apache.org/jira/browse/NUTCH-2851 > Project: Nutch > Issue Type: Sub-task > Components: dmoz, generator, indexer, segment >Affects Versions: 1.18 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Major > Fix For: 1.19 > > > In class org.apache.nutch.crawl.Generator > In method org.apache.nutch.crawl.Generator.partitionSegment(Path, Path, int) > Called method java.util.Random.nextInt() > At Generator.java:[line 1016] > Random object created and used only once in > org.apache.nutch.crawl.Generator.partitionSegment(Path, Path, int) > This code creates a java.util.Random object, uses it to generate one random > number, and then discards the Random object. This produces mediocre quality > random numbers and is inefficient. If possible, rewrite the code so that the > Random object is created once and saved, and each time a new random number is > required invoke a method on the existing Random object to obtain it. > If it is important that the generated Random numbers not be guessable, you > must not create a new Random for each random number; the values are too > easily guessable. You should strongly consider using a > java.security.SecureRandom instead (and avoid allocating a new SecureRandom > for each random number needed). > This bad practice also affects the following > org.apache.nutch.indexer.IndexingJob since first historized release > org.apache.nutch.segment.SegmentReader since first historized release > org.apache.nutch.tools.DmozParser$RDFProcessor since first historized release -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (NUTCH-2850) Method ignores exceptional return value
[ https://issues.apache.org/jira/browse/NUTCH-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-2850. --- > Method ignores exceptional return value > --- > > Key: NUTCH-2850 > URL: https://issues.apache.org/jira/browse/NUTCH-2850 > Project: Nutch > Issue Type: Sub-task > Components: dumpers >Affects Versions: 1.18 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Major > Fix For: 1.19 > > > In class org.apache.nutch.tools.FileDumper > In method org.apache.nutch.tools.FileDumper.dump(File, File, String[], > boolean, boolean, boolean) > Called method java.io.File.mkdirs() > At FileDumper.java:[line 237] > Exceptional return value of java.io.File.mkdirs() ignored in > org.apache.nutch.tools.FileDumper.dump(File, File, String[], boolean, > boolean, boolean) > This method returns a value that is not checked. The return value should be > checked since it can indicate an unusual or unexpected function execution. > For example, the File.delete() method returns false if the file could not be > successfully deleted (rather than throwing an Exception). If you don't check > the result, you won't notice if the method invocation signals unexpected > behavior by returning an atypical return value. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (NUTCH-3024) Remove flaky 'dependency check' target
Lewis John McGibbney created NUTCH-3024: --- Summary: Remove flaky 'dependency check' target Key: NUTCH-3024 URL: https://issues.apache.org/jira/browse/NUTCH-3024 Project: Nutch Issue Type: Task Components: build Affects Versions: 1.19 Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Fix For: 1.20 I [started a thread|https://lists.apache.org/thread/ol3ssjphdqqxwsxhc65qoqg1dj1kjbxb] covering my observations running the ant _*dependency-check*_ target. It fails unpredictably in both GitHub actions and our trusty Jenkins builds on ci-builds.apache.org. I propose to simply remove this target (and associated configuration) in a bid to clean up some flaky legacy build code. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Removing “dependency-check” target from build.xml
Hi dev@, Recently I was doing a bit of work on CI and made an attempt to activate the “dependency-check” target (previously named “report-vulnerabilities”). It appears that the underlying “dependency-check” tooling is flaky at best. It appears to take an awful long time to execute and seems to be prone to hanging. I propose to remove this target and implement something more stable in the future… when I work on finishing the Gradle build. lewismc
[jira] [Created] (NUTCH-3023) Use mikepenz/action-junit-report to improve interpretation of failed tests during CI
Lewis John McGibbney created NUTCH-3023: --- Summary: Use mikepenz/action-junit-report to improve interpretation of failed tests during CI Key: NUTCH-3023 URL: https://issues.apache.org/jira/browse/NUTCH-3023 Project: Nutch Issue Type: Task Components: build, test Affects Versions: 1.19 Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Fix For: 1.20 The following GitHub action could help improve the interpretation of unit test anomalies during a CI run. [https://github.com/mikepenz/action-junit-report] Rather than having to grep through the GitHub Action log, one could save time by interpreting the comments posted to the PR conversation thread. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (NUTCH-3014) Standardize Job names
[ https://issues.apache.org/jira/browse/NUTCH-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-3014. --- Thanks [~snagel] for the review > Standardize Job names > - > > Key: NUTCH-3014 > URL: https://issues.apache.org/jira/browse/NUTCH-3014 > Project: Nutch > Issue Type: Improvement > Components: configuration, runtime >Affects Versions: 1.19 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Minor > Fix For: 1.20 > > > There is a large degree of variability when we set the job name}}{}}} > > {{Job job = NutchJob.getInstance(getConf());}} > {{job.setJobName("read " + segment);}} > > Some examples mention the job name, others don't. Some use upper case, others > don't, etc. > I think we can standardize the NutchJob job names. This would help when > filtering jobs in YARN ResourceManager UI as well. > I propose we implement the following convention > * *Nutch* (mandatory) - static value which prepends the job name, assists > with distinguishing the Job as a NutchJob and making it easily findable. > * *${ClassName}* (mandatory) - literally the name of the Class the job is > encoded in > * *${additional info}* (optional) - value could further distinguish the type > of job (LinkRank Counter, LinkRank Initializer, LinkRank Inverter, etc.) > _{*}Nutch ${ClassName}{*}: *${additional info}*_ > _Examples:_ > * _Nutch LinkRank: Inverter_ > * _Nutch CrawlDb: + $crawldb_ > * _Nutch LinkDbReader: + $linkdb_ > Thanks for any suggestions/comments. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (NUTCH-3014) Standardize Job names
[ https://issues.apache.org/jira/browse/NUTCH-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-3014. - Resolution: Fixed > Standardize Job names > - > > Key: NUTCH-3014 > URL: https://issues.apache.org/jira/browse/NUTCH-3014 > Project: Nutch > Issue Type: Improvement > Components: configuration, runtime >Affects Versions: 1.19 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Minor > Fix For: 1.20 > > > There is a large degree of variability when we set the job name}}{}}} > > {{Job job = NutchJob.getInstance(getConf());}} > {{job.setJobName("read " + segment);}} > > Some examples mention the job name, others don't. Some use upper case, others > don't, etc. > I think we can standardize the NutchJob job names. This would help when > filtering jobs in YARN ResourceManager UI as well. > I propose we implement the following convention > * *Nutch* (mandatory) - static value which prepends the job name, assists > with distinguishing the Job as a NutchJob and making it easily findable. > * *${ClassName}* (mandatory) - literally the name of the Class the job is > encoded in > * *${additional info}* (optional) - value could further distinguish the type > of job (LinkRank Counter, LinkRank Initializer, LinkRank Inverter, etc.) > _{*}Nutch ${ClassName}{*}: *${additional info}*_ > _Examples:_ > * _Nutch LinkRank: Inverter_ > * _Nutch CrawlDb: + $crawldb_ > * _Nutch LinkDbReader: + $linkdb_ > Thanks for any suggestions/comments. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (NUTCH-3022) Experiment formatting codebase per google-java-format
Lewis John McGibbney created NUTCH-3022: --- Summary: Experiment formatting codebase per google-java-format Key: NUTCH-3022 URL: https://issues.apache.org/jira/browse/NUTCH-3022 Project: Nutch Issue Type: Task Components: build Affects Versions: 1.19 Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Fix For: 1.20 I [started a mailing list thread|https://lists.apache.org/thread/ssmm6djyk5syvhmq701zjf0d9bobpk5n] which quizzed whether we should integrate code linting/formatting into the CI. Seb provided some excellent, calculated input which inspired me to create this ticket. I will create a PR which lints the Nutcj codebase per the *google-java-format* and discuss the results. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work stopped] (NUTCH-3014) Standardize Job names
[ https://issues.apache.org/jira/browse/NUTCH-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-3014 stopped by Lewis John McGibbney. --- > Standardize Job names > - > > Key: NUTCH-3014 > URL: https://issues.apache.org/jira/browse/NUTCH-3014 > Project: Nutch > Issue Type: Improvement > Components: configuration, runtime >Affects Versions: 1.19 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Minor > Fix For: 1.20 > > > There is a large degree of variability when we set the job name}}{}}} > > {{Job job = NutchJob.getInstance(getConf());}} > {{job.setJobName("read " + segment);}} > > Some examples mention the job name, others don't. Some use upper case, others > don't, etc. > I think we can standardize the NutchJob job names. This would help when > filtering jobs in YARN ResourceManager UI as well. > I propose we implement the following convention > * *Nutch* (mandatory) - static value which prepends the job name, assists > with distinguishing the Job as a NutchJob and making it easily findable. > * *${ClassName}* (mandatory) - literally the name of the Class the job is > encoded in > * *${additional info}* (optional) - value could further distinguish the type > of job (LinkRank Counter, LinkRank Initializer, LinkRank Inverter, etc.) > _{*}Nutch ${ClassName}{*}: *${additional info}*_ > _Examples:_ > * _Nutch LinkRank: Inverter_ > * _Nutch CrawlDb: + $crawldb_ > * _Nutch LinkDbReader: + $linkdb_ > Thanks for any suggestions/comments. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: Nutch codebase formatting
Thanks Seb. I'll go ahead and try to build in the google Java format via super-linter and see where we get...! lewismc On 2023/10/29 17:04:47 Sebastian Nagel wrote: > Hi Lewis, > > >> whether we need a Nutch custom code style at all… why don’t we just use > >> some other existing style and then enforce it? > > Enforcing: yes! > > However, I would try hard to keep the changes on a reasonable minimum. For > example, if we change the indentation, almost every code line is affected > which > makes > - "git annotate" mostly useless (or more difficult to use because you need > look >back) > - merges of open PRs, custom patches or modifications in custom repositories >might get quite painful, until the formatting is synchronized. > > > >> * google Java format [1] which offers a GitHub action for easy integration > >> into our CI process, or > > +1 > > + available also for Intellij, Eclipse > + indentation stays the same > +/- about 25% of the code lines are changed (might be acceptable) > > > >> * superlinter [3] basically emerging as the industry OSS default, offers a > >> GitHub action and could also be configured to lint dockerfile, and other > >> artifacts. It can also be configured to use the google Java style as well… > > +1 (with Google Java style) > > > > I’ll submit a PR for superlinter so everyone can see what it would look > like. > > Great! Thanks! > > > Best, > Sebastian > > On 10/29/23 00:38, Lewis John McGibbney wrote: > > Any thoughts on this folks. > > I’ll submit a PR for superlinter so everyone can see what it would look > > like. > > lewismc > > > > On 2023/10/23 19:28:45 lewis john mcgibbney wrote: > >> Hi dev@, > >> > >> For the longest time the Nutch codebase has shipped with a > >> eclipse-codeformat.xml [0] file. > >> Whilst this has been largely successful in keeping the codebase uniform, it > >> cannot/has not been integrated into continuous integration (CI) and > >> subsequently not really enforced! > >> > >> Whilst I’m a big fan of “if it ain’t broken don’t fix it”, I think we > >> should have some CI code formatting checks. Additionally I really question > >> whether we need a Nutch custom code style at all… why don’t we just use > >> some other existing style and then enforce it? > >> > >> I therefore propose that we replace the legacy code formatter with a > >> convention such as > >> > >> * google Java format [1] which offers a GitHub action for easy integration > >> into our CI process, or > >> * check style [2] which offers an Ant task which we could use, this is of > >> less utility as we think about the move to grade > >> * superlinter [3] basically emerging as the industry OSS default, offers a > >> GitHub action and could also be configured to lint dockerfile, and other > >> artifacts. It can also be configured to use the google Java style as well… > >> > >> My preference would be [3] because it offers a more comprehensive linting > >> package for the entire codebase not just the Java code. > >> > >> Thanks for your consideration. > >> lewismc > >> > >> [0] > >> https://github.com/apache/nutch/blob/master/eclipse-codeformat.xml > >> [1] > >> https://github.com/google/google-java-format > >> [2] > >> https://checkstyle.sourceforge.io/ > >> [3] > >> https://github.com/marketplace/actions/super-linter > >> >
Re: Nutch codebase formatting
Any thoughts on this folks. I’ll submit a PR for superlinter so everyone can see what it would look like. lewismc On 2023/10/23 19:28:45 lewis john mcgibbney wrote: > Hi dev@, > > For the longest time the Nutch codebase has shipped with a > eclipse-codeformat.xml [0] file. > Whilst this has been largely successful in keeping the codebase uniform, it > cannot/has not been integrated into continuous integration (CI) and > subsequently not really enforced! > > Whilst I’m a big fan of “if it ain’t broken don’t fix it”, I think we > should have some CI code formatting checks. Additionally I really question > whether we need a Nutch custom code style at all… why don’t we just use > some other existing style and then enforce it? > > I therefore propose that we replace the legacy code formatter with a > convention such as > > * google Java format [1] which offers a GitHub action for easy integration > into our CI process, or > * check style [2] which offers an Ant task which we could use, this is of > less utility as we think about the move to grade > * superlinter [3] basically emerging as the industry OSS default, offers a > GitHub action and could also be configured to lint dockerfile, and other > artifacts. It can also be configured to use the google Java style as well… > > My preference would be [3] because it offers a more comprehensive linting > package for the entire codebase not just the Java code. > > Thanks for your consideration. > lewismc > > [0] > https://github.com/apache/nutch/blob/master/eclipse-codeformat.xml > [1] > https://github.com/google/google-java-format > [2] > https://checkstyle.sourceforge.io/ > [3] > https://github.com/marketplace/actions/super-linter >
[jira] [Work stopped] (NUTCH-3015) Add more CI steps to GitHub master-build.yml
[ https://issues.apache.org/jira/browse/NUTCH-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-3015 stopped by Lewis John McGibbney. --- > Add more CI steps to GitHub master-build.yml > > > Key: NUTCH-3015 > URL: https://issues.apache.org/jira/browse/NUTCH-3015 > Project: Nutch > Issue Type: Improvement > Components: build >Affects Versions: 1.19 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Minor > Fix For: 1.20 > > > With specific reference to the GitHub master-build.yml, we currently we run > _*ant clean nightly javadoc -buildfile build.xml*_ as one mammoth task and if > something fails it is unclear as to exactly what. > > There are several improvements I want to propose to the GitHub CI > * run workflows against in multiple Environments/OS e.g. ubuntu, macos & > windows > * define multiple jobs which can run in parallel to speed up CI e.g. javadoc > and nightly targets > * run more targets e.g. linting, rat-sources, report-vulnerabilities, > report-licenses, etc. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (NUTCH-3015) Add more CI steps to GitHub master-build.yml
[ https://issues.apache.org/jira/browse/NUTCH-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-3015. --- > Add more CI steps to GitHub master-build.yml > > > Key: NUTCH-3015 > URL: https://issues.apache.org/jira/browse/NUTCH-3015 > Project: Nutch > Issue Type: Improvement > Components: build >Affects Versions: 1.19 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Minor > Fix For: 1.20 > > > With specific reference to the GitHub master-build.yml, we currently we run > _*ant clean nightly javadoc -buildfile build.xml*_ as one mammoth task and if > something fails it is unclear as to exactly what. > > There are several improvements I want to propose to the GitHub CI > * run workflows against in multiple Environments/OS e.g. ubuntu, macos & > windows > * define multiple jobs which can run in parallel to speed up CI e.g. javadoc > and nightly targets > * run more targets e.g. linting, rat-sources, report-vulnerabilities, > report-licenses, etc. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (NUTCH-3015) Add more CI steps to GitHub master-build.yml
[ https://issues.apache.org/jira/browse/NUTCH-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-3015. - Resolution: Fixed > Add more CI steps to GitHub master-build.yml > > > Key: NUTCH-3015 > URL: https://issues.apache.org/jira/browse/NUTCH-3015 > Project: Nutch > Issue Type: Improvement > Components: build >Affects Versions: 1.19 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Minor > Fix For: 1.20 > > > With specific reference to the GitHub master-build.yml, we currently we run > _*ant clean nightly javadoc -buildfile build.xml*_ as one mammoth task and if > something fails it is unclear as to exactly what. > > There are several improvements I want to propose to the GitHub CI > * run workflows against in multiple Environments/OS e.g. ubuntu, macos & > windows > * define multiple jobs which can run in parallel to speed up CI e.g. javadoc > and nightly targets > * run more targets e.g. linting, rat-sources, report-vulnerabilities, > report-licenses, etc. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (NUTCH-2887) Migrate to JUnit 5 Jupiter
[ https://issues.apache.org/jira/browse/NUTCH-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2887 started by Lewis John McGibbney. --- > Migrate to JUnit 5 Jupiter > -- > > Key: NUTCH-2887 > URL: https://issues.apache.org/jira/browse/NUTCH-2887 > Project: Nutch > Issue Type: Improvement > Components: test > Environment: Migrate > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Major > Fix For: 1.20 > > > This effort is a bit of a beast. See the [JUnit migration > tips|https://junit.org/junit5/docs/current/user-guide/#migrating-from-junit4-tips] > for general guidance. A general grep for junit in src produces the following > {code:bash} > ./test/nutch-site.xml > ./test/org/apache/nutch/tools/TestCommonCrawlDataDumper.java > ./test/org/apache/nutch/net/TestURLNormalizers.java > ./test/org/apache/nutch/net/protocols/TestHttpDateFormat.java > ./test/org/apache/nutch/net/TestURLFilters.java > ./test/org/apache/nutch/util/TestStringUtil.java > ./test/org/apache/nutch/util/TestSuffixStringMatcher.java > ./test/org/apache/nutch/util/TestEncodingDetector.java > ./test/org/apache/nutch/util/TestMimeUtil.java > ./test/org/apache/nutch/util/TestPrefixStringMatcher.java > ./test/org/apache/nutch/util/DumpFileUtilTest.java > ./test/org/apache/nutch/util/TestNodeWalker.java > ./test/org/apache/nutch/util/WritableTestUtils.java > ./test/org/apache/nutch/util/TestTableUtil.java > ./test/org/apache/nutch/util/TestURLUtil.java > ./test/org/apache/nutch/util/TestGZIPUtils.java > ./test/org/apache/nutch/parse/TestParseText.java > ./test/org/apache/nutch/parse/TestOutlinks.java > ./test/org/apache/nutch/parse/TestParseData.java > ./test/org/apache/nutch/parse/TestOutlinkExtractor.java > ./test/org/apache/nutch/parse/TestParserFactory.java > ./test/org/apache/nutch/segment/TestSegmentMerger.java > ./test/org/apache/nutch/segment/TestSegmentMergerCrawlDatums.java > ./test/org/apache/nutch/plugin/TestPluginSystem.java > ./test/org/apache/nutch/fetcher/TestFetcher.java > ./test/org/apache/nutch/protocol/TestProtocolFactory.java > ./test/org/apache/nutch/protocol/TestContent.java > ./test/org/apache/nutch/protocol/AbstractHttpProtocolPluginTest.java > ./test/org/apache/nutch/crawl/TestCrawlDbFilter.java > ./test/org/apache/nutch/crawl/TestTextProfileSignature.java > ./test/org/apache/nutch/crawl/TestCrawlDbStates.java > ./test/org/apache/nutch/crawl/TestGenerator.java > ./test/org/apache/nutch/crawl/TestAdaptiveFetchSchedule.java > ./test/org/apache/nutch/crawl/TODOTestCrawlDbStates.java > ./test/org/apache/nutch/crawl/TestSignatureFactory.java > ./test/org/apache/nutch/crawl/ContinuousCrawlTestUtil.java > ./test/org/apache/nutch/crawl/TestInjector.java > ./test/org/apache/nutch/crawl/TestLinkDbMerger.java > ./test/org/apache/nutch/crawl/TestCrawlDbMerger.java > ./test/org/apache/nutch/service/TestNutchServer.java > ./test/org/apache/nutch/metadata/TestMetadata.java > ./test/org/apache/nutch/metadata/TestSpellCheckedMetadata.java > ./test/org/apache/nutch/indexer/TestIndexingFilters.java > ./test/org/apache/nutch/indexer/TestIndexerMapReduce.java > ./bin/nutch > ./plugin/scoring-orphan/src/test/org/apache/nutch/scoring/orphan/TestOrphanScoringFilter.java > ./plugin/index-basic/src/test/org/apache/nutch/indexer/basic/TestBasicIndexingFilter.java > ./plugin/urlfilter-domaindenylist/build.xml > ./plugin/urlfilter-domaindenylist/src/test/org/apache/nutch/urlfilter/domaindenylist/TestDomainDenylistURLFilter.java > ./plugin/protocol-imaps/plugin.xml > ./plugin/protocol-imaps/ivy.xml > ./plugin/protocol-imaps/lib/junit-4.13.jar > ./plugin/protocol-imaps/lib/greenmail-junit4-1.6.0.jar > ./plugin/protocol-imaps/lib/greenmail-1.6.0.jar > ./plugin/protocol-imaps/src/test/org/apache/nutch/protocol/imaps/TestImaps.java > ./plugin/protocol-file/build.xml > ./plugin/protocol-file/src/test/org/apache/nutch/protocol/file/TestProtocolFile.java > ./plugin/urlnormalizer-regex/build.xml > ./plugin/urlnormalizer-regex/src/test/org/apache/nutch/net/urlnormalizer/regex/TestRegexURLNormalizer.java > ./plugin/build-plugin.xml > ./plugin/creativecommons/src/test/org/creativecommons/nutch/TestCCParseFilter.java > ./plugin/urlnormalizer-basic/src/test/org/apache/nutch/net/urlnormalizer/basic/TestBasicURLNormalizer.java > ./plugin/urlnormalizer-protocol/build.xml > ./plugin/urlnormalizer-protocol/src/test/org/apache/nutch/net/urlnormalizer/protocol/TestProtocolURLNormalizer.java > ./plugin/urlfilter-prefix/src/test/org/apache/nutch/urlfilter/prefi
[jira] [Created] (NUTCH-3016) Upgrade Apache Ivy to 2.5.2
Lewis John McGibbney created NUTCH-3016: --- Summary: Upgrade Apache Ivy to 2.5.2 Key: NUTCH-3016 URL: https://issues.apache.org/jira/browse/NUTCH-3016 Project: Nutch Issue Type: Task Components: ivy, build Affects Versions: 1.19 Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Fix For: 1.20 [Apache Ivy v2.5.2|https://ant.apache.org/ivy/history/2.5.2/release-notes.html] was released on August 20 2023! We should upgrade. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (NUTCH-2887) Migrate to JUnit 5 Jupiter
[ https://issues.apache.org/jira/browse/NUTCH-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2887: --- Assignee: Lewis John McGibbney > Migrate to JUnit 5 Jupiter > -- > > Key: NUTCH-2887 > URL: https://issues.apache.org/jira/browse/NUTCH-2887 > Project: Nutch > Issue Type: Improvement > Components: test > Environment: Migrate > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Major > Fix For: 1.20 > > > This effort is a bit of a beast. See the [JUnit migration > tips|https://junit.org/junit5/docs/current/user-guide/#migrating-from-junit4-tips] > for general guidance. A general grep for junit in src produces the following > {code:bash} > ./test/nutch-site.xml > ./test/org/apache/nutch/tools/TestCommonCrawlDataDumper.java > ./test/org/apache/nutch/net/TestURLNormalizers.java > ./test/org/apache/nutch/net/protocols/TestHttpDateFormat.java > ./test/org/apache/nutch/net/TestURLFilters.java > ./test/org/apache/nutch/util/TestStringUtil.java > ./test/org/apache/nutch/util/TestSuffixStringMatcher.java > ./test/org/apache/nutch/util/TestEncodingDetector.java > ./test/org/apache/nutch/util/TestMimeUtil.java > ./test/org/apache/nutch/util/TestPrefixStringMatcher.java > ./test/org/apache/nutch/util/DumpFileUtilTest.java > ./test/org/apache/nutch/util/TestNodeWalker.java > ./test/org/apache/nutch/util/WritableTestUtils.java > ./test/org/apache/nutch/util/TestTableUtil.java > ./test/org/apache/nutch/util/TestURLUtil.java > ./test/org/apache/nutch/util/TestGZIPUtils.java > ./test/org/apache/nutch/parse/TestParseText.java > ./test/org/apache/nutch/parse/TestOutlinks.java > ./test/org/apache/nutch/parse/TestParseData.java > ./test/org/apache/nutch/parse/TestOutlinkExtractor.java > ./test/org/apache/nutch/parse/TestParserFactory.java > ./test/org/apache/nutch/segment/TestSegmentMerger.java > ./test/org/apache/nutch/segment/TestSegmentMergerCrawlDatums.java > ./test/org/apache/nutch/plugin/TestPluginSystem.java > ./test/org/apache/nutch/fetcher/TestFetcher.java > ./test/org/apache/nutch/protocol/TestProtocolFactory.java > ./test/org/apache/nutch/protocol/TestContent.java > ./test/org/apache/nutch/protocol/AbstractHttpProtocolPluginTest.java > ./test/org/apache/nutch/crawl/TestCrawlDbFilter.java > ./test/org/apache/nutch/crawl/TestTextProfileSignature.java > ./test/org/apache/nutch/crawl/TestCrawlDbStates.java > ./test/org/apache/nutch/crawl/TestGenerator.java > ./test/org/apache/nutch/crawl/TestAdaptiveFetchSchedule.java > ./test/org/apache/nutch/crawl/TODOTestCrawlDbStates.java > ./test/org/apache/nutch/crawl/TestSignatureFactory.java > ./test/org/apache/nutch/crawl/ContinuousCrawlTestUtil.java > ./test/org/apache/nutch/crawl/TestInjector.java > ./test/org/apache/nutch/crawl/TestLinkDbMerger.java > ./test/org/apache/nutch/crawl/TestCrawlDbMerger.java > ./test/org/apache/nutch/service/TestNutchServer.java > ./test/org/apache/nutch/metadata/TestMetadata.java > ./test/org/apache/nutch/metadata/TestSpellCheckedMetadata.java > ./test/org/apache/nutch/indexer/TestIndexingFilters.java > ./test/org/apache/nutch/indexer/TestIndexerMapReduce.java > ./bin/nutch > ./plugin/scoring-orphan/src/test/org/apache/nutch/scoring/orphan/TestOrphanScoringFilter.java > ./plugin/index-basic/src/test/org/apache/nutch/indexer/basic/TestBasicIndexingFilter.java > ./plugin/urlfilter-domaindenylist/build.xml > ./plugin/urlfilter-domaindenylist/src/test/org/apache/nutch/urlfilter/domaindenylist/TestDomainDenylistURLFilter.java > ./plugin/protocol-imaps/plugin.xml > ./plugin/protocol-imaps/ivy.xml > ./plugin/protocol-imaps/lib/junit-4.13.jar > ./plugin/protocol-imaps/lib/greenmail-junit4-1.6.0.jar > ./plugin/protocol-imaps/lib/greenmail-1.6.0.jar > ./plugin/protocol-imaps/src/test/org/apache/nutch/protocol/imaps/TestImaps.java > ./plugin/protocol-file/build.xml > ./plugin/protocol-file/src/test/org/apache/nutch/protocol/file/TestProtocolFile.java > ./plugin/urlnormalizer-regex/build.xml > ./plugin/urlnormalizer-regex/src/test/org/apache/nutch/net/urlnormalizer/regex/TestRegexURLNormalizer.java > ./plugin/build-plugin.xml > ./plugin/creativecommons/src/test/org/creativecommons/nutch/TestCCParseFilter.java > ./plugin/urlnormalizer-basic/src/test/org/apache/nutch/net/urlnormalizer/basic/TestBasicURLNormalizer.java > ./plugin/urlnormalizer-protocol/build.xml > ./plugin/urlnormalizer-protocol/src/test/org/apache/nutch/net/urlnormalizer/protocol/TestProtocolURLNormalizer.java > ./plugin/urlfilter-prefix/src/test/org/apache/n
[jira] [Work started] (NUTCH-3015) Add more CI steps to GitHub master-build.yml
[ https://issues.apache.org/jira/browse/NUTCH-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-3015 started by Lewis John McGibbney. --- > Add more CI steps to GitHub master-build.yml > > > Key: NUTCH-3015 > URL: https://issues.apache.org/jira/browse/NUTCH-3015 > Project: Nutch > Issue Type: Improvement > Components: build >Affects Versions: 1.19 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Minor > Fix For: 1.20 > > > With specific reference to the GitHub master-build.yml, we currently we run > _*ant clean nightly javadoc -buildfile build.xml*_ as one mammoth task and if > something fails it is unclear as to exactly what. > > There are several improvements I want to propose to the GitHub CI > * run workflows against in multiple Environments/OS e.g. ubuntu, macos & > windows > * define multiple jobs which can run in parallel to speed up CI e.g. javadoc > and nightly targets > * run more targets e.g. linting, rat-sources, report-vulnerabilities, > report-licenses, etc. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (NUTCH-3014) Standardize Job names
[ https://issues.apache.org/jira/browse/NUTCH-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-3014 started by Lewis John McGibbney. --- > Standardize Job names > - > > Key: NUTCH-3014 > URL: https://issues.apache.org/jira/browse/NUTCH-3014 > Project: Nutch > Issue Type: Improvement > Components: configuration, runtime >Affects Versions: 1.19 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Minor > Fix For: 1.20 > > > There is a large degree of variability when we set the job name}}{}}} > > {{Job job = NutchJob.getInstance(getConf());}} > {{job.setJobName("read " + segment);}} > > Some examples mention the job name, others don't. Some use upper case, others > don't, etc. > I think we can standardize the NutchJob job names. This would help when > filtering jobs in YARN ResourceManager UI as well. > I propose we implement the following convention > * *Nutch* (mandatory) - static value which prepends the job name, assists > with distinguishing the Job as a NutchJob and making it easily findable. > * *${ClassName}* (mandatory) - literally the name of the Class the job is > encoded in > * *${additional info}* (optional) - value could further distinguish the type > of job (LinkRank Counter, LinkRank Initializer, LinkRank Inverter, etc.) > _{*}Nutch ${ClassName}{*}: *${additional info}*_ > _Examples:_ > * _Nutch LinkRank: Inverter_ > * _Nutch CrawlDb: + $crawldb_ > * _Nutch LinkDbReader: + $linkdb_ > Thanks for any suggestions/comments. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Nutch codebase formatting
Hi dev@, For the longest time the Nutch codebase has shipped with a eclipse-codeformat.xml [0] file. Whilst this has been largely successful in keeping the codebase uniform, it cannot/has not been integrated into continuous integration (CI) and subsequently not really enforced! Whilst I’m a big fan of “if it ain’t broken don’t fix it”, I think we should have some CI code formatting checks. Additionally I really question whether we need a Nutch custom code style at all… why don’t we just use some other existing style and then enforce it? I therefore propose that we replace the legacy code formatter with a convention such as * google Java format [1] which offers a GitHub action for easy integration into our CI process, or * check style [2] which offers an Ant task which we could use, this is of less utility as we think about the move to grade * superlinter [3] basically emerging as the industry OSS default, offers a GitHub action and could also be configured to lint dockerfile, and other artifacts. It can also be configured to use the google Java style as well… My preference would be [3] because it offers a more comprehensive linting package for the entire codebase not just the Java code. Thanks for your consideration. lewismc [0] https://github.com/apache/nutch/blob/master/eclipse-codeformat.xml [1] https://github.com/google/google-java-format [2] https://checkstyle.sourceforge.io/ [3] https://github.com/marketplace/actions/super-linter
[jira] [Created] (NUTCH-3015) Add more CI steps to GitHub master-build.yml
Lewis John McGibbney created NUTCH-3015: --- Summary: Add more CI steps to GitHub master-build.yml Key: NUTCH-3015 URL: https://issues.apache.org/jira/browse/NUTCH-3015 Project: Nutch Issue Type: Improvement Components: build Affects Versions: 1.19 Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Fix For: 1.20 With specific reference to the GitHub master-build.yml, we currently we run _*ant clean nightly javadoc -buildfile build.xml*_ as one mammoth task and if something fails it is unclear as to exactly what. There are several improvements I want to propose to the GitHub CI * run workflows against in multiple Environments/OS e.g. ubuntu, macos & windows * define multiple jobs which can run in parallel to speed up CI e.g. javadoc and nightly targets * run more targets e.g. linting, rat-sources, report-vulnerabilities, report-licenses, etc. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (NUTCH-3014) Standardize Job names
[ https://issues.apache.org/jira/browse/NUTCH-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-3014: Description: There is a large degree of variability when we set the job name}}{}}} {{Job job = NutchJob.getInstance(getConf());}} {{job.setJobName("read " + segment);}} Some examples mention the job name, others don't. Some use upper case, others don't, etc. I think we can standardize the NutchJob job names. This would help when filtering jobs in YARN ResourceManager UI as well. I propose we implement the following convention * *Nutch* (mandatory) - static value which prepends the job name, assists with distinguishing the Job as a NutchJob and making it easily findable. * *${ClassName}* (mandatory) - literally the name of the Class the job is encoded in * *${additional info}* (optional) - value could further distinguish the type of job (LinkRank Counter, LinkRank Initializer, LinkRank Inverter, etc.) _{*}Nutch ${ClassName}{*}: *${additional info}*_ _Examples:_ * _Nutch LinkRank: Inverter_ * _Nutch CrawlDb: + $crawldb_ * _Nutch LinkDbReader: + $linkdb_ Thanks for any suggestions/comments. was: There is a large degree of variability when we set the job name}}{}}} {{Job job = NutchJob.getInstance(getConf());}} {{job.setJobName("read " + segment);}} Some examples mention the job name, others don't. Some use upper case, others don't, etc. I think we can standardize the NutchJob job names. This would help when filtering jobs in YARN ResourceManager UI as well. I propose we implement the following convention * *Nutch* (mandatory) - static value which prepends the job name, assists with distinguishing the Job as a NutchJob and making it easily findable. * *${ClassName}* (mandatory) - literally the name of the Class the job is encoded in * *${additional info}* (optional) - value could further distinguish the type of job (LinkRank Counter, LinkRank Initializer, LinkRank Inverter, etc.) _{*}Nutch ${ClassName}{*}: *${additional info}*_ _Examples:_ * _Nutch LinkRank Inverter_ * _Nutch CrawlDb + $crawldb_ * _Nutch LinkDbReader + $linkdb_ Thanks for any suggestions/comments. > Standardize Job names > - > > Key: NUTCH-3014 > URL: https://issues.apache.org/jira/browse/NUTCH-3014 > Project: Nutch > Issue Type: Improvement > Components: configuration, runtime > Affects Versions: 1.19 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Minor > Fix For: 1.20 > > > There is a large degree of variability when we set the job name}}{}}} > > {{Job job = NutchJob.getInstance(getConf());}} > {{job.setJobName("read " + segment);}} > > Some examples mention the job name, others don't. Some use upper case, others > don't, etc. > I think we can standardize the NutchJob job names. This would help when > filtering jobs in YARN ResourceManager UI as well. > I propose we implement the following convention > * *Nutch* (mandatory) - static value which prepends the job name, assists > with distinguishing the Job as a NutchJob and making it easily findable. > * *${ClassName}* (mandatory) - literally the name of the Class the job is > encoded in > * *${additional info}* (optional) - value could further distinguish the type > of job (LinkRank Counter, LinkRank Initializer, LinkRank Inverter, etc.) > _{*}Nutch ${ClassName}{*}: *${additional info}*_ > _Examples:_ > * _Nutch LinkRank: Inverter_ > * _Nutch CrawlDb: + $crawldb_ > * _Nutch LinkDbReader: + $linkdb_ > Thanks for any suggestions/comments. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (NUTCH-3014) Standardize Job names
[ https://issues.apache.org/jira/browse/NUTCH-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-3014: Description: There is a large degree of variability when we set the job name}}{}}} {{Job job = NutchJob.getInstance(getConf());}} {{job.setJobName("read " + segment);}} Some examples mention the job name, others don't. Some use upper case, others don't, etc. I think we can standardize the NutchJob job names. This would help when filtering jobs in YARN ResourceManager UI as well. I propose we implement the following convention * *Nutch* (mandatory) - static value which prepends the job name, assists with distinguishing the Job as a NutchJob and making it easily findable. * *${ClassName}* (mandatory) - literally the name of the Class the job is encoded in * *${additional info}* (optional) - value could further distinguish the type of job (LinkRank Counter, LinkRank Initializer, LinkRank Inverter, etc.) _{*}Nutch ${ClassName}{*}: *${additional info}*_ _Examples:_ * _Nutch LinkRank Inverter_ * _Nutch CrawlDb + $crawldb_ * _Nutch LinkDbReader + $linkdb_ Thanks for any suggestions/comments. was: There is a large degree of variability when we set the job name{{{}{}}} {{Job job = NutchJob.getInstance(getConf());}} {{job.setJobName("read " + segment);}} Some examples mention the job name, others don't. Some use upper case, others don't, etc. I think we can standardize the NutchJob job names. This would help when filtering jobs in YARN ResourceManager UI as well. I propose we implement the following convention * *Nutch* (mandatory) - static value which prepends the job name, assists with distinguishing the Job as a NutchJob and making it easily findable. * *${ClassName}* (mandatory) - literally the name of the Class the job is encoded in * *${additional info}* (optional) - value could further distinguish the type of job (LinkRank Counter, LinkRank Initializer, LinkRank Inverter, etc.) _*Nutch ${ClassName}* *${additional info}*_ _Examples:_ * _Nutch LinkRank Inverter_ * _Nutch CrawlDb + $crawldb_ * _Nutch LinkDbReader + $linkdb_ Thanks for any suggestions/comments. > Standardize Job names > - > > Key: NUTCH-3014 > URL: https://issues.apache.org/jira/browse/NUTCH-3014 > Project: Nutch > Issue Type: Improvement > Components: configuration, runtime > Affects Versions: 1.19 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Minor > Fix For: 1.20 > > > There is a large degree of variability when we set the job name}}{}}} > > {{Job job = NutchJob.getInstance(getConf());}} > {{job.setJobName("read " + segment);}} > > Some examples mention the job name, others don't. Some use upper case, others > don't, etc. > I think we can standardize the NutchJob job names. This would help when > filtering jobs in YARN ResourceManager UI as well. > I propose we implement the following convention > * *Nutch* (mandatory) - static value which prepends the job name, assists > with distinguishing the Job as a NutchJob and making it easily findable. > * *${ClassName}* (mandatory) - literally the name of the Class the job is > encoded in > * *${additional info}* (optional) - value could further distinguish the type > of job (LinkRank Counter, LinkRank Initializer, LinkRank Inverter, etc.) > _{*}Nutch ${ClassName}{*}: *${additional info}*_ > _Examples:_ > * _Nutch LinkRank Inverter_ > * _Nutch CrawlDb + $crawldb_ > * _Nutch LinkDbReader + $linkdb_ > Thanks for any suggestions/comments. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (NUTCH-3014) Standardize Job names
[ https://issues.apache.org/jira/browse/NUTCH-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-3014: Summary: Standardize Job names (was: Standardize NutchJob job names) > Standardize Job names > - > > Key: NUTCH-3014 > URL: https://issues.apache.org/jira/browse/NUTCH-3014 > Project: Nutch > Issue Type: Improvement > Components: configuration, runtime >Affects Versions: 1.19 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Minor > Fix For: 1.20 > > > There is a large degree of variability when we set the job name{{{}{}}} > > {{Job job = NutchJob.getInstance(getConf());}} > {{job.setJobName("read " + segment);}} > > Some examples mention the job name, others don't. Some use upper case, others > don't, etc. > I think we can standardize the NutchJob job names. This would help when > filtering jobs in YARN ResourceManager UI as well. > I propose we implement the following convention > * *Nutch* (mandatory) - static value which prepends the job name, assists > with distinguishing the Job as a NutchJob and making it easily findable. > * *${ClassName}* (mandatory) - literally the name of the Class the job is > encoded in > * *${additional info}* (optional) - value could further distinguish the type > of job (LinkRank Counter, LinkRank Initializer, LinkRank Inverter, etc.) > _*Nutch ${ClassName}* *${additional info}*_ > _Examples:_ > * _Nutch LinkRank Inverter_ > * _Nutch CrawlDb + $crawldb_ > * _Nutch LinkDbReader + $linkdb_ > Thanks for any suggestions/comments. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (NUTCH-3013) Employ commons-lang3's StopWatch to simplify timing logic
[ https://issues.apache.org/jira/browse/NUTCH-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-3013. - Resolution: Fixed Thanks for the review [~snagel] > Employ commons-lang3's StopWatch to simplify timing logic > - > > Key: NUTCH-3013 > URL: https://issues.apache.org/jira/browse/NUTCH-3013 > Project: Nutch > Issue Type: Improvement > Components: logging, runtime, util >Affects Versions: 1.19 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Minor > Labels: timing > Fix For: 1.20 > > > I ended up running some experiments integrating Nutch and [Celeborn > (Incubating)|https://celeborn.apache.org/] and it got me thinking about > runtime timings. After some investigation I came across [common-lang3's > StopWatch > Class|https://commons.apache.org/proper/commons-lang/javadocs/api-release/index.html?org/apache/commons/lang3/time/StopWatch.html] > which provides a convenient API for timings. > Seeing as we already declare the commons-lang3 dependency, I think StopWatch > could help us clean up some timing logic in Nutch. Specifically, it would > reduce redundancy in terms of duplicated code and logic. It would also open > the door to introduce timing _*splits*_ if anyone is so inclined to dig > deeper into runtime timings. > A cursory search for *_"long start = System.currentTimeMillis();"_* returns > hits for 32 files so it's fair to say that timing already affects lots of > aspects of the Nutch execution workflow. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (NUTCH-3013) Employ commons-lang3's StopWatch to simplify timing logic
[ https://issues.apache.org/jira/browse/NUTCH-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-3013. --- > Employ commons-lang3's StopWatch to simplify timing logic > - > > Key: NUTCH-3013 > URL: https://issues.apache.org/jira/browse/NUTCH-3013 > Project: Nutch > Issue Type: Improvement > Components: logging, runtime, util >Affects Versions: 1.19 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Minor > Labels: timing > Fix For: 1.20 > > > I ended up running some experiments integrating Nutch and [Celeborn > (Incubating)|https://celeborn.apache.org/] and it got me thinking about > runtime timings. After some investigation I came across [common-lang3's > StopWatch > Class|https://commons.apache.org/proper/commons-lang/javadocs/api-release/index.html?org/apache/commons/lang3/time/StopWatch.html] > which provides a convenient API for timings. > Seeing as we already declare the commons-lang3 dependency, I think StopWatch > could help us clean up some timing logic in Nutch. Specifically, it would > reduce redundancy in terms of duplicated code and logic. It would also open > the door to introduce timing _*splits*_ if anyone is so inclined to dig > deeper into runtime timings. > A cursory search for *_"long start = System.currentTimeMillis();"_* returns > hits for 32 files so it's fair to say that timing already affects lots of > aspects of the Nutch execution workflow. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: Roll-Call for Apache Flagon
I’m here. lewismc On Sat, Oct 21, 2023 at 08:28 Christofer Dutz wrote: > Hi all, > > > > I was tasked at the last board report to pursue a roll call for Apache > Flagon after we saw that a VOTE thread has currently been open for over 2 > weeks with only one vote (which was “-0”). > > Also seeing that only 2 people have done any commits in the last few > months feels rather strange for a project that has been a TLP for only 7 > months now. > > > > > Please reply to this thread if you’re still willing and able to contribute > to this project. > > > > Thanks, > > Chris >
[jira] [Created] (NUTCH-3014) Standardize NutchJob job names
Lewis John McGibbney created NUTCH-3014: --- Summary: Standardize NutchJob job names Key: NUTCH-3014 URL: https://issues.apache.org/jira/browse/NUTCH-3014 Project: Nutch Issue Type: Improvement Components: configuration, runtime Affects Versions: 1.19 Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Fix For: 1.20 There is a large degree of variability when we set the job name{{{}{}}} {{Job job = NutchJob.getInstance(getConf());}} {{job.setJobName("read " + segment);}} Some examples mention the job name, others don't. Some use upper case, others don't, etc. I think we can standardize the NutchJob job names. This would help when filtering jobs in YARN ResourceManager UI as well. I propose we implement the following convention * *Nutch* (mandatory) - static value which prepends the job name, assists with distinguishing the Job as a NutchJob and making it easily findable. * *${ClassName}* (mandatory) - literally the name of the Class the job is encoded in * *${additional info}* (optional) - value could further distinguish the type of job (LinkRank Counter, LinkRank Initializer, LinkRank Inverter, etc.) _*Nutch ${ClassName}* *${additional info}*_ _Examples:_ * _Nutch LinkRank Inverter_ * _Nutch CrawlDb + $crawldb_ * _Nutch LinkDbReader + $linkdb_ Thanks for any suggestions/comments. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (NUTCH-3013) Employ commons-lang3's StopWatch to simplify timing logic
[ https://issues.apache.org/jira/browse/NUTCH-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-3013 started by Lewis John McGibbney. --- > Employ commons-lang3's StopWatch to simplify timing logic > - > > Key: NUTCH-3013 > URL: https://issues.apache.org/jira/browse/NUTCH-3013 > Project: Nutch > Issue Type: Improvement > Components: logging, runtime, util >Affects Versions: 1.19 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Minor > Labels: timing > Fix For: 1.20 > > > I ended up running some experiments integrating Nutch and [Celeborn > (Incubating)|https://celeborn.apache.org/] and it got me thinking about > runtime timings. After some investigation I came across [common-lang3's > StopWatch > Class|https://commons.apache.org/proper/commons-lang/javadocs/api-release/index.html?org/apache/commons/lang3/time/StopWatch.html] > which provides a convenient API for timings. > Seeing as we already declare the commons-lang3 dependency, I think StopWatch > could help us clean up some timing logic in Nutch. Specifically, it would > reduce redundancy in terms of duplicated code and logic. It would also open > the door to introduce timing _*splits*_ if anyone is so inclined to dig > deeper into runtime timings. > A cursory search for *_"long start = System.currentTimeMillis();"_* returns > hits for 32 files so it's fair to say that timing already affects lots of > aspects of the Nutch execution workflow. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (NUTCH-3013) Employ commons-lang3's StopWatch to simplify timing logic
Lewis John McGibbney created NUTCH-3013: --- Summary: Employ commons-lang3's StopWatch to simplify timing logic Key: NUTCH-3013 URL: https://issues.apache.org/jira/browse/NUTCH-3013 Project: Nutch Issue Type: Improvement Components: logging, runtime, util Affects Versions: 1.19 Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Fix For: 1.20 I ended up running some experiments integrating Nutch and [Celeborn (Incubating)|https://celeborn.apache.org/] and it got me thinking about runtime timings. After some investigation I came across [common-lang3's StopWatch Class|https://commons.apache.org/proper/commons-lang/javadocs/api-release/index.html?org/apache/commons/lang3/time/StopWatch.html] which provides a convenient API for timings. Seeing as we already declare the commons-lang3 dependency, I think StopWatch could help us clean up some timing logic in Nutch. Specifically, it would reduce redundancy in terms of duplicated code and logic. It would also open the door to introduce timing _*splits*_ if anyone is so inclined to dig deeper into runtime timings. A cursory search for *_"long start = System.currentTimeMillis();"_* returns hits for 32 files so it's fair to say that timing already affects lots of aspects of the Nutch execution workflow. -- This message was sent by Atlassian Jira (v8.20.10#820010)
No appenders could be found for logger (org.apache.celeborn.mapreduce.v2.app.MRAppMasterWithCeleborn)
Hi user@, I am making progress in my experiments integrating Nutch 1.20-SNAPSHOT, Hadoop 3.3.4 and Celeborn 0.4.0-SNAPSHOT-incubating! In both the Hadoop work count example and with all of the Nutch MapReduce jobs I run, I see the following output present in the YARN container stderr log output log4j:WARN No appenders could be found for logger (org.apache.celeborn.mapreduce.v2.app.MRAppMasterWithCeleborn). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Looking into the Celeborn source [0] I see that Celeborn uses Slf4j over Log4j2 but I am not sure how that plays with the above Hadoop distribution. I think some further configuration is required... lewismc [0] https://github.com/apache/incubator-celeborn/blob/a5dfd67d5b9bcb7d5da59f441ed1d60b4bc27cd3/client-mr/mr/src/main/java/org/apache/celeborn/mapreduce/v2/app/MRAppMasterWithCeleborn.java#L50 -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc
Re: java.lang.NumberFormatException: null when running Hadoop Mapreduce Wordcount example
Hi Ethan, Thanks for the advice! As I am in an experimental phase, I decided to try again in pseudo-distributed mode... I tried downgrading to Hadoop 3.2.1 (OpenJDK8) but apparently that Hadoop distribution doesn't run on Apple M1 chip! I therefore tried again on Hadoop 3.3.4 and was successfully able to get the Hadoop MapReduce word count example running. I needed to configure and start YARN (which I did not do previously). I will create a pull request such that this is reflected in the Celeborn documentation. I am running into other issues which I will create a new thread for. Thank you lewismc On 2023/10/18 05:05:12 Ethan Feng wrote: > Hi Lewis, > > Sorry to hear that you're having trouble running the wordcount example > with Celeborn. > > Based on the information you shared, I would suggest you run MapReduce > with Celeborn on a Hadoop cluster instead of pseudo-disturb mode. > Celeborn client in MapReduce needs to write a config file into the > Hadoop file system. > > If that doesn't resolve the issue, please let me know and I'll be > happy to help you troubleshoot further. > > Celeborn has a Slack workspace if you are convenient to join. ( > https://join.slack.com/t/apachecelebor-kw08030/shared_invite/zt-1ju3hd5j8-4Z5keMdzpcVMspe4UJzF4Q > ) > > Best regards, > Ethan Feng >
Re: [NEW FEATURE AVAILABLE] Celeborn support MapReduce engine.
Excellent. Thanks for the heads up :) lewismc On 2023/10/18 03:44:54 Ethan Feng wrote: > Hi Lewis, > > Thanks for reaching out. > > I can confirm that future Celeborn releases will include the "mr" > client jars since Celeborn 0.4.0 and it will start the release process > in a short period. > > If you have any further questions or concerns about using MapReduce > with Celeborn, please don't hesitate to let me know. > > Best regards, > Ethan Feng
java.lang.NumberFormatException: null when running Hadoop Mapreduce Wordcount example
Hi user@, I cloned Celeborn (0.4.0-Incubating) 69defcad7f9423c9c24d2d22ead856b4225671c6 today and built it with the -Pmr profile. openjdk version "11.0.20.1" 2023-08-24 OpenJDK Runtime Environment Homebrew (build 11.0.20.1+0) OpenJDK 64-Bit Server VM Homebrew (build 11.0.20.1+0, mixed mode) Apache Hadoop 3.3.4 running in pseudo-distrib mode. I made an attempt to start MapReduce with Celeborn as documented at https://celeborn.apache.org/docs/latest/#start-mapreduce-with-celeborn Everything goes good until it fails when I attempt to run the wordcount example. The exceptions and stack traces are available at https://paste.apache.org/3vvy6. Is it likely that this has to do with the Hadoop version or is this a known issue? Thanks in advance for any help. lewismc -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc
Re: [NEW FEATURE AVAILABLE] Celeborn support MapReduce engine.
Hi Ethan, I'm just picking up Celeborn now and plan on running some experiments with the Apache Nutch (https://nutch.apache.org) project. I downloaded Celeborn 0.3.1-incubating (2023-10-13) from the downloads page and noticed that no Celeborn client jars for MapReduce exist at $CELEBORN_HOME/mr/*.jar as suggested within the documentation at https://celeborn.apache.org/docs/latest/#add-celeborn-client-jar-to-mapreduces-classpath I'm cloned the source (69defca) right now and built with ./build/make-distribution.sh -Pmr, I now see the 'mr' directory which I can use... Out of curiosity will future Celeborn releases include the "mr" client jars? Thank you lewismc On 2023/09/14 11:43:25 Ethan Feng wrote: > Hello developers and users, > I am glad to announce that Celeborn supports the MapReduce engine > now. Both Hadoop 2 and 3 are supported. If you are interested, you > can just try it and feedback on anything you want. > > > The quick start guide can be found here [ > https://celeborn.apache.org/docs/latest/ ]. > The design doc can be found here [ > https://docs.google.com/document/d/1g4irlBucIAFNI42cFSuOVWYqOWSvuqpw_VBHmyyv8zo/edit?usp=sharing > ]. > > Thanks, > Ethan. >
Establishing a Nutch development roadmap
Hi dev@, I've been at arms length for a while as $dayjob changed and then changed again over the last number of years. With that being said, I wanted to start a thread on $title with the goal of establishing some "big items" we could put on the roadmap and maybe even publish... Here are some of the thing's I've been thinking about (unordered) * NUTCH-2940 Develop Gradle Core Build for Apache Nutch * Metrics system integration cf. https://github.com/apache/nutch/pull/712 * Upgrading Javac version > 11 * Trade study to consider integrating (something like) Plugin Framework for Java (PF4J) into Nutch * porting Nutch to run on Apache Beam https://beam.apache.org/ Does anyone else have candidates they wish to add? Thanks for your consideration. lewismc -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc
Re: [DISCUSS] Removing Any23 from Nutch?
+1 Tim. On Wed, Sep 13, 2023 at 16:50 > > > > -- Forwarded message -- > From: Tim Allison > To: user@nutch.apache.org, d...@nutch.apache.org > Cc: > Bcc: > Date: Wed, 13 Sep 2023 10:50:08 -0400 > Subject: [DISCUSS] Removing Any23 from Nutch? > All, > I opened https://issues.apache.org/jira/browse/NUTCH-2998 a few weeks > ago. Any23 was moved to the attic in June. Unless there are objections, I > propose removing it from Nutch before the next release. > Any objections? > >Best, > >Tim >
Yahoo's Burst
Hi user@, I stumbled across Burst today... It looks like it is under active development and the documentation is lacking for loading data via a client. https://github.com/yahoo/burst lewismc -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc
Re: [VOTE] Move OODT to Attic?
+1 move to the attic. I share Sean's sentiment entirely. A real success story. Thanks Imesha for representing the project to the Board. lewismc On 2023/04/03 01:02:01 Imesha Sudasingha wrote: > Hello everyone: > > Due to inactivity, Apache OODT is considering moving to the Attic [1]. This > email serves as a call to all PMC members to vote whether to retire OODT to > the attic, or not. Note that three -1 votes will be sufficient to cancel > retirement to the attic no matter how many +1 votes there are. > > PMC members, please reply to this email with your vote: > > +1 [ ] I wish for Apache OODT to be retired to the Apache Attic > +0 [ ] I do not care > -1 [ ] Apache OODT should not be retired to the Attic > > Here's my +1. > > Thanks, > Imesha > > [1] https://attic.apache.org/ >
Re: FLAGON IS A TOP LEVEL PROJECT
Congrats community. lewismc On Wed, Mar 22, 2023 at 19:55 Joshua Poore wrote: > All, > > I’m so excited to tell you that the ASF Board unanimously approved the > resolution to establish Apache Flagon as an ASF Top Level Project. > > HUGE thanks to our community—PMC, committers, contributors, users. Apache > projects are built from communities. Thank You! > > I also want to congratulate @Jyyjy—his recent pull request is our first > official commit to Master as a TLP! > > PMC will work to migrate Apache Flagon from incubator to an autonomous > TLP. Stay Tuned! > > Thanks to all of you—you all made this possible. > > > Respectfully, > > Josh (VP, Apache Flagon) > > -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc
Re: FLAGON IS A TOP LEVEL PROJECT
Congrats community. lewismc On Wed, Mar 22, 2023 at 19:55 Joshua Poore wrote: > All, > > I’m so excited to tell you that the ASF Board unanimously approved the > resolution to establish Apache Flagon as an ASF Top Level Project. > > HUGE thanks to our community—PMC, committers, contributors, users. Apache > projects are built from communities. Thank You! > > I also want to congratulate @Jyyjy—his recent pull request is our first > official commit to Master as a TLP! > > PMC will work to migrate Apache Flagon from incubator to an autonomous > TLP. Stay Tuned! > > Thanks to all of you—you all made this possible. > > > Respectfully, > > Josh (VP, Apache Flagon) > > -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc
Re: Tika server crashes
Bit of a plug for tika-helm here folks... Horizontal pod autoscaling [0] is available (off by default) and can be configured via values.yaml or overridden on the CLI. This would mean that the availability to a tika-server would still be available in the event that one particular pod went down due to OOM. See [1] for more details. lewismc [0] https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/ [1] https://github.com/apache/tika-helm/blob/main/values.yaml#L99-L105 On 2023/03/08 20:29:49 Tim Allison wrote: > HIT_MAX_FILES is expected. We designed that in to periodically > restart the server to avoid memory leaks in badly behaving parsers. > You can configure a value for the max file threshold if necessary. > > The restart failed, and that's a problem. Let me look into the code, > I thought we offered more grace than 6 seconds to restart the server. > > Can you share any server settings in your config.xml? > > Please remember that as of 2.x tika-server will shutdown on oom, > timeouts and max_files, and clients should be able to handle waiting > for tika-server restarts. > > On Wed, Mar 8, 2023 at 2:58 PM Konstantin Gribov wrote: > > > > Hello, Artur. > > > > How many concurrent requests did you have and are you running Tika Server > > on Windows? And what kind of files did you use? > > > > You may have hit number of open files limit due to lot of reasons starting > > from known Windows issue (JVM process holds file descriptors for mmaped > > files until process killed) through just too low nofile limit to some Tika > > bug with handling for example stdin/stdout for forked processes. > > > > Could you provide jvm thread dump and lsof output (or Windows analog)? > > > > -- > > Best regards, > > Konstantin Gribov. > > > > > > On Wed, Mar 8, 2023 at 4:26 PM Artur Auhatov via user > > wrote: > >> > >> Hello! > >> > >> I have a few questions related to Tika server. > >> > >> > >> > >> We’ve started using tika server in our environment. While testing > >> reliability of the tika-server we found it crashes during fork and can’t > >> fork anymore until restart the main process. Is this known problem? > >> > >> In order to start tika-server we use command: java -jar > >> tika-server-2.7.0.jar -c myconfig.xml > >> > >> > >> > >> There is a log message: > >> > >> > >> > >> 14:50:30,272 [INFO] [Thread-9] - Shutting down forked process with > >> status: HIT_MAX_FILES [org.apache.tika.server.core.ServerStatusWatcher] > >> > >> INFO [pool-2-thread-1] 14:50:30,816 > >> org.apache.tika.server.core.TikaServerWatchDog forked process exited with > >> exit value 2 > >> > >> INFO [pool-2-thread-1] 14:50:36,876 > >> org.apache.tika.server.core.TikaServerWatchDog about to shutdown process > >> > >> ERROR [main] 14:50:36,878 org.apache.tika.server.core.TikaServerCli Can't > >> start: > >> > >> java.util.concurrent.ExecutionException: java.lang.RuntimeException: > >> Forked process failed to start after 6022 (ms) > >> > >> at > >> java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?] > >> > >> at > >> java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?] > >> > >> at > >> org.apache.tika.server.core.TikaServerCli.mainLoop(TikaServerCli.java:121) > >> ~[tika-server-standard-2.7.0.jar:2.7.0] > >> > >> at > >> org.apache.tika.server.core.TikaServerCli.execute(TikaServerCli.java:93) > >> ~[tika-server-standard-2.7.0.jar:2.7.0] > >> > >> at > >> org.apache.tika.server.core.TikaServerCli.main(TikaServerCli.java:80) > >> ~[tika-server-standard-2.7.0.jar:2.7.0] > >> > >> Caused by: java.lang.RuntimeException: Forked process failed to start > >> after 6022 (ms) > >> > >> at > >> org.apache.tika.server.core.TikaServerWatchDog$ForkedProcess.(TikaServerWatchDog.java:316) > >> ~[tika-server-standard-2.7.0.jar:2.7.0] > >> > >> at > >> org.apache.tika.server.core.TikaServerWatchDog$ForkedProcess.(TikaServerWatchDog.java:287) > >> ~[tika-server-standard-2.7.0.jar:2.7.0] > >> > >> at > >> org.apache.tika.server.core.TikaServerWatchDog.startForkedProcess(TikaServerWatchDog.java:224) > >> ~[tika-server-standard-2.7.0.jar:2.7.0] > >> > >> at > >> org.apache.tika.server.core.TikaServerWatchDog.call(TikaServerWatchDog.java:143) > >> ~[tika-server-standard-2.7.0.jar:2.7.0] > >> > >> at > >> org.apache.tika.server.core.TikaServerWatchDog.call(TikaServerWatchDog.java:53) > >> ~[tika-server-standard-2.7.0.jar:2.7.0] > >> > >> at > >> java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] > >> > >> at > >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > >> ~[?:?] > >> > >> at > >> java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] > >> > >> at > >>
[jira] [Updated] (TIKA-3989) Upgrade tika-helm Horizontal Pod Autoscaling from to autoscaling/v2beta1 to autoscaling/v2
[ https://issues.apache.org/jira/browse/TIKA-3989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated TIKA-3989: --- Description: The _*autoscaling/v2beta1*_ API is superseded with {_}*autoscaling/v2*{_}. This is documented thoroughly at [https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/] (was: The _*autoscaling/v2beta1*_ API is superseded with autoscaling/v2. This is documented thoroughly at https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/) > Upgrade tika-helm Horizontal Pod Autoscaling from to autoscaling/v2beta1 to > autoscaling/v2 > -- > > Key: TIKA-3989 > URL: https://issues.apache.org/jira/browse/TIKA-3989 > Project: Tika > Issue Type: Task > Components: helm > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Minor > > The _*autoscaling/v2beta1*_ API is superseded with {_}*autoscaling/v2*{_}. > This is documented thoroughly at > [https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (TIKA-3989) Upgrade tika-helm Horizontal Pod Autoscaling from to autoscaling/v2beta1 to autoscaling/v2
Lewis John McGibbney created TIKA-3989: -- Summary: Upgrade tika-helm Horizontal Pod Autoscaling from to autoscaling/v2beta1 to autoscaling/v2 Key: TIKA-3989 URL: https://issues.apache.org/jira/browse/TIKA-3989 Project: Tika Issue Type: Task Components: helm Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney The _*autoscaling/v2beta1*_ API is superseded with autoscaling/v2. This is documented thoroughly in [https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/|https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (TIKA-3989) Upgrade tika-helm Horizontal Pod Autoscaling from to autoscaling/v2beta1 to autoscaling/v2
[ https://issues.apache.org/jira/browse/TIKA-3989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated TIKA-3989: --- Description: The _*autoscaling/v2beta1*_ API is superseded with autoscaling/v2. This is documented thoroughly at https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/ (was: The _*autoscaling/v2beta1*_ API is superseded with autoscaling/v2. This is documented thoroughly in [https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/|https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/) > Upgrade tika-helm Horizontal Pod Autoscaling from to autoscaling/v2beta1 to > autoscaling/v2 > -- > > Key: TIKA-3989 > URL: https://issues.apache.org/jira/browse/TIKA-3989 > Project: Tika > Issue Type: Task > Components: helm > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Minor > > The _*autoscaling/v2beta1*_ API is superseded with autoscaling/v2. This is > documented thoroughly at > https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (TIKA-3988) Add Github Action to Lint and Test Charts
[ https://issues.apache.org/jira/browse/TIKA-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed TIKA-3988. -- > Add Github Action to Lint and Test Charts > - > > Key: TIKA-3988 > URL: https://issues.apache.org/jira/browse/TIKA-3988 > Project: Tika > Issue Type: Improvement > Components: helm >Affects Versions: 2.7.0 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Major > Fix For: 2.7.0 > > > The [chart-testing-action|https://github.com/helm/chart-testing-action] will > improve CI for the tika-helm. PR coming up. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (TIKA-3988) Add Github Action to Lint and Test Charts
[ https://issues.apache.org/jira/browse/TIKA-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved TIKA-3988. Resolution: Fixed > Add Github Action to Lint and Test Charts > - > > Key: TIKA-3988 > URL: https://issues.apache.org/jira/browse/TIKA-3988 > Project: Tika > Issue Type: Improvement > Components: helm >Affects Versions: 2.7.0 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Major > Fix For: 2.7.0 > > > The [chart-testing-action|https://github.com/helm/chart-testing-action] will > improve CI for the tika-helm. PR coming up. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[ANNOUNCEMENT] Apache Tika Helm Chart v2.7.0 and v2.7.0-full released
The Tika PMC is happy to announce that tika-helm v2.7.0 and v2.7.0-full Charts are now available. Documentation can be found at https://github.com/apache/tika-helm#readme Please register support and feedback at https://github.com/apache/tika-helm/pulls Thanks to everyone who contributed to these releases. Happy Helm'ing... lewismc -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc
[ANNOUNCEMENT] Apache Tika Helm Chart v2.7.0 and v2.7.0-full released
The Tika PMC is happy to announce that tika-helm v2.7.0 and v2.7.0-full Charts are now available. Documentation can be found at https://github.com/apache/tika-helm#readme Please register support and feedback at https://github.com/apache/tika-helm/pulls Thanks to everyone who contributed to these releases. Happy Helm'ing... lewismc -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc
[jira] [Commented] (TIKA-3988) Add Github Action to Lint and Test Charts
[ https://issues.apache.org/jira/browse/TIKA-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702421#comment-17702421 ] Lewis John McGibbney commented on TIKA-3988: It looks like there are some permissions issues which needs to be configured before the Github action can be run. I got in touch with INFRA about this. The Github Action output is as follows {quote} Error: .github#L1 helm/chart-testing-action@v2.3.1 and helm/kind-action@v1.4.0 are not allowed to be used in apache/tika-helm. Actions in this workflow must be: within a repository owned by apache, created by GitHub, verified in the GitHub Marketplace, or matching the following: {*}/{*}@[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]+, AdoptOpenJDK/install-jdk@{*}, JamesIves/github-pages-deploy-action@5dc1d5a192aeb5ab5b7d5a77b7d36aea4a7f5c92, TobKed/label-when-approved-action@{*}, actions-cool/issues-helper@{*}, actions-rs/{*}, al-cheb/configure-pagefile-action@{*}, amannn/action-semantic-pull-request@{*}, apache/{*}, burrunan/gradle-cache-action@{*}, bytedeco/javacpp-presets/.github/actions/{*}, chromaui/action@{*}, codecov/codecov-action@{*}, conda-incubator/setup-miniconda@{*}, container-tools/kind-action@{*}, container-tools/microshift-action@{*}, dawidd6/action-download-artifact@{*}, delaguardo/setup-graalvm@{*}, docker://jekyll/jekyll:{*}, docker://pandoc/core:2.9, eps1lon/actions-label-merge-conflict@{*}, gaurav-nelson/gith... {quote} > Add Github Action to Lint and Test Charts > - > > Key: TIKA-3988 > URL: https://issues.apache.org/jira/browse/TIKA-3988 > Project: Tika > Issue Type: Improvement > Components: helm >Affects Versions: 2.7.0 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Major > Fix For: 2.7.0 > > > The [chart-testing-action|https://github.com/helm/chart-testing-action] will > improve CI for the tika-helm. PR coming up. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (TIKA-3988) Add Github Action to Lint and Test Charts
Lewis John McGibbney created TIKA-3988: -- Summary: Add Github Action to Lint and Test Charts Key: TIKA-3988 URL: https://issues.apache.org/jira/browse/TIKA-3988 Project: Tika Issue Type: Improvement Components: helm Affects Versions: 2.7.0 Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Fix For: 2.7.0 The [chart-testing-action|https://github.com/helm/chart-testing-action] will improve CI for the tika-helm. PR coming up. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-3985) Automate tika-helm Chart releases with helm/chart-releaser-action
[ https://issues.apache.org/jira/browse/TIKA-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702402#comment-17702402 ] Lewis John McGibbney commented on TIKA-3985: https://github.com/marketplace/actions/jfrog-cli-for-github-actions https://github.com/helm/chart-releaser-action > Automate tika-helm Chart releases with helm/chart-releaser-action > -- > > Key: TIKA-3985 > URL: https://issues.apache.org/jira/browse/TIKA-3985 > Project: Tika > Issue Type: Improvement > Components: helm > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Major > Fix For: 2.7.0 > > > I've received several requests for > [tika-helm|https://github.com/apache/tika-helm] releases to shadow > [tika-docker|https://github.com/apache/tika-docker]. > I found a Github action which will enable that. PR coming up. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: Userale Schema
Big +1 one this. Would be useful as we are thinking about potentially pushing data into OpenSearch in the future. A schema and data types would be very useful. Lewis On Wed, Mar 15, 2023 at 1:48 PM Gedd Johnson wrote: > > Hi all, > > As discussed in this PR, we'd like to ideate on the topic of implementing a > schema for the Userale client payloads that are sent to backend servers. > > First stab at a problem statement: Userale in its current state does not > implement any sort of schema for its payloads. Changes to the payload's shape > (as referenced in the PR linked above) can break data pipelines for > downstream users. How might we: > > 1. Validate and version a schema so that downstream users know the shape of > data they will receive > > 2. Maintain the flexible schema management that Userale currently offers > > Looking forward to the discussion! > > Best, > Gedd Johnson > -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc
[jira] [Created] (TIKA-3985) Automate tika-helm Chart releases with helm/chart-releaser-action
Lewis John McGibbney created TIKA-3985: -- Summary: Automate tika-helm Chart releases with helm/chart-releaser-action Key: TIKA-3985 URL: https://issues.apache.org/jira/browse/TIKA-3985 Project: Tika Issue Type: Improvement Components: helm Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Fix For: 2.7.0 I've received several requests for [tika-helm|https://github.com/apache/tika-helm] releases to shadow [tika-docker|https://github.com/apache/tika-docker]. I found a Github action which will enable that. PR coming up. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (TIKA-3452) java.nio.file.FileSystemException Read-only file system
[ https://issues.apache.org/jira/browse/TIKA-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated TIKA-3452: --- Fix Version/s: 2.7.0 (was: 2.0.0-BETA) > java.nio.file.FileSystemException Read-only file system > --- > > Key: TIKA-3452 > URL: https://issues.apache.org/jira/browse/TIKA-3452 > Project: Tika > Issue Type: Bug > Components: docker, helm > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Major > Fix For: 2.7.0 > > > The following ExecutionException is thrown when I attempt to run [tika-docker > 2.0.0-BETA|https://hub.docker.com/layers/apache/tika/2.0.0-BETA-full/images/sha256-2d735f7bdf86e618a5390d92614a310697f9134d11a2b2e4c1c0cfcde1f68b1d?context=explore] > {code:bash} > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > java.util.concurrent.ExecutionException: java.nio.file.FileSystemException: > /tmp/apache-tika-server-forked-tmp-8374629799942405236: Read-only file system > at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191) > at > org.apache.tika.server.core.TikaServerCli.mainLoop(TikaServerCli.java:116) > at > org.apache.tika.server.core.TikaServerCli.execute(TikaServerCli.java:88) > at org.apache.tika.server.core.TikaServerCli.main(TikaServerCli.java:66) > Caused by: java.nio.file.FileSystemException: > /tmp/apache-tika-server-forked-tmp-8374629799942405236: Read-only file system > at > java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:100) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) > at > java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219) > at java.base/java.nio.file.Files.newByteChannel(Files.java:375) > at java.base/java.nio.file.Files.createFile(Files.java:652) > at > java.base/java.nio.file.TempFileHelper.create(TempFileHelper.java:137) > at > java.base/java.nio.file.TempFileHelper.createTempFile(TempFileHelper.java:160) > at java.base/java.nio.file.Files.createTempFile(Files.java:917) > at > org.apache.tika.server.core.TikaServerWatchDog$ForkedProcess.(TikaServerWatchDog.java:220) > at > org.apache.tika.server.core.TikaServerWatchDog$ForkedProcess.(TikaServerWatchDog.java:210) > at > org.apache.tika.server.core.TikaServerWatchDog.call(TikaServerWatchDog.java:117) > at > org.apache.tika.server.core.TikaServerWatchDog.call(TikaServerWatchDog.java:50) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) > at java.base/java.lang.Thread.run(Thread.java:832) > {code} > There are differences/improvements in the way the [tika-server child process > is > spawned|https://cwiki.apache.org/confluence/display/TIKA/TikaServer#TikaServer-MakingTikaServerRobusttoOOMs,InfiniteLoopsandMemoryLeaks] > in the 2.0.0-BETA docker image. I am investigating a fix. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (TIKA-3452) java.nio.file.FileSystemException Read-only file system
[ https://issues.apache.org/jira/browse/TIKA-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved TIKA-3452. Resolution: Fixed > java.nio.file.FileSystemException Read-only file system > --- > > Key: TIKA-3452 > URL: https://issues.apache.org/jira/browse/TIKA-3452 > Project: Tika > Issue Type: Bug > Components: docker, helm > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Major > Fix For: 2.7.0 > > > The following ExecutionException is thrown when I attempt to run [tika-docker > 2.0.0-BETA|https://hub.docker.com/layers/apache/tika/2.0.0-BETA-full/images/sha256-2d735f7bdf86e618a5390d92614a310697f9134d11a2b2e4c1c0cfcde1f68b1d?context=explore] > {code:bash} > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > java.util.concurrent.ExecutionException: java.nio.file.FileSystemException: > /tmp/apache-tika-server-forked-tmp-8374629799942405236: Read-only file system > at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191) > at > org.apache.tika.server.core.TikaServerCli.mainLoop(TikaServerCli.java:116) > at > org.apache.tika.server.core.TikaServerCli.execute(TikaServerCli.java:88) > at org.apache.tika.server.core.TikaServerCli.main(TikaServerCli.java:66) > Caused by: java.nio.file.FileSystemException: > /tmp/apache-tika-server-forked-tmp-8374629799942405236: Read-only file system > at > java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:100) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) > at > java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219) > at java.base/java.nio.file.Files.newByteChannel(Files.java:375) > at java.base/java.nio.file.Files.createFile(Files.java:652) > at > java.base/java.nio.file.TempFileHelper.create(TempFileHelper.java:137) > at > java.base/java.nio.file.TempFileHelper.createTempFile(TempFileHelper.java:160) > at java.base/java.nio.file.Files.createTempFile(Files.java:917) > at > org.apache.tika.server.core.TikaServerWatchDog$ForkedProcess.(TikaServerWatchDog.java:220) > at > org.apache.tika.server.core.TikaServerWatchDog$ForkedProcess.(TikaServerWatchDog.java:210) > at > org.apache.tika.server.core.TikaServerWatchDog.call(TikaServerWatchDog.java:117) > at > org.apache.tika.server.core.TikaServerWatchDog.call(TikaServerWatchDog.java:50) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) > at java.base/java.lang.Thread.run(Thread.java:832) > {code} > There are differences/improvements in the way the [tika-server child process > is > spawned|https://cwiki.apache.org/confluence/display/TIKA/TikaServer#TikaServer-MakingTikaServerRobusttoOOMs,InfiniteLoopsandMemoryLeaks] > in the 2.0.0-BETA docker image. I am investigating a fix. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (TEZ-4371) Implement ClientServiceDelegate.getJobCounters
[ https://issues.apache.org/jira/browse/TEZ-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned TEZ-4371: - Assignee: Lewis John McGibbney > Implement ClientServiceDelegate.getJobCounters > -- > > Key: TEZ-4371 > URL: https://issues.apache.org/jira/browse/TEZ-4371 > Project: Apache Tez > Issue Type: Improvement >Reporter: László Bodor > Assignee: Lewis John McGibbney >Priority: Major > > Details are > [here|https://issues.apache.org/jira/browse/NUTCH-2839?focusedCommentId=17471115=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17471115] > currently when tez ClientProtocol intercepts MR job submission (YARNRunner), > the collection of counters is not implemented > {code} > public Counters getJobCounters(JobID jobId) > throws IOException, InterruptedException { > // FIXME needs counters support from DAG > // with a translation layer on client side > Counters empty = new Counters(); > return empty; > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] (TEZ-4371) Implement ClientServiceDelegate.getJobCounters
[ https://issues.apache.org/jira/browse/TEZ-4371 ] Lewis John McGibbney deleted comment on TEZ-4371: --- was (Author: lewismc): [~abstractdog] I have to finish off NUTCH-2856 then I could make an effort to investigate and implement this improvement. I'll write here once I finish NUTCH-2856. > Implement ClientServiceDelegate.getJobCounters > -- > > Key: TEZ-4371 > URL: https://issues.apache.org/jira/browse/TEZ-4371 > Project: Apache Tez > Issue Type: Improvement >Reporter: László Bodor >Priority: Major > > Details are > [here|https://issues.apache.org/jira/browse/NUTCH-2839?focusedCommentId=17471115=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17471115] > currently when tez ClientProtocol intercepts MR job submission (YARNRunner), > the collection of counters is not implemented > {code} > public Counters getJobCounters(JobID jobId) > throws IOException, InterruptedException { > // FIXME needs counters support from DAG > // with a translation layer on client side > Counters empty = new Counters(); > return empty; > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (NUTCH-2856) Implement a protocol-smb plugin based on hierynomus/smbj
[ https://issues.apache.org/jira/browse/NUTCH-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2856: --- Assignee: (was: Lewis John McGibbney) > Implement a protocol-smb plugin based on hierynomus/smbj > > > Key: NUTCH-2856 > URL: https://issues.apache.org/jira/browse/NUTCH-2856 > Project: Nutch > Issue Type: New Feature > Components: external, plugin, protocol >Reporter: Hiran Chaudhuri >Priority: Major > Fix For: 1.20 > > > The plugin protocol-smb advertized on > [https://cwiki.apache.org/confluence/display/NUTCH/PluginCentral] actually > refers to the JCIFS library. According to this library's homepage > [https://www.jcifs.org/]: > _If you're looking for the latest and greatest open source Java SMB library, > this is not it. JCIFS has been in maintenance-mode-only for several years and > although what it does support works fine (SMB1, NTLMv2, midlc, MSRPC and > various utility classes), jCIFS does not support the newer SMB2/3 variants of > the SMB protocol which is slowly becoming required (Windows 10 requires > SMB2/3). JCIFS only supports SMB1 but Microsoft has deprecated SMB1 in their > products. *So if SMB1 is disabled on your network, JCIFS' file related > operations will NOT work.*_ > Looking at > [https://en.wikipedia.org/wiki/Server_Message_Block#SMB_/_CIFS_/_SMB1:|https://en.wikipedia.org/wiki/Server_Message_Block#SMB_/_CIFS_/_SMB1] > _Microsoft added SMB1 to the Windows Server 2012 R2 deprecation list in June > 2013. Windows Server 2016 and some versions of Windows 10 Fall Creators > Update do not have SMB1 installed by default._ > As a conclusion, the chances that SMB1 protocol is installed and/or > configured are getting vastly smaller. Therefore some migration towards > SMB2/3 is required. Luckily the JCIFS homepage lists alternatives: > * [jcifs-codelibs|https://github.com/codelibs/jcifs] > * [jcifs-ng|https://github.com/AgNO3/jcifs-ng] > * [smbj|https://github.com/hierynomus/smbj] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NUTCH-2988) Elasticsearch 7.13.2 compatible with ASL 2.0?
[ https://issues.apache.org/jira/browse/NUTCH-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694741#comment-17694741 ] Lewis John McGibbney commented on NUTCH-2988: - Actually, digging deeper it looks like the v7.13.2 we consume is licensed under [Elastic License 2.0|https://raw.githubusercontent.com/elastic/elasticsearch/v7.13.2/licenses/ELASTIC-LICENSE-2.0.txt]. This is confirmed by # https://central.sonatype.com/artifact/org.elasticsearch.client/elasticsearch-rest-high-level-client/7.13.2, and # https://mvnrepository.com/artifact/org.elasticsearch.client/elasticsearch-rest-high-level-client/7.13.2 > Elasticsearch 7.13.2 compatible with ASL 2.0? > - > > Key: NUTCH-2988 > URL: https://issues.apache.org/jira/browse/NUTCH-2988 > Project: Nutch > Issue Type: Task >Reporter: Tim Allison >Priority: Minor > > In the latest release of at least the 1.x branch of Nutch, the elasticsearch > high level java client is at 7.13.2, which is after the great schism. Or, > the last purely ASL 2.0 license was in 7.10.2. > So, do we need to downgrade to 7.10.2 or is Elasticsearch's new licensing > plan suitable to be released within an ASF project? > Or, is the client as opposed to the main search project still actually ASL > 2.0? > Ref: https://github.com/elastic/elasticsearch/blob/v7.13.2/LICENSE.txt -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NUTCH-2988) Elasticsearch 7.13.2 compatible with ASL 2.0?
[ https://issues.apache.org/jira/browse/NUTCH-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694736#comment-17694736 ] Lewis John McGibbney commented on NUTCH-2988: - It looks the the [elasticsearch-java client|https://github.com/elastic/elasticsearch-java/blob/v8.6.2/LICENSE.txt]'s are licensed under ALv2.0. > Elasticsearch 7.13.2 compatible with ASL 2.0? > - > > Key: NUTCH-2988 > URL: https://issues.apache.org/jira/browse/NUTCH-2988 > Project: Nutch > Issue Type: Task >Reporter: Tim Allison >Priority: Minor > > In the latest release of at least the 1.x branch of Nutch, the elasticsearch > high level java client is at 7.13.2, which is after the great schism. Or, > the last purely ASL 2.0 license was in 7.10.2. > So, do we need to downgrade to 7.10.2 or is Elasticsearch's new licensing > plan suitable to be released within an ASF project? > Or, is the client still actually ASL 2.0? > Ref: https://github.com/elastic/elasticsearch/blob/v7.13.2/LICENSE.txt -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (TIKA-3452) java.nio.file.FileSystemException Read-only file system
[ https://issues.apache.org/jira/browse/TIKA-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated TIKA-3452: --- Summary: java.nio.file.FileSystemException Read-only file system (was: java.nio.file.FileSystemException Read-only file system in 2.0.0-BETA tika-docker) > java.nio.file.FileSystemException Read-only file system > --- > > Key: TIKA-3452 > URL: https://issues.apache.org/jira/browse/TIKA-3452 > Project: Tika > Issue Type: Bug > Components: docker, helm > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney >Priority: Major > Fix For: 2.0.0-BETA > > > The following ExecutionException is thrown when I attempt to run [tika-docker > 2.0.0-BETA|https://hub.docker.com/layers/apache/tika/2.0.0-BETA-full/images/sha256-2d735f7bdf86e618a5390d92614a310697f9134d11a2b2e4c1c0cfcde1f68b1d?context=explore] > {code:bash} > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > java.util.concurrent.ExecutionException: java.nio.file.FileSystemException: > /tmp/apache-tika-server-forked-tmp-8374629799942405236: Read-only file system > at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191) > at > org.apache.tika.server.core.TikaServerCli.mainLoop(TikaServerCli.java:116) > at > org.apache.tika.server.core.TikaServerCli.execute(TikaServerCli.java:88) > at org.apache.tika.server.core.TikaServerCli.main(TikaServerCli.java:66) > Caused by: java.nio.file.FileSystemException: > /tmp/apache-tika-server-forked-tmp-8374629799942405236: Read-only file system > at > java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:100) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) > at > java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219) > at java.base/java.nio.file.Files.newByteChannel(Files.java:375) > at java.base/java.nio.file.Files.createFile(Files.java:652) > at > java.base/java.nio.file.TempFileHelper.create(TempFileHelper.java:137) > at > java.base/java.nio.file.TempFileHelper.createTempFile(TempFileHelper.java:160) > at java.base/java.nio.file.Files.createTempFile(Files.java:917) > at > org.apache.tika.server.core.TikaServerWatchDog$ForkedProcess.(TikaServerWatchDog.java:220) > at > org.apache.tika.server.core.TikaServerWatchDog$ForkedProcess.(TikaServerWatchDog.java:210) > at > org.apache.tika.server.core.TikaServerWatchDog.call(TikaServerWatchDog.java:117) > at > org.apache.tika.server.core.TikaServerWatchDog.call(TikaServerWatchDog.java:50) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) > at java.base/java.lang.Thread.run(Thread.java:832) > {code} > There are differences/improvements in the way the [tika-server child process > is > spawned|https://cwiki.apache.org/confluence/display/TIKA/TikaServer#TikaServer-MakingTikaServerRobusttoOOMs,InfiniteLoopsandMemoryLeaks] > in the 2.0.0-BETA docker image. I am investigating a fix. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [ROLL CALL] Project status of Any23
Hi, I am here but have been busy doing other tasks right now. The project has been very quiet for quite a while and this has been reported to the Board in many previous reports. I neglected to submit reports for two or three months but will attempt to rectify that. I also try to push releases and ensure that dspenency updates are made. lewismc On 2023/01/19 02:10:56 Willem Jiang wrote: > Hi, > > There has been no development activity on this project for more than 6 > months. The PMC failed to submit several board reports. Without a > community or anyone working on the project, there is no project. > > If any of the PMC members are still active, please indicate so by > responding to this email. > > Thank you, all! I hope everyone is safe and healthy. > > Willem Jiang >
Re: user Digest 8 Nov 2022 10:16:05 -0000 Issue 3169
Hi Mike, Yes it is possible to extend the TLD list. In fact, when the TLD lost was compiled the author left a note explicitly stating that it may not be complete. https://github.com/apache/nutch/blob/master/conf/domain-suffixes.xml.template Please submit a PR if you wish to make any changes or additions. You can use the parser checker tool to validate your change before creating the PR. Thanks lewismc On Tue, Nov 8, 2022 at 02:16 wrote: > > -- Forwarded message -- > From: Mike > To: user@nutch.apache.org > Cc: > Bcc: > Date: Tue, 8 Nov 2022 11:15:51 +0100 > Subject: Incomplete TLD List > Hi! > Some of the new TLDs are wrongly indexed by Nutch, is it possible to extend > the TLD list? > > "url":"https://about.google/intl/en_FR/how-our-business-works/;, > "tstamp":"2022-11-06T17:22:14.808Z", > "domain":"google", > "digest":"3b9a23d42f200392d12a697bbb8d4d87", > > > Thanks > > Mike > -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc
Re: [ VOTE ] Graduation of Flagon Project
Excellent, please close the thread off with a RESULT title and then tally the VOTE’s and who VOTE’d. Thanks On Sat, Nov 5, 2022 at 15:37 Austin Bennett wrote: > On this thread, we have 6 +1s [ also another on the incubator thread ] and > no 0, or -1 votes, so* the VOTE passes*, and we can confidently say the > community is in favor of Graduation. > > > On Thu, Nov 3, 2022 at 8:00 AM Evan Jones wrote: > >> + 1 >> >> Best >> >> Evan Jones >> Website: www.ea-jones.com >> >> >> On Thu, Nov 3, 2022 at 4:44 AM Furkan KAMACI >> wrote: >> >> > Definitely +1! >> > >> > On Tue, Nov 1, 2022 at 7:58 PM Amir Ghaemi wrote: >> > >> >> +1 >> >> >> >> Best Regards, >> >> *Amir M. Ghaemi* >> >> >> >> >> >> On Tue, Nov 1, 2022 at 7:06 AM Gedd Johnson >> wrote: >> >> >> >> > +1 >> >> > >> >> > Best, >> >> > Gedd Johnson >> >> > >> >> > On Mon, Oct 31, 2022 at 23:22 Joshua Poore >> wrote: >> >> > >> >> > > Emphatic +1 for me. >> >> > > >> >> > > Sincerely, >> >> > > >> >> > > Josh >> >> > > >> >> > > >> >> > > On Oct 31, 2022, at 5:13 PM, lewis john mcgibbney < >> lewi...@apache.org >> >> > >> >> > > wrote: >> >> > > >> >> > > +1 >> >> > > >> >> > > On Mon, Oct 31, 2022 at 09:31 Austin Bennett >> >> wrote: >> >> > > >> >> > >> Hi Flagon Community, >> >> > >> +1 >> >> > >> Given recent discussions around the graduation status of the >> >> project, it >> >> > >> is time to work through the process. We have had a recent >> discussion >> >> > >> on-list, and consensus seems to be in favor of graduation. The >> next >> >> > step >> >> > >> seems to be a recommendation that we make an official VOTE, per: >> >> > >> https://incubator.apache.org/guides/graduation >> >> > >> .html#community_graduation_vote >> >> > >> >> >> > >> *Please VOTE* for the actual record. I will also let the >> Incubator >> >> know >> >> > >> the vote is occurring [ per the link above ], and imagine that I >> will >> >> > tally >> >> > >> the votes later on Friday the 4th to allow for >= 72 hours; we'll >> see >> >> > how >> >> > >> this thread goes. >> >> > >> >> >> > >> Per https://www.apache.org/foundation/voting.html* ideally votes >> >> will >> >> > >> be +1, 0, or -1* And, as I understand it, only IPMC votes are >> >> binding, >> >> > >> as found in >> https://incubator.apache.org/guides/participation.html. >> >> > >> >> >> > >> >> >> > >> Please consider this my +1. >> >> > >> >> >> > >> The existing community has demonstrated addressing the >> requirements >> >> as >> >> > >> found in the incubator guidelines for graduation. Graduation is >> an >> >> > >> important milestone signaling project maturity, and a great step >> >> towards >> >> > >> ongoing growth and evolution. >> >> > >> >> >> > >> Cheers - >> >> > >> Austin >> >> > >> >> >> > >> -- >> >> > > http://home.apache.org/~lewismc/ >> >> > > http://people.apache.org/keys/committer/lewismc >> >> > > >> >> > > >> >> > > >> >> > >> >> >> > >> > -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc