[jira] [Commented] (NUTCH-3054) Address deprecation of Node16 for all GitHub Actions
[ https://issues.apache.org/jira/browse/NUTCH-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842209#comment-17842209 ] ASF GitHub Bot commented on NUTCH-3054: --- lewismc opened a new pull request, #817: URL: https://github.com/apache/nutch/pull/817 Addresses https://issues.apache.org/jira/browse/NUTCH-3054 > Address deprecation of Node16 for all GitHub Actions > > > Key: NUTCH-3054 > URL: https://issues.apache.org/jira/browse/NUTCH-3054 > Project: Nutch > Issue Type: Task > Components: ci/cd >Affects Versions: 1.20 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Minor > Fix For: 1.21 > > > See > [https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/] > We need to upgrade the setup-java action in > [https://github.com/apache/nutch/blob/master/.github/workflows/master-build.yml] > > Patch coming up -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (NUTCH-3054) Address deprecation of Node16 for all GitHub Actions
[ https://issues.apache.org/jira/browse/NUTCH-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-3054: Affects Version/s: 1.20 > Address deprecation of Node16 for all GitHub Actions > > > Key: NUTCH-3054 > URL: https://issues.apache.org/jira/browse/NUTCH-3054 > Project: Nutch > Issue Type: Task > Components: ci/cd >Affects Versions: 1.20 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Minor > Fix For: 1.21 > > > See > [https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/] > We need to upgrade the setup-java action in > [https://github.com/apache/nutch/blob/master/.github/workflows/master-build.yml] > > Patch coming up -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (NUTCH-3054) Address deprecation of Node16 for all GitHub Actions
Lewis John McGibbney created NUTCH-3054: --- Summary: Address deprecation of Node16 for all GitHub Actions Key: NUTCH-3054 URL: https://issues.apache.org/jira/browse/NUTCH-3054 Project: Nutch Issue Type: Task Components: ci/cd Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Fix For: 1.21 See [https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/] We need to upgrade the setup-java action in [https://github.com/apache/nutch/blob/master/.github/workflows/master-build.yml] Patch coming up -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (NUTCH-3054) Address deprecation of Node16 for all GitHub Actions
[ https://issues.apache.org/jira/browse/NUTCH-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-3054 started by Lewis John McGibbney. --- > Address deprecation of Node16 for all GitHub Actions > > > Key: NUTCH-3054 > URL: https://issues.apache.org/jira/browse/NUTCH-3054 > Project: Nutch > Issue Type: Task > Components: ci/cd >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Minor > Fix For: 1.21 > > > See > [https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/] > We need to upgrade the setup-java action in > [https://github.com/apache/nutch/blob/master/.github/workflows/master-build.yml] > > Patch coming up -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NUTCH-3049) Investigate using Records
[ https://issues.apache.org/jira/browse/NUTCH-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842208#comment-17842208 ] Lewis John McGibbney commented on NUTCH-3049: - I think that each of the Writable classes mentioned in NutchWritable may be fair game {{ org.apache.nutch.crawl.CrawlDatum.class,}} {{ org.apache.nutch.crawl.Inlink.class,}} {{ org.apache.nutch.crawl.Inlinks.class,}} {{ org.apache.nutch.indexer.NutchIndexAction.class,}} {{ org.apache.nutch.metadata.Metadata.class,}} {{ org.apache.nutch.parse.Outlink.class,}} {{ org.apache.nutch.parse.ParseText.class,}} {{ org.apache.nutch.parse.ParseData.class,}} {{ org.apache.nutch.parse.ParseImpl.class,}} {{ org.apache.nutch.parse.ParseStatus.class,}} {{ org.apache.nutch.protocol.Content.class,}} {{ org.apache.nutch.protocol.ProtocolStatus.class,}} {{ org.apache.nutch.scoring.webgraph.LinkDatum.class,}} {{ org.apache.nutch.hostdb.HostDatum.class}} > Investigate using Records > - > > Key: NUTCH-3049 > URL: https://issues.apache.org/jira/browse/NUTCH-3049 > Project: Nutch > Issue Type: Sub-task >Reporter: Lewis John McGibbney >Priority: Major > > Guidance at [https://www.baeldung.com/java-migrate-8-to-17#records] > i think there are multiple areas where we could use Records. This ticket will > document the opportunities and structure that work. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] Consolidating Nutch Continuous Integration
Hi Sebastian, Understood. If it ain’t broke don’t fix it. Thanks for the input. On 2024/04/28 12:08:27 Sebastian Nagel wrote: > > From my side: no. It may not harm to have both. > > Best, > Sebastian
[jira] [Created] (NUTCH-3053) Upgrade build and CI to JDK17
Lewis John McGibbney created NUTCH-3053: --- Summary: Upgrade build and CI to JDK17 Key: NUTCH-3053 URL: https://issues.apache.org/jira/browse/NUTCH-3053 Project: Nutch Issue Type: Sub-task Components: build, ci/cd Reporter: Lewis John McGibbney This will involves changes to * [https://github.com/apache/nutch/blob/master/.github/workflows/master-build.yml] * [https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/] * [https://github.com/apache/nutch/blob/master/default.properties#L46] * [https://github.com/apache/nutch/blob/master/default.properties#L57] * We should also investigate any deprecation notices in the build output * [https://github.com/apache/nutch/blob/master/ivy/mvn.template#L128-L129] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (NUTCH-3052) Investigate using sealed classes
Lewis John McGibbney created NUTCH-3052: --- Summary: Investigate using sealed classes Key: NUTCH-3052 URL: https://issues.apache.org/jira/browse/NUTCH-3052 Project: Nutch Issue Type: Sub-task Reporter: Lewis John McGibbney Guidance available at [https://www.baeldung.com/java-migrate-8-to-17#sealed-classes] First document if and where sealed classes would add value. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (NUTCH-3051) Investigate using new pattern matching syntax in switch expressions
Lewis John McGibbney created NUTCH-3051: --- Summary: Investigate using new pattern matching syntax in switch expressions Key: NUTCH-3051 URL: https://issues.apache.org/jira/browse/NUTCH-3051 Project: Nutch Issue Type: Sub-task Reporter: Lewis John McGibbney Guidance available at [https://www.baeldung.com/java-migrate-8-to-17#2-switch-expressions] Apparently we use switch in 35 files [https://github.com/search?q=repo%3Aapache%2Fnutch+switch+language%3AJava=code=Java] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (NUTCH-3050) Investigate use of the enhanced instanceof operator
Lewis John McGibbney created NUTCH-3050: --- Summary: Investigate use of the enhanced instanceof operator Key: NUTCH-3050 URL: https://issues.apache.org/jira/browse/NUTCH-3050 Project: Nutch Issue Type: Sub-task Reporter: Lewis John McGibbney Guidance at [https://www.baeldung.com/java-migrate-8-to-17#1-enhanced-instanceof-operator] Apparently we use instanceof operator in 50 files [https://github.com/search?q=repo%3Aapache%2Fnutch%20instanceof=code] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (NUTCH-3049) Investigate using Records
Lewis John McGibbney created NUTCH-3049: --- Summary: Investigate using Records Key: NUTCH-3049 URL: https://issues.apache.org/jira/browse/NUTCH-3049 Project: Nutch Issue Type: Sub-task Reporter: Lewis John McGibbney Guidance at [https://www.baeldung.com/java-migrate-8-to-17#records] i think there are multiple areas where we could use Records. This ticket will document the opportunities and structure that work. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (NUTCH-3048) Investigate where/if new string utility methods could be used
Lewis John McGibbney created NUTCH-3048: --- Summary: Investigate where/if new string utility methods could be used Key: NUTCH-3048 URL: https://issues.apache.org/jira/browse/NUTCH-3048 Project: Nutch Issue Type: Sub-task Components: util Reporter: Lewis John McGibbney Guidance at [https://www.baeldung.com/java-migrate-8-to-17#3-new-string-methods] We may be able to also revisit our usage of common-* libraries with tje goal of using native methods from JDK. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (NUTCH-3047) Use multi-line text blocks
Lewis John McGibbney created NUTCH-3047: --- Summary: Use multi-line text blocks Key: NUTCH-3047 URL: https://issues.apache.org/jira/browse/NUTCH-3047 Project: Nutch Issue Type: Sub-task Components: CLI Reporter: Lewis John McGibbney Guidance available at [https://www.baeldung.com/java-migrate-8-to-17#2-text-block] This will help to cleanup our CLI *usage()* messages at a bare minimum. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (NUTCH-3046) Use compact strings
[ https://issues.apache.org/jira/browse/NUTCH-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-3046: Description: Follow the guidance at [https://www.baeldung.com/java-migrate-8-to-17#1-compact-string] It looks like there are 9 instances where we use _*char []*_ |[https://github.com/search?q=repo%3Aapache%2Fnutch%20char%5B%5D=code]]. was: Follow the guidance at [https://www.baeldung.com/java-migrate-8-to-17#1-compact-string] It looks like there are [9 instances where we use char[]|[https://github.com/search?q=repo%3Aapache%2Fnutch%20char%5B%5D=code]]. > Use compact strings > --- > > Key: NUTCH-3046 > URL: https://issues.apache.org/jira/browse/NUTCH-3046 > Project: Nutch > Issue Type: Sub-task >Reporter: Lewis John McGibbney >Priority: Major > > Follow the guidance at > [https://www.baeldung.com/java-migrate-8-to-17#1-compact-string] > It looks like there are 9 instances where we use _*char []*_ > |[https://github.com/search?q=repo%3Aapache%2Fnutch%20char%5B%5D=code]]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NUTCH-1806) Delegate processing of URL domains to crawler commons
[ https://issues.apache.org/jira/browse/NUTCH-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841995#comment-17841995 ] ASF GitHub Bot commented on NUTCH-1806: --- sebastian-nagel opened a new pull request, #816: URL: https://github.com/apache/nutch/pull/816 and NUTCH-1942 Remove TopLevelDomain - use methods from crawler-commons' EffectiveTldFinder in URLUtil replacing classed and methods from the "org.apache.nutch.util.domain" package - adapt and extend unit tests - add tests for URLUtil.getTopLevelDomainName(url) - reflect changes to the public suffix list since 2014 ("xyz" is now a public suffix / ICANN suffix) - adapt to minor API changes - URLUtil.getDomainName(url) returns the host name in case no valid public suffix is found - for Unicode suffixes and TLDs the methods URLUtil.getDomainSuffix(url) resp. URLUtil.getTopLevelDomainName(url) now return the ASCII representation - add unit tests for host names with trailing dot ("www.apache.org.") - add add unit test for URLs without host/domain (cf. NUTCH-2450)unit test for URLs without host/domain (cf. NUTCH-2450) - update and complete Javadoc - update DomainStatistics, TLDIndexingFilter and domain URL filters to use the updated methods in URLUtil - remove the class TLDScoringFilter. The configuration is bound to the domain-suffixes.xml which wasn't maintained anymore and is now removed - remove package org.apache.nutch.util.domain - move DomainStatistics to org.apache.nutch.util - remove configuration files of domain utils > Delegate processing of URL domains to crawler commons > - > > Key: NUTCH-1806 > URL: https://issues.apache.org/jira/browse/NUTCH-1806 > Project: Nutch > Issue Type: Improvement >Affects Versions: 1.8 >Reporter: Julien Nioche >Priority: Major > Labels: crawler-commons > Fix For: 1.21 > > > We have code in src/java/org/apache/nutch/util/domain and a resource file > conf/domain-suffixes.xml to handle URL domains. This is used mostly from > URLUtil.getDomainName. > The resource file is not necessarily up to date and since crawler commons has > a similar functionality we should use it instead of having to maintain our > own resources. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] NUTCH-1806 Delegate processing of URL domains to crawler-common [nutch]
sebastian-nagel opened a new pull request, #816: URL: https://github.com/apache/nutch/pull/816 and NUTCH-1942 Remove TopLevelDomain - use methods from crawler-commons' EffectiveTldFinder in URLUtil replacing classed and methods from the "org.apache.nutch.util.domain" package - adapt and extend unit tests - add tests for URLUtil.getTopLevelDomainName(url) - reflect changes to the public suffix list since 2014 ("xyz" is now a public suffix / ICANN suffix) - adapt to minor API changes - URLUtil.getDomainName(url) returns the host name in case no valid public suffix is found - for Unicode suffixes and TLDs the methods URLUtil.getDomainSuffix(url) resp. URLUtil.getTopLevelDomainName(url) now return the ASCII representation - add unit tests for host names with trailing dot ("www.apache.org.") - add add unit test for URLs without host/domain (cf. NUTCH-2450)unit test for URLs without host/domain (cf. NUTCH-2450) - update and complete Javadoc - update DomainStatistics, TLDIndexingFilter and domain URL filters to use the updated methods in URLUtil - remove the class TLDScoringFilter. The configuration is bound to the domain-suffixes.xml which wasn't maintained anymore and is now removed - remove package org.apache.nutch.util.domain - move DomainStatistics to org.apache.nutch.util - remove configuration files of domain utils -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org