commits
Thread
Date
Earlier messages
Messages by Thread
(nutch) branch master updated: NUTCH-2771 Tests in nightly builds: skip long runners
snagel
(nutch) branch master updated: NUTCH-3084 Improve CI by filtering and separating plugin and core test execution (#833)
lewismc
(nutch) branch master updated (a99bd8ea6 -> b02340dfe)
snagel
(nutch) 01/01: Merge pull request #827 from sebastian-nagel/NUTCH-3067
snagel
(nutch) branch master updated: Unlock database when Injector finishes - regardless of result
snagel
(nutch) branch master updated: NUTCH-3075 tld plugin makes injector crash NUTCH-1942 Remove TopLevelDomain
snagel
(nutch) branch master updated (d6f55b8ea -> 4a61208f4)
snagel
(nutch) 01/01: Merge pull request #828 from sebastian-nagel/NUTCH-3073
snagel
(nutch) branch master updated: NUTCH-2812 Methods returning array may expose internal representation
snagel
(nutch) branch master updated (8b11962a4 -> c137b4e0b)
snagel
(nutch) 01/01: Merge pull request #798 from GabeHaegele/NUTCH-2812
snagel
(nutch) branch master updated (582cdd417 -> 8b11962a4)
snagel
(nutch) 01/01: Merge pull request #816 from sebastian-nagel/NUTCH-1942-domain-utils-to-use-crawler-commons
snagel
(nutch) branch master updated: NUTCH-3058 Fetcher: counter for hung threads (#820)
snagel
(nutch) branch master updated: NUTCH-3061 URL filters to log name of the rules file
snagel
(nutch) branch master updated: NUTCH-3062 protocol-okhttp: optionally record HTTP and SSL/TLS versions (#822)
snagel
(nutch) branch master updated (309bc1863 -> bc8bd317f)
snagel
(nutch) 01/01: Merge pull request #823 from sebastian-nagel/NUTCH-3065-changelog-markdown
snagel
(nutch) branch master updated: NUTCH-3066 Protocol plugin unit tests fail randomly
snagel
(nutch) branch master updated (ac03cf164 -> e09d40cbd)
joegilvary
(nutch) 01/01: Merge pull request #819 from CatChullain/NUTCH-3057
joegilvary
(nutch) branch master updated: NUTCH-3063 Support for "addBinaryContent" from REST API
snagel
(nutch) branch master updated: NUTCH-3055 README: fix Github "hub" commands - replace "git" with "hub" were necessary - improve formatting of "contributing" steps
snagel
(nutch) branch master updated (8abc78a65 -> bfa07df29)
snagel
(nutch) 01/01: Merge pull request #815 from sebastian-nagel/NUTCH-3044-generator-npe
snagel
(nutch) branch master updated: NUTCH-3041 Address confusing logging in o.a.n.net.URLExemptionFilters (#813)
lewismc
(nutch) branch master updated: NUTCH-3043 Generator: count URLs rejected by URL filters (#814)
snagel
(nutch) branch master updated: NUTCH-3039 Failure to handle ftp:// URLs
snagel
(nutch-site) branch asf-site updated: Revert incorrect change in doap.rdf (see #2)
snagel
(nutch-site) branch asf-staging updated: Revert incorrect change in doap.rdf (see #2)
snagel
(nutch-site) branch main updated: Revert incorrect change (#2)
snagel
(nutch) branch master updated: NUTCH-3054 Address deprecation of Node16 for all GitHub Actions (#817)
lewismc
(nutch) branch master updated: Boostrap Nutch 1.21 development drive.
lewismc
(nutch) branch master updated: Add GitHub CI badge to README
lewismc
svn commit: r68753 - in /release/nutch: 1.19/ 1.20/apache-nutch-1.20-bin.tar.gz.sha512 1.20/apache-nutch-1.20-bin.zip.sha512 1.20/apache-nutch-1.20-src.tar.gz.sha512 1.20/apache-nutch-1.20-src.zip.sha512 2.4/
lewismc
svn commit: r68752 - /dev/nutch/1.20/ /release/nutch/1.20/
lewismc
svn commit: r68410 [1/3] - /dev/nutch/1.20/
lewismc
svn commit: r68410 [2/3] - /dev/nutch/1.20/
lewismc
svn commit: r68410 [3/3] - /dev/nutch/1.20/
lewismc
(nutch) annotated tag release-1.20 updated (a2cb6aa5d -> 6510cb241)
lewismc
(nutch) branch branch-1.20 updated: Prepare Nutch 1.20 release candidate
lewismc
(nutch) branch branch-1.20 created (now f141a398c)
lewismc
(nutch) 01/01: Prepare Nutch 1.20 release candidate
lewismc
(nutch) branch master updated: NUTCH-3038 Address issues discovered during 1.20 release management dryrun (#811)
lewismc
(nutch) branch branch-1.20 deleted (was 9cfe3d7f9)
lewismc
(nutch) branch branch-1.20 created (now 9cfe3d7f9)
lewismc
(nutch) 01/01: Prepare for Nutch 1.20 release
lewismc
(nutch) branch master updated: NUTCH-3032 Code for an ArbitraryIndexingFilter to index values resolved by user POJO code at index time (#810)
lewismc
(nutch) branch master updated (5a95bc653 -> 1563396d9)
lewismc
(nutch) branch master updated (3905a8df7 -> 5a95bc653)
lewismc
(nutch) branch master updated (367988dfd -> 3905a8df7)
lewismc
(nutch) branch master updated: NUTCH-3008 indexer-elastic: downgrade to ES 7.10.2 to address licensing issues
snagel
(nutch) branch master updated: NUTCH-3029
markus
(nutch) branch master updated: NUTCH-3033 Upgrade Ivy to v2.5.2 (#803)
lewismc
(nutch) branch master updated: NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler
markus
(nutch) branch master updated: NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler
markus
(nutch) branch master updated: NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler
markus
(nutch) branch master updated: NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler
markus
(nutch) branch master updated: NUTCH-3030 Use system default cipher suites instead of hard-coded set
markus
(nutch) branch master updated: Update Dockerfile / JAVA_HOME - 2nd try (#805)
lewismc
(nutch) branch master updated: NUTCH-3031 ProtocolFactory host mapper to support domains
markus
(nutch) branch revert-801-patch-2 deleted (was 54394b9ed)
lewismc
(nutch) branch branch-1.19 updated: Revert "Update Dockerfile / JAVA_HOME (#801)" (#804)
lewismc
(nutch) branch revert-801-patch-2 created (now 54394b9ed)
lewismc
(nutch) 01/01: Revert "Update Dockerfile / JAVA_HOME (#801)"
lewismc
(nutch) branch branch-1.19 updated: Update Dockerfile / JAVA_HOME (#801)
lewismc
(nutch) branch master updated: Update crawl documentation
snagel
(nutch) branch master updated: NUTCH-3027 Trivial resource leak patch in DomainSuffixes.java
markus
(nutch) branch master updated: NUTCH-3024 Remove flaky 'dependency check' target (#795)
lewismc
(nutch) branch NUTCH-3026 created (now 3a294709d)
tallison
(nutch) 01/01: NUTCH-3026 -- first steps towards statusOnly option in IndexingJob
tallison
(nutch) branch master updated (adadc43fb -> 7ad382d95)
snagel
(nutch) branch master updated (90849124d -> adadc43fb)
snagel
(nutch) 01/02: [NUTCH-3017] Allow fast-urlfilter to load from HDFS/S3 and support gzipped input - use Hadoop-provided compression codecs - update description of property urlfilter.fast.file
snagel
(nutch) 02/02: Merge branch 'NUTCH-3017', closes #793
snagel
(nutch) branch master updated: NUTCH-3020 -- ParseSegment should check for okhttp's truncation flag (#794)
tallison
(nutch) branch master updated: NUTCH-3019 -- update Tika (#797)
tallison
(nutch) branch master updated: NUTCH-3014 Standardize Job names (#789)
lewismc
(nutch) branch master updated: NUTCH-3015 Add more CI steps to GitHub master-build.yml (#790)
lewismc
[nutch] branch master updated: NUTCH-3013 Employ commons-lang3's StopWatch to simplify timing logic (#788)
lewismc
[nutch] branch master updated: NUTCH-3012 SegmentReader when dumping with option -recode: NPE on unparsed documents - fall back to UTF-8 when stringifying the content of unparsed documents
snagel
[nutch] branch master updated: NUTCH-3011 HttpRobotRulesParser: handle HTTP 429 Too Many Requests same as server errors (HTTP 5xx)
snagel
[nutch] branch master updated: NUTCH-2990 HttpRobotRulesParser to follow 5 redirects as specified by RFC 9309 (#779)
snagel
[nutch] branch master updated: NUTCH-3009 Upgrade to Hadoop 3.3.6
snagel
[nutch] branch master updated: NUTCH-3002 Protocol-okhttp HttpResponse: HTTP header metadata lookup should be case-insensitive - implement class CaseInsensitiveMetadata providing case-insensitive metadata look-ups (but no spell-checking) - use CaseInsensitiveMetadata to hold HTTP header metadata in in the class OkHttpResponse of protocol-okhttp - add unit tests to prove the fix (and also case-insensitive look-ups and spell-checking in protocol-http)
snagel
[nutch] branch master updated (a74b57b90 -> 97eb0b5ac)
tallison
[nutch] branch master updated (a1ab4333e -> a74b57b90)
snagel
[nutch] branch master updated: NUTCH-2897 Do not supress deprecated API warnings - deprecate constructor of NutchJob - remove deprocated call to Object.finalize() from Plugin.finalize()
snagel
[nutch] branch master updated: NUTCH-3010 Injector: count unique number of injected URLs - add counter urls_injected_unique - improve log messages reporting the counts of injected/merged URLs
snagel
[nutch] branch master updated (417b87732 -> a72a53a32)
snagel
[nutch] branch master updated: NUTCH-2852 SpotBugs: Method invokes System.exit(...) - remove all calls of System.exit(...) in methods except main(args) of various "checker" tools
snagel
[nutch] branch master updated: NUTCH-3004 -- propagate ssl exception if message doesn't match "handshake alert..."
tallison
[nutch] branch master updated (0ad935fdc -> d81be5181)
tallison
[nutch] branch master updated: Remove Any23 from Nutch
tallison
[nutch] branch master updated: NUTCH-3000 - the selenium protocol should return the full html, not just the inner body element.
tallison
[nutch] branch master updated: NUTCH-3001 - fix logic for grabbing bytes if there's no content type in the header
tallison
[nutch] branch master updated: NUTCH-2999 -- upgrade lucene to latest 8.x throughout
tallison
[nutch] branch NUTCH-2999 deleted (was 3bb8b0eeb)
tallison
[nutch] branch master updated (f5cd0d633 -> e93aa977e)
tallison
[nutch] 01/01: Merge pull request #770 from apache/NUTCH-2999
tallison
[nutch] branch NUTCH-2999 created (now 3bb8b0eeb)
tallison
[nutch] branch master updated: NUTCH-2989 -- ElasticIndexWriter should enable auth for https, too
tallison
[nutch] branch master updated: NUTCH-2997 Add Override annotations
snagel
[nutch] branch master updated: NUTCH-2996 Use new SimpleRobotRulesParser API entry point crawler-commons 1.4
snagel
[nutch] branch master updated: NUTCH-2995 Upgrade to crawler-commons 1.4
snagel
[nutch] branch master updated: NUTCH-2993 ScoringDepth plugin to skip depth check based on URL Pattern - apply patch contributed by Markus Jelsma
snagel
[nutch-site] branch asf-staging updated: Add logo on URL path where requested README.md in source code repository
snagel
[nutch-site] branch main updated: Add logo on URL path where requested README.md in source code repository
snagel
[nutch-site] branch asf-site updated: Add logo on URL path where requested README.md in source code repository
snagel
[nutch-site] branch asf-site updated: Add link to ASF privacy policies
snagel
[nutch-site] branch main updated: Add link to ASF privacy policies
snagel
[nutch-site] branch asf-staging updated: Add link to ASF privacy policies
snagel
[nutch-site] branch main updated (aa45c17 -> db7208f)
snagel
[nutch-site] 02/03: Update copyright year 2022 -> 2023
snagel
[nutch-site] 03/03: Add new committer / PMC
snagel
[nutch-site] 01/03: - add link / banner of Apache conferences or events - rename and move link to ASF
snagel
[nutch-site] branch asf-site updated: - add new committer / PMC - update copyright year 2022 -> 2023 - add link / banner of Apache conferences or events - rename and move link to ASF
snagel
[nutch-site] branch asf-staging updated: - add new committer / PMC - update copyright year 2022 -> 2023 - add link / banner of Apache conferences or events - rename and move link to ASF
snagel
[nutch-webapp] branch dependabot/maven/com.h2database-h2-2.2.220 created (now 0b5fed6)
github-bot
[nutch-webapp] branch dependabot/maven/com.google.guava-guava-32.0.0-jre created (now b38a4ff)
github-bot
[nutch] branch master updated: NUTCH-2991 Support HTTP/S Header Authorization for Solr connections (#763)
snagel
[nutch] branch master updated: NUTCH-2992 Fetcher: always block fetch queues when exceptions threshold is reached - if QueueFeeder is still alive, also block queues which are empty right now
snagel
[nutch-webapp] branch dependabot/maven/org.springframework-spring-core-5.2.24.RELEASE created (now 70deb3a)
github-bot
[nutch-webapp] branch dependabot/maven/org.springframework-spring-core-5.2.23.RELEASE created (now 9e33145)
github-bot
[nutch] branch master updated: NUTCH-2596 Upgrade from org.mortbay.jetty to org.eclipse.jetty - upgrade from org.mortbay.jetty 6.1.26 to org.eclipse.jetty 9.4.50 (Hadoop depends on 9.4.43) - remove obsolete dependency exclusions of hadoop-common - upgrade Fetcher unit tests to use org.eclipse.jetty
snagel
[nutch] branch master updated: NUTCH-2984 Drop test proxy server and benchmark tool
snagel
[nutch] branch master updated: NUTCH-2985 Disable plugin urlfilter-validator by default
snagel
[nutch] branch master updated: NUTCH-2983 nutch-default.xml improvements - remove property "hadoop.job.history.user.location", obsolete since Hadoop 0.21.0 - normalize spelling (case) of URL and CrawlDb - trim trailing space - fix typos - improve description of properties {db,linkdb}.ignore.{ex,in}ternal.links
snagel
[nutch] branch master updated: NUTCH-2972 Javadoc build fails using JDK 17 - fix Javadoc issues when building with JDK 17
snagel
[nutch] branch master updated: NUTCH-2982 Generator: parameter for URL normalization not passed forward - pass forward params `norm` and `maxNumSegments` - fix typos in Javadoc
snagel
[nutch] branch master updated (383aeca5d -> e8fd21090)
snagel
[nutch] 02/07: NUTCH-2920 -- fix imports
snagel
[nutch] 03/07: NUTCH-2920 -- add keystore for 2-way tls; add back in no-tls option with a stern warning and possibly helpful links.
snagel
[nutch] 04/07: NUTCH-2920 -- improve handling for missing trust.store.path in the index-writers.xml
snagel
[nutch] 07/07: Add indexer-opensearch-1x to 4 more targets...feedback from sebastian-nagel
snagel
[nutch] 05/07: NUTCH-2920 -- improve username/pw logic and update README.md
snagel
[nutch] 01/07: NUTCH-2920 -- first working attempt at migrating ElasticsearchIndexWriter to OpenSearch
snagel
[nutch] 06/07: fix template to include new key store info. remove unused auth
snagel
[nutch] branch master updated: NUTCH-2980: Upgraded Selenium to 4.7.2 + HTMLUnit
snagel
[nutch] branch master updated: NUTCH-2974 Ant build fails with "Unparseable date" on certain platforms
snagel
[nutch] branch master updated: NUTCH-2634 Some links marked as "nofollow" are followed anyway - fix detection of nofollow in multi-valued rel attributes
snagel
[nutch] branch master updated: NUTCH-2924 Generate maxCount expr evaluated only once
markus
[nutch-webapp] branch dependabot/maven/org.springframework-spring-web-6.0.0 created (now dc3ba0a)
github-bot
[nutch] branch master updated: NUTCH-2977
markus
[nutch-webapp] branch dependabot/maven/org.springframework-spring-web-4.2.7.RELEASE created (now faede31)
github-bot
[nutch] branch master updated (85f7bcb63 -> ed7b6615b)
snagel
svn commit: r56776 - /release/nutch/1.18/
snagel
[nutch] branch master updated (ffe059892 -> 85f7bcb63)
snagel
[nutch] 01/02: Nutch 1.19 release - update current year in API docs etc. - update version number - add changes / release notes - update links to Hadoop API docs
snagel
[nutch] 02/02: Prepare for new development after release of 1.19 - bump version number (-> 1.20-NAPSHOT)
snagel
[nutch-site] branch main updated (4efc5a9 -> aa45c17)
snagel
[nutch-site] 02/02: Announce release of Nutch 1.19 - fix release data in announcement
snagel
[nutch-site] branch asf-site updated: Announce release of Nutch 1.19 - fix release data in announcement
snagel
[nutch-site] branch asf-staging updated: Announce release of Nutch 1.19 - fix release data in announcement
snagel
[nutch-site] branch asf-site updated (a41c7ef -> 314b1b2)
snagel
[nutch-site] 01/03: - add README for branch asf-site - modify .asf.yaml to contain only instructions required in branch asf-site
snagel
[nutch-site] 02/03: Update content from Hugo build after adding Kube modified templates
snagel
[nutch-site] branch asf-staging updated (3e9e725 -> 2cfe00d)
snagel
[nutch-site] branch asf-staging updated: Announce release of Nutch 1.19
snagel
svn commit: r56738 [1/3] - /release/nutch/1.19/CHANGES.txt
snagel
svn commit: r56738 [3/3] - /release/nutch/1.19/CHANGES.txt
snagel
svn commit: r56738 [2/3] - /release/nutch/1.19/CHANGES.txt
snagel
[nutch-site] branch main updated: NUTCH-1999 Add /robots.txt to Nutch site (#1)
snagel
[nutch-site] branch asf-staging updated: - add README for branch asf-staging - modify .asf.yaml to contain only instructions required in branch asf-staging
snagel
[nutch-site] branch NUTCH-1999-nutch-site-robots-txt updated (142489f -> f863c1f)
snagel
[nutch-site] branch asf-staging updated: Sync .asf.yaml file with main branch
snagel
[nutch-site] branch asf-staging created (now d77dbb5)
snagel
[nutch-site] 01/01: Update content from Hugo build after adding Kube modified templates
snagel
svn commit: r56686 - /dev/nutch/1.19/ /release/nutch/1.19/
snagel
svn commit: r56398 - /dev/nutch/1.19/
snagel
[nutch] branch branch-1.19 created (now 63d4f11c0)
snagel
[nutch] annotated tag release-1.19 updated (63d4f11c0 -> 5d7660ceb)
snagel
[nutch] branch master updated: NUTCH-2969 Javadoc: Javascript search is not working when built on JDK 11 - pass --no-module-directories to javadoc target when building on JDK 11 - remove obsolete condition to fail javadoc builds on JDK 7u25 and earlier
snagel
[nutch] branch master updated (bca5fc0d0 -> 635ef2f3b)
snagel
[nutch] branch master updated (bec577d50 -> bca5fc0d0)
snagel
[nutch] branch master updated: NUTCH-2863 Injector to parse command-line flags case-insensitive
snagel
[nutch] branch master updated: NUTCH-2962 Update and complete package info of protocol plugins
snagel
[nutch] branch master updated: NUTCH-2930 Protocol-okhttp: implement IP filter (#736)
snagel
[nutch] branch master updated (c0f723e99 -> 05afebd03)
snagel
[nutch] branch master updated (edebfe49f -> c0f723e99)
snagel
[nutch] branch master updated (a5a630055 -> edebfe49f)
snagel
[nutch] branch master updated (82f9530dc -> a5a630055)
snagel
[nutch] branch master updated (b7b834501 -> 82f9530dc)
snagel
[nutch] branch master updated (8fc4f17ac -> b7b834501)
snagel
[nutch] branch master updated: NUTCH-2956 index-geoip: dependency upgrades and improvements - upgrade to geoip2 3.0.1 - exclude transitive dependencies (Jackson) provided as Nutch core deps - read also GeoLite2-*.mmdb files - review index field names in plugin and Nutch Solr schema: - fix typos in field names - remove unused fields from schema
snagel
[nutch] branch master updated: NUTCH-2953 Indexer Elastic to ignore SSL issues - apply patch contributed by Markus Jelsma - fix class imports
snagel
[nutch] branch master updated: NUTCH-2952 Upgrade core dependencies - Hadoop 3.1.3 -> 3.3.3 - log4j 2.17.0 -> 2.17.2 - and some more
snagel
[nutch] branch master updated (5b970ff22 -> 487110b07)
snagel
[nutch] 03/03: NUTCH-2936 Early registration of URL stream handlers provided by plugins may fail Hadoop jobs running in distributed mode if protocol-okhttp is used NUTCH-2949 Tasks of a multi-threaded map runner may fail because of slow creation of URL stream handlers
snagel
[nutch] 02/03: NUTCH-2936 Early registration of URL stream handlers provided by plugins may fail Hadoop jobs running in distributed mode if protocol-okhttp is used - code improvements Nutch plugin system: - use `Class<?>` and remove suppressions of warnings - javadocs: fix typos - remove superfluous white space - autoformat using code style template
snagel
[nutch] 01/03: NUTCH-2936 Early registration of URL stream handlers provided by plugins may fail Hadoop jobs running in distributed mode if protocol-okhttp is used - protocol-okhttp: initialize SSLContext used to ignore SSL/TLS certificate verificiation not in a static code block
snagel
[nutch] branch master updated: NUTCH-2951 Crawl datum with metadata WRITABLE_GENERATE_TIME_KEY awaits fetching forever - bug fix: add missing braces (bug introduced with NUTCH-2737, solution contributed by Lapadula Alessandro)
snagel
[nutch-webapp] branch dependabot/maven/org.springframework-spring-core-5.2.22.RELEASE created (now e19e71e)
github-bot
[nutch] branch master updated (02dca3b6d -> 47d3fe607)
snagel
[nutch] branch master updated: NUTCH-2936 Early registration of URL stream handlers provided by plugins may fail Hadoop jobs running in distributed mode (#726)
lewismc
[nutch] branch master updated (568993b90 -> bdbe7b330)
snagel
[nutch] 01/02: NUTCH-2946 Fetcher: slow down fetching from hosts where requests fail repeatedly with exceptions or HTTP status codes mapped to ProtocolStatus.EXCEPTION (HTTP 403 Forbidden, 429 Too many requests, 5xx server errors, etc.)
snagel
[nutch] 02/02: NUTCH-2946 Fetcher: optionally slow down fetching from hosts with repeated exceptions - configure the delay in seconds as a float instead of milliseconds - use the value of fetcher.server.delay as default - double the delay with every observed exception (exponential backoff) but cap the growth at 2**31 to avoid overflows
snagel
[nutch] branch master updated: NUTCH-2948 Upgrade dependencies to Any23 2.7 and Tika 2.3.0
snagel
[nutch-webapp] branch dependabot/maven/org.springframework-spring-core-5.3.19 created (now b974d7a)
github-bot
Earlier messages