[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17766078#comment-17766078 ]
ASF GitHub Bot commented on NUTCH-2978: --------------------------------------- sebastian-nagel commented on PR #772: URL: https://github.com/apache/nutch/pull/772#issuecomment-1722472438 +1 A test with the [pseudo-distributed Hadoop setup](https://github.com/sebastian-nagel/nutch-test-single-node-cluster/) was successful: - Nutch tools work properly, no issues - as expected, Hadoop puts slf4j-api-1.7.36.jar and slf4j-reload4j-1.7.36.jar in the classpath in front of the Nutch job jars - consequently, task logs are formatted using the format defined in `$HADOOP_HOMe/etc/hadoop/log4j.properties` - (the good thing) log messages from Nutch classes appear in the task logs, e.g. ``` 2023-09-17 07:29:21,726 INFO [FetcherThread] org.apache.nutch.fetcher.FetcherThread: FetcherThread 33 fetching https://nutch.apache.org/ (queue crawl delay=5000ms) ``` - the log format defined in `$NUTCH_HOME/conf/log4j2.xml` is only applied to the logs of the Yarn job client, e.g. ``` 2023-09-17 07:29:32,432 INFO fetcher.Fetcher: Fetcher: finished at 2023-09-17 07:29:32, elapsed: 00:00:25 ``` - in addition, I've included two PDFs, a XLSX and a ePub document, to test the Tika parser: the docs were successfully parsed using Tika 2.3.0 - if necessary I can repeat the test for NUTCH-2959 > Move to slf4j2 and remove log4j1 and reload4j > --------------------------------------------- > > Key: NUTCH-2978 > URL: https://issues.apache.org/jira/browse/NUTCH-2978 > Project: Nutch > Issue Type: Task > Reporter: Markus Jelsma > Priority: Major > Attachments: NUTCH-2978-1.patch, NUTCH-2978-2.patch, > NUTCH-2978-3.patch, NUTCH-2978-any23.patch, NUTCH-2978.patch > > > I got in trouble upgrading some dependencies and got a lot of LinkageErrors > today, or with a Tika upgrade, disappearing logs. This patch fixes that by > moving to slf4j2, using the corrent log4j2-slfj4-impl2 and getting rid of old > log4j -> reload4j. > > This patch fixes it. -- This message was sent by Atlassian Jira (v8.20.10#820010)