[ 
https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17766078#comment-17766078
 ] 

ASF GitHub Bot commented on NUTCH-2978:
---------------------------------------

sebastian-nagel commented on PR #772:
URL: https://github.com/apache/nutch/pull/772#issuecomment-1722472438

   +1
   
   A test with the [pseudo-distributed Hadoop 
setup](https://github.com/sebastian-nagel/nutch-test-single-node-cluster/) was 
successful:
   - Nutch tools work properly, no issues
   - as expected, Hadoop puts slf4j-api-1.7.36.jar and 
slf4j-reload4j-1.7.36.jar in the classpath in front of the Nutch job jars
   - consequently, task logs are formatted using the format defined in 
`$HADOOP_HOMe/etc/hadoop/log4j.properties`
   - (the good thing) log messages from Nutch classes appear in the task logs, 
e.g.
     ```
      2023-09-17 07:29:21,726 INFO [FetcherThread] 
org.apache.nutch.fetcher.FetcherThread: FetcherThread 33 fetching 
https://nutch.apache.org/ (queue crawl delay=5000ms)
     ```
   - the log format defined in `$NUTCH_HOME/conf/log4j2.xml` is only applied to 
the logs of the Yarn job client, e.g.
     ```
     2023-09-17 07:29:32,432 INFO fetcher.Fetcher: Fetcher: finished at 
2023-09-17 07:29:32, elapsed: 00:00:25
     ```
   - in addition, I've included two PDFs, a XLSX and a ePub document, to test 
the Tika parser: the docs were successfully parsed using Tika 2.3.0 - if 
necessary I can repeat the test for NUTCH-2959
   




> Move to slf4j2 and remove log4j1 and reload4j
> ---------------------------------------------
>
>                 Key: NUTCH-2978
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2978
>             Project: Nutch
>          Issue Type: Task
>            Reporter: Markus Jelsma
>            Priority: Major
>         Attachments: NUTCH-2978-1.patch, NUTCH-2978-2.patch, 
> NUTCH-2978-3.patch, NUTCH-2978-any23.patch, NUTCH-2978.patch
>
>
> I got in trouble upgrading some dependencies and got a lot of LinkageErrors 
> today, or with a Tika upgrade, disappearing logs. This patch fixes that by 
> moving to slf4j2, using the corrent log4j2-slfj4-impl2 and getting rid of old 
> log4j -> reload4j.
>  
> This patch fixes it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to