[ https://issues.apache.org/jira/browse/NUTCH-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel updated NUTCH-3012: ----------------------------------- Description: SegmentReader when called with the flag {{-recode}} fails with a NPE when trying to stringify the raw content of unparsed documents: {noformat} $> bin/nutch readseg -dump crawl/segments/20231009065431 crawl/segreader/20231009065431 -recode ... 2023-10-09 07:55:18,451 INFO mapreduce.Job: Task Id : attempt_1696825862783_0005_r_000000_0, Status : FAILED Error: java.lang.NullPointerException: charset at java.base/java.lang.String.<init>(String.java:504) at java.base/java.lang.String.<init>(String.java:561) at org.apache.nutch.protocol.Content.toString(Content.java:297) at org.apache.nutch.segment.SegmentReader$InputCompatReducer.reduce(SegmentReader.java:189) {noformat} > SegmentReader when dumping with option -recode: NPE on unparsed documents > ------------------------------------------------------------------------- > > Key: NUTCH-3012 > URL: https://issues.apache.org/jira/browse/NUTCH-3012 > Project: Nutch > Issue Type: Bug > Components: segment > Affects Versions: 1.19 > Reporter: Sebastian Nagel > Assignee: Sebastian Nagel > Priority: Major > Fix For: 1.20 > > > SegmentReader when called with the flag {{-recode}} fails with a NPE when > trying to stringify the raw content of unparsed documents: > {noformat} > $> bin/nutch readseg -dump crawl/segments/20231009065431 > crawl/segreader/20231009065431 -recode > ... > 2023-10-09 07:55:18,451 INFO mapreduce.Job: Task Id : > attempt_1696825862783_0005_r_000000_0, Status : FAILED > Error: java.lang.NullPointerException: charset > at java.base/java.lang.String.<init>(String.java:504) > at java.base/java.lang.String.<init>(String.java:561) > at org.apache.nutch.protocol.Content.toString(Content.java:297) > at > org.apache.nutch.segment.SegmentReader$InputCompatReducer.reduce(SegmentReader.java:189) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)