Kim Whitehall created NUTCH-2100: ------------------------------------ Summary: Nutch dump command doesnt dump anything Key: NUTCH-2100 URL: https://issues.apache.org/jira/browse/NUTCH-2100 Project: Nutch Issue Type: Bug Reporter: Kim Whitehall
When running the cmd nutch dump -segment segment -outputDir dumpFolder -mimeStats I receive the following Dumper File Stats: TOTAL Stats: [ ] The log indicates that segments are being skipped. Note, if I use nutch/readseg -dump I can see there is content there. The log is shown below: 2015-09-15 20:10:56,142 INFO tools.FileDumper - Accepting all mimetypes. 2015-09-15 20:10:56,782 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2015-09-15 20:10:57,057 INFO tools.FileDumper - Processing segment: [/.../segments/20150915195411/crawl_generate] 2015-09-15 20:10:57,057 WARN tools.FileDumper - Skipping segment: [/.../segments/20150915195411/crawl_generate/content/part-00000/data]: no data directory present 2015-09-15 20:10:57,057 INFO tools.FileDumper - Processing segment: [/.../segments/20150915195411/crawl_fetch] 2015-09-15 20:10:57,057 WARN tools.FileDumper - Skipping segment: [/.../segments/20150915195411/crawl_fetch/content/part-00000/data]: no data directory present 2015-09-15 20:10:57,058 INFO tools.FileDumper - Processing segment: [/.../segments/20150915195411/content] 2015-09-15 20:10:57,058 WARN tools.FileDumper - Skipping segment: [/.../segments/20150915195411/content/content/part-00000/data]: no data directory present 2015-09-15 20:10:57,058 INFO tools.FileDumper - Processing segment: [/.../segments/20150915195411/parse_text] 2015-09-15 20:10:57,058 WARN tools.FileDumper - Skipping segment: [/.../segments/20150915195411/parse_text/content/part-00000/data]: no data directory present 2015-09-15 20:10:57,058 INFO tools.FileDumper - Processing segment: [/.../segments/20150915195411/parse_data] 2015-09-15 20:10:57,058 WARN tools.FileDumper - Skipping segment: [/.../segments/20150915195411/parse_data/content/part-00000/data]: no data directory present 2015-09-15 20:10:57,058 INFO tools.FileDumper - Processing segment: [/.../segments/20150915195411/crawl_parse] 2015-09-15 20:10:57,058 WARN tools.FileDumper - Skipping segment: [/.../segments/20150915195411/crawl_parse/content/part-00000/data]: no data directory present 2015-09-15 20:10:57,059 INFO tools.FileDumper - Dumper File Stats: TOTAL Stats: [ ] -- This message was sent by Atlassian JIRA (v6.3.4#6332)