lewismc commented on code in PR #906:
URL: https://github.com/apache/nutch/pull/906#discussion_r3067764239
##########
src/java/org/apache/nutch/indexer/IndexingJob.java:
##########
@@ -155,6 +159,25 @@ public void index(Path crawlDb, Path linkDb, List<Path>
segments,
LOG.error(StringUtils.stringifyException(e));
throw e;
}
+ Path latencyDir = new Path(tmp, "_latency");
+ FileSystem fs = tmp.getFileSystem(conf);
+ if (fs.exists(latencyDir)) {
+ try (Job mergeJob = IndexerMapReduce.createLatencyMergeJob(conf,
latencyDir)) {
+ FileOutputFormat.setOutputPath(mergeJob, new Path(tmp,
"_latency_merge_out"));
+ boolean mergeSuccess = mergeJob.waitForCompletion(true);
Review Comment:
Yes, I wondered about this. I am not a huge fan of the intermediate output
being written for IndexerJob either. I think we could even remove the changes
for this job and address them separately. This will NOT have an impact on the
Job execution... however the counters are not accurate.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]