> On Sept. 10, 2014, 1:24 a.m., Julien Le Dem wrote: > > ./trunk/src/java/org/apache/nutch/tools/FileDumper.java, line 45 > > <https://reviews.apache.org/r/9119/diff/1/?file=681989#file681989line45> > > > > this should be in the scope of the main method. > > If you wanted to write unit tests it would be inconvenient as calling > > main more than once would cumulate the stats.
Thanks, addressed. > On Sept. 10, 2014, 1:24 a.m., Julien Le Dem wrote: > > ./trunk/src/java/org/apache/nutch/tools/FileDumper.java, line 64 > > <https://reviews.apache.org/r/9119/diff/1/?file=681989#file681989line64> > > > > you might want to throw an exception if this returns false Fixed. > On Sept. 10, 2014, 1:24 a.m., Julien Le Dem wrote: > > ./trunk/src/java/org/apache/nutch/tools/FileDumper.java, line 88 > > <https://reviews.apache.org/r/9119/diff/1/?file=681989#file681989line88> > > > > as this all working from the local file system, using Files all the way > > and converting to path when needed seems more natural. > > new Path(file.toURI()) for example. Not sure how to address this one? > On Sept. 10, 2014, 1:24 a.m., Julien Le Dem wrote: > > ./trunk/src/java/org/apache/nutch/tools/FileDumper.java, line 117 > > <https://reviews.apache.org/r/9119/diff/1/?file=681989#file681989line117> > > > > we create new File(outputFullPath) twice. Fixed. > On Sept. 10, 2014, 1:24 a.m., Julien Le Dem wrote: > > ./trunk/src/java/org/apache/nutch/tools/FileDumper.java, line 116 > > <https://reviews.apache.org/r/9119/diff/1/?file=681989#file681989line116> > > > > does content close the stream? Not sure, but I put the close in the finally block now and refactored. - Chris ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9119/#review52809 ----------------------------------------------------------- On Sept. 10, 2014, 3:15 a.m., Chris Mattmann wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/9119/ > ----------------------------------------------------------- > > (Updated Sept. 10, 2014, 3:15 a.m.) > > > Review request for nutch. > > > Bugs: NUTCH-1526 > https://issues.apache.org/jira/browse/NUTCH-1526 > > > Repository: nutch > > > Description > ------- > > Will contain the patch the SegmentContentDumperTool described in NUTCH-1526: > > ./bin/nutch org.apache.nutch.tools.SegmentContentDumper [options] > -segmentRootDir full file path to the root segment directory, e.g., > crawl/segments > -regexUrlPattern a regex URL pattern to select URL keys to dump from the > content DB in each segment > -outputDir The output directory to write file names to. > -metadata --key=value where key is a Content Metadata key and value is a > value to check. > > > Diffs > ----- > > ./trunk/src/java/org/apache/nutch/tools/FileDumper.java PRE-CREATION > > Diff: https://reviews.apache.org/r/9119/diff/ > > > Testing > ------- > > Testing it on DARPA XDATA XNET. > > > Thanks, > > Chris Mattmann > >