> On Sept. 10, 2014, 1:24 a.m., Julien Le Dem wrote:
> > ./trunk/src/java/org/apache/nutch/tools/FileDumper.java, line 45
> > <https://reviews.apache.org/r/9119/diff/1/?file=681989#file681989line45>
> >
> >     this should be in the scope of the main method.
> >     If you wanted to write unit tests it would be inconvenient as calling 
> > main more than once would cumulate the stats.

Thanks, addressed.


> On Sept. 10, 2014, 1:24 a.m., Julien Le Dem wrote:
> > ./trunk/src/java/org/apache/nutch/tools/FileDumper.java, line 64
> > <https://reviews.apache.org/r/9119/diff/1/?file=681989#file681989line64>
> >
> >     you might want to throw an exception if this returns false

Fixed.


> On Sept. 10, 2014, 1:24 a.m., Julien Le Dem wrote:
> > ./trunk/src/java/org/apache/nutch/tools/FileDumper.java, line 88
> > <https://reviews.apache.org/r/9119/diff/1/?file=681989#file681989line88>
> >
> >     as this all working from the local file system, using Files all the way 
> > and converting to path when needed seems more natural.
> >     new Path(file.toURI()) for example.

Not sure how to address this one?


> On Sept. 10, 2014, 1:24 a.m., Julien Le Dem wrote:
> > ./trunk/src/java/org/apache/nutch/tools/FileDumper.java, line 117
> > <https://reviews.apache.org/r/9119/diff/1/?file=681989#file681989line117>
> >
> >     we create new File(outputFullPath) twice.

Fixed.


> On Sept. 10, 2014, 1:24 a.m., Julien Le Dem wrote:
> > ./trunk/src/java/org/apache/nutch/tools/FileDumper.java, line 116
> > <https://reviews.apache.org/r/9119/diff/1/?file=681989#file681989line116>
> >
> >     does content close the stream?

Not sure, but I put the close in the finally block now and refactored.


- Chris


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9119/#review52809
-----------------------------------------------------------


On Sept. 10, 2014, 3:15 a.m., Chris Mattmann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9119/
> -----------------------------------------------------------
> 
> (Updated Sept. 10, 2014, 3:15 a.m.)
> 
> 
> Review request for nutch.
> 
> 
> Bugs: NUTCH-1526
>     https://issues.apache.org/jira/browse/NUTCH-1526
> 
> 
> Repository: nutch
> 
> 
> Description
> -------
> 
> Will contain the patch the SegmentContentDumperTool described in NUTCH-1526:
> 
> ./bin/nutch org.apache.nutch.tools.SegmentContentDumper [options]
>    -segmentRootDir full file path to the root segment directory, e.g., 
> crawl/segments
>    -regexUrlPattern a regex URL pattern to select URL keys to dump from the 
> content DB in each segment
>    -outputDir The output directory to write file names to.
>    -metadata --key=value where key is a Content Metadata key and value is a 
> value to check.
> 
> 
> Diffs
> -----
> 
>   ./trunk/src/java/org/apache/nutch/tools/FileDumper.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/9119/diff/
> 
> 
> Testing
> -------
> 
> Testing it on DARPA XDATA XNET.
> 
> 
> Thanks,
> 
> Chris Mattmann
> 
>

Reply via email to