----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/8151/#review13700 -----------------------------------------------------------
crunch/src/main/java/org/apache/crunch/lib/PTables.java <https://reviews.apache.org/r/8151/#comment29361> This method doesn't seem to be used anywhere. Remove? crunch/src/main/java/org/apache/crunch/lib/text/AbstractCompositeExtractor.java <https://reviews.apache.org/r/8151/#comment29362> Missing javadoc crunch/src/main/java/org/apache/crunch/lib/text/AbstractCompositeExtractor.java <https://reviews.apache.org/r/8151/#comment29363> This class generates lots of warnings in Eclipse; using Extractor<?> instead of just Extractor will shut Eclipse up. crunch/src/main/java/org/apache/crunch/lib/text/AbstractSimpleExtractor.java <https://reviews.apache.org/r/8151/#comment29364> Needs javadoc describing the contract. I think GoF suggests doExtract() for template methods. crunch/src/main/java/org/apache/crunch/lib/text/Extractors.java <https://reviews.apache.org/r/8151/#comment29365> I think all inner classes here should be private, like in Aggregators. Otherwise we'll add a lot of new classes to our javadocs. Also, this class needs some more stating-the-obvious javadoc ;-) crunch/src/main/java/org/apache/crunch/lib/text/Parse.java <https://reviews.apache.org/r/8151/#comment29366> Final, with private constructor? crunch/src/main/java/org/apache/crunch/lib/text/Parse.java <https://reviews.apache.org/r/8151/#comment29367> Does this need to be public? crunch/src/main/java/org/apache/crunch/lib/text/ScannerFactory.java <https://reviews.apache.org/r/8151/#comment29369> I think a fluent Builder pattern would be more appropriate than a Factory. This would solve the problem of bulky method names like forDelimiterAnSkip(). crunch/src/main/java/org/apache/crunch/lib/text/ScannerFactory.java <https://reviews.apache.org/r/8151/#comment29368> As per classic Code Conventions from Sun, attributes should be at the top of the class. Pretty cool, looking forward to the next round. It was too late already when I noticed that you can annotate multiple lines, but otherwise I think I could get used to RB ;-) - Matthias Friedrich On Nov. 21, 2012, 2:31 a.m., Josh Wills wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/8151/ > ----------------------------------------------------------- > > (Updated Nov. 21, 2012, 2:31 a.m.) > > > Review request for crunch. > > > Description > ------- > > Latest and greatest rev of the extraction library for text parsing. I ended > up refactoring the approach so that we could support nested parsing (e.g., > using different Scanner instances for different parts of a line) and > collections of items on a single line. > > > This addresses bug CRUNCH-97. > https://issues.apache.org/jira/browse/CRUNCH-97 > > > Diffs > ----- > > crunch/src/main/java/org/apache/crunch/lib/PTables.java e788656 > > crunch/src/main/java/org/apache/crunch/lib/text/AbstractCompositeExtractor.java > PRE-CREATION > > crunch/src/main/java/org/apache/crunch/lib/text/AbstractSimpleExtractor.java > PRE-CREATION > crunch/src/main/java/org/apache/crunch/lib/text/Extractor.java PRE-CREATION > crunch/src/main/java/org/apache/crunch/lib/text/ExtractorStats.java > PRE-CREATION > crunch/src/main/java/org/apache/crunch/lib/text/Extractors.java > PRE-CREATION > crunch/src/main/java/org/apache/crunch/lib/text/Parse.java PRE-CREATION > crunch/src/main/java/org/apache/crunch/lib/text/ScannerFactory.java > PRE-CREATION > crunch/src/test/java/org/apache/crunch/lib/text/ParseTest.java PRE-CREATION > > Diff: https://reviews.apache.org/r/8151/diff/ > > > Testing > ------- > > Unit tests so far, still gathering feedback on the approach. > > > Thanks, > > Josh Wills > >
