Jean-Daniel Cryans has posted comments on this change. Change subject: kudu-spark-tools: Spark tool for Import & Export different format of files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: mapreduced base export to csv and import parquet files. ......................................................................
Patch Set 2: (29 comments) http://gerrit.cloudera.org:8080/#/c/7421/2//COMMIT_MSG Commit Message: Line 7: kudu-spark-tools: Spark tool for Import & Export different format of files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: mapreduced base export to csv and import parquet files. Please follow best practices for git commit messages, such as https://chris.beams.io/posts/git-commit/ http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java: PS2, Line 35: a CSV files. nit: is there one or many files? Line 39: public class ExportCsv extends Configured implements Tool { This needs a test. PS2, Line 50: The current configuration. I know the ImportCsv* classes is like this, but the javadoc format we use dictates starting @param line with a lower case and and not ending with a period. See http://www.oracle.com/technetwork/java/javase/documentation/index-137868.html#format No need to fix ImportCsv in this patch but you'd be welcome to contribute a separate patch for it :) PS2, Line 85: with columns nit: rephrase to something like "the given table and columns into..." PS2, Line 86: "T nit: mind putting this on a new line? Then the code looks like what the error would look like. PS2, Line 86: kudu nit: Kudu http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java: Line 32: nit: remove the extra blank line PS2, Line 48: (i.e., the parsing). That made more sense in ImportCsbMapper, I'd just remove this javadoc completely. Line 54: this.separator = conf.get(ExportCsv.SEPARATOR_CONF_KEY); single line: this.separator = conf.get(ExportCsv.SEPARATOR_CONF_KEY, ExportCsv.DEFAULT_SEPARATOR); PS2, Line 61: Insert nits: s/Convert/Converts s/Insert/RowResult also end the sentence with a period. Line 74: * converts RowResult $this.separator string Please revise this whole javadoc section. FWIW it might not be necessary, since it's a private method. Also it's not doing anything surprising. Line 114: default: Missing UNIXTIME_MICROS? http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java: PS2, Line 43: PARQUET nit: Apache Parquet Line 47: public class ImportParquet extends Configured implements Tool { This needs a test. Line 59: * @param conf The current configuration. Same comment as before regarding javadoc. Line 75: LOG.info(schema); Did you forget to remove this? Line 88: FileInputFormat.setInputPaths(job, inputDir); You could run some pre-flight checks like making sure that the columns match. PS2, Line 105: PARQUET Apache Parquet http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java: PS2, Line 39: PARQUET lines and turns them into Kudu Inserts. some comments as before regarding PARQUET and Inserts Line 101: case DOUBLE: UNIXTIME_MICROS would be recognized but not supported, someone might have TIMESTAMP and think it's compatible, maybe you can catch that before like in createSubmittableJob? http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java: Line 27: * Created by sany on 26/06/2017 AD. Remove. PS2, Line 29: nit Line 37: Set<String> counts = context.getKeyValueMetadata().get("my.count"); What's this about? Line 38: // assertTrue("counts: " + counts, counts.size() > 0); Remove. http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala File java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala: Line 1: remove this blank line Line 18: package org.apache.kudu.spark.tools add a blank line Line 24: import org.slf4j.{Logger, LoggerFactory} please re-order this Line 29: object ImportExportKudu { I'm less familiar with scala but at least we'll need a test. -- To view, visit http://gerrit.cloudera.org:8080/7421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sandish Kumar HN <sanysand...@gmail.com> Gerrit-Reviewer: Jean-Daniel Cryans <jdcry...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sandish Kumar HN <sanysand...@gmail.com> Gerrit-HasComments: Yes