Jean-Daniel Cryans has posted comments on this change. Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro) ......................................................................
Patch Set 3: (19 comments) http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java: Line 69: * converts RowResult to string. I'd still advocate removing this javadoc section, it adds nothing and it's a private method. http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java: Line 97: //pre-flight checks of input parquet schema and table schema nit: missing space after //m Also end your sentence with a period. Line 99: if (!schema.containsField(sche.getName())) { Why do you not also check the type? Line 101: System.exit(0); Having System.exits in the code isn't good, ideally this case would be tested and if you exit then how can you catch the error? Line 104: //Kudu does not recommend using TIMESTAMP Well Kudu doesn't support Parquet's TIMESTAMP, it's not about a recommendation. Also same nit as the comment above, and some comment regarding exit. http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITExportCsv.java File java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITExportCsv.java: Line 17: package org.apache.kudu.mapreduce.tools; nit: missing a blank line. Line 68: // Create a 2 lines input file nit: end comments with a period. Also I'm not following this comment. The next line creates a table with 4 tablets, 3 of which have 3 rows. Where's the 2 lines input file coming from? http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java File java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java: Line 17: package org.apache.kudu.mapreduce.tools; nit: missing blank line. Line 50: public class ITImportParquet extends BaseKuduTest { I'd suggest having a separate test that specifically verifies the pre-flight checks that are running. Line 107: String[] args = new String[] { "-D" + CommandLineParser.MASTER_ADDRESSES_KEY + "=" + getMasterAddresses(), nit: long line Line 111: Job job = ImportParquet.createSubmittableJob(parser.getConfiguration(), parser.getRemainingArgs()); nit: long line Line 115: client.newScannerBuilder(openTable(TABLE_NAME)).build())); openTable isn't a cheap call, do it only once. Line 116: assertEquals(4,getTableRows(openTable(TABLE_NAME)).get(0).getInt("key")); Use scanTableToStrings and verify all the rows instead. Better for type conversion checking. Line 130: ParquetWriter<Group> writer = new ParquetWriter<Group>(data, new GroupWriteSupport(), UNCOMPRESSED, 1024, 1024, 512, nit: long line Line 133: writer.write(f.newGroup().append("key", 1).append("column1_i", 3).append("column2_d", 2.3).append("column3_s", Those lines are all too long, also you could probably refactor this? http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java File java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java: PS3, Line 204: rowStrings What strings? http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala File java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala: Line 119: LOG.info(args.header+":"+args.delimiter+":"+args.path) Forgot to remove? http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/TestImportExportFiles.scala File java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/TestImportExportFiles.scala: Line 17: package org.apache.kudu.spark.tools nit: add a blank line. Line 66: //val table = kuduClient.openTable(TABLE_NAME) ? -- To view, visit http://gerrit.cloudera.org:8080/7421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sandish Kumar HN <sanysand...@gmail.com> Gerrit-Reviewer: Jean-Daniel Cryans <jdcry...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sandish Kumar HN <sanysand...@gmail.com> Gerrit-HasComments: Yes