[kudu-CR] kudu-spark-tools: Spark tool for Import & Export different format of files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: mapreduced base export to csv and import parquet files.

Jean-Daniel Cryans (Code Review) Tue, 18 Jul 2017 14:56:32 -0700

Jean-Daniel Cryans has posted comments on this change.

Change subject: kudu-spark-tools: Spark tool for Import & Export different 
format of files such as parquet,avro,csv in and to from kudu tables 
kudu-client-tools: mapreduced base export to csv and import parquet files.
......................................................................



Patch Set 2:

(29 comments)

http://gerrit.cloudera.org:8080/#/c/7421/2//COMMIT_MSG
Commit Message:

Line 7: kudu-spark-tools: Spark tool for Import & Export different format of 
files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: 
mapreduced base export to csv and import parquet files.
Please follow best practices for git commit messages, such as 
https://chris.beams.io/posts/git-commit/


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java
File 
java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java:

PS2, Line 35: a CSV files.
nit: is there one or many files?


Line 39: public class ExportCsv extends Configured implements Tool {
This needs a test.


PS2, Line 50: The current configuration.
I know the ImportCsv* classes is like this, but the javadoc format we use 
dictates starting @param line with a lower case and and not ending with a 
period. See 
http://www.oracle.com/technetwork/java/javase/documentation/index-137868.html#format

No need to fix ImportCsv in this patch but you'd be welcome to contribute a 
separate patch for it :)


PS2, Line 85: with columns
nit: rephrase to something like "the given table and columns into..."


PS2, Line 86: "T
nit: mind putting this on a new line? Then the code looks like what the error 
would look like.


PS2, Line 86: kudu
nit: Kudu


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java
File 
java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java:

Line 32: 
nit: remove the extra blank line


PS2, Line 48: (i.e., the parsing).
That made more sense in ImportCsbMapper, I'd just remove this javadoc 
completely.


Line 54:     this.separator = conf.get(ExportCsv.SEPARATOR_CONF_KEY);
single line:

this.separator = conf.get(ExportCsv.SEPARATOR_CONF_KEY, 
ExportCsv.DEFAULT_SEPARATOR);


PS2, Line 61: Insert
nits:
s/Convert/Converts
s/Insert/RowResult
also end the sentence with a period.


Line 74:    * converts RowResult $this.separator string
Please revise this whole javadoc section. FWIW it might not be necessary, since 
it's a private method. Also it's not doing anything surprising.


Line 114:         default:
Missing UNIXTIME_MICROS?


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java
File 
java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java:

PS2, Line 43: PARQUET
nit: Apache Parquet


Line 47: public class ImportParquet extends Configured implements Tool {
This needs a test.


Line 59:    * @param conf The current configuration.
Same comment as before regarding javadoc.


Line 75:     LOG.info(schema);
Did you forget to remove this?


Line 88:     FileInputFormat.setInputPaths(job, inputDir);
You could run some pre-flight checks like making sure that the columns match.


PS2, Line 105: PARQUET
Apache Parquet


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java
File 
java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java:

PS2, Line 39: PARQUET lines and turns them into Kudu Inserts.
some comments as before regarding PARQUET and Inserts


Line 101:           case DOUBLE:
UNIXTIME_MICROS would be recognized but not supported, someone might have 
TIMESTAMP and think it's compatible, maybe you can catch that before like in 
createSubmittableJob?


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java
File 
java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java:

Line 27:  * Created by sany on 26/06/2017 AD.
Remove.


PS2, Line 29:  
nit


Line 37:     Set<String> counts = context.getKeyValueMetadata().get("my.count");
What's this about?


Line 38:     // assertTrue("counts: " + counts, counts.size() > 0);
Remove.


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala
File 
java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala:

Line 1: 
remove this blank line


Line 18: package org.apache.kudu.spark.tools
add a blank line


Line 24: import org.slf4j.{Logger, LoggerFactory}
please re-order this


Line 29: object ImportExportKudu {
I'm less familiar with scala but at least we'll need a test.


-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sanysand...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jdcry...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sandish Kumar HN <sanysand...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] kudu-spark-tools: Spark tool for Import & Export different format of files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: mapreduced base export to csv and import parquet files.

Reply via email to