Dan Burkert has posted comments on this change. Change subject: KUDU-1533 Spark Kudu Rdd/Dataframe upsert ......................................................................
Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/3871/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala: Line 61: new KuduRelation(tableName, kuduMaster)(sqlContext) Should the upsert flag get passed here as well? Line 78: override def createRelation(sqlContext: SQLContext, mode: SaveMode, How would you feel about removing the method and not implementing CreatableRelationProvider? As far as I can tell we are not, and can not, implement it correctly, since the main point of it is to create a new table when it doesn't exist. Line 94: case Overwrite => kuduRelation.insert(data, overwrite = true) Throw Unsupported here as well for the reasons below. Line 218: context.writeRows(data, tableName, overwrite, upsert) Could you actually change this to throw a NotSupported exception when overwrite = true? I think the Spark semantics are supposed to be that overwrite will truncate the dataset then do inserts, which we can't yet support. Changing it to throw will allow us to backwards compatibly support it later when we have lightweight table truncation. http://gerrit.cloudera.org:8080/#/c/3871/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala: Line 132: def writeRows(data: DataFrame, tableName: String, overwrite: Boolean, upsert: Boolean) { Instead of writeRows taking two booleans, could you split this into insertRows, updateRows, and upsertRows? I think that will be a lot more clear. The boolean was already really confusing. Line 151: def writeRows(rows: Iterator[Row], Same here. There shouldn't be too much code duplication if you pull out the row building into a helper function. Should this be private? -- To view, visit http://gerrit.cloudera.org:8080/3871 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley <wdberke...@gmail.com> Gerrit-Reviewer: Chris George <chris.geo...@rms.com> Gerrit-Reviewer: Dan Burkert <d...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes