Will Berkeley has posted comments on this change.

Change subject: KUDU-1533 Spark Kudu Rdd/Dataframe upsert
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/3871/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

Line 61:     new KuduRelation(tableName, kuduMaster)(sqlContext)
> Should the upsert flag get passed here as well?
N/A. See below.


Line 78:   override def createRelation(sqlContext: SQLContext, mode: SaveMode,
> How would you feel about removing the method and not implementing Creatable
You're right we shouldn't implement CreatableRelationProvider if we can't do it 
right. Alternatively, couldn't createRelation take a dataframe, construct a 
Kudu schema from its schema, and then allow it to be persisted to Kudu? Working 
this way, we could support everything except Overwrite right now...buuuuut it's 
going to be really awkward to specify the key and the partition schema. Awkward 
enough I think we should remove creatableRelationProvider until we can 
implement it correctly as it is now (based on an existing table).

All the operations will still be available on a dataframe via KuduContext.


Line 94:       case Overwrite => kuduRelation.insert(data, overwrite = true)
> Throw Unsupported here as well for the reasons below.
N/A. See above.


Line 218:     context.writeRows(data, tableName, overwrite, upsert)
> Could you actually change this to throw a NotSupported exception when overw
Done


http://gerrit.cloudera.org:8080/#/c/3871/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

Line 132:   def writeRows(data: DataFrame, tableName: String, overwrite: 
Boolean, upsert: Boolean) {
> Instead of writeRows taking two booleans, could you split this into insertR
Done


Line 151:   def writeRows(rows: Iterator[Row],
> Same here.  There shouldn't be too much code duplication if you pull out th
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wdberke...@gmail.com>
Gerrit-Reviewer: Chris George <chris.geo...@rms.com>
Gerrit-Reviewer: Dan Burkert <d...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>
Gerrit-HasComments: Yes

Reply via email to