Re: Spark on Kudu

2016-06-14 Thread Dan Burkert
I'm not sure exactly what the semantics will be, but at least one of them will be upsert. These modes come from spark, and they were really designed for file-backed storage and not table storage. We may want to do append = upsert, and overwrite = truncate + insert. I think that may match the

Re: Spark on Kudu

2016-06-14 Thread Dan Burkert
Right now append uses an update Kudu operation, which requires the row already be present in the table. Overwrite maps to insert. Kudu very recently got upsert support baked in, but it hasn't yet been integrated into the Spark connector. So pretty soon these sharp edges will get a lot better,

Re: Spark on Kudu

2016-06-14 Thread Benjamin Kim
I tried to use the “append” mode, and it worked. Over 3.8 million rows in 64s. I would assume that now I can use the “overwrite” mode on existing data. Now, I have to find answers to these questions. What would happen if I “append” to the data in the Kudu table if the data already exists? What

Re: Spark on Kudu

2016-06-14 Thread Benjamin Kim
Hi, Now, I’m getting this error when trying to write to the table. import scala.collection.JavaConverters._ val key_seq = Seq(“my_id") val key_list = List(“my_id”).asJava kuduContext.createTable(tableName, df.schema, key_seq, new

Re: Spark on Kudu

2016-06-14 Thread Dan Burkert
On Tue, Jun 14, 2016 at 4:20 PM, Benjamin Kim wrote: > Dan, > > Thanks! It got further. Now, how do I set the Primary Key to be a > column(s) in the DataFrame and set the partitioning? Is it like this? > > kuduContext.createTable(tableName, df.schema, Seq(“my_id"), new >

Re: Spark on Kudu

2016-06-14 Thread Benjamin Kim
Dan, Thanks! It got further. Now, how do I set the Primary Key to be a column(s) in the DataFrame and set the partitioning? Is it like this? kuduContext.createTable(tableName, df.schema, Seq(“my_id"), new CreateTableOptions().setNumReplicas(1).addHashPartitions(“my_id"))

Re: Spark on Kudu

2016-06-14 Thread Dan Burkert
Looks like we're missing an import statement in that example. Could you try: import org.kududb.client._ and try again? - Dan On Tue, Jun 14, 2016 at 4:01 PM, Benjamin Kim wrote: > I encountered an error trying to create a table based on the documentation > from a