I'm not sure exactly what the semantics will be, but at least one of them
will be upsert. These modes come from spark, and they were really designed
for file-backed storage and not table storage. We may want to do append =
upsert, and overwrite = truncate + insert. I think that may match the
Right now append uses an update Kudu operation, which requires the row
already be present in the table. Overwrite maps to insert. Kudu very
recently got upsert support baked in, but it hasn't yet been integrated
into the Spark connector. So pretty soon these sharp edges will get a lot
better,
I tried to use the “append” mode, and it worked. Over 3.8 million rows in 64s.
I would assume that now I can use the “overwrite” mode on existing data. Now, I
have to find answers to these questions. What would happen if I “append” to the
data in the Kudu table if the data already exists? What
Hi,
Now, I’m getting this error when trying to write to the table.
import scala.collection.JavaConverters._
val key_seq = Seq(“my_id")
val key_list = List(“my_id”).asJava
kuduContext.createTable(tableName, df.schema, key_seq, new
On Tue, Jun 14, 2016 at 4:20 PM, Benjamin Kim wrote:
> Dan,
>
> Thanks! It got further. Now, how do I set the Primary Key to be a
> column(s) in the DataFrame and set the partitioning? Is it like this?
>
> kuduContext.createTable(tableName, df.schema, Seq(“my_id"), new
>
Dan,
Thanks! It got further. Now, how do I set the Primary Key to be a column(s) in
the DataFrame and set the partitioning? Is it like this?
kuduContext.createTable(tableName, df.schema, Seq(“my_id"), new
CreateTableOptions().setNumReplicas(1).addHashPartitions(“my_id"))
Looks like we're missing an import statement in that example. Could you
try:
import org.kududb.client._
and try again?
- Dan
On Tue, Jun 14, 2016 at 4:01 PM, Benjamin Kim wrote:
> I encountered an error trying to create a table based on the documentation
> from a