Qutiba created KUDU-1533:
----------------------------

             Summary: Spark Kudu Rdd/Dataframe upsert 
                 Key: KUDU-1533
                 URL: https://issues.apache.org/jira/browse/KUDU-1533
             Project: Kudu
          Issue Type: Bug
         Environment: Spark
            Reporter: Qutiba


Applying Upserting kuduRdd into existing Kudu table is not clear how to apply.
You mention in the documentation under "Kudu integration with Spark":
some possible operations to perform:
***********************************************
// then we can insert data into the kudu table
df.write.options(Map("kudu.master"-> "your.kudu.master.here","kudu.table"-> 
"your.kudu.table.here")).mode("append").kudu

// to update existing data change the mode to 'overwrite'
df.write.options(Map("kudu.master"-> "your.kudu.master.here","kudu.table"-> 
"your.kudu.table.here")).mode("overwrite").kudu
****************************************************************
But there is no possibility to perform:
kuduDataFrame.write.options(Map("kudu.master"-> Kudu_Master,"kudu.table"-> 
TargetTable)).mode("upsert").kudu
***************************************************************
the current solution which is quit slow is:
Call DataFrame.foreachpartition
- open the table
- create session
    --For each row in this partition 
          --- create upsert operation
          --- get row from the operation
          --- add all fields and values to this row
          --- perform this operation
----------------------------------
this solution is quit slow! so adding upsert mode to Dataframe for Kudu tables 
could be better than open sessions and create operations as the previous 
solution.
kuduDataFrame.write.options(Map("kudu.master"-> Kudu_Master,"kudu.table"-> 
TargetTable)).mode("upsert").kudu




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to