[ 
https://issues.apache.org/jira/browse/PHOENIX-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17443800#comment-17443800
 ] 

Istvan Toth commented on PHOENIX-6590:
--------------------------------------

I admit that I am not overly familiar with the Phoenix 
DataSourceWriter/DataWriter API.

Chiefly, I don't know how Spark will handle failures from the DataSource when 
writing.
I have skimmed the API docs, but I couldn't find APIs in DataSets to handle the 
failures, only some mentions of possibly automatically re-trying them.

How is this feautere different from simply setting the batch size to 1 ? 
I think that apart from an extra "commit" call, which in itself negligilble 
cost, the same thing would happen with the same (low) performance.

Looking at 
https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/connector/write/DataWriter.html
even setting the batch size to 1 (or the equivalent proposed) autocommit on 
won't achieve the transactional behaviour defined by Spark, as any successfully 
written rows will stay in the database, and abort cannot clean them up.
(Even saving the previous state wouldn't help, as the rows may have been 
modified by other actors in the meantime)

The only way I can think of that would get the Spark transactional behaviour is 
using Phoenix transactional tables and not using batch commit at all. 
I don't know how well Phoenix handles transactions of many thousads of millons 
of rows, either via Tephra or Omid)

Could you give as outline of what is the improved behaviour that you aim for, 
and how do you plan to implement this (not necessarily in this ticket, but in 
the parent one)

> Add autocommit option to enable/disable autocommit on phoenix connections 
> created in workers
> --------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-6590
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6590
>             Project: Phoenix
>          Issue Type: Sub-task
>          Components: spark-connector
>            Reporter: Rajeshbabu Chintaguntla
>            Assignee: Rajeshbabu Chintaguntla
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to