[ 
https://issues.apache.org/jira/browse/SPARK-47273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoqin Li updated SPARK-47273:
-------------------------------
    Description: 
In order to support developing spark streaming sink in python, we need to 
implement 

Reuse PythonPartitionWriter to implement the serialization and execution of 
write callback in executor.

Implement python worker process to run python streaming data sink committer and 
communicate with JVM through socket in spark driver. For each python streaming 
data sink instance there will be a long live python worker process created. 
Inside the python process, the python write committer will receive abort or 
commit function call and send back result through socket.

  was:
In order to support developing spark streaming sink in python, we need to 
implement

Reuse PythonPartitionWriter to implement the serialization and execution of 
write callback in executor.

Implement python worker process to run python streaming data sink committer and 
communicate with JVM through socket in spark driver. For each python streaming 
data sink instance there will be a long live python worker process created. 
Inside the python process, the python write committer will receive abort or 
commit function call and send back result through socket.


> Implement python stream writer interface
> ----------------------------------------
>
>                 Key: SPARK-47273
>                 URL: https://issues.apache.org/jira/browse/SPARK-47273
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, SS
>    Affects Versions: 4.0.0
>            Reporter: Chaoqin Li
>            Priority: Major
>
> In order to support developing spark streaming sink in python, we need to 
> implement 
> Reuse PythonPartitionWriter to implement the serialization and execution of 
> write callback in executor.
> Implement python worker process to run python streaming data sink committer 
> and communicate with JVM through socket in spark driver. For each python 
> streaming data sink instance there will be a long live python worker process 
> created. Inside the python process, the python write committer will receive 
> abort or commit function call and send back result through socket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to