Chaoqin Li created SPARK-48330: ---------------------------------- Summary: Fix the python data source timeout issue for large trigger interval Key: SPARK-48330 URL: https://issues.apache.org/jira/browse/SPARK-48330 Project: Spark Issue Type: Task Components: PySpark, SS Affects Versions: 4.0.0 Reporter: Chaoqin Li
Currently we run long running python worker process for python streaming source and sink to perform planning, commit and abort in driver side. Testing indicate that current implementation cause connection timeout error when streaming query has large trigger interval For python streaming source, keep the long running worker archaetecture but set the socket timeout to be infinity to avoid timeout error. For python streaming sink, since StreamingWrite is also created per microbatch in scala side, long running worker cannot be attached to s StreamingWrite instance. Therefore we abandon the long running worker architecture, simply call commit() or abort() and exit the worker and allow spark to reuse worker for us. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org