[discuss]Support RDD using JDBC data source in PySpark

[email protected] Mon, 19 Sep 2022 04:16:55 -0700

Hi guys:

When i using pyspark, i wanna get data from mysql database.  so i want use 
JDBCRDD. but that is not be supported in PySpark.


For some reasons, i can't using DataFrame API, only can use RDD(datastream) 
API. Even i know the DataFrame can get data from jdbc source fairly well.


So i want to implement functionality that can use rdd to get data from jdbc 
source for PySpark.

But i don't know if that are necessary for PySpark.   so we can discuss it.

If it is necessary for PySpark, i want to contribute to Spark.   i want to 
create a jira task and hope can get assigned to me.
I am a bigdata engineer, like to contribute for open source. I already summit 2 
PR for Apache Flink(FLINK-26609, FLINK-26728) and its merged\closed.
So i think if i can get the jira ticket, i can implemented it fairly well.



thanks.



.



[email protected]

[discuss]Support RDD using JDBC data source in PySpark

Reply via email to