[jira] [Commented] (SPARK-40502) Support dataframe API use jdbc data source in PySpark
[ https://issues.apache.org/jira/browse/SPARK-40502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17607524#comment-17607524 ] CaoYu commented on SPARK-40502: --- When I designed the Python Flink course It is found that PyFlink does not have the operators sum\min\minby\max\maxby So I submitted a PR to the flink community and provided the python implementation code of these operators (FLINK-26609 FLINK-26728) So, again, if jdbc datasource is what pyspark needs, I'd love and have the time to implement it > Support dataframe API use jdbc data source in PySpark > - > > Key: SPARK-40502 > URL: https://issues.apache.org/jira/browse/SPARK-40502 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.3.0 >Reporter: CaoYu >Priority: Major > > When i using pyspark, i wanna get data from mysql database. so i want use > JDBCRDD like java\scala. > But that is not be supported in PySpark. > > For some reasons, i can't using DataFrame API, only can use RDD(datastream) > API. Even i know the DataFrame can get data from jdbc source fairly well. > > So i want to implement functionality that can use rdd to get data from jdbc > source for PySpark. > > *But i don't know if that are necessary for PySpark. so we can discuss it.* > > {*}If it is necessary for PySpark{*}{*}, i want to contribute to Spark.{*} > *i hope this Jira task can assigned to me, so i can start working to > implement it.* > > *if not, please close this Jira task.* > > > *thanks a lot.* > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40502) Support dataframe API use jdbc data source in PySpark
[ https://issues.apache.org/jira/browse/SPARK-40502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17607523#comment-17607523 ] CaoYu commented on SPARK-40502: --- I am a teacher Recently designed Python language basic course, big data direction PySpark is one of the practical cases, but it is only a simple use of RDD code to complete the basic data processing work, and the use of JDBC data source is a part of the course DataFrames(SparkSQL) will be used in future design advanced courses. So I hope the datastream API to have the capability of jdbc datasource. > Support dataframe API use jdbc data source in PySpark > - > > Key: SPARK-40502 > URL: https://issues.apache.org/jira/browse/SPARK-40502 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.3.0 >Reporter: CaoYu >Priority: Major > > When i using pyspark, i wanna get data from mysql database. so i want use > JDBCRDD like java\scala. > But that is not be supported in PySpark. > > For some reasons, i can't using DataFrame API, only can use RDD(datastream) > API. Even i know the DataFrame can get data from jdbc source fairly well. > > So i want to implement functionality that can use rdd to get data from jdbc > source for PySpark. > > *But i don't know if that are necessary for PySpark. so we can discuss it.* > > {*}If it is necessary for PySpark{*}{*}, i want to contribute to Spark.{*} > *i hope this Jira task can assigned to me, so i can start working to > implement it.* > > *if not, please close this Jira task.* > > > *thanks a lot.* > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40502) Support dataframe API use jdbc data source in PySpark
[ https://issues.apache.org/jira/browse/SPARK-40502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17607447#comment-17607447 ] Hyukjin Kwon commented on SPARK-40502: -- {quote} For some reasons, i can't using DataFrame API, only can use RDD(datastream) API. {quote} What's the reason? > Support dataframe API use jdbc data source in PySpark > - > > Key: SPARK-40502 > URL: https://issues.apache.org/jira/browse/SPARK-40502 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.3.0 >Reporter: CaoYu >Priority: Major > > When i using pyspark, i wanna get data from mysql database. so i want use > JDBCRDD like java\scala. > But that is not be supported in PySpark. > > For some reasons, i can't using DataFrame API, only can use RDD(datastream) > API. Even i know the DataFrame can get data from jdbc source fairly well. > > So i want to implement functionality that can use rdd to get data from jdbc > source for PySpark. > > *But i don't know if that are necessary for PySpark. so we can discuss it.* > > {*}If it is necessary for PySpark{*}{*}, i want to contribute to Spark.{*} > *i hope this Jira task can assigned to me, so i can start working to > implement it.* > > *if not, please close this Jira task.* > > > *thanks a lot.* > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org