[ https://issues.apache.org/jira/browse/SPARK-45639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-45639. ---------------------------------- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43630 [https://github.com/apache/spark/pull/43630] > Support loading Python data sources in DataFrameReader > ------------------------------------------------------ > > Key: SPARK-45639 > URL: https://issues.apache.org/jira/browse/SPARK-45639 > Project: Spark > Issue Type: Sub-task > Components: PySpark > Affects Versions: 4.0.0 > Reporter: Allison Wang > Assignee: Allison Wang > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Allow users to read from a Python data source using > `spark.read.format(...).load()` in PySpark. For example > Users can extend the DataSource and the DataSourceReader classes to create > their own Python data source reader and use them in PySpark: > {code:java} > class MyReader(DataSourceReader): > def read(self, partition): > yield (0, 1) > class MyDataSource(DataSource): > def schema(self): > return "id INT, value INT" > > def reader(self, schema): > return MyReader() > df = spark.read.format("MyDataSource").load() > df.show() > +---+-----+ > | id|value| > +---+-----+ > | 0| 1| > +---+-----+ > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org