[ https://issues.apache.org/jira/browse/SPARK-45639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allison Wang updated SPARK-45639: --------------------------------- Description: Allow users to read from a Python data source using `spark.read.format(...).load()` in PySpark. For example Users can extend the DataSource and the DataSourceReader classes to create their own Python data source reader and use them in PySpark: {code:java} class MyReader(DataSourceReader): def read(self, partition): yield (0, 1) class MyDataSource(DataSource): def schema(self): return "id INT, value INT" def reader(self, schema): return MyReader() df = spark.read.format("MyDataSource").load() df.show() +---+-----+ | id|value| +---+-----+ | 0| 1| +---+-----+ {code} was: Allow users to read from a Python data source using `spark.read.format(...).load()` For example Users can extend the DataSource and the DataSourceReader classes to create their own Python data source reader and use them in PySpark: {code:java} class MyReader(DataSourceReader): def read(self, partition): yield (0, 1) class MyDataSource(DataSource): def schema(self): return "id INT, value INT" def reader(self, schema): return MyReader() df = spark.read.format("MyDataSource").load() df.show() +---+-----+ | id|value| +---+-----+ | 0| 1| +---+-----+ {code} > Support loading Python data sources in DataFrameReader > ------------------------------------------------------ > > Key: SPARK-45639 > URL: https://issues.apache.org/jira/browse/SPARK-45639 > Project: Spark > Issue Type: Sub-task > Components: PySpark > Affects Versions: 4.0.0 > Reporter: Allison Wang > Priority: Major > > Allow users to read from a Python data source using > `spark.read.format(...).load()` in PySpark. For example > Users can extend the DataSource and the DataSourceReader classes to create > their own Python data source reader and use them in PySpark: > {code:java} > class MyReader(DataSourceReader): > def read(self, partition): > yield (0, 1) > class MyDataSource(DataSource): > def schema(self): > return "id INT, value INT" > > def reader(self, schema): > return MyReader() > df = spark.read.format("MyDataSource").load() > df.show() > +---+-----+ > | id|value| > +---+-----+ > | 0| 1| > +---+-----+ > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org