Re: [DISCUSS] SPIP: Python Data Source API

2023-06-25 Thread Reynold Xin
Personally I'd love this, but I agree with some of the earlier comments that this should not be Python specific (meaning I should be able to implement a data source in Python and then make it usable across all languages Spark  supports). I think we should find a way to make this reusable beyond

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-25 Thread Maciej
Thanks for your feedback Martin. However, if the primary intended purpose of this API is to provide an interface for endpoint querying, then I find this proposal even less convincing. Neither the Spark execution model nor the data source API (full or restricted as proposed here) are a good

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-24 Thread Martin Grund
Hey, I would like to express my strong support for Python Data Sources even though they might not be immediately as powerful as Scala-based data sources. One element that is easily lost in this discussion is how much faster the iteration speed is with Python compared to Scala. Due to the dynamic

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-24 Thread Maciej
With such limited scope (both language availability and features) do we have any representative examples of sources that could significantly benefit from providing this API,  compared other available options, such as batch imports, direct queries from vectorized  UDFs or even interfacing

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-20 Thread Wenchen Fan
In an ideal world, every data source you want to connect to already has a Spark data source implementation (either v1 or v2), then this Python API is useless. But I feel it's common that people want to do quick data exploration, and the target data system is not popular enough to have an existing

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-20 Thread Maciej
Similarly to Jacek, I feel it fails to document an actual community need for such a feature. Currently, any data source implementation has the potential to benefit Spark users across all supported and third-party clients. For generally available sources, this is advantageous for the whole

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-19 Thread Cheng Pan
This API looks starting from scratch and has no relationship with the existing Java/Scala DataSourceV2 API. Particularly, how can they support SQL? We have been back and forth on the DataSource V2 design since 2.3, I believe there are some things to learn when introducing the Python DataSource

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-19 Thread Hyukjin Kwon
Actually I support this idea in a way that Python developers don't have to learn Scala to write their own source (and separate packaging). This is more crucial especially when you want to write a simple data source that interacts with the Python ecosystem. On Tue, 20 Jun 2023 at 03:08, Denny Lee

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-19 Thread Denny Lee
Slightly biased, but per my conversations - this would be awesome to have! On Mon, Jun 19, 2023 at 09:43 Abdeali Kothari wrote: > I would definitely use it - is it's available :) > > On Mon, 19 Jun 2023, 21:56 Jacek Laskowski, wrote: > >> Hi Allison and devs, >> >> Although I was against this

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-19 Thread Abdeali Kothari
I would definitely use it - is it's available :) On Mon, 19 Jun 2023, 21:56 Jacek Laskowski, wrote: > Hi Allison and devs, > > Although I was against this idea at first sight (probably because I'm a > Scala dev), I think it could work as long as there are people who'd be > interested in such an

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-19 Thread Jacek Laskowski
Hi Allison and devs, Although I was against this idea at first sight (probably because I'm a Scala dev), I think it could work as long as there are people who'd be interested in such an API. Were there any? I'm just curious. I've seen no emails requesting it. I also doubt that Python devs would

[DISCUSS] SPIP: Python Data Source API

2023-06-15 Thread Allison Wang
Hi everyone, I would like to start a discussion on “Python Data Source API”. This proposal aims to introduce a simple API in Python for Data Sources. The idea is to enable Python developers to create data sources without having to learn Scala or deal with the complexities of the current data