Similarly to Jacek, I feel it fails to document an actual community need for such a feature.

Currently, any data source implementation has the potential to benefit Spark users across all supported and third-party clients. For generally available sources, this is advantageous for the whole Spark community and avoids creating 1st and 2nd-tier citizens. This is even more important with new officially supported languages being added through connect.

Instead, we might rather document in detail the process of implementing a new source using current APIs and work towards easily extensible or customizable sources, in case there is such a need.

--
Best regards,
Maciej Szymkiewicz

Web:https://zero323.net
PGP: A30CEF0C31A501EC


On 6/20/23 05:19, Hyukjin Kwon wrote:
Actually I support this idea in a way that Python developers don't have to learn Scala to write their own source (and separate packaging). This is more crucial especially when you want to write a simple data source that interacts with the Python ecosystem.

On Tue, 20 Jun 2023 at 03:08, Denny Lee <denny.g....@gmail.com> wrote:

    Slightly biased, but per my conversations - this would be awesome
    to have!

    On Mon, Jun 19, 2023 at 09:43 Abdeali Kothari
    <abdealikoth...@gmail.com> wrote:

        I would definitely use it - is it's available :)

        On Mon, 19 Jun 2023, 21:56 Jacek Laskowski, <ja...@japila.pl>
        wrote:

            Hi Allison and devs,

            Although I was against this idea at first sight (probably
            because I'm a Scala dev), I think it could work as long as
            there are people who'd be interested in such an API. Were
            there any? I'm just curious. I've seen no emails
            requesting it.

            I also doubt that Python devs would like to work on new
            data sources but support their wishes wholeheartedly :)

            Pozdrawiam,
            Jacek Laskowski
            ----
            "The Internals Of" Online Books <https://books.japila.pl/>
            Follow me on https://twitter.com/jaceklaskowski

            <https://twitter.com/jaceklaskowski>


            On Fri, Jun 16, 2023 at 6:14 AM Allison Wang
            <allison.w...@databricks.com.invalid> wrote:

                Hi everyone,

                I would like to start a discussion on “Python Data
                Source API”.

                This proposal aims to introduce a simple API in Python
                for Data Sources. The idea is to enable Python
                developers to create data sources without having to
                learn Scala or deal with the complexities of the
                current data source APIs. The goal is to make a
                Python-based API that is simple and easy to use, thus
                making Spark more accessible to the wider Python
                developer community. This proposed approach is based
                on the recently introduced Python user-defined table
                functions with extensions to support data sources.

                *SPIP Doc*:
                
https://docs.google.com/document/d/1oYrCKEKHzznljYfJO4kx5K_Npcgt1Slyfph3NEk7JRU/edit?usp=sharing


                *SPIP JIRA*:
                https://issues.apache.org/jira/browse/SPARK-44076

                Looking forward to your feedback.

                Thanks,
                Allison



Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to