+1; the pandas interfaces are pretty popular and supporting them in pyspark looks promising, I think. one question I have; what's an initial goal of the proposal? Is that to port all the pandas interfaces that Koalas has already implemented? Or, the basic set of them?
On Tue, Mar 16, 2021 at 1:44 AM Ismaël Mejía <ieme...@gmail.com> wrote: > +1 > > Bringing a Pandas API for pyspark to upstream Spark will only bring > benefits for everyone (more eyes to use/see/fix/improve the API) as > well as better alignment with core Spark improvements, the extra > weight looks manageable. > > On Mon, Mar 15, 2021 at 4:45 PM Nicholas Chammas > <nicholas.cham...@gmail.com> wrote: > > > > On Mon, Mar 15, 2021 at 2:12 AM Reynold Xin <r...@databricks.com> wrote: > >> > >> I don't think we should deprecate existing APIs. > > > > > > +1 > > > > I strongly prefer Spark's immutable DataFrame API to the Pandas API. I > could be wrong, but I wager most people who have worked with both Spark and > Pandas feel the same way. > > > > For the large community of current PySpark users, or users switching to > PySpark from another Spark language API, it doesn't make sense to deprecate > the current API, even by convention. > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- --- Takeshi Yamamuro