Re: [DISCUSS] Support pandas API layer on PySpark

2021-03-13 Thread Liang-Chi Hsieh
>From Python developer perspective, this direction sounds making sense to me. As pandas is almost the standard library in the related area, if PySpark supports pandas API out of box, the usability would be in a higher level. For maintenance cost, IIUC, there are some Spark committers in the

Re: [DISCUSS] Support pandas API layer on PySpark

2021-03-13 Thread Holden Karau
I think having pandas support inside of Spark makes sense. One of my questions is who are the majour contributors to this effort, is the community developing the pandas API layer for Spark interested in being part of Spark or do they prefer having their own release cycle? On Sat, Mar 13, 2021 at

[DISCUSS] Support pandas API layer on PySpark

2021-03-13 Thread Hyukjin Kwon
Hi all, I would like to start the discussion on supporting pandas API layer on Spark. If we have a general consensus on having it in PySpark, I will initiate and drive an SPIP with a detailed explanation about the implementation’s overview and structure. I would appreciate it if I can know