Re: [DISCUSS] PySpark Window UDF

2018-09-20 Thread Felix Cheung
Definitely! numba numbers are amazing From: Wes McKinney Sent: Saturday, September 8, 2018 7:46 AM To: Li Jin Cc: dev@spark.apache.org Subject: Re: [DISCUSS] PySpark Window UDF hi Li, These results are very cool. I'm excited to see you continuing to push

Re: [DISCUSS] PySpark Window UDF

2018-09-08 Thread Wes McKinney
hi Li, These results are very cool. I'm excited to see you continuing to push this effort forward. - Wes On Wed, Sep 5, 2018 at 5:52 PM Li Jin wrote: > > Hello again! > > I recently implemented a proof-of-concept implementation of proposal above. I > think the results are pretty exciting so I

Re: [DISCUSS] PySpark Window UDF

2018-09-05 Thread Li Jin
Hello again! I recently implemented a proof-of-concept implementation of proposal above. I think the results are pretty exciting so I want to share my findings with the community. I have implemented two variants of the pandas window UDF - one that takes pandas.Series as input and one that takes

[DISCUSS] PySpark Window UDF

2018-05-16 Thread Li Jin
Hi All, I have been looking into leverage the Arrow and Pandas UDF work we have done so far for Window UDF in PySpark. I have done some investigation and believe there is a way to do PySpark window UDF efficiently. The basic idea is instead of passing each window to Python separately, we can