Re: [build system] github fetches timing out

2021-03-17 Thread shane knapp ☠
it's been happening a lot again recently... i'm investigating. On Wed, Mar 10, 2021 at 10:23 AM Liang-Chi Hsieh wrote: > Thanks Shane for looking at it! > > > shane knapp ☠ wrote > > ...and just like that, overnight the builds started successfully git > > fetching! > > > > -- > > Shane Knapp >

Re: [DISCUSS] Support pandas API layer on PySpark

2021-03-17 Thread Hyukjin Kwon
Thanks Nicholas for the pointer :-). On Thu, 18 Mar 2021, 00:11 Nicholas Chammas, wrote: > On Tue, Mar 16, 2021 at 9:15 PM Hyukjin Kwon wrote: > >> I am currently thinking we will have to convert the Koalas tests to use >> unittests to match with PySpark for now. >> > Keep in mind that

Re: [DISCUSS] Support pandas API layer on PySpark

2021-03-17 Thread Nicholas Chammas
On Tue, Mar 16, 2021 at 9:15 PM Hyukjin Kwon wrote: > I am currently thinking we will have to convert the Koalas tests to use > unittests to match with PySpark for now. > Keep in mind that pytest supports unittest-based tests out of the box , so

Re: [DISCUSS] Support pandas API layer on PySpark

2021-03-17 Thread Hyukjin Kwon
Yeah, that's a good point, Georg. I think we will port as is first, and discuss further about that indexing system. We should probably either add non-index mode or switch it to a distributed default index type that minimizes the side effect in query plan. We still have some months left. I will

Re: [DISCUSS] Support pandas API layer on PySpark

2021-03-17 Thread Georg Heiler
Would you plan to keep the existing indexing mechanism then? https://koalas.readthedocs.io/en/latest/user_guide/best_practices.html#use-distributed-or-distributed-sequence-default-index For me, it always even when trying to use the distributed version resulted in various window functions being

Re: [DISCUSS] Support pandas API layer on PySpark

2021-03-17 Thread Hyukjin Kwon
> Just out of curiosity, does Koalas pretty much implement all of the Pandas APIs now? If there are some that are yet to be implemented or others that have differences, are these documented so users won't be caught off-guard? It's roughly 75% done so far (in Series, DataFrame and Index). Yeah,