Exposing functions to pyspark

2019-09-30 Thread Andrew Melo
Hello, I'm working on a DSv2 implementation with a userbase that is 100% pyspark based. There's some interesting additional DS-level functionality I'd like to expose from the Java side to pyspark -- e.g. I/O metrics, which source site provided the data, etc... Does someone have an example of

Re: [build system] our colo is having power issues again. there will be a few 'events' this week

2019-09-30 Thread Shane Knapp
alright, they switched over from generator power back to the grid about an hour ago... all the workers are back up and building! shane On Tue, Sep 24, 2019 at 3:23 PM Takeshi Yamamuro wrote: > > Shane, thanks for the hard work! > > Bests, > Takeshi > > On Wed, Sep 25, 2019 at 6:07 AM Jungtaek

Re: UDAFs have an inefficiency problem

2019-09-30 Thread Erik Erlandson
On the PR review, there were questions about adding a new aggregating class, and whether or not Aggregator[IN,BUF,OUT] could be used. I added a proof of concept solution based on enhancing Aggregator to the pull-req: https://github.com/apache/spark/pull/25024/ I wrote up my findings on the PR