Re: Correlated subqueries in the DataFrame API

2018-04-19 Thread Reynold Xin
Perhaps we can just have a function that turns a DataFrame into a Column? That'd work for both correlated and uncorrelated case, although in the correlated case we'd need to turn off eager analysis (otherwise there is no way to construct a valid DataFrame). On Thu, Apr 19, 2018 at 4:08 PM, Ryan

Re: Correlated subqueries in the DataFrame API

2018-04-19 Thread Ryan Blue
Nick, thanks for raising this. It looks useful to have something in the DF API that behaves like sub-queries, but I’m not sure that passing a DF works. Making every method accept a DF that may contain matching data seems like it puts a lot of work on the API — which now has to accept a DF all

[DISCUSS] SPIP: Standardize SQL logical plans

2018-04-19 Thread Ryan Blue
Hi everyone, A few weeks ago, I wrote up a proposal to standardize SQL logical plans and a supporting design doc for data source catalog APIs

unsubscribe

2018-04-19 Thread varma dantuluri
unsubscribe -- Regards, Varma Dantuluri

Re: Will higher order functions in spark SQL be pushed upstream?

2018-04-19 Thread Michael Davies
Hi Herman, That’s great and thanks for quick reply. The JIRA has an example of transform and refers to a presto doc with lots of functions. Do you know which functions will be supported. I am interested in using filter for example. Cheers Mick > On 19 Apr 2018, at 10:46, Herman van Hövell

Scala 2.12 support

2018-04-19 Thread Reynold Xin
Forking the thread to focus on Scala 2.12. Dean, There are couple different issues with Scala 2.12 (closure cleaner, API breaking changes). Which one do you think we can address with a Scala upgrade? (The closure cleaner one I haven't spent a lot of time looking at it but it might involve more

Re: GLM Poisson Model - Deviance calculations

2018-04-19 Thread Sean Owen
I see, this was handled for binomial deviance by the 'ylogy' method, which computes y log (y / mu), defining this to be 0 when y = 0. It's not necessary to add a delta or anything; 0 is the limit as y goes to 0 so it's fine. The same change is appropriate for Poisson deviance. Gamma deviance

Re: time for Apache Spark 3.0?

2018-04-19 Thread Sean Owen
That certainly sounds beneficial, to maybe several other projects. If there's no downside and it takes away API issues, seems like a win. On Thu, Apr 19, 2018 at 5:28 AM Dean Wampler wrote: > I spoke with Martin Odersky and Lightbend's Scala Team about the known API >

Re: time for Apache Spark 3.0?

2018-04-19 Thread Dean Wampler
I spoke with Martin Odersky and Lightbend's Scala Team about the known API issue with method disambiguation. They offered to implement a small patch in a new release of Scala 2.12 to handle the issue without requiring a Spark API change. They would cut a 2.12.6 release for it. I'm told that Scala

Re: Will higher order functions in spark SQL be pushed upstream?

2018-04-19 Thread Mick Davies
Hi, Regarding higher order functions > Yes, we intend to contribute this to open source. It doesn't look like this is in 2.3.0, at least I can't find it. Do you know when it might reach open source. Thanks Mick -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/