Re: Time for 2.3.1?

2018-05-10 Thread Henry Robinson
+1, I'd like to get a release out with SPARK-23852 fixed. The Parquet community are about to release 1.8.3 - the voting period closes tomorrow - and I've tested it with Spark 2.3 and confirmed the bug is fixed. Hopefully it is released and I can post the version change to branch-2.3 before you

Re: Time for 2.3.1?

2018-05-10 Thread Ryan Blue
Parquet has a Java patch release, 1.8.3, that should pass tomorrow morning. I think the plan is to get that in to fix a bug with Parquet data written by Impala. On Thu, May 10, 2018 at 11:09 AM, Marcelo Vanzin wrote: > Hello all, > > It's been a while since we shipped 2.3.0

Time for 2.3.1?

2018-05-10 Thread Marcelo Vanzin
Hello all, It's been a while since we shipped 2.3.0 and lots of important bug fixes have gone into the branch since then. I took a look at Jira and it seems there's not a lot of things explicitly targeted at 2.3.1 - the only potential blocker (a parquet issue) is being worked on since a new

Re: Revisiting Online serving of Spark models?

2018-05-10 Thread Felix Cheung
Huge +1 on this! From: holden.ka...@gmail.com on behalf of Holden Karau Sent: Thursday, May 10, 2018 9:39:26 AM To: Joseph Bradley Cc: dev Subject: Re: Revisiting Online serving of Spark models? On Thu, May 10,

Re: Revisiting Online serving of Spark models?

2018-05-10 Thread Holden Karau
On Thu, May 10, 2018 at 9:25 AM, Joseph Bradley wrote: > Thanks for bringing this up Holden! I'm a strong supporter of this. > > Awesome! I'm glad other folks think something like this belongs in Spark. > This was one of the original goals for mllib-local: to have local

Re: eager execution and debuggability

2018-05-10 Thread Ryan Blue
> it would be fantastic if we could make it easier to debug Spark programs without needing to rely on eager execution. I agree, it would be great if we could make the errors more clear about where the error happened (user code or in Spark code) and what assumption was violated. The problem is

Re: Revisiting Online serving of Spark models?

2018-05-10 Thread Joseph Bradley
Thanks for bringing this up Holden! I'm a strong supporter of this. This was one of the original goals for mllib-local: to have local versions of MLlib models which could be deployed without the big Spark JARs and without a SparkContext or SparkSession. There are related commercial offerings

Re: eager execution and debuggability

2018-05-10 Thread Lalwani, Jayesh
If they are struggling to find bugs in their program because of lazy execution model of Spark, they are going to struggle to debug issues when the program runs into problems in production. Learning how to debug Spark is part of learning Spark. It’s better that they run into issues in the