Re: Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-20 Thread Steve Loughran
On 19 Nov 2015, at 22:14, Reynold Xin > wrote: I proposed dropping support for Hadoop 1.x in the Spark 2.0 email, and I think everybody is for that. https://issues.apache.org/jira/browse/SPARK-11807 Sean suggested also dropping support for

Re: Support for local disk columnar storage for DataFrames

2015-11-20 Thread Cristian O
Raised this for checkpointing, hopefully it gets some priority as it's very useful and relatively straightforward to implement ? https://issues.apache.org/jira/browse/SPARK-11879 On 18 November 2015 at 16:31, Cristian O wrote: > Hi, > > While these OSS efforts

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-20 Thread Steve Loughran
On 20 Nov 2015, at 14:28, ches...@alpinenow.com wrote: Assuming we have 1.6 and 1.7 releases, then spark 2.0 is about 9 months away. customer will need to upgrade the new Hadoop clusters to Apache 2.6 or later to leverage new spark 2.0 in one year. I think this

Re: Removing the Mesos fine-grained mode

2015-11-20 Thread Iulian DragoČ™
This is a good point. We should probably document this better in the migration notes. In the mean time: http://spark.apache.org/docs/latest/running-on-mesos.html#dynamic-resource-allocation-with-mesos Roughly, dynamic allocation lets Spark add and kill executors based on the scheduling delay.

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-20 Thread Sandy Ryza
To answer your fourth question from Cloudera's perspective, we would never support a customer running Spark 2.0 on a Hadoop version < 2.6. -Sandy On Fri, Nov 20, 2015 at 1:39 PM, Reynold Xin wrote: > OK I'm not exactly asking for a vote here :) > > I don't think we should

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-20 Thread Reynold Xin
OK I'm not exactly asking for a vote here :) I don't think we should look at it from only maintenance point of view -- because in that case the answer is clearly supporting as few versions as possible (or just rm -rf spark source code and call it a day). It is a tradeoff between the number of

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-20 Thread Saisai Shao
+1. Hadoop 2.6 would be a good choice with many features added (like supporting long running service, label based scheduling). Currently there's lot of reflection codes to support multiple version of Yarn, so upgrading to a newer version will really ease the pain :). Thanks Saisai On Fri, Nov

Unhandled case in VectorAssembler

2015-11-20 Thread BenFradet
Hey there, I noticed that there is an unhandled case in the transform method of VectorAssembler if one of the input columns doesn't have one of the supported type DoubleType, NumericType, BooleanType or VectorUDT. So, if you try to transform a column of StringType you get a cryptic

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-20 Thread Chester Chen
for #1-3, the answer is likely No. Recently we upgrade to Spark 1.5.1, with CDH5.3, CDH5.4 and HDP2.2 and others. We were using CDH5.3 client to talk to CDH5.4. We were doing this to see if we support many different hadoop cluster versions without changing the build. This was ok for

Re: Unhandled case in VectorAssembler

2015-11-20 Thread Joseph Bradley
Yes, please, could you send a JIRA (and PR)? A custom error message would be better. Thank you! Joseph On Fri, Nov 20, 2015 at 2:39 PM, BenFradet wrote: > Hey there, > > I noticed that there is an unhandled case in the transform method of > VectorAssembler if one of

Re: Removing the Mesos fine-grained mode

2015-11-20 Thread Adam McElwee
I've used fine-grained mode on our mesos spark clusters until this week, mostly because it was the default. I started trying coarse-grained because of the recent chatter on the mailing list about wanting to move the mesos execution path to coarse-grained only. The odd things is, coarse-grained vs