Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-08 Thread Krishna Sankar
Yep, otherwise this will become an N^2 problem - Scala versions X Hadoop Distributions X ... May be one option is to have a minimum basic set (which I know is what we are discussing) and move the rest to spark-packages.org. There the vendors can add the latest downloads - for example when 1.4 is

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-08 Thread Sean Owen
Yeah, interesting question of what is the better default for the single set of artifacts published to Maven. I think there's an argument for Hadoop 2 and perhaps Hive for the 2.10 build too. Pros and cons discussed more at https://issues.apache.org/jira/browse/SPARK-5134

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-08 Thread Patrick Wendell
We probably want to revisit the way we do binaries in general for 1.4+. IMO, something worth forking a separate thread for. I've been hesitating to add new binaries because people (understandably) complain if you ever stop packaging older ones, but on the other hand the ASF has complained that we

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-08 Thread Matei Zaharia
+1 Tested it on Mac OS X. One small issue I noticed is that the Scala 2.11 build is using Hadoop 1 without Hive, which is kind of weird because people will more likely want Hadoop 2 with Hive. So it would be good to publish a build for that configuration instead. We can do it if we do a new

Re: Loading previously serialized object to Spark

2015-03-08 Thread Akhil Das
Can you paste the complete code? Thanks Best Regards On Sat, Mar 7, 2015 at 2:25 AM, Ulanov, Alexander alexander.ula...@hp.com wrote: Hi, I've implemented class MyClass in MLlib that does some operation on LabeledPoint. MyClass extends serializable, so I can map this operation on data of

Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-08 Thread Sean Owen
Ah. I misunderstood that Matei was referring to the Scala 2.11 tarball at http://people.apache.org/~pwendell/spark-1.3.0-rc3/ and not the Maven artifacts. Patrick I see you just commented on SPARK-5134 and will follow up there. Sounds like this may accidentally not be a problem. On binary

Re: Block Transfer Service encryption support

2015-03-08 Thread Patrick Wendell
I think that yes, longer term we want to have encryption of all communicated data. However Jeff, can you open a JIRA to discuss the design before opening a pull request (it's fine to link to a WIP branch if you'd like)? I'd like to better understand the performance and operational complexity of

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-08 Thread Matei Zaharia
Yeah, my concern is that people should get Apache Spark from *Apache*, not from a vendor. It helps everyone use the latest features no matter where they are. In the Hadoop distro case, Hadoop made all this effort to have standard APIs (e.g. YARN), so it should be easy. But it is a problem if

Re: Block Transfer Service encryption support

2015-03-08 Thread Andrew Ash
I'm interested in seeing this data transfer occurring over encrypted communication channels as well. Many customers require that all network transfer occur encrypted to prevent the soft underbelly that's often found inside a corporate network. On Fri, Mar 6, 2015 at 4:20 PM, turp1twin

Re: Block Transfer Service encryption support

2015-03-08 Thread Jeff Turpin
I have already written most of the code, just finishing up the unit tests right now... Jeff On Sun, Mar 8, 2015 at 5:39 PM, Andrew Ash and...@andrewash.com wrote: I'm interested in seeing this data transfer occurring over encrypted communication channels as well. Many customers require that

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-08 Thread Patrick Wendell
I think it's important to separate the goals from the implementation. I agree with Matei on the goal - I think the goal needs to be to allow people to download Apache Spark and use it with CDH, HDP, MapR, whatever... This is the whole reason why HDFS and YARN have stable API's, so that other

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-08 Thread Matei Zaharia
Our goal is to let people use the latest Apache release even if vendors fall behind or don't want to package everything, so that's why we put out releases for vendors' versions. It's fairly low overhead. Matei On Mar 8, 2015, at 5:56 PM, Sean Owen so...@cloudera.com wrote: Ah. I

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-08 Thread Sean Owen
Yeah it's not much overhead, but here's an example of where it causes a little issue. I like that reasoning. However, the released builds don't track the later versions of Hadoop that vendors would be distributing -- there's no Hadoop 2.6 build for example. CDH4 is here, but not the far-more-used