Re: what is the difference between org.spark-project.hive and org.apache.hadoop.hive
There are two differences: 1. We publish hive with a shaded protobuf dependency to avoid conflicts with some Hadoop versions. 2. We publish a proper hive-exec jar that only includes hive packages. The upstream version of hive-exec bundles a bunch of other random dependencies in it which makes it really hard for third-party projects to use it. On Thu, Jul 10, 2014 at 11:29 PM, kingfly wangf...@huawei.com wrote: -- Best Regards Frank Wang | Software Engineer Mobile: +86 18505816792 Phone: +86 571 63547 Fax: Email: wangf...@huawei.com Huawei Technologies Co., Ltd. Hangzhou RD Center NO.410, JiangHong Road, Binjiang Area, Hangzhou, 310052, P. R. China
How pySpark works?
Hi, I want to use pySpark, but can't understand how it works. Documentation doesn't provide enough information. 1) How python shipped to cluster? Should machines in cluster already have python? 2) What happens when I write some python code in map function - is it shipped to cluster and just executed on it? How it understand all dependencies, which my code need and ship it there? If I use Math in my code in map does it mean, that I would ship Math class or some python Math on cluster would be used? 3) I have c++ compiled code. Can I ship this executable with addPyFile and just use exec function from python? Would it work? -- *Sincerely yoursEgor PakhomovScala Developer, Yandex*
Random forest - is it under implementation?
Hi, I have intern, who wants to implement some ML algorithm for spark. Which algorithm would be good idea to implement(it should be not very difficult)? I heard someone already working on random forest, but couldn't find proof of that. I'm aware of new politics, where we should implement stable, good quality, popular ML or do not do it at all. -- *Sincerely yoursEgor PakhomovScala Developer, Yandex*
Re: Random forest - is it under implementation?
Sung chung from alpine data labs presented the random Forrest implementation at Spark summit 2014. The work will be open sourced and contributed back to MLLib. Stay tuned Sent from my iPad On Jul 11, 2014, at 6:02 AM, Egor Pahomov pahomov.e...@gmail.com wrote: Hi, I have intern, who wants to implement some ML algorithm for spark. Which algorithm would be good idea to implement(it should be not very difficult)? I heard someone already working on random forest, but couldn't find proof of that. I'm aware of new politics, where we should implement stable, good quality, popular ML or do not do it at all. -- *Sincerely yoursEgor PakhomovScala Developer, Yandex*
Re: Random forest - is it under implementation?
Great. Then one question left: what would you recommend for implementation? 2014-07-11 17:43 GMT+04:00 Chester At Work ches...@alpinenow.com: Sung chung from alpine data labs presented the random Forrest implementation at Spark summit 2014. The work will be open sourced and contributed back to MLLib. Stay tuned Sent from my iPad On Jul 11, 2014, at 6:02 AM, Egor Pahomov pahomov.e...@gmail.com wrote: Hi, I have intern, who wants to implement some ML algorithm for spark. Which algorithm would be good idea to implement(it should be not very difficult)? I heard someone already working on random forest, but couldn't find proof of that. I'm aware of new politics, where we should implement stable, good quality, popular ML or do not do it at all. -- *Sincerely yoursEgor PakhomovScala Developer, Yandex* -- *Sincerely yoursEgor PakhomovScala Developer, Yandex*
Calling Scala/Java methods which operates on RDD
HI, I want to write some common utility function in Scala and want to call the same from Java/Python Spark API ( may be add some wrapper code around scala calls). Calling Scala functions from Java works fine. I was reading pyspark rdd code and find out that pyspark is able to call JavaRDD function like union/zip to get same for pyspark RDD and deserializing the output and everything works fine. But somehow I am not able to work out really simple example. I think I am missing some serialization/deserialization. Can someone confirm that is it even possible to do so? Or, would it be much easier to pass RDD data files around instead of RDD directly (from pyspark to java/scala)? For example, below code just add 1 to each element of RDD containing Integers. package flukebox.test; object TestClass{ def testFunc(data:RDD[Int])={ data.map(x = x+1) } } Calling from python, from pyspark import RDD from py4j.java_gateway import java_import java_import(sc._gateway.jvm, flukebox.test) data = sc.parallelize([1,2,3,4,5,6,7,8,9]) sc._jvm.flukebox.test.TestClass.testFunc(data._jrdd.rdd()) *This fails because testFunc get any RDD of type Byte Array.* Any help/pointer would be highly appreciated. Thanks Regards, Jai K Singh
Re: [VOTE] Release Apache Spark 1.0.1 (RC2)
Unless you can diagnose the problem quickly, Gary, I think we need to go ahead with this release as is. This release didn't touch the Mesos support as far as I know, so the problem might be a nondeterministic issue with your application. But on the other hand the release does fix some critical bugs that affect all users. We can always do 1.0.2 later if we discover a problem. Matei On Jul 10, 2014, at 9:40 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Gary, The vote technically doesn't close until I send the vote summary e-mail, but I was planning to close and package this tonight. It's too bad if there is a regression, it might be worth holding the release but it really requires narrowing down the issue to get more information about the scope and severity. Could you fork another thread for this? - Patrick On Thu, Jul 10, 2014 at 6:28 PM, Gary Malouf malouf.g...@gmail.com wrote: -1 I honestly do not know the voting rules for the Spark community, so please excuse me if I am out of line or if Mesos compatibility is not a concern at this point. We just tried to run this version built against 2.3.0-cdh5.0.2 on mesos 0.18.2. All of our jobs with data above a few gigabytes hung indefinitely. Downgrading back to the 1.0.0 stable release of Spark built the same way worked for us. On Mon, Jul 7, 2014 at 5:17 PM, Tom Graves tgraves...@yahoo.com.invalid wrote: +1. Ran some Spark on yarn jobs on a hadoop 2.4 cluster with authentication on. Tom On Friday, July 4, 2014 2:39 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.1! The tag to be voted on is v1.0.1-rc1 (commit 7d1043c): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=7d1043c99303b87aef8ee19873629c2bfba4cc78 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.0.1-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1021/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.0.1-rc2-docs/ Please vote on releasing this package as Apache Spark 1.0.1! The vote is open until Monday, July 07, at 20:45 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.0.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ === Differences from RC1 === This release includes only one blocking patch from rc1: https://github.com/apache/spark/pull/1255 There are also smaller fixes which came in over the last week. === About this release === This release fixes a few high-priority bugs in 1.0 and has a variety of smaller fixes. The full list is here: http://s.apache.org/b45. Some of the more visible patches are: SPARK-2043: ExternalAppendOnlyMap doesn't always find matching keys SPARK-2156 and SPARK-1112: Issues with jobs hanging due to akka frame size. SPARK-1790: Support r3 instance types on EC2. This is the first maintenance release on the 1.0 line. We plan to make additional maintenance releases as new fixes come in.
Re: [VOTE] Release Apache Spark 1.0.1 (RC2)
Hi Matei, We have not had time to re-deploy the rc today, but one thing that jumps out is the shrinking of the default akka frame size from 10MB to around 128KB by default. That is my first suspicion for our issue - could imagine that biting others as well. I'll try to re-test that today - either way, understand moving forward at this point. Gary On Fri, Jul 11, 2014 at 12:08 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Unless you can diagnose the problem quickly, Gary, I think we need to go ahead with this release as is. This release didn't touch the Mesos support as far as I know, so the problem might be a nondeterministic issue with your application. But on the other hand the release does fix some critical bugs that affect all users. We can always do 1.0.2 later if we discover a problem. Matei On Jul 10, 2014, at 9:40 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Gary, The vote technically doesn't close until I send the vote summary e-mail, but I was planning to close and package this tonight. It's too bad if there is a regression, it might be worth holding the release but it really requires narrowing down the issue to get more information about the scope and severity. Could you fork another thread for this? - Patrick On Thu, Jul 10, 2014 at 6:28 PM, Gary Malouf malouf.g...@gmail.com wrote: -1 I honestly do not know the voting rules for the Spark community, so please excuse me if I am out of line or if Mesos compatibility is not a concern at this point. We just tried to run this version built against 2.3.0-cdh5.0.2 on mesos 0.18.2. All of our jobs with data above a few gigabytes hung indefinitely. Downgrading back to the 1.0.0 stable release of Spark built the same way worked for us. On Mon, Jul 7, 2014 at 5:17 PM, Tom Graves tgraves...@yahoo.com.invalid wrote: +1. Ran some Spark on yarn jobs on a hadoop 2.4 cluster with authentication on. Tom On Friday, July 4, 2014 2:39 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.1! The tag to be voted on is v1.0.1-rc1 (commit 7d1043c): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=7d1043c99303b87aef8ee19873629c2bfba4cc78 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.0.1-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1021/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.0.1-rc2-docs/ Please vote on releasing this package as Apache Spark 1.0.1! The vote is open until Monday, July 07, at 20:45 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.0.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ === Differences from RC1 === This release includes only one blocking patch from rc1: https://github.com/apache/spark/pull/1255 There are also smaller fixes which came in over the last week. === About this release === This release fixes a few high-priority bugs in 1.0 and has a variety of smaller fixes. The full list is here: http://s.apache.org/b45. Some of the more visible patches are: SPARK-2043: ExternalAppendOnlyMap doesn't always find matching keys SPARK-2156 and SPARK-1112: Issues with jobs hanging due to akka frame size. SPARK-1790: Support r3 instance types on EC2. This is the first maintenance release on the 1.0 line. We plan to make additional maintenance releases as new fixes come in.
Re: [VOTE] Release Apache Spark 1.0.1 (RC2)
Hey Gary, Why do you think the akka frame size changed? It didn't change - we added some fixes for cases where users were setting non-default values. On Fri, Jul 11, 2014 at 9:31 AM, Gary Malouf malouf.g...@gmail.com wrote: Hi Matei, We have not had time to re-deploy the rc today, but one thing that jumps out is the shrinking of the default akka frame size from 10MB to around 128KB by default. That is my first suspicion for our issue - could imagine that biting others as well. I'll try to re-test that today - either way, understand moving forward at this point. Gary On Fri, Jul 11, 2014 at 12:08 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Unless you can diagnose the problem quickly, Gary, I think we need to go ahead with this release as is. This release didn't touch the Mesos support as far as I know, so the problem might be a nondeterministic issue with your application. But on the other hand the release does fix some critical bugs that affect all users. We can always do 1.0.2 later if we discover a problem. Matei On Jul 10, 2014, at 9:40 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Gary, The vote technically doesn't close until I send the vote summary e-mail, but I was planning to close and package this tonight. It's too bad if there is a regression, it might be worth holding the release but it really requires narrowing down the issue to get more information about the scope and severity. Could you fork another thread for this? - Patrick On Thu, Jul 10, 2014 at 6:28 PM, Gary Malouf malouf.g...@gmail.com wrote: -1 I honestly do not know the voting rules for the Spark community, so please excuse me if I am out of line or if Mesos compatibility is not a concern at this point. We just tried to run this version built against 2.3.0-cdh5.0.2 on mesos 0.18.2. All of our jobs with data above a few gigabytes hung indefinitely. Downgrading back to the 1.0.0 stable release of Spark built the same way worked for us. On Mon, Jul 7, 2014 at 5:17 PM, Tom Graves tgraves...@yahoo.com.invalid wrote: +1. Ran some Spark on yarn jobs on a hadoop 2.4 cluster with authentication on. Tom On Friday, July 4, 2014 2:39 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.1! The tag to be voted on is v1.0.1-rc1 (commit 7d1043c): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=7d1043c99303b87aef8ee19873629c2bfba4cc78 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.0.1-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1021/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.0.1-rc2-docs/ Please vote on releasing this package as Apache Spark 1.0.1! The vote is open until Monday, July 07, at 20:45 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.0.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ === Differences from RC1 === This release includes only one blocking patch from rc1: https://github.com/apache/spark/pull/1255 There are also smaller fixes which came in over the last week. === About this release === This release fixes a few high-priority bugs in 1.0 and has a variety of smaller fixes. The full list is here: http://s.apache.org/b45. Some of the more visible patches are: SPARK-2043: ExternalAppendOnlyMap doesn't always find matching keys SPARK-2156 and SPARK-1112: Issues with jobs hanging due to akka frame size. SPARK-1790: Support r3 instance types on EC2. This is the first maintenance release on the 1.0 line. We plan to make additional maintenance releases as new fixes come in.
Re: [VOTE] Release Apache Spark 1.0.1 (RC2)
Okay just FYI - I'm closing this vote since many people are waiting on the release and I was hoping to package it today. If we find a reproducible Mesos issue here, we can definitely spin the fix into a subsequent release. On Fri, Jul 11, 2014 at 9:37 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Gary, Why do you think the akka frame size changed? It didn't change - we added some fixes for cases where users were setting non-default values. On Fri, Jul 11, 2014 at 9:31 AM, Gary Malouf malouf.g...@gmail.com wrote: Hi Matei, We have not had time to re-deploy the rc today, but one thing that jumps out is the shrinking of the default akka frame size from 10MB to around 128KB by default. That is my first suspicion for our issue - could imagine that biting others as well. I'll try to re-test that today - either way, understand moving forward at this point. Gary On Fri, Jul 11, 2014 at 12:08 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Unless you can diagnose the problem quickly, Gary, I think we need to go ahead with this release as is. This release didn't touch the Mesos support as far as I know, so the problem might be a nondeterministic issue with your application. But on the other hand the release does fix some critical bugs that affect all users. We can always do 1.0.2 later if we discover a problem. Matei On Jul 10, 2014, at 9:40 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Gary, The vote technically doesn't close until I send the vote summary e-mail, but I was planning to close and package this tonight. It's too bad if there is a regression, it might be worth holding the release but it really requires narrowing down the issue to get more information about the scope and severity. Could you fork another thread for this? - Patrick On Thu, Jul 10, 2014 at 6:28 PM, Gary Malouf malouf.g...@gmail.com wrote: -1 I honestly do not know the voting rules for the Spark community, so please excuse me if I am out of line or if Mesos compatibility is not a concern at this point. We just tried to run this version built against 2.3.0-cdh5.0.2 on mesos 0.18.2. All of our jobs with data above a few gigabytes hung indefinitely. Downgrading back to the 1.0.0 stable release of Spark built the same way worked for us. On Mon, Jul 7, 2014 at 5:17 PM, Tom Graves tgraves...@yahoo.com.invalid wrote: +1. Ran some Spark on yarn jobs on a hadoop 2.4 cluster with authentication on. Tom On Friday, July 4, 2014 2:39 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.1! The tag to be voted on is v1.0.1-rc1 (commit 7d1043c): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=7d1043c99303b87aef8ee19873629c2bfba4cc78 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.0.1-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1021/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.0.1-rc2-docs/ Please vote on releasing this package as Apache Spark 1.0.1! The vote is open until Monday, July 07, at 20:45 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.0.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ === Differences from RC1 === This release includes only one blocking patch from rc1: https://github.com/apache/spark/pull/1255 There are also smaller fixes which came in over the last week. === About this release === This release fixes a few high-priority bugs in 1.0 and has a variety of smaller fixes. The full list is here: http://s.apache.org/b45. Some of the more visible patches are: SPARK-2043: ExternalAppendOnlyMap doesn't always find matching keys SPARK-2156 and SPARK-1112: Issues with jobs hanging due to akka frame size. SPARK-1790: Support r3 instance types on EC2. This is the first maintenance release on the 1.0 line. We plan to make additional maintenance releases as new fixes come in.
[RESULT] [VOTE] Release Apache Spark 1.0.1 (RC2)
This vote has passed with 9 +1 votes (5 binding) and 1 -1 vote (0 binding). +1: Patrick Wendell* Mark Hamstra* DB Tsai Krishna Sankar Soren Macbeth Andrew Or Matei Zaharia* Xiangrui Meng* Tom Graves* 0: -1: Gary Malouf
Re: How pySpark works?
Hi Egor, Here are a few answers to your questions: 1) Python needs to be installed on all machines, but not pyspark. The way the executors get the pyspark code depends on which cluster manager you use. In standalone mode, your executors need to have the actual python files in their working directory. In yarn mode, python files are included in the assembly jar, which is then shipped to your executor containers through a distributed cache. 2) Pyspark is just a thin wrapper around Spark. When you write a closure in python, it is shipped to the executors within the task itself the same way scala closures are shipped. If you use a special library, then all of the nodes will need to have that library pre-installed. 3) Are you trying to run your c++ code inside the map function? If so, you need to make sure the compiled code is present in the working directory on all the executors before-hand for python to exec it. I haven't done this before, but maybe there are a few gotchas in doing this. Maybe others can add more information? Andrew 2014-07-11 5:50 GMT-07:00 Egor Pahomov pahomov.e...@gmail.com: Hi, I want to use pySpark, but can't understand how it works. Documentation doesn't provide enough information. 1) How python shipped to cluster? Should machines in cluster already have python? 2) What happens when I write some python code in map function - is it shipped to cluster and just executed on it? How it understand all dependencies, which my code need and ship it there? If I use Math in my code in map does it mean, that I would ship Math class or some python Math on cluster would be used? 3) I have c++ compiled code. Can I ship this executable with addPyFile and just use exec function from python? Would it work? -- *Sincerely yoursEgor PakhomovScala Developer, Yandex*
Re: How pySpark works?
Also take a look at this: https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals On Fri, Jul 11, 2014 at 10:29 AM, Andrew Or and...@databricks.com wrote: Hi Egor, Here are a few answers to your questions: 1) Python needs to be installed on all machines, but not pyspark. The way the executors get the pyspark code depends on which cluster manager you use. In standalone mode, your executors need to have the actual python files in their working directory. In yarn mode, python files are included in the assembly jar, which is then shipped to your executor containers through a distributed cache. 2) Pyspark is just a thin wrapper around Spark. When you write a closure in python, it is shipped to the executors within the task itself the same way scala closures are shipped. If you use a special library, then all of the nodes will need to have that library pre-installed. 3) Are you trying to run your c++ code inside the map function? If so, you need to make sure the compiled code is present in the working directory on all the executors before-hand for python to exec it. I haven't done this before, but maybe there are a few gotchas in doing this. Maybe others can add more information? Andrew 2014-07-11 5:50 GMT-07:00 Egor Pahomov pahomov.e...@gmail.com: Hi, I want to use pySpark, but can't understand how it works. Documentation doesn't provide enough information. 1) How python shipped to cluster? Should machines in cluster already have python? 2) What happens when I write some python code in map function - is it shipped to cluster and just executed on it? How it understand all dependencies, which my code need and ship it there? If I use Math in my code in map does it mean, that I would ship Math class or some python Math on cluster would be used? 3) I have c++ compiled code. Can I ship this executable with addPyFile and just use exec function from python? Would it work? -- *Sincerely yoursEgor PakhomovScala Developer, Yandex*
Re: Calling Scala/Java methods which operates on RDD
Hi Jai, Your suspicion is correct. In general, Python RDDs are pickled into byte arrays and stored in Java land as RDDs of byte arrays. union/zip operates on byte arrays directly without deserializing. Currently, Python byte arrays only get unpickled into Java objects in special cases, like SQL functions or saving to Sequence Files (upcoming). Hope it helps. Kan On Fri, Jul 11, 2014 at 5:04 AM, Jai Kumar Singh fluke...@flukebox.in wrote: HI, I want to write some common utility function in Scala and want to call the same from Java/Python Spark API ( may be add some wrapper code around scala calls). Calling Scala functions from Java works fine. I was reading pyspark rdd code and find out that pyspark is able to call JavaRDD function like union/zip to get same for pyspark RDD and deserializing the output and everything works fine. But somehow I am not able to work out really simple example. I think I am missing some serialization/deserialization. Can someone confirm that is it even possible to do so? Or, would it be much easier to pass RDD data files around instead of RDD directly (from pyspark to java/scala)? For example, below code just add 1 to each element of RDD containing Integers. package flukebox.test; object TestClass{ def testFunc(data:RDD[Int])={ data.map(x = x+1) } } Calling from python, from pyspark import RDD from py4j.java_gateway import java_import java_import(sc._gateway.jvm, flukebox.test) data = sc.parallelize([1,2,3,4,5,6,7,8,9]) sc._jvm.flukebox.test.TestClass.testFunc(data._jrdd.rdd()) *This fails because testFunc get any RDD of type Byte Array.* Any help/pointer would be highly appreciated. Thanks Regards, Jai K Singh
Announcing Spark 1.0.1
I am happy to announce the availability of Spark 1.0.1! This release includes contributions from 70 developers. Spark 1.0.0 includes fixes across several areas of Spark, including the core API, PySpark, and MLlib. It also includes new features in Spark's (alpha) SQL library, including support for JSON data and performance and stability fixes. Visit the release notes[1] to read about this release or download[2] the release today. [1] http://spark.apache.org/releases/spark-release-1-0-1.html [2] http://spark.apache.org/downloads.html
Re: Announcing Spark 1.0.1
Congrats to the Spark community ! On Friday, July 11, 2014, Patrick Wendell pwend...@gmail.com wrote: I am happy to announce the availability of Spark 1.0.1! This release includes contributions from 70 developers. Spark 1.0.0 includes fixes across several areas of Spark, including the core API, PySpark, and MLlib. It also includes new features in Spark's (alpha) SQL library, including support for JSON data and performance and stability fixes. Visit the release notes[1] to read about this release or download[2] the release today. [1] http://spark.apache.org/releases/spark-release-1-0-1.html [2] http://spark.apache.org/downloads.html