enable Spark on Mesos security delegation token transfer
Hi, Since spark 1.0 has security integretion with YARN, it enabled transfer credetials include delegation token from scheduler to executor side. It is done in startContainerRequest RPC call, a crendetial will be pass to the executor side, so that executor UserGroupInformation will load the credential and get authenticated with secured HDFS; We know that hadoop’s RPC can be configured to encrypted, so spark on yarn’s security is good. While for spark on mesos, credential can not trasnfered to the executor side, we can not integrate secured HDFS in mesos deployment. To do the credential transfering, my solution is 1) Add crendetial field in the mesos’s proto structure named TaskInfo 2) Modify spark scheduler’s code, read credential from UserGroupInformation and store it into the field mentioned in 1). 3) Modify spark executor’s code, add credetianl load logic before executor started. In this way, the mesos can do the credential transfer in the launchTask message. But still, the libprocess message in mesos is not encrypted, it can not protect the crendetial in tranferring. There is 2 solutions 1) Make the libprocess communitication layer support encryption. May should add ssl support to the libprocess 2) Just encrypt the credential part, using some pre-deployed secret key in mesos. Currently we choose the second. This work will effect both spark and mesos layer, and will change one interface between them; I don’t have much dev experience on spark and mesos, so and ideas/suggestions, please let me know. Thanks. Peter Shi
Re: [VOTE] Release Apache Spark 1.0.0 (RC11)
Testing... Resending as it appears my message didn't go through last week. Tom On Wednesday, May 28, 2014 4:12 PM, Tom Graves tgraves...@yahoo.com wrote: +1. Tested spark on yarn (cluster mode, client mode, pyspark, spark-shell) on hadoop 0.23 and 2.4. Tom On Wednesday, May 28, 2014 3:07 PM, Sean McNamara sean.mcnam...@webtrends.com wrote: Pulled down, compiled, and tested examples on OS X and ubuntu. Deployed app we are building on spark and poured data through it. +1 Sean On May 26, 2014, at 8:39 AM, Tathagata Das tathagata.das1...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.0! This has a few important bug fixes on top of rc10: SPARK-1900 and SPARK-1918: https://github.com/apache/spark/pull/853 SPARK-1870: https://github.com/apache/spark/pull/848 SPARK-1897: https://github.com/apache/spark/pull/849 The tag to be voted on is v1.0.0-rc11 (commit c69d97cd): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=c69d97cdb42f809cb71113a1db4194c21372242a The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc11/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/tdas.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1019/ The documentation corresponding to this release can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/ Please vote on releasing this package as Apache Spark 1.0.0! The vote is open until Thursday, May 29, at 16:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.0.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == API Changes == We welcome users to compile Spark applications against 1.0. There are a few API changes in this release. Here are links to the associated upgrade guides - user facing changes have been kept as small as possible. Changes to ML vector specification: http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/mllib-guide.html#from-09-to-10 Changes to the Java API: http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark Changes to the streaming API: http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x Changes to the GraphX API: http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091 Other changes: coGroup and related functions now return Iterable[T] instead of Seq[T] == Call toSeq on the result to restore the old behavior SparkContext.jarOfClass returns Option[String] instead of Seq[String] == Call toSeq on the result to restore old behavior
Re: [VOTE] Release Apache Spark 1.0.0 (RC11)
Received! On Wed, Jun 4, 2014 at 10:47 AM, Tom Graves tgraves...@yahoo.com.invalid wrote: Testing... Resending as it appears my message didn't go through last week. Tom On Wednesday, May 28, 2014 4:12 PM, Tom Graves tgraves...@yahoo.com wrote: +1. Tested spark on yarn (cluster mode, client mode, pyspark, spark-shell) on hadoop 0.23 and 2.4. Tom On Wednesday, May 28, 2014 3:07 PM, Sean McNamara sean.mcnam...@webtrends.com wrote: Pulled down, compiled, and tested examples on OS X and ubuntu. Deployed app we are building on spark and poured data through it. +1 Sean On May 26, 2014, at 8:39 AM, Tathagata Das tathagata.das1...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.0! This has a few important bug fixes on top of rc10: SPARK-1900 and SPARK-1918: https://github.com/apache/spark/pull/853 SPARK-1870: https://github.com/apache/spark/pull/848 SPARK-1897: https://github.com/apache/spark/pull/849 The tag to be voted on is v1.0.0-rc11 (commit c69d97cd): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=c69d97cdb42f809cb71113a1db4194c21372242a The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc11/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/tdas.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1019/ The documentation corresponding to this release can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/ Please vote on releasing this package as Apache Spark 1.0.0! The vote is open until Thursday, May 29, at 16:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.0.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == API Changes == We welcome users to compile Spark applications against 1.0. There are a few API changes in this release. Here are links to the associated upgrade guides - user facing changes have been kept as small as possible. Changes to ML vector specification: http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/mllib-guide.html#from-09-to-10 Changes to the Java API: http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark Changes to the streaming API: http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x Changes to the GraphX API: http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091 Other changes: coGroup and related functions now return Iterable[T] instead of Seq[T] == Call toSeq on the result to restore the old behavior SparkContext.jarOfClass returns Option[String] instead of Seq[String] == Call toSeq on the result to restore old behavior
Re: [VOTE] Release Apache Spark 1.0.0 (RC11)
Hi Patrick, We maintain internal Spark mirror in sync with Spark github master... What's the way to get the 1.0.0 stable release from github to deploy on our production cluster ? Is there a tag for 1.0.0 that I should use to deploy ? Thanks. Deb On Wed, Jun 4, 2014 at 10:49 AM, Patrick Wendell pwend...@gmail.com wrote: Received! On Wed, Jun 4, 2014 at 10:47 AM, Tom Graves tgraves...@yahoo.com.invalid wrote: Testing... Resending as it appears my message didn't go through last week. Tom On Wednesday, May 28, 2014 4:12 PM, Tom Graves tgraves...@yahoo.com wrote: +1. Tested spark on yarn (cluster mode, client mode, pyspark, spark-shell) on hadoop 0.23 and 2.4. Tom On Wednesday, May 28, 2014 3:07 PM, Sean McNamara sean.mcnam...@webtrends.com wrote: Pulled down, compiled, and tested examples on OS X and ubuntu. Deployed app we are building on spark and poured data through it. +1 Sean On May 26, 2014, at 8:39 AM, Tathagata Das tathagata.das1...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.0! This has a few important bug fixes on top of rc10: SPARK-1900 and SPARK-1918: https://github.com/apache/spark/pull/853 SPARK-1870: https://github.com/apache/spark/pull/848 SPARK-1897: https://github.com/apache/spark/pull/849 The tag to be voted on is v1.0.0-rc11 (commit c69d97cd): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=c69d97cdb42f809cb71113a1db4194c21372242a The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc11/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/tdas.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1019/ The documentation corresponding to this release can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/ Please vote on releasing this package as Apache Spark 1.0.0! The vote is open until Thursday, May 29, at 16:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.0.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == API Changes == We welcome users to compile Spark applications against 1.0. There are a few API changes in this release. Here are links to the associated upgrade guides - user facing changes have been kept as small as possible. Changes to ML vector specification: http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/mllib-guide.html#from-09-to-10 Changes to the Java API: http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark Changes to the streaming API: http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x Changes to the GraphX API: http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091 Other changes: coGroup and related functions now return Iterable[T] instead of Seq[T] == Call toSeq on the result to restore the old behavior SparkContext.jarOfClass returns Option[String] instead of Seq[String] == Call toSeq on the result to restore old behavior
Re: [VOTE] Release Apache Spark 1.0.0 (RC11)
Hey There, The best way is to use the v1.0.0 tag: https://github.com/apache/spark/releases/tag/v1.0.0 - Patrick On Wed, Jun 4, 2014 at 12:19 PM, Debasish Das debasish.da...@gmail.com wrote: Hi Patrick, We maintain internal Spark mirror in sync with Spark github master... What's the way to get the 1.0.0 stable release from github to deploy on our production cluster ? Is there a tag for 1.0.0 that I should use to deploy ? Thanks. Deb On Wed, Jun 4, 2014 at 10:49 AM, Patrick Wendell pwend...@gmail.com wrote: Received! On Wed, Jun 4, 2014 at 10:47 AM, Tom Graves tgraves...@yahoo.com.invalid wrote: Testing... Resending as it appears my message didn't go through last week. Tom On Wednesday, May 28, 2014 4:12 PM, Tom Graves tgraves...@yahoo.com wrote: +1. Tested spark on yarn (cluster mode, client mode, pyspark, spark-shell) on hadoop 0.23 and 2.4. Tom On Wednesday, May 28, 2014 3:07 PM, Sean McNamara sean.mcnam...@webtrends.com wrote: Pulled down, compiled, and tested examples on OS X and ubuntu. Deployed app we are building on spark and poured data through it. +1 Sean On May 26, 2014, at 8:39 AM, Tathagata Das tathagata.das1...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.0! This has a few important bug fixes on top of rc10: SPARK-1900 and SPARK-1918: https://github.com/apache/spark/pull/853 SPARK-1870: https://github.com/apache/spark/pull/848 SPARK-1897: https://github.com/apache/spark/pull/849 The tag to be voted on is v1.0.0-rc11 (commit c69d97cd): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=c69d97cdb42f809cb71113a1db4194c21372242a The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc11/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/tdas.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1019/ The documentation corresponding to this release can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/ Please vote on releasing this package as Apache Spark 1.0.0! The vote is open until Thursday, May 29, at 16:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.0.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == API Changes == We welcome users to compile Spark applications against 1.0. There are a few API changes in this release. Here are links to the associated upgrade guides - user facing changes have been kept as small as possible. Changes to ML vector specification: http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/mllib-guide.html#from-09-to-10 Changes to the Java API: http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark Changes to the streaming API: http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x Changes to the GraphX API: http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091 Other changes: coGroup and related functions now return Iterable[T] instead of Seq[T] == Call toSeq on the result to restore the old behavior SparkContext.jarOfClass returns Option[String] instead of Seq[String] == Call toSeq on the result to restore old behavior
Re: Add my JIRA username (hsaputra) to Spark's contributor's list
Uh,write my name wrong, right should be Guoqiang Li rather than Guoquiang Li -- Original -- From: Kan Zhang;kzh...@apache.org; Date: Wed, Jun 4, 2014 03:00 AM To: devdev@spark.apache.org; Subject: Re: Add my JIRA username (hsaputra) to Spark's contributor's list Same here please, username (kzhang). Thanks! On Tue, Jun 3, 2014 at 11:39 AM, Henry Saputra henry.sapu...@gmail.com wrote: Thanks Matei! - Henry On Tue, Jun 3, 2014 at 11:36 AM, Matei Zaharia matei.zaha...@gmail.com wrote: Done. Looks like this was lost in the JIRA import. Matei On Jun 3, 2014, at 11:33 AM, Henry Saputra henry.sapu...@gmail.com wrote: Hi, Could someone with right karma kindly add my username (hsaputra) to Spark's contributor list? I was added before but somehow now I can no longer assign ticket to myself nor update tickets I am working on. Thanks, - Henry
Re: Announcing Spark 1.0.0
Could someone please clarify my confusion or is this not an issue that we should be concerned about? Thanks, Rahul Singhal On 30/05/14 5:28 PM, Rahul Singhal rahul.sing...@guavus.com wrote: Is it intentional/ok that the tag v1.0.0 is behind tag v1.0.0-rc11? Thanks, Rahul Singhal On 30/05/14 3:43 PM, Patrick Wendell pwend...@gmail.com wrote: I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0 is a milestone release as the first in the 1.0 line of releases, providing API stability for Spark's core interfaces. Spark 1.0.0 is Spark's largest release ever, with contributions from 117 developers. I'd like to thank everyone involved in this release - it was truly a community effort with fixes, features, and optimizations contributed from dozens of organizations. This release expands Spark's standard libraries, introducing a new SQL package (SparkSQL) which lets users integrate SQL queries into existing Spark workflows. MLlib, Spark's machine learning library, is expanded with sparse vector support and several new algorithms. The GraphX and Streaming libraries also introduce new features and optimizations. Spark's core engine adds support for secured YARN clusters, a unified tool for submitting Spark applications, and several performance and stability improvements. Finally, Spark adds support for Java 8 lambda syntax and improves coverage of the Java and Python API's. Those features only scratch the surface - check out the release notes here: http://spark.apache.org/releases/spark-release-1-0-0.html Note that since release artifacts were posted recently, certain mirrors may not have working downloads for a few hours. - Patrick
Re: Announcing Spark 1.0.0
Hey Rahul, The v1.0.0 tag is correct. When we release Spark we create multiple candidates. One of the candidates is promoted to the full release. So rc11 is also the same as the official v1.0.0 release. - Patrick On Wed, Jun 4, 2014 at 8:29 PM, Rahul Singhal rahul.sing...@guavus.com wrote: Could someone please clarify my confusion or is this not an issue that we should be concerned about? Thanks, Rahul Singhal On 30/05/14 5:28 PM, Rahul Singhal rahul.sing...@guavus.com wrote: Is it intentional/ok that the tag v1.0.0 is behind tag v1.0.0-rc11? Thanks, Rahul Singhal On 30/05/14 3:43 PM, Patrick Wendell pwend...@gmail.com wrote: I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0 is a milestone release as the first in the 1.0 line of releases, providing API stability for Spark's core interfaces. Spark 1.0.0 is Spark's largest release ever, with contributions from 117 developers. I'd like to thank everyone involved in this release - it was truly a community effort with fixes, features, and optimizations contributed from dozens of organizations. This release expands Spark's standard libraries, introducing a new SQL package (SparkSQL) which lets users integrate SQL queries into existing Spark workflows. MLlib, Spark's machine learning library, is expanded with sparse vector support and several new algorithms. The GraphX and Streaming libraries also introduce new features and optimizations. Spark's core engine adds support for secured YARN clusters, a unified tool for submitting Spark applications, and several performance and stability improvements. Finally, Spark adds support for Java 8 lambda syntax and improves coverage of the Java and Python API's. Those features only scratch the surface - check out the release notes here: http://spark.apache.org/releases/spark-release-1-0-0.html Note that since release artifacts were posted recently, certain mirrors may not have working downloads for a few hours. - Patrick
Re: What is the correct Spark version of master/branch-1.0?
Thank you for your reply. I've sent pull requests. Thanks. 2014-06-05 3:16 GMT+09:00 Patrick Wendell pwend...@gmail.com: It should be 1.1-SNAPSHOT. Feel free to submit a PR to clean up any inconsistencies. On Tue, Jun 3, 2014 at 8:33 PM, Takuya UESHIN ues...@happy-camper.st wrote: Hi all, I'm wondering what is the correct Spark version of each HEAD of master and branch-1.0. current master HEAD (e8d93ee5284cb6a1d4551effe91ee8d233323329): - pom.xml: 1.0.0-SNAPSHOT - SparkBuild.scala: 1.1.0-SNAPSHOT It should be 1.1.0-SNAPSHOT? current branch-1.0 HEAD (d96794132e37cf57f8dd945b9d11f8adcfc30490): - pom.xml: 1.0.1-SNAPSHOT - SparkBuild.scala: 1.0.0 It should be 1.0.1-SNAPSHOT? Thanks. -- Takuya UESHIN Tokyo, Japan http://twitter.com/ueshin -- Takuya UESHIN Tokyo, Japan http://twitter.com/ueshin