enable Spark on Mesos security delegation token transfer

2014-06-04 Thread Shihaoliang (Shihaoliang)
Hi,

Since spark 1.0 has security integretion with YARN, it enabled transfer 
credetials include delegation token from scheduler to executor side.
It is done in startContainerRequest RPC call, a crendetial will be pass to the 
executor side, so that executor UserGroupInformation will load the credential 
and get authenticated with secured HDFS;
We know that hadoop’s RPC can be configured to encrypted, so spark on yarn’s 
security is good.

While for spark on mesos, credential can not trasnfered to the executor side, 
we can not integrate secured HDFS in mesos deployment.

To do the credential transfering, my solution is

1)   Add crendetial field in the mesos’s proto structure named TaskInfo

2)   Modify spark scheduler’s code, read credential from 
UserGroupInformation and store it into the field mentioned in 1).

3)   Modify spark executor’s code, add credetianl load logic before 
executor started.

In this way, the mesos can do the credential transfer in the launchTask message.

But still, the libprocess message in mesos is not encrypted, it can not protect 
the crendetial in tranferring.

There is 2 solutions

1)   Make the libprocess communitication layer support encryption. May 
should add ssl support to the libprocess

2)   Just encrypt the credential part, using some pre-deployed secret key 
in mesos.

Currently we choose the second.

This work will effect both spark and mesos layer, and will change one interface 
between them;

I don’t have much dev experience on spark and mesos, so and ideas/suggestions, 
please let me know.

Thanks.
Peter Shi



Re: [VOTE] Release Apache Spark 1.0.0 (RC11)

2014-06-04 Thread Tom Graves
Testing... Resending as it appears my message didn't go through last week.

Tom


On Wednesday, May 28, 2014 4:12 PM, Tom Graves tgraves...@yahoo.com wrote:
 


+1. Tested spark on yarn (cluster mode, client mode, pyspark, spark-shell) on 
hadoop 0.23 and 2.4. 

Tom


On Wednesday, May 28, 2014 3:07 PM, Sean McNamara sean.mcnam...@webtrends.com 
wrote:
 


Pulled down, compiled, and tested examples on OS X and ubuntu.
Deployed app we are building on spark and poured data through it.

+1

Sean



On May 26, 2014, at 8:39 AM, Tathagata Das tathagata.das1...@gmail.com wrote:

 Please vote on releasing the following candidate as Apache Spark version 
 1.0.0!
 
 This has a few important bug fixes on top of rc10:
 SPARK-1900 and SPARK-1918: https://github.com/apache/spark/pull/853
 SPARK-1870: https://github.com/apache/spark/pull/848
 SPARK-1897: https://github.com/apache/spark/pull/849
 
 The tag to be voted on is v1.0.0-rc11 (commit c69d97cd):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=c69d97cdb42f809cb71113a1db4194c21372242a
 
 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~tdas/spark-1.0.0-rc11/
 
 Release
 artifacts are signed with the following key:
 https://people.apache.org/keys/committer/tdas.asc
 
 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1019/
 
 The documentation corresponding to this release can be found at:
 http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/
 
 Please vote on releasing this package as Apache Spark 1.0.0!
 
 The vote is open until
 Thursday, May 29, at 16:00 UTC and passes if
 a majority of at least 3 +1 PMC votes are cast.
 
 [ ] +1 Release this package as Apache Spark 1.0.0
 [ ] -1 Do not release this package because ...
 
 To learn more about Apache Spark, please see
 http://spark.apache.org/
 
 == API Changes ==
 We welcome users to compile Spark applications against 1.0. There are
 a few API changes in this release. Here are links to the associated
 upgrade guides - user facing changes have been kept as small as
 possible.
 
 Changes to ML vector specification:
 http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/mllib-guide.html#from-09-to-10
 
 Changes to the Java API:
 http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark
 
 Changes to the streaming API:
 http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x
 
 Changes to the GraphX API:
 http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091
 
 Other changes:
 coGroup and related functions now return Iterable[T] instead of Seq[T]
 == Call toSeq on the result to restore the old behavior
 
 SparkContext.jarOfClass returns Option[String] instead of
 Seq[String]
 == Call toSeq on the result to restore old behavior

Re: [VOTE] Release Apache Spark 1.0.0 (RC11)

2014-06-04 Thread Patrick Wendell
Received!

On Wed, Jun 4, 2014 at 10:47 AM, Tom Graves
tgraves...@yahoo.com.invalid wrote:
 Testing... Resending as it appears my message didn't go through last week.

 Tom


 On Wednesday, May 28, 2014 4:12 PM, Tom Graves tgraves...@yahoo.com wrote:



 +1. Tested spark on yarn (cluster mode, client mode, pyspark, spark-shell) on 
 hadoop 0.23 and 2.4.

 Tom


 On Wednesday, May 28, 2014 3:07 PM, Sean McNamara 
 sean.mcnam...@webtrends.com wrote:



 Pulled down, compiled, and tested examples on OS X and ubuntu.
 Deployed app we are building on spark and poured data through it.

 +1

 Sean



 On May 26, 2014, at 8:39 AM, Tathagata Das tathagata.das1...@gmail.com 
 wrote:

 Please vote on releasing the following candidate as Apache Spark version 
 1.0.0!

 This has a few important bug fixes on top of rc10:
 SPARK-1900 and SPARK-1918: https://github.com/apache/spark/pull/853
 SPARK-1870: https://github.com/apache/spark/pull/848
 SPARK-1897: https://github.com/apache/spark/pull/849

 The tag to be voted on is v1.0.0-rc11 (commit c69d97cd):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=c69d97cdb42f809cb71113a1db4194c21372242a

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~tdas/spark-1.0.0-rc11/

 Release
  artifacts are signed with the following key:
 https://people.apache.org/keys/committer/tdas.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1019/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/

 Please vote on releasing this package as Apache Spark 1.0.0!

 The vote is open until
  Thursday, May 29, at 16:00 UTC and passes if
 a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.0.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == API Changes ==
 We welcome users to compile Spark applications against 1.0. There are
 a few API changes in this release. Here are links to the associated
 upgrade guides - user facing changes have been kept as small as
 possible.

 Changes to ML vector specification:
 http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/mllib-guide.html#from-09-to-10

 Changes to the Java API:
 http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark

 Changes to the streaming API:
 http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x

 Changes to the GraphX API:
 http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091

 Other changes:
 coGroup and related functions now return Iterable[T] instead of Seq[T]
 == Call toSeq on the result to restore the old behavior

 SparkContext.jarOfClass returns Option[String] instead of
  Seq[String]
 == Call toSeq on the result to restore old behavior


Re: [VOTE] Release Apache Spark 1.0.0 (RC11)

2014-06-04 Thread Debasish Das
Hi Patrick,

We maintain internal Spark mirror in sync with Spark github master...

What's the way to get the 1.0.0 stable release from github to deploy on our
production cluster ? Is there a tag for 1.0.0 that I should use to deploy ?

Thanks.
Deb



On Wed, Jun 4, 2014 at 10:49 AM, Patrick Wendell pwend...@gmail.com wrote:

 Received!

 On Wed, Jun 4, 2014 at 10:47 AM, Tom Graves
 tgraves...@yahoo.com.invalid wrote:
  Testing... Resending as it appears my message didn't go through last
 week.
 
  Tom
 
 
  On Wednesday, May 28, 2014 4:12 PM, Tom Graves tgraves...@yahoo.com
 wrote:
 
 
 
  +1. Tested spark on yarn (cluster mode, client mode, pyspark,
 spark-shell) on hadoop 0.23 and 2.4.
 
  Tom
 
 
  On Wednesday, May 28, 2014 3:07 PM, Sean McNamara
 sean.mcnam...@webtrends.com wrote:
 
 
 
  Pulled down, compiled, and tested examples on OS X and ubuntu.
  Deployed app we are building on spark and poured data through it.
 
  +1
 
  Sean
 
 
 
  On May 26, 2014, at 8:39 AM, Tathagata Das tathagata.das1...@gmail.com
 wrote:
 
  Please vote on releasing the following candidate as Apache Spark
 version 1.0.0!
 
  This has a few important bug fixes on top of rc10:
  SPARK-1900 and SPARK-1918: https://github.com/apache/spark/pull/853
  SPARK-1870: https://github.com/apache/spark/pull/848
  SPARK-1897: https://github.com/apache/spark/pull/849
 
  The tag to be voted on is v1.0.0-rc11 (commit c69d97cd):
 
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=c69d97cdb42f809cb71113a1db4194c21372242a
 
  The release files, including signatures, digests, etc. can be found at:
  http://people.apache.org/~tdas/spark-1.0.0-rc11/
 
  Release
   artifacts are signed with the following key:
  https://people.apache.org/keys/committer/tdas.asc
 
  The staging repository for this release can be found at:
  https://repository.apache.org/content/repositories/orgapachespark-1019/
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/
 
  Please vote on releasing this package as Apache Spark 1.0.0!
 
  The vote is open until
   Thursday, May 29, at 16:00 UTC and passes if
  a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.0.0
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  == API Changes ==
  We welcome users to compile Spark applications against 1.0. There are
  a few API changes in this release. Here are links to the associated
  upgrade guides - user facing changes have been kept as small as
  possible.
 
  Changes to ML vector specification:
 
 http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/mllib-guide.html#from-09-to-10
 
  Changes to the Java API:
 
 http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark
 
  Changes to the streaming API:
 
 http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x
 
  Changes to the GraphX API:
 
 http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091
 
  Other changes:
  coGroup and related functions now return Iterable[T] instead of Seq[T]
  == Call toSeq on the result to restore the old behavior
 
  SparkContext.jarOfClass returns Option[String] instead of
   Seq[String]
  == Call toSeq on the result to restore old behavior



Re: [VOTE] Release Apache Spark 1.0.0 (RC11)

2014-06-04 Thread Patrick Wendell
Hey There,

The best way is to use the v1.0.0 tag:
https://github.com/apache/spark/releases/tag/v1.0.0

- Patrick

On Wed, Jun 4, 2014 at 12:19 PM, Debasish Das debasish.da...@gmail.com wrote:
 Hi Patrick,

 We maintain internal Spark mirror in sync with Spark github master...

 What's the way to get the 1.0.0 stable release from github to deploy on our
 production cluster ? Is there a tag for 1.0.0 that I should use to deploy ?

 Thanks.
 Deb



 On Wed, Jun 4, 2014 at 10:49 AM, Patrick Wendell pwend...@gmail.com wrote:

 Received!

 On Wed, Jun 4, 2014 at 10:47 AM, Tom Graves
 tgraves...@yahoo.com.invalid wrote:
  Testing... Resending as it appears my message didn't go through last
 week.
 
  Tom
 
 
  On Wednesday, May 28, 2014 4:12 PM, Tom Graves tgraves...@yahoo.com
 wrote:
 
 
 
  +1. Tested spark on yarn (cluster mode, client mode, pyspark,
 spark-shell) on hadoop 0.23 and 2.4.
 
  Tom
 
 
  On Wednesday, May 28, 2014 3:07 PM, Sean McNamara
 sean.mcnam...@webtrends.com wrote:
 
 
 
  Pulled down, compiled, and tested examples on OS X and ubuntu.
  Deployed app we are building on spark and poured data through it.
 
  +1
 
  Sean
 
 
 
  On May 26, 2014, at 8:39 AM, Tathagata Das tathagata.das1...@gmail.com
 wrote:
 
  Please vote on releasing the following candidate as Apache Spark
 version 1.0.0!
 
  This has a few important bug fixes on top of rc10:
  SPARK-1900 and SPARK-1918: https://github.com/apache/spark/pull/853
  SPARK-1870: https://github.com/apache/spark/pull/848
  SPARK-1897: https://github.com/apache/spark/pull/849
 
  The tag to be voted on is v1.0.0-rc11 (commit c69d97cd):
 
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=c69d97cdb42f809cb71113a1db4194c21372242a
 
  The release files, including signatures, digests, etc. can be found at:
  http://people.apache.org/~tdas/spark-1.0.0-rc11/
 
  Release
   artifacts are signed with the following key:
  https://people.apache.org/keys/committer/tdas.asc
 
  The staging repository for this release can be found at:
  https://repository.apache.org/content/repositories/orgapachespark-1019/
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/
 
  Please vote on releasing this package as Apache Spark 1.0.0!
 
  The vote is open until
   Thursday, May 29, at 16:00 UTC and passes if
  a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.0.0
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  == API Changes ==
  We welcome users to compile Spark applications against 1.0. There are
  a few API changes in this release. Here are links to the associated
  upgrade guides - user facing changes have been kept as small as
  possible.
 
  Changes to ML vector specification:
 
 http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/mllib-guide.html#from-09-to-10
 
  Changes to the Java API:
 
 http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark
 
  Changes to the streaming API:
 
 http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x
 
  Changes to the GraphX API:
 
 http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091
 
  Other changes:
  coGroup and related functions now return Iterable[T] instead of Seq[T]
  == Call toSeq on the result to restore the old behavior
 
  SparkContext.jarOfClass returns Option[String] instead of
   Seq[String]
  == Call toSeq on the result to restore old behavior



Re: Add my JIRA username (hsaputra) to Spark's contributor's list

2014-06-04 Thread witgo
Uh,write my name wrong, right should be Guoqiang Li rather than Guoquiang Li




-- Original --
From:  Kan Zhang;kzh...@apache.org;
Date:  Wed, Jun 4, 2014 03:00 AM
To:  devdev@spark.apache.org; 

Subject:  Re: Add my JIRA username (hsaputra) to Spark's contributor's list



Same here please, username (kzhang). Thanks!


On Tue, Jun 3, 2014 at 11:39 AM, Henry Saputra henry.sapu...@gmail.com
wrote:

 Thanks Matei!

 - Henry

 On Tue, Jun 3, 2014 at 11:36 AM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
  Done. Looks like this was lost in the JIRA import.
 
  Matei
 
  On Jun 3, 2014, at 11:33 AM, Henry Saputra henry.sapu...@gmail.com
 wrote:
 
  Hi,
 
  Could someone with right karma kindly add my username (hsaputra) to
  Spark's contributor list?
 
  I was added before but somehow now I can no longer assign ticket to
  myself nor update tickets I am working on.
 
 
  Thanks,
 
  - Henry
 


Re: Announcing Spark 1.0.0

2014-06-04 Thread Rahul Singhal
Could someone please clarify my confusion or is this not an issue that we
should be concerned about?

Thanks,
Rahul Singhal





On 30/05/14 5:28 PM, Rahul Singhal rahul.sing...@guavus.com wrote:

Is it intentional/ok that the tag v1.0.0 is behind tag v1.0.0-rc11?


Thanks,
Rahul Singhal





On 30/05/14 3:43 PM, Patrick Wendell pwend...@gmail.com wrote:

I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0
is a milestone release as the first in the 1.0 line of releases,
providing API stability for Spark's core interfaces.

Spark 1.0.0 is Spark's largest release ever, with contributions from
117 developers. I'd like to thank everyone involved in this release -
it was truly a community effort with fixes, features, and
optimizations contributed from dozens of organizations.

This release expands Spark's standard libraries, introducing a new SQL
package (SparkSQL) which lets users integrate SQL queries into
existing Spark workflows. MLlib, Spark's machine learning library, is
expanded with sparse vector support and several new algorithms. The
GraphX and Streaming libraries also introduce new features and
optimizations. Spark's core engine adds support for secured YARN
clusters, a unified tool for submitting Spark applications, and
several performance and stability improvements. Finally, Spark adds
support for Java 8 lambda syntax and improves coverage of the Java and
Python API's.

Those features only scratch the surface - check out the release notes
here:
http://spark.apache.org/releases/spark-release-1-0-0.html

Note that since release artifacts were posted recently, certain
mirrors may not have working downloads for a few hours.

- Patrick




Re: Announcing Spark 1.0.0

2014-06-04 Thread Patrick Wendell
Hey Rahul,

The v1.0.0 tag is correct. When we release Spark we create multiple
candidates. One of the candidates is promoted to the full release. So
rc11 is also the same as the official v1.0.0 release.

- Patrick

On Wed, Jun 4, 2014 at 8:29 PM, Rahul Singhal rahul.sing...@guavus.com wrote:
 Could someone please clarify my confusion or is this not an issue that we
 should be concerned about?

 Thanks,
 Rahul Singhal





 On 30/05/14 5:28 PM, Rahul Singhal rahul.sing...@guavus.com wrote:

Is it intentional/ok that the tag v1.0.0 is behind tag v1.0.0-rc11?


Thanks,
Rahul Singhal





On 30/05/14 3:43 PM, Patrick Wendell pwend...@gmail.com wrote:

I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0
is a milestone release as the first in the 1.0 line of releases,
providing API stability for Spark's core interfaces.

Spark 1.0.0 is Spark's largest release ever, with contributions from
117 developers. I'd like to thank everyone involved in this release -
it was truly a community effort with fixes, features, and
optimizations contributed from dozens of organizations.

This release expands Spark's standard libraries, introducing a new SQL
package (SparkSQL) which lets users integrate SQL queries into
existing Spark workflows. MLlib, Spark's machine learning library, is
expanded with sparse vector support and several new algorithms. The
GraphX and Streaming libraries also introduce new features and
optimizations. Spark's core engine adds support for secured YARN
clusters, a unified tool for submitting Spark applications, and
several performance and stability improvements. Finally, Spark adds
support for Java 8 lambda syntax and improves coverage of the Java and
Python API's.

Those features only scratch the surface - check out the release notes
here:
http://spark.apache.org/releases/spark-release-1-0-0.html

Note that since release artifacts were posted recently, certain
mirrors may not have working downloads for a few hours.

- Patrick




Re: What is the correct Spark version of master/branch-1.0?

2014-06-04 Thread Takuya UESHIN
Thank you for your reply.

I've sent pull requests.


Thanks.


2014-06-05 3:16 GMT+09:00 Patrick Wendell pwend...@gmail.com:
 It should be 1.1-SNAPSHOT. Feel free to submit a PR to clean up any
 inconsistencies.

 On Tue, Jun 3, 2014 at 8:33 PM, Takuya UESHIN ues...@happy-camper.st wrote:
 Hi all,

 I'm wondering what is the correct Spark version of each HEAD of master
 and branch-1.0.

 current master HEAD (e8d93ee5284cb6a1d4551effe91ee8d233323329):
 - pom.xml: 1.0.0-SNAPSHOT
 - SparkBuild.scala: 1.1.0-SNAPSHOT

 It should be 1.1.0-SNAPSHOT?


 current branch-1.0 HEAD (d96794132e37cf57f8dd945b9d11f8adcfc30490):
 - pom.xml: 1.0.1-SNAPSHOT
 - SparkBuild.scala: 1.0.0

 It should be 1.0.1-SNAPSHOT?


 Thanks.

 --
 Takuya UESHIN
 Tokyo, Japan

 http://twitter.com/ueshin



-- 
Takuya UESHIN
Tokyo, Japan

http://twitter.com/ueshin