date:20140219

[GitHub] incubator-spark pull request: For SPARK-1082, Use Curator for ZK i...

2014-02-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/611#issuecomment-35474696
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12775/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

Re: coding style discussion: explicit return type in public APIs

2014-02-19 Thread Mridul Muralidharan

You are right.
A degenerate case would be :

def createFoo = new FooImpl()

vs

def createFoo: Foo = new FooImpl()

Former will cause api instability. Reynold, maybe this is already
avoided - and I understood it wrong ?

Thanks,
Mridul



On Wed, Feb 19, 2014 at 12:44 PM, Christopher Nguyen c...@adatao.com wrote:
 Mridul, IIUUC, what you've mentioned did come to mind, but I deemed it
 orthogonal to the stylistic issue Reynold is talking about.

 I believe you're referring to the case where there is a specific desired
 return type by API design, but the implementation does not, in which case,
 of course, one must define the return type. That's an API requirement and
 not just a matter of readability.

 We could add this as an NB in the proposed guideline.

 --
 Christopher T. Nguyen
 Co-founder  CEO, Adatao http://adatao.com
 linkedin.com/in/ctnguyen



 On Tue, Feb 18, 2014 at 10:40 PM, Reynold Xin r...@databricks.com wrote:

 +1 Christopher's suggestion.

 Mridul,

 How would that happen? Case 3 requires the method to be invoking the
 constructor directly. It was implicit in my email, but the return type
 should be the same as the class itself.




 On Tue, Feb 18, 2014 at 10:37 PM, Mridul Muralidharan mri...@gmail.com
 wrote:

  Case 3 can be a potential issue.
  Current implementation might be returning a concrete class which we
  might want to change later - making it a type change.
  The intention might be to return an RDD (for example), but the
  inferred type might be a subclass of RDD - and future changes will
  cause signature change.
 
 
  Regards,
  Mridul
 
 
  On Wed, Feb 19, 2014 at 11:52 AM, Reynold Xin r...@databricks.com
 wrote:
   Hi guys,
  
   Want to bring to the table this issue to see what other members of the
   community think and then we can codify it in the Spark coding style
  guide.
   The topic is about declaring return types explicitly in public APIs.
  
   In general I think we should favor explicit type declaration in public
   APIs. However, I do think there are 3 cases we can avoid the public API
   definition because in these 3 cases the types are self-evident 
  repetitive.
  
   Case 1. toString
  
   Case 2. A method returning a string or a val defining a string
  
   def name = abcd // this is so obvious that it is a string
   val name = edfg // this too
  
   Case 3. The method or variable is invoking the constructor of a class
 and
   return that immediately. For example:
  
   val a = new SparkContext(...)
   implicit def rddToAsyncRDDActions[T: ClassTag](rdd: RDD[T]) = new
   AsyncRDDActions(rdd)
  
  
   Thoughts?

[GitHub] incubator-spark pull request: MLLIB-24: url of Collaborative Filt...

2014-02-19 Thread CrazyJvm

Github user CrazyJvm commented on the pull request:

https://github.com/apache/incubator-spark/pull/619#issuecomment-35476826
  
There seems no problem to use yahoo link. Or you are worried about the link 
might be invalid again?
@mengxr  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: Spark 1095 : Adding explicit return ...

2014-02-19 Thread NirmalReddy

Github user NirmalReddy commented on the pull request:

https://github.com/apache/incubator-spark/pull/610#issuecomment-35480180
  
@aarondav With this last commit i suppose i have completed the 
issue.(Spark-1095)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

Spark 0.9.0

2014-02-19 Thread Gino Mathews

Hi,

I am trying to use Apache spark on a Standalone cluster.
After downloading the Spark I tried to build the package. However I am getting 
following error for the normal build using default Hadoop:

gino@gino008:~/Downloads/spark-0.9.0-incubating$ sbt assembly
Loading /usr/share/sbt/bin/sbt-launch-lib.bash
[info] Loading project definition from 
/home/gino/Downloads/spark-0.9.0-incubating/project/project
[info] Updating 
{file:/home/gino/Downloads/spark-0.9.0-incubating/project/project/}default-5f2b58...
[info] Resolving org.scala-lang#scala-library;2.9.2 ...
[error] Server access Error: Connection reset 
url=http://repo.typesafe.com/typesafe/ivy-releases/org.scala-lang/scala-library/2.9.2/jars/scala-library.jar
[error] Server access Error: Connection reset 
url=http://scalasbt.artifactoryonline.com/scalasbt/sbt-plugin-releases/org.scala-lang/scala-library/2.9.2/jars/scala-library.jar
[error] Server access Error: Connection reset 
url=http://repo1.maven.org/maven2/org/scala-lang/scala-library/2.9.2/scala-library-2.9.2.jar
[info] Resolving org.scala-sbt#control;0.12.4 ...

truncated-

I am getting following error for the normal build using  Hadoop 2.2.0:



gino@gino008:~/Downloads/spark-0.9.0-incubating$ SPARK_HADOOP_VERSION=2.2.0 sbt 
assembly
Loading /usr/share/sbt/bin/sbt-launch-lib.bash
[info] Loading project definition from 
/home/gino/Downloads/spark-0.9.0-incubating/project/project
[info] Updating 
{file:/home/gino/Downloads/spark-0.9.0-incubating/project/project/}default-5f2b58...
[info] Resolving org.scala-lang#scala-compiler;2.9.2 ...
[error] Server access Error: Connection reset 
url=http://repo.typesafe.com/typesafe/ivy-releases/org.scala-lang/scala-compiler/2.9.2/jars/scala-compiler.jar
[error] Server access Error: Connection reset 
url=http://scalasbt.artifactoryonline.com/scalasbt/sbt-plugin-releases/org.scala-lang/scala-compiler/2.9.2/jars/scala-compiler.jar
[error] Server access Error: Connection reset 
url=http://repo1.maven.org/maven2/org/scala-lang/scala-compiler/2.9.2/scala-compiler-2.9.2.jar
[info] Resolving org.sonatype.oss#oss-parent;7 ...
[error] Server access Error: Connection reset 
url=http://repo.typesafe.com/typesafe/ivy-releases/org.sonatype.oss/oss-parent/7/jars/oss-parent.jar
[error] Server access Error: Connection reset 
url=http://scalasbt.artifactoryonline.com/scalasbt/sbt-plugin-releases/org.sonatype.oss/oss-parent/7/jars/oss-parent.jar
[error] Server access Error: Connection reset 
url=http://repo1.maven.org/maven2/org/sonatype/oss/oss-parent/7/oss-parent-7.jar
[error] Server access Error: Connection reset 
url=http://repo.typesafe.com/typesafe/ivy-releases/jline/jline/1.0/jars/jline.jar
[error] Server access Error: Connection reset 
url=http://scalasbt.artifactoryonline.com/scalasbt/sbt-plugin-releases/jline/jline/1.0/jars/jline.jar
[error] Server access Error: Connection reset 
url=http://repo1.maven.org/maven2/jline/jline/1.0/jline-1.0.jar
[info] Resolving org.scala-sbt#api;0.12.4 ...

--truncated-


Please guide how to download the maven repositories.

Thanks in Advance

Gino Mathews K

[GitHub] incubator-spark pull request: [java8API] SPARK-964 Investigate the...

2014-02-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/539#issuecomment-35491849
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1094] Support MiMa for report...

2014-02-19 Thread ScrapCodes

Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/incubator-spark/pull/585#discussion_r9863061
  
--- Diff: project/MimaBuild.scala ---
@@ -0,0 +1,115 @@
+import com.typesafe.tools.mima.plugin.MimaKeys.{binaryIssueFilters, 
previousArtifact}
+import com.typesafe.tools.mima.plugin.MimaPlugin.mimaDefaultSettings
+
+object MimaBuild {
+
+  val ignoredABIProblems = {
+import com.typesafe.tools.mima.core._
+import com.typesafe.tools.mima.core.ProblemFilters._
+/**
+ * A: Detections likely to become semi private at some point.
+ */
+
Seq(exclude[MissingClassProblem](org.apache.spark.util.XORShiftRandom),
+  
exclude[MissingClassProblem](org.apache.spark.util.XORShiftRandom$),
+  
exclude[MissingMethodProblem](org.apache.spark.util.Utils.cloneWritables),
+  
exclude[MissingMethodProblem](org.apache.spark.util.collection.ExternalAppendOnlyMap#DiskMapIterator.nextItem_=),
--- End diff --

exclude for a class does not work I suppose. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: Deprecated and added a few java api ...

2014-02-19 Thread ScrapCodes

Github user ScrapCodes commented on the pull request:

https://github.com/apache/incubator-spark/pull/402#issuecomment-35494734
  
@pwendell are you okay with the changes ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [java8API] SPARK-964 Investigate the...

2014-02-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/539#issuecomment-35496756
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [java8API] SPARK-964 Investigate the...

2014-02-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/539#issuecomment-35496755
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1094] Support MiMa for report...

2014-02-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/585#issuecomment-35496818
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1094] Support MiMa for report...

2014-02-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/585#issuecomment-35496819
  
One or more automated tests failed
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12778/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1094] Support MiMa for report...

2014-02-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/585#issuecomment-35496946
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1094] Support MiMa for report...

2014-02-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/585#issuecomment-35496947
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: MLLIB-24: url of Collaborative Filt...

2014-02-19 Thread CodingCat

Github user CodingCat commented on the pull request:

https://github.com/apache/incubator-spark/pull/619#issuecomment-35497321
  
@mengxr DOI link may not be accessible to non-paid users, I think yahoo 
research is relatively stable enough


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [java8API] SPARK-964 Investigate the...

2014-02-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/539#issuecomment-35501417
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1094] Support MiMa for report...

2014-02-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/585#issuecomment-35501415
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12780/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: Add Security to Spark - Akka, Http, ...

2014-02-19 Thread tgravescs

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/incubator-spark/pull/332#discussion_r9868173
  
--- Diff: core/src/main/java/org/apache/spark/SparkSaslServer.java ---
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark;
+
+import org.apache.commons.net.util.Base64;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Map;
+import java.util.TreeMap;
+
+import javax.security.auth.callback.Callback;
+import javax.security.auth.callback.CallbackHandler;
+import javax.security.auth.callback.NameCallback;
+import javax.security.auth.callback.PasswordCallback;
+import javax.security.auth.callback.UnsupportedCallbackException;
+import javax.security.sasl.AuthorizeCallback;
+import javax.security.sasl.RealmCallback;
+import javax.security.sasl.Sasl;
+import javax.security.sasl.SaslException;
+import javax.security.sasl.SaslServer;
+import java.io.IOException;
+
+/**
+ * Encapsulates SASL server logic for Server
+ */
+public class SparkSaslServer {
+  /** Logger */
+  private static Logger LOG = 
LoggerFactory.getLogger(SparkSaslServer.class);
+
+  /**
+   * Actual SASL work done by this object from javax.security.sasl.
+   * Initialized below in constructor.
+   */
+  private SaslServer saslServer;
+
+  public static final String SASL_DEFAULT_REALM = default;
--- End diff --

Yeah that code was specifically copied from Hadoop 0.23.  I'll leave it for 
now and we can make it configurable in the next round of changes to add more 
configurability. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1059. Now that we submit core ...

2014-02-19 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/incubator-spark/pull/555#issuecomment-35513822
  
@sryza can this be closed then?  I think the important note you added to 
the running on yarn about the cores will suffice alone with my security PR.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [java8API] SPARK-964 Investigate the...

2014-02-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/539#issuecomment-35514615
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1105] fix site scala version ...

2014-02-19 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/incubator-spark/pull/618#discussion_r9870730
  
--- Diff: docs/index.md ---
@@ -19,7 +19,7 @@ Spark uses [Simple Build Tool](http://www.scala-sbt.org), 
which is bundled with
 
 sbt/sbt assembly
 
-For its Scala API, Spark {{site.SPARK_VERSION}} depends on Scala 
{{site.SCALA_VERSION}}. If you write applications in Scala, you will need to 
use this same version of Scala in your own program -- newer major versions may 
not work. You can get the right version of Scala from 
[scala-lang.org](http://www.scala-lang.org/download/).
+For its Scala API, Spark {{site.SPARK_VERSION}} depends on Scala 
{{site.SCALA_BINARY_VERSION}}. If you write applications in Scala, you will 
need to use this same version of Scala in your own program -- newer major 
versions may not work. You can get the right version of Scala from 
[scala-lang.org](http://www.scala-lang.org/download/).
--- End diff --

To make this more clear, it might be good to say:

If you write applications in Scala, you will need to use a compatible 
Scala version (e.g. {{site.SCALA_BINARY_VERSION}}.X) -- newer major versions 
may not work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1105] fix site scala version ...

2014-02-19 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/incubator-spark/pull/618#issuecomment-35516252
  
LGTM pending a small fix -- @aarondav want to take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

Re: coding style discussion: explicit return type in public APIs

2014-02-19 Thread Patrick Wendell

+1 overall.

Christopher - I agree that once the number of rules becomes large it's
more efficient to pursue a use your judgement approach. However,
since this is only 3 cases I'd prefer to wait to see if it grows.

The concern with this approach is that for newer people, contributors,
etc it's hard for them to understand what good judgement is. Many are
new to scala, so explicit rules are generally better.

- Patrick

On Wed, Feb 19, 2014 at 12:19 AM, Reynold Xin r...@databricks.com wrote:
 Yes, the case you brought up is not a matter of readability or style. If it
 returns a different type, it should be declared (otherwise it is just
 wrong).


 On Wed, Feb 19, 2014 at 12:17 AM, Mridul Muralidharan mri...@gmail.comwrote:

 You are right.
 A degenerate case would be :

 def createFoo = new FooImpl()

 vs

 def createFoo: Foo = new FooImpl()

 Former will cause api instability. Reynold, maybe this is already
 avoided - and I understood it wrong ?

 Thanks,
 Mridul



 On Wed, Feb 19, 2014 at 12:44 PM, Christopher Nguyen c...@adatao.com
 wrote:
  Mridul, IIUUC, what you've mentioned did come to mind, but I deemed it
  orthogonal to the stylistic issue Reynold is talking about.
 
  I believe you're referring to the case where there is a specific desired
  return type by API design, but the implementation does not, in which
 case,
  of course, one must define the return type. That's an API requirement and
  not just a matter of readability.
 
  We could add this as an NB in the proposed guideline.
 
  --
  Christopher T. Nguyen
  Co-founder  CEO, Adatao http://adatao.com
  linkedin.com/in/ctnguyen
 
 
 
  On Tue, Feb 18, 2014 at 10:40 PM, Reynold Xin r...@databricks.com
 wrote:
 
  +1 Christopher's suggestion.
 
  Mridul,
 
  How would that happen? Case 3 requires the method to be invoking the
  constructor directly. It was implicit in my email, but the return type
  should be the same as the class itself.
 
 
 
 
  On Tue, Feb 18, 2014 at 10:37 PM, Mridul Muralidharan mri...@gmail.com
  wrote:
 
   Case 3 can be a potential issue.
   Current implementation might be returning a concrete class which we
   might want to change later - making it a type change.
   The intention might be to return an RDD (for example), but the
   inferred type might be a subclass of RDD - and future changes will
   cause signature change.
  
  
   Regards,
   Mridul
  
  
   On Wed, Feb 19, 2014 at 11:52 AM, Reynold Xin r...@databricks.com
  wrote:
Hi guys,
   
Want to bring to the table this issue to see what other members of
 the
community think and then we can codify it in the Spark coding style
   guide.
The topic is about declaring return types explicitly in public APIs.
   
In general I think we should favor explicit type declaration in
 public
APIs. However, I do think there are 3 cases we can avoid the public
 API
definition because in these 3 cases the types are self-evident 
   repetitive.
   
Case 1. toString
   
Case 2. A method returning a string or a val defining a string
   
def name = abcd // this is so obvious that it is a string
val name = edfg // this too
   
Case 3. The method or variable is invoking the constructor of a
 class
  and
return that immediately. For example:
   
val a = new SparkContext(...)
implicit def rddToAsyncRDDActions[T: ClassTag](rdd: RDD[T]) = new
AsyncRDDActions(rdd)
   
   
Thoughts?

[GitHub] incubator-spark pull request: [java8API] SPARK-964 Investigate the...

2014-02-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/539#issuecomment-35521749
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: For SPARK-1082, Use Curator for ZK i...

2014-02-19 Thread aarondav

Github user aarondav commented on a diff in the pull request:

https://github.com/apache/incubator-spark/pull/611#discussion_r9875482
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/master/ZooKeeperLeaderElectionAgent.scala
 ---
@@ -18,105 +18,73 @@
 package org.apache.spark.deploy.master
 
 import akka.actor.ActorRef
-import org.apache.zookeeper._
-import org.apache.zookeeper.Watcher.Event.EventType
 
 import org.apache.spark.{SparkConf, Logging}
 import org.apache.spark.deploy.master.MasterMessages._
+import org.apache.curator.framework.CuratorFramework
+import org.apache.curator.framework.recipes.leader.{LeaderLatchListener, 
LeaderLatch}
 
 private[spark] class ZooKeeperLeaderElectionAgent(val masterActor: 
ActorRef,
 masterUrl: String, conf: SparkConf)
-  extends LeaderElectionAgent with SparkZooKeeperWatcher with Logging  {
+  extends LeaderElectionAgent with LeaderLatchListener with Logging  {
 
   val WORKING_DIR = conf.get(spark.deploy.zookeeper.dir, /spark) + 
/leader_election
 
-  private val watcher = new ZooKeeperWatcher()
-  private val zk = new SparkZooKeeperSession(this, conf)
+  private var zk: CuratorFramework = _
+  private var leaderLatch: LeaderLatch = _
   private var status = LeadershipStatus.NOT_LEADER
-  private var myLeaderFile: String = _
-  private var leaderUrl: String = _
 
   override def preStart() {
+
 logInfo(Starting ZooKeeper LeaderElection agent)
-zk.connect()
-  }
+zk = SparkCuratorUtil.newClient(conf)
+leaderLatch = new LeaderLatch(zk, WORKING_DIR)
+leaderLatch.addListener(this)
 
-  override def zkSessionCreated() {
-synchronized {
-  zk.mkdirRecursive(WORKING_DIR)
-  myLeaderFile =
-zk.create(WORKING_DIR + /master_, masterUrl.getBytes, 
CreateMode.EPHEMERAL_SEQUENTIAL)
-  self ! CheckLeader
-}
+leaderLatch.start()
   }
 
   override def preRestart(reason: scala.Throwable, message: 
scala.Option[scala.Any]) {
-logError(LeaderElectionAgent failed, waiting  + zk.ZK_TIMEOUT_MILLIS 
+ ..., reason)
-Thread.sleep(zk.ZK_TIMEOUT_MILLIS)
+logError(LeaderElectionAgent failed..., reason)
 super.preRestart(reason, message)
   }
 
-  override def zkDown() {
-logError(ZooKeeper down! LeaderElectionAgent shutting down Master.)
-System.exit(1)
-  }
-
   override def postStop() {
+leaderLatch.close()
 zk.close()
   }
 
   override def receive = {
-case CheckLeader = checkLeader()
+case _ =
   }
 
-  private class ZooKeeperWatcher extends Watcher {
-def process(event: WatchedEvent) {
-  if (event.getType == EventType.NodeDeleted) {
-logInfo(Leader file disappeared, a master is down!)
-self ! CheckLeader
+  override def isLeader() {
+// In case that leadship gain and lost in a short time.
+Thread.sleep(1000)
--- End diff --

Ah, sorry if I was unclear, but I was just joking about putting a 
sleep(1000) in here. The real solution is to add a synchronized block to 
isLeader and notLeader -- I was just making a point that we're not concerned 
with the overhead of synchronization in this code path. (The synchronized block 
is not needed with the current implementation and use of Curator, but I think 
it makes the code clearer without a real downside.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1105] fix site scala version ...

2014-02-19 Thread aarondav

Github user aarondav commented on a diff in the pull request:

https://github.com/apache/incubator-spark/pull/618#discussion_r9875739
  
--- Diff: docs/scala-programming-guide.md ---
@@ -17,12 +17,12 @@ This guide shows each of these features and walks 
through some samples. It assum
 
 # Linking with Spark
 
-Spark {{site.SPARK_VERSION}} uses Scala {{site.SCALA_VERSION}}. If you 
write applications in Scala, you'll need to use this same version of Scala in 
your program -- newer major versions may not work.
+Spark {{site.SPARK_VERSION}} uses Scala {{site.SCALA_BINARY_VERSION}}. If 
you write applications in Scala, you'll need to use this same version of Scala 
in your program -- newer major versions may not work.
--- End diff --

I suppose we should repeat Patrick's comment here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: Add Security to Spark - Akka, Http, ...

2014-02-19 Thread tgravescs

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/incubator-spark/pull/332#discussion_r9875857
  
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import org.apache.hadoop.io.Text
+import org.apache.hadoop.security.Credentials
+import org.apache.hadoop.security.UserGroupInformation
+
+import org.apache.spark.deploy.SparkHadoopUtil
+
+/** 
+ * Spark class responsible for security.  
+ */
+private[spark] class SecurityManager extends Logging {
+
+  private val isAuthOn = System.getProperty(spark.authenticate, 
false).toBoolean
+  private val isUIAuthOn = System.getProperty(spark.authenticate.ui, 
false).toBoolean
+  private val viewAcls = System.getProperty(spark.ui.view.acls, 
).split(',').map(_.trim()).toSet
+  private val secretKey = generateSecretKey()
+  logDebug(is auth enabled =  + isAuthOn +  is uiAuth enabled =  + 
isUIAuthOn)
+ 
+  /**
+   * In Yarn mode it uses Hadoop UGI to pass the secret as that
+   * will keep it protected.  For a standalone SPARK cluster
+   * use a environment variable SPARK_SECRET to specify the secret.
+   * This probably isn't ideal but only the user who starts the process
+   * should have access to view the variable (at least on Linux).
+   * Since we can't set the environment variable we set the 
+   * java system property SPARK_SECRET so it will automatically
+   * generate a secret is not specified.  This definitely is not
+   * ideal since users can see it. We should switch to put it in 
+   * a config.
+   */
+  private def generateSecretKey(): String = {
+
+if (!isAuthenticationEnabled) return null
+// first check to see if secret already set, else generate it
+if (SparkHadoopUtil.get.isYarnMode) {
+  val credentials = SparkHadoopUtil.get.getCurrentUserCredentials()
+  if (credentials != null) { 
+val secretKey = credentials.getSecretKey(new Text(akkaCookie))
+if (secretKey != null) {
+  logDebug(in yarn mode, getting secret from credentials)
+  return new Text(secretKey).toString
+} else {
+  logDebug(getSecretKey: yarn mode, secret key from credentials 
is null)
+}
+  } else {
+logDebug(getSecretKey: yarn mode, credentials are null)
+  }
+}
+val secret = System.getProperty(SPARK_SECRET, 
System.getenv(SPARK_SECRET)) 
+if (secret != null  !secret.isEmpty()) return secret 
+// generate one 
+val sCookie = akka.util.Crypt.generateSecureCookie
+
+// if we generate we must be the first so lets set it so its used by 
everyone else
+if (SparkHadoopUtil.get.isYarnMode) {
+  val creds = new Credentials()
+  creds.addSecretKey(new Text(akkaCookie), sCookie.getBytes())
+  SparkHadoopUtil.get.addCurrentUserCredentials(creds)
+  logDebug(adding secret to credentials yarn mode)
+} else {
+  System.setProperty(SPARK_SECRET, sCookie)
+  logDebug(adding secret to java property)
+}
+return sCookie
+  }
+
+  def isUIAuthenticationEnabled(): Boolean = return isUIAuthOn 
+
+  // allow anyone in the acl list and the application owner 
+  def checkUIViewPermissions(user: String): Boolean = {
+if (isUIAuthenticationEnabled()  (user != null)) {
+  if ((!viewAcls.contains(user))  (user != 
System.getProperty(user.name))) {
--- End diff --

Good idea to just prepopulate it.  I assume its safer just to add both 
user.name and SPARK_USER to acl list if they are set?  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working,

[GitHub] incubator-spark pull request: Add Security to Spark - Akka, Http, ...

2014-02-19 Thread tgravescs

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/incubator-spark/pull/332#discussion_r9875875
  
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import org.apache.hadoop.io.Text
+import org.apache.hadoop.security.Credentials
+import org.apache.hadoop.security.UserGroupInformation
+
+import org.apache.spark.deploy.SparkHadoopUtil
+
+/** 
+ * Spark class responsible for security.  
+ */
+private[spark] class SecurityManager extends Logging {
+
+  private val isAuthOn = System.getProperty(spark.authenticate, 
false).toBoolean
+  private val isUIAuthOn = System.getProperty(spark.authenticate.ui, 
false).toBoolean
+  private val viewAcls = System.getProperty(spark.ui.view.acls, 
).split(',').map(_.trim()).toSet
+  private val secretKey = generateSecretKey()
+  logDebug(is auth enabled =  + isAuthOn +  is uiAuth enabled =  + 
isUIAuthOn)
+ 
+  /**
+   * In Yarn mode it uses Hadoop UGI to pass the secret as that
+   * will keep it protected.  For a standalone SPARK cluster
+   * use a environment variable SPARK_SECRET to specify the secret.
+   * This probably isn't ideal but only the user who starts the process
+   * should have access to view the variable (at least on Linux).
+   * Since we can't set the environment variable we set the 
+   * java system property SPARK_SECRET so it will automatically
+   * generate a secret is not specified.  This definitely is not
+   * ideal since users can see it. We should switch to put it in 
+   * a config.
+   */
+  private def generateSecretKey(): String = {
+
+if (!isAuthenticationEnabled) return null
+// first check to see if secret already set, else generate it
+if (SparkHadoopUtil.get.isYarnMode) {
+  val credentials = SparkHadoopUtil.get.getCurrentUserCredentials()
+  if (credentials != null) { 
+val secretKey = credentials.getSecretKey(new Text(akkaCookie))
+if (secretKey != null) {
+  logDebug(in yarn mode, getting secret from credentials)
+  return new Text(secretKey).toString
+} else {
+  logDebug(getSecretKey: yarn mode, secret key from credentials 
is null)
+}
+  } else {
+logDebug(getSecretKey: yarn mode, credentials are null)
+  }
+}
+val secret = System.getProperty(SPARK_SECRET, 
System.getenv(SPARK_SECRET)) 
+if (secret != null  !secret.isEmpty()) return secret 
+// generate one 
+val sCookie = akka.util.Crypt.generateSecureCookie
+
+// if we generate we must be the first so lets set it so its used by 
everyone else
+if (SparkHadoopUtil.get.isYarnMode) {
+  val creds = new Credentials()
+  creds.addSecretKey(new Text(akkaCookie), sCookie.getBytes())
--- End diff --

yep, I'll update.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: Add Security to Spark - Akka, Http, ...

2014-02-19 Thread tgravescs

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/incubator-spark/pull/332#discussion_r9875896
  
--- Diff: core/src/main/scala/org/apache/spark/network/Connection.scala ---
@@ -431,6 +466,7 @@ private[spark] class ReceivingConnection(channel_ : 
SocketChannel, selector_ : S
 val newMessage = Message.create(header).asInstanceOf[BufferMessage]
 newMessage.started = true
 newMessage.startTime = System.currentTimeMillis
+newMessage.isSecurityNeg = if (header.securityNeg == 1) true else 
false
--- End diff --

ah, ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1105] fix site scala version ...

2014-02-19 Thread CodingCat

Github user CodingCat commented on the pull request:

https://github.com/apache/incubator-spark/pull/618#issuecomment-35530650
  
thank you very much for your comments @pwendell @aarondav 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

Re: coding style discussion: explicit return type in public APIs

2014-02-19 Thread Christopher Nguyen

Patrick, I sympathize with your sensibility here, and at face value,
there's very little daylight between (a) a rule comprising small set of
enumerated items and (b) a guideline followed by the same set as examples.

My suggestion had a non-obvious tl;dr thesis behind it, so allow me to show
my cards :)

First, rules can be costly for the rule makers to create and maintain to
ensure necessity and sufficiency, and can unintentionally encourage
mischievous, often tedious arguments to work around those rules. In the
area of coding style, even at Google (at least when I was there), we had
guides rather than rules. It turns out that guidelines are also easier to
socialize and enforce than enumerated rules. strawman_humorA Google
search for coding style guide returns 15.7 million results, while that
for coding style rule has 6.6M, and most of *those* are articles about
coding style guide./strawman_humor

More importantly, I've found Spark's to be one of the best
socially-engineered communities I've participated in. It is quite helpful
and welcoming to newcomers while (not paradoxically) comprising one of the
highest median quality of participants, per my calibration of, e.g., the
various meetups I've gone to in the SF Bay Area. This community
friendliness and mutual regard are not accidental and have contributed in
part to Spark's success to date. It seems quite tolerant of newbies and
implicitly recognizes that there may be a lot of valuable expertise and
interesting use cases we can learn from the person behind that
idiotic-sounding question, who might go on to contribute valuable PRs. I've
yet to see the acronym RTFM used in anger here. Now, rules don't
automatically negate that, but they can be discouraging to navigate (Have
I broken some rule?) and misused as devices to shoot others (You've just
broken our rule #178.S4.P2). I'd rather see those things kept to a minimum,
in locked cabinets.

For the above reasons, I would suggest, for Spark, guidelines over rules
whenever feasible  tolerable, certainly in the area of coding style.

Cheers,
--
Christopher T. Nguyen
Co-founder  CEO, Adatao http://adatao.com
linkedin.com/in/ctnguyen



On Wed, Feb 19, 2014 at 8:37 AM, Patrick Wendell pwend...@gmail.com wrote:

 +1 overall.

 Christopher - I agree that once the number of rules becomes large it's
 more efficient to pursue a use your judgement approach. However,
 since this is only 3 cases I'd prefer to wait to see if it grows.

 The concern with this approach is that for newer people, contributors,
 etc it's hard for them to understand what good judgement is. Many are
 new to scala, so explicit rules are generally better.

 - Patrick

 On Wed, Feb 19, 2014 at 12:19 AM, Reynold Xin r...@databricks.com wrote:
  Yes, the case you brought up is not a matter of readability or style. If
 it
  returns a different type, it should be declared (otherwise it is just
  wrong).
 
 
  On Wed, Feb 19, 2014 at 12:17 AM, Mridul Muralidharan mri...@gmail.com
 wrote:
 
  You are right.
  A degenerate case would be :
 
  def createFoo = new FooImpl()
 
  vs
 
  def createFoo: Foo = new FooImpl()
 
  Former will cause api instability. Reynold, maybe this is already
  avoided - and I understood it wrong ?
 
  Thanks,
  Mridul
 
 
 
  On Wed, Feb 19, 2014 at 12:44 PM, Christopher Nguyen c...@adatao.com
  wrote:
   Mridul, IIUUC, what you've mentioned did come to mind, but I deemed it
   orthogonal to the stylistic issue Reynold is talking about.
  
   I believe you're referring to the case where there is a specific
 desired
   return type by API design, but the implementation does not, in which
  case,
   of course, one must define the return type. That's an API requirement
 and
   not just a matter of readability.
  
   We could add this as an NB in the proposed guideline.
  
   --
   Christopher T. Nguyen
   Co-founder  CEO, Adatao http://adatao.com
   linkedin.com/in/ctnguyen
  
  
  
   On Tue, Feb 18, 2014 at 10:40 PM, Reynold Xin r...@databricks.com
  wrote:
  
   +1 Christopher's suggestion.
  
   Mridul,
  
   How would that happen? Case 3 requires the method to be invoking the
   constructor directly. It was implicit in my email, but the return
 type
   should be the same as the class itself.
  
  
  
  
   On Tue, Feb 18, 2014 at 10:37 PM, Mridul Muralidharan 
 mri...@gmail.com
   wrote:
  
Case 3 can be a potential issue.
Current implementation might be returning a concrete class which we
might want to change later - making it a type change.
The intention might be to return an RDD (for example), but the
inferred type might be a subclass of RDD - and future changes will
cause signature change.
   
   
Regards,
Mridul
   
   
On Wed, Feb 19, 2014 at 11:52 AM, Reynold Xin r...@databricks.com
 
   wrote:
 Hi guys,

 Want to bring to the table this issue to see what other members
 of
  the
 community think and then we can codify it in the Spark coding
 style
guide.
 The

Re: coding style discussion: explicit return type in public APIs

2014-02-19 Thread Mridul Muralidharan

Without bikeshedding this too much ... It is likely incorrect (not wrong) -
and rules like this potentially cause things to slip through.

Explicit return type strictly specifies what is being exposed (think in
face of impl change - createFoo changes in future from Foo to Foo1 or Foo2)
.. being conservative about how to specify exposed interfaces, imo,
outweighs potential gains in breveity of code.
Btw this is a degenerate contrieved example already stretching its use ...

Regards
Mridul

Regards
Mridul
On Feb 19, 2014 1:49 PM, Reynold Xin r...@databricks.com wrote:

 Yes, the case you brought up is not a matter of readability or style. If it
 returns a different type, it should be declared (otherwise it is just
 wrong).


 On Wed, Feb 19, 2014 at 12:17 AM, Mridul Muralidharan mri...@gmail.com
 wrote:

  You are right.
  A degenerate case would be :
 
  def createFoo = new FooImpl()
 
  vs
 
  def createFoo: Foo = new FooImpl()
 
  Former will cause api instability. Reynold, maybe this is already
  avoided - and I understood it wrong ?
 
  Thanks,
  Mridul
 
 
 
  On Wed, Feb 19, 2014 at 12:44 PM, Christopher Nguyen c...@adatao.com
  wrote:
   Mridul, IIUUC, what you've mentioned did come to mind, but I deemed it
   orthogonal to the stylistic issue Reynold is talking about.
  
   I believe you're referring to the case where there is a specific
 desired
   return type by API design, but the implementation does not, in which
  case,
   of course, one must define the return type. That's an API requirement
 and
   not just a matter of readability.
  
   We could add this as an NB in the proposed guideline.
  
   --
   Christopher T. Nguyen
   Co-founder  CEO, Adatao http://adatao.com
   linkedin.com/in/ctnguyen
  
  
  
   On Tue, Feb 18, 2014 at 10:40 PM, Reynold Xin r...@databricks.com
  wrote:
  
   +1 Christopher's suggestion.
  
   Mridul,
  
   How would that happen? Case 3 requires the method to be invoking the
   constructor directly. It was implicit in my email, but the return type
   should be the same as the class itself.
  
  
  
  
   On Tue, Feb 18, 2014 at 10:37 PM, Mridul Muralidharan 
 mri...@gmail.com
   wrote:
  
Case 3 can be a potential issue.
Current implementation might be returning a concrete class which we
might want to change later - making it a type change.
The intention might be to return an RDD (for example), but the
inferred type might be a subclass of RDD - and future changes will
cause signature change.
   
   
Regards,
Mridul
   
   
On Wed, Feb 19, 2014 at 11:52 AM, Reynold Xin r...@databricks.com
   wrote:
 Hi guys,

 Want to bring to the table this issue to see what other members of
  the
 community think and then we can codify it in the Spark coding
 style
guide.
 The topic is about declaring return types explicitly in public
 APIs.

 In general I think we should favor explicit type declaration in
  public
 APIs. However, I do think there are 3 cases we can avoid the
 public
  API
 definition because in these 3 cases the types are self-evident 
repetitive.

 Case 1. toString

 Case 2. A method returning a string or a val defining a string

 def name = abcd // this is so obvious that it is a string
 val name = edfg // this too

 Case 3. The method or variable is invoking the constructor of a
  class
   and
 return that immediately. For example:

 val a = new SparkContext(...)
 implicit def rddToAsyncRDDActions[T: ClassTag](rdd: RDD[T]) = new
 AsyncRDDActions(rdd)


 Thoughts?

Re: coding style discussion: explicit return type in public APIs

2014-02-19 Thread Aaron Davidson

One slight concern regarding primitive types -- in particular, Ints and
Longs can have semantic differences when it comes to overflow, so it's
often good to know what type of variable you're returning. Perhaps it is
sufficient to say that Int is the default numeric type, and that other
types should be specified explicitly.


On Wed, Feb 19, 2014 at 8:37 AM, Patrick Wendell pwend...@gmail.com wrote:

 +1 overall.

 Christopher - I agree that once the number of rules becomes large it's
 more efficient to pursue a use your judgement approach. However,
 since this is only 3 cases I'd prefer to wait to see if it grows.

 The concern with this approach is that for newer people, contributors,
 etc it's hard for them to understand what good judgement is. Many are
 new to scala, so explicit rules are generally better.

 - Patrick

 On Wed, Feb 19, 2014 at 12:19 AM, Reynold Xin r...@databricks.com wrote:
  Yes, the case you brought up is not a matter of readability or style. If
 it
  returns a different type, it should be declared (otherwise it is just
  wrong).
 
 
  On Wed, Feb 19, 2014 at 12:17 AM, Mridul Muralidharan mri...@gmail.com
 wrote:
 
  You are right.
  A degenerate case would be :
 
  def createFoo = new FooImpl()
 
  vs
 
  def createFoo: Foo = new FooImpl()
 
  Former will cause api instability. Reynold, maybe this is already
  avoided - and I understood it wrong ?
 
  Thanks,
  Mridul
 
 
 
  On Wed, Feb 19, 2014 at 12:44 PM, Christopher Nguyen c...@adatao.com
  wrote:
   Mridul, IIUUC, what you've mentioned did come to mind, but I deemed it
   orthogonal to the stylistic issue Reynold is talking about.
  
   I believe you're referring to the case where there is a specific
 desired
   return type by API design, but the implementation does not, in which
  case,
   of course, one must define the return type. That's an API requirement
 and
   not just a matter of readability.
  
   We could add this as an NB in the proposed guideline.
  
   --
   Christopher T. Nguyen
   Co-founder  CEO, Adatao http://adatao.com
   linkedin.com/in/ctnguyen
  
  
  
   On Tue, Feb 18, 2014 at 10:40 PM, Reynold Xin r...@databricks.com
  wrote:
  
   +1 Christopher's suggestion.
  
   Mridul,
  
   How would that happen? Case 3 requires the method to be invoking the
   constructor directly. It was implicit in my email, but the return
 type
   should be the same as the class itself.
  
  
  
  
   On Tue, Feb 18, 2014 at 10:37 PM, Mridul Muralidharan 
 mri...@gmail.com
   wrote:
  
Case 3 can be a potential issue.
Current implementation might be returning a concrete class which we
might want to change later - making it a type change.
The intention might be to return an RDD (for example), but the
inferred type might be a subclass of RDD - and future changes will
cause signature change.
   
   
Regards,
Mridul
   
   
On Wed, Feb 19, 2014 at 11:52 AM, Reynold Xin r...@databricks.com
 
   wrote:
 Hi guys,

 Want to bring to the table this issue to see what other members
 of
  the
 community think and then we can codify it in the Spark coding
 style
guide.
 The topic is about declaring return types explicitly in public
 APIs.

 In general I think we should favor explicit type declaration in
  public
 APIs. However, I do think there are 3 cases we can avoid the
 public
  API
 definition because in these 3 cases the types are self-evident 
repetitive.

 Case 1. toString

 Case 2. A method returning a string or a val defining a string

 def name = abcd // this is so obvious that it is a string
 val name = edfg // this too

 Case 3. The method or variable is invoking the constructor of a
  class
   and
 return that immediately. For example:

 val a = new SparkContext(...)
 implicit def rddToAsyncRDDActions[T: ClassTag](rdd: RDD[T]) = new
 AsyncRDDActions(rdd)


 Thoughts?

Re: coding style discussion: explicit return type in public APIs

2014-02-19 Thread Reynold Xin

Mridul,

Can you be more specific in the createFoo example?

def myFunc = createFoo

is disallowed in my guideline. It is invoking a function createFoo, not the
constructor of Foo.




On Wed, Feb 19, 2014 at 10:39 AM, Mridul Muralidharan mri...@gmail.comwrote:

 Without bikeshedding this too much ... It is likely incorrect (not wrong) -
 and rules like this potentially cause things to slip through.

 Explicit return type strictly specifies what is being exposed (think in
 face of impl change - createFoo changes in future from Foo to Foo1 or Foo2)
 .. being conservative about how to specify exposed interfaces, imo,
 outweighs potential gains in breveity of code.
 Btw this is a degenerate contrieved example already stretching its use ...

 Regards
 Mridul

 Regards
 Mridul
 On Feb 19, 2014 1:49 PM, Reynold Xin r...@databricks.com wrote:

  Yes, the case you brought up is not a matter of readability or style. If
 it
  returns a different type, it should be declared (otherwise it is just
  wrong).
 
 
  On Wed, Feb 19, 2014 at 12:17 AM, Mridul Muralidharan mri...@gmail.com
  wrote:
 
   You are right.
   A degenerate case would be :
  
   def createFoo = new FooImpl()
  
   vs
  
   def createFoo: Foo = new FooImpl()
  
   Former will cause api instability. Reynold, maybe this is already
   avoided - and I understood it wrong ?
  
   Thanks,
   Mridul
  
  
  
   On Wed, Feb 19, 2014 at 12:44 PM, Christopher Nguyen c...@adatao.com
   wrote:
Mridul, IIUUC, what you've mentioned did come to mind, but I deemed
 it
orthogonal to the stylistic issue Reynold is talking about.
   
I believe you're referring to the case where there is a specific
  desired
return type by API design, but the implementation does not, in which
   case,
of course, one must define the return type. That's an API requirement
  and
not just a matter of readability.
   
We could add this as an NB in the proposed guideline.
   
--
Christopher T. Nguyen
Co-founder  CEO, Adatao http://adatao.com
linkedin.com/in/ctnguyen
   
   
   
On Tue, Feb 18, 2014 at 10:40 PM, Reynold Xin r...@databricks.com
   wrote:
   
+1 Christopher's suggestion.
   
Mridul,
   
How would that happen? Case 3 requires the method to be invoking the
constructor directly. It was implicit in my email, but the return
 type
should be the same as the class itself.
   
   
   
   
On Tue, Feb 18, 2014 at 10:37 PM, Mridul Muralidharan 
  mri...@gmail.com
wrote:
   
 Case 3 can be a potential issue.
 Current implementation might be returning a concrete class which
 we
 might want to change later - making it a type change.
 The intention might be to return an RDD (for example), but the
 inferred type might be a subclass of RDD - and future changes will
 cause signature change.


 Regards,
 Mridul


 On Wed, Feb 19, 2014 at 11:52 AM, Reynold Xin 
 r...@databricks.com
wrote:
  Hi guys,
 
  Want to bring to the table this issue to see what other members
 of
   the
  community think and then we can codify it in the Spark coding
  style
 guide.
  The topic is about declaring return types explicitly in public
  APIs.
 
  In general I think we should favor explicit type declaration in
   public
  APIs. However, I do think there are 3 cases we can avoid the
  public
   API
  definition because in these 3 cases the types are self-evident 
 repetitive.
 
  Case 1. toString
 
  Case 2. A method returning a string or a val defining a string
 
  def name = abcd // this is so obvious that it is a string
  val name = edfg // this too
 
  Case 3. The method or variable is invoking the constructor of a
   class
and
  return that immediately. For example:
 
  val a = new SparkContext(...)
  implicit def rddToAsyncRDDActions[T: ClassTag](rdd: RDD[T]) =
 new
  AsyncRDDActions(rdd)
 
 
  Thoughts?

Re: coding style discussion: explicit return type in public APIs

2014-02-19 Thread Andrew Ash

I found Haskell's convention of including type signatures as documentation
to be worthwhile.

http://www.haskell.org/haskellwiki/Type_signatures_as_good_style

I'd support a guideline to include type signatures where they're unclear
but would prefer to leave it quite vague.  In my experience, the lightest
process is the best process for contributions.  Strict rules here _will_
drive away contributors.


On Wed, Feb 19, 2014 at 10:42 AM, Reynold Xin r...@databricks.com wrote:

 Mridul,

 Can you be more specific in the createFoo example?

 def myFunc = createFoo

 is disallowed in my guideline. It is invoking a function createFoo, not the
 constructor of Foo.




 On Wed, Feb 19, 2014 at 10:39 AM, Mridul Muralidharan mri...@gmail.com
 wrote:

  Without bikeshedding this too much ... It is likely incorrect (not
 wrong) -
  and rules like this potentially cause things to slip through.
 
  Explicit return type strictly specifies what is being exposed (think in
  face of impl change - createFoo changes in future from Foo to Foo1 or
 Foo2)
  .. being conservative about how to specify exposed interfaces, imo,
  outweighs potential gains in breveity of code.
  Btw this is a degenerate contrieved example already stretching its use
 ...
 
  Regards
  Mridul
 
  Regards
  Mridul
  On Feb 19, 2014 1:49 PM, Reynold Xin r...@databricks.com wrote:
 
   Yes, the case you brought up is not a matter of readability or style.
 If
  it
   returns a different type, it should be declared (otherwise it is just
   wrong).
  
  
   On Wed, Feb 19, 2014 at 12:17 AM, Mridul Muralidharan 
 mri...@gmail.com
   wrote:
  
You are right.
A degenerate case would be :
   
def createFoo = new FooImpl()
   
vs
   
def createFoo: Foo = new FooImpl()
   
Former will cause api instability. Reynold, maybe this is already
avoided - and I understood it wrong ?
   
Thanks,
Mridul
   
   
   
On Wed, Feb 19, 2014 at 12:44 PM, Christopher Nguyen c...@adatao.com
 
wrote:
 Mridul, IIUUC, what you've mentioned did come to mind, but I deemed
  it
 orthogonal to the stylistic issue Reynold is talking about.

 I believe you're referring to the case where there is a specific
   desired
 return type by API design, but the implementation does not, in
 which
case,
 of course, one must define the return type. That's an API
 requirement
   and
 not just a matter of readability.

 We could add this as an NB in the proposed guideline.

 --
 Christopher T. Nguyen
 Co-founder  CEO, Adatao http://adatao.com
 linkedin.com/in/ctnguyen



 On Tue, Feb 18, 2014 at 10:40 PM, Reynold Xin r...@databricks.com
 
wrote:

 +1 Christopher's suggestion.

 Mridul,

 How would that happen? Case 3 requires the method to be invoking
 the
 constructor directly. It was implicit in my email, but the return
  type
 should be the same as the class itself.




 On Tue, Feb 18, 2014 at 10:37 PM, Mridul Muralidharan 
   mri...@gmail.com
 wrote:

  Case 3 can be a potential issue.
  Current implementation might be returning a concrete class which
  we
  might want to change later - making it a type change.
  The intention might be to return an RDD (for example), but the
  inferred type might be a subclass of RDD - and future changes
 will
  cause signature change.
 
 
  Regards,
  Mridul
 
 
  On Wed, Feb 19, 2014 at 11:52 AM, Reynold Xin 
  r...@databricks.com
 wrote:
   Hi guys,
  
   Want to bring to the table this issue to see what other
 members
  of
the
   community think and then we can codify it in the Spark coding
   style
  guide.
   The topic is about declaring return types explicitly in public
   APIs.
  
   In general I think we should favor explicit type declaration
 in
public
   APIs. However, I do think there are 3 cases we can avoid the
   public
API
   definition because in these 3 cases the types are
 self-evident 
  repetitive.
  
   Case 1. toString
  
   Case 2. A method returning a string or a val defining a string
  
   def name = abcd // this is so obvious that it is a string
   val name = edfg // this too
  
   Case 3. The method or variable is invoking the constructor of
 a
class
 and
   return that immediately. For example:
  
   val a = new SparkContext(...)
   implicit def rddToAsyncRDDActions[T: ClassTag](rdd: RDD[T]) =
  new
   AsyncRDDActions(rdd)
  
  
   Thoughts?

Re: coding style discussion: explicit return type in public APIs

2014-02-19 Thread Mridul Muralidharan

My initial mail had it listed, adding more details here since I assume I am
missing something or not being clear - please note, this is just
illustrative and my scala knowledge is bad :-) (I am trying to draw
parallels from mistakes in java world)

def createFoo = new Foo()

To

def createFoo = new Foo1()

To

def createFoo = new Foo2()

(appropriate inheritance applied - parent Foo).

I am thinking from api evolution and binary compatibility point of view

Regards,
Mridul
On Feb 20, 2014 12:12 AM, Reynold Xin r...@databricks.com wrote:

 Mridul,

 Can you be more specific in the createFoo example?

 def myFunc = createFoo

 is disallowed in my guideline. It is invoking a function createFoo, not the
 constructor of Foo.




 On Wed, Feb 19, 2014 at 10:39 AM, Mridul Muralidharan mri...@gmail.com
 wrote:

  Without bikeshedding this too much ... It is likely incorrect (not
 wrong) -
  and rules like this potentially cause things to slip through.
 
  Explicit return type strictly specifies what is being exposed (think in
  face of impl change - createFoo changes in future from Foo to Foo1 or
 Foo2)
  .. being conservative about how to specify exposed interfaces, imo,
  outweighs potential gains in breveity of code.
  Btw this is a degenerate contrieved example already stretching its use
 ...
 
  Regards
  Mridul
 
  Regards
  Mridul
  On Feb 19, 2014 1:49 PM, Reynold Xin r...@databricks.com wrote:
 
   Yes, the case you brought up is not a matter of readability or style.
 If
  it
   returns a different type, it should be declared (otherwise it is just
   wrong).
  
  
   On Wed, Feb 19, 2014 at 12:17 AM, Mridul Muralidharan 
 mri...@gmail.com
   wrote:
  
You are right.
A degenerate case would be :
   
def createFoo = new FooImpl()
   
vs
   
def createFoo: Foo = new FooImpl()
   
Former will cause api instability. Reynold, maybe this is already
avoided - and I understood it wrong ?
   
Thanks,
Mridul
   
   
   
On Wed, Feb 19, 2014 at 12:44 PM, Christopher Nguyen c...@adatao.com
 
wrote:
 Mridul, IIUUC, what you've mentioned did come to mind, but I deemed
  it
 orthogonal to the stylistic issue Reynold is talking about.

 I believe you're referring to the case where there is a specific
   desired
 return type by API design, but the implementation does not, in
 which
case,
 of course, one must define the return type. That's an API
 requirement
   and
 not just a matter of readability.

 We could add this as an NB in the proposed guideline.

 --
 Christopher T. Nguyen
 Co-founder  CEO, Adatao http://adatao.com
 linkedin.com/in/ctnguyen



 On Tue, Feb 18, 2014 at 10:40 PM, Reynold Xin r...@databricks.com
 
wrote:

 +1 Christopher's suggestion.

 Mridul,

 How would that happen? Case 3 requires the method to be invoking
 the
 constructor directly. It was implicit in my email, but the return
  type
 should be the same as the class itself.




 On Tue, Feb 18, 2014 at 10:37 PM, Mridul Muralidharan 
   mri...@gmail.com
 wrote:

  Case 3 can be a potential issue.
  Current implementation might be returning a concrete class which
  we
  might want to change later - making it a type change.
  The intention might be to return an RDD (for example), but the
  inferred type might be a subclass of RDD - and future changes
 will
  cause signature change.
 
 
  Regards,
  Mridul
 
 
  On Wed, Feb 19, 2014 at 11:52 AM, Reynold Xin 
  r...@databricks.com
 wrote:
   Hi guys,
  
   Want to bring to the table this issue to see what other
 members
  of
the
   community think and then we can codify it in the Spark coding
   style
  guide.
   The topic is about declaring return types explicitly in public
   APIs.
  
   In general I think we should favor explicit type declaration
 in
public
   APIs. However, I do think there are 3 cases we can avoid the
   public
API
   definition because in these 3 cases the types are
 self-evident 
  repetitive.
  
   Case 1. toString
  
   Case 2. A method returning a string or a val defining a string
  
   def name = abcd // this is so obvious that it is a string
   val name = edfg // this too
  
   Case 3. The method or variable is invoking the constructor of
 a
class
 and
   return that immediately. For example:
  
   val a = new SparkContext(...)
   implicit def rddToAsyncRDDActions[T: ClassTag](rdd: RDD[T]) =
  new
   AsyncRDDActions(rdd)
  
  
   Thoughts?

Re: coding style discussion: explicit return type in public APIs

2014-02-19 Thread Mridul Muralidharan

I agree, makes sense.
Please note I was referring only to exposed user api in my comments - not
other code !

Regards,
Mridul
On Feb 20, 2014 12:15 AM, Andrew Ash and...@andrewash.com wrote:

 I found Haskell's convention of including type signatures as documentation
 to be worthwhile.

 http://www.haskell.org/haskellwiki/Type_signatures_as_good_style

 I'd support a guideline to include type signatures where they're unclear
 but would prefer to leave it quite vague.  In my experience, the lightest
 process is the best process for contributions.  Strict rules here _will_
 drive away contributors.


 On Wed, Feb 19, 2014 at 10:42 AM, Reynold Xin r...@databricks.com wrote:

  Mridul,
 
  Can you be more specific in the createFoo example?
 
  def myFunc = createFoo
 
  is disallowed in my guideline. It is invoking a function createFoo, not
 the
  constructor of Foo.
 
 
 
 
  On Wed, Feb 19, 2014 at 10:39 AM, Mridul Muralidharan mri...@gmail.com
  wrote:
 
   Without bikeshedding this too much ... It is likely incorrect (not
  wrong) -
   and rules like this potentially cause things to slip through.
  
   Explicit return type strictly specifies what is being exposed (think in
   face of impl change - createFoo changes in future from Foo to Foo1 or
  Foo2)
   .. being conservative about how to specify exposed interfaces, imo,
   outweighs potential gains in breveity of code.
   Btw this is a degenerate contrieved example already stretching its use
  ...
  
   Regards
   Mridul
  
   Regards
   Mridul
   On Feb 19, 2014 1:49 PM, Reynold Xin r...@databricks.com wrote:
  
Yes, the case you brought up is not a matter of readability or style.
  If
   it
returns a different type, it should be declared (otherwise it is just
wrong).
   
   
On Wed, Feb 19, 2014 at 12:17 AM, Mridul Muralidharan 
  mri...@gmail.com
wrote:
   
 You are right.
 A degenerate case would be :

 def createFoo = new FooImpl()

 vs

 def createFoo: Foo = new FooImpl()

 Former will cause api instability. Reynold, maybe this is already
 avoided - and I understood it wrong ?

 Thanks,
 Mridul



 On Wed, Feb 19, 2014 at 12:44 PM, Christopher Nguyen 
 c...@adatao.com
  
 wrote:
  Mridul, IIUUC, what you've mentioned did come to mind, but I
 deemed
   it
  orthogonal to the stylistic issue Reynold is talking about.
 
  I believe you're referring to the case where there is a specific
desired
  return type by API design, but the implementation does not, in
  which
 case,
  of course, one must define the return type. That's an API
  requirement
and
  not just a matter of readability.
 
  We could add this as an NB in the proposed guideline.
 
  --
  Christopher T. Nguyen
  Co-founder  CEO, Adatao http://adatao.com
  linkedin.com/in/ctnguyen
 
 
 
  On Tue, Feb 18, 2014 at 10:40 PM, Reynold Xin 
 r...@databricks.com
  
 wrote:
 
  +1 Christopher's suggestion.
 
  Mridul,
 
  How would that happen? Case 3 requires the method to be invoking
  the
  constructor directly. It was implicit in my email, but the
 return
   type
  should be the same as the class itself.
 
 
 
 
  On Tue, Feb 18, 2014 at 10:37 PM, Mridul Muralidharan 
mri...@gmail.com
  wrote:
 
   Case 3 can be a potential issue.
   Current implementation might be returning a concrete class
 which
   we
   might want to change later - making it a type change.
   The intention might be to return an RDD (for example), but the
   inferred type might be a subclass of RDD - and future changes
  will
   cause signature change.
  
  
   Regards,
   Mridul
  
  
   On Wed, Feb 19, 2014 at 11:52 AM, Reynold Xin 
   r...@databricks.com
  wrote:
Hi guys,
   
Want to bring to the table this issue to see what other
  members
   of
 the
community think and then we can codify it in the Spark
 coding
style
   guide.
The topic is about declaring return types explicitly in
 public
APIs.
   
In general I think we should favor explicit type declaration
  in
 public
APIs. However, I do think there are 3 cases we can avoid the
public
 API
definition because in these 3 cases the types are
  self-evident 
   repetitive.
   
Case 1. toString
   
Case 2. A method returning a string or a val defining a
 string
   
def name = abcd // this is so obvious that it is a string
val name = edfg // this too
   
Case 3. The method or variable is invoking the constructor
 of
  a
 class
  and
return that immediately. For example:
   
val a = new SparkContext(...)
implicit def rddToAsyncRDDActions[T: ClassTag](rdd: RDD[T])
 =
   new

[GitHub] incubator-spark pull request: Add Security to Spark - Akka, Http, ...

2014-02-19 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/incubator-spark/pull/332#discussion_r9878293
  
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import org.apache.hadoop.io.Text
+import org.apache.hadoop.security.Credentials
+import org.apache.hadoop.security.UserGroupInformation
+
+import org.apache.spark.deploy.SparkHadoopUtil
+
+/** 
+ * Spark class responsible for security.  
+ */
+private[spark] class SecurityManager extends Logging {
+
+  private val isAuthOn = System.getProperty(spark.authenticate, 
false).toBoolean
+  private val isUIAuthOn = System.getProperty(spark.authenticate.ui, 
false).toBoolean
+  private val viewAcls = System.getProperty(spark.ui.view.acls, 
).split(',').map(_.trim()).toSet
+  private val secretKey = generateSecretKey()
+  logDebug(is auth enabled =  + isAuthOn +  is uiAuth enabled =  + 
isUIAuthOn)
+ 
+  /**
+   * In Yarn mode it uses Hadoop UGI to pass the secret as that
+   * will keep it protected.  For a standalone SPARK cluster
+   * use a environment variable SPARK_SECRET to specify the secret.
+   * This probably isn't ideal but only the user who starts the process
+   * should have access to view the variable (at least on Linux).
+   * Since we can't set the environment variable we set the 
+   * java system property SPARK_SECRET so it will automatically
+   * generate a secret is not specified.  This definitely is not
+   * ideal since users can see it. We should switch to put it in 
+   * a config.
+   */
+  private def generateSecretKey(): String = {
+
+if (!isAuthenticationEnabled) return null
+// first check to see if secret already set, else generate it
+if (SparkHadoopUtil.get.isYarnMode) {
+  val credentials = SparkHadoopUtil.get.getCurrentUserCredentials()
+  if (credentials != null) { 
+val secretKey = credentials.getSecretKey(new Text(akkaCookie))
+if (secretKey != null) {
+  logDebug(in yarn mode, getting secret from credentials)
+  return new Text(secretKey).toString
+} else {
+  logDebug(getSecretKey: yarn mode, secret key from credentials 
is null)
+}
+  } else {
+logDebug(getSecretKey: yarn mode, credentials are null)
+  }
+}
+val secret = System.getProperty(SPARK_SECRET, 
System.getenv(SPARK_SECRET)) 
+if (secret != null  !secret.isEmpty()) return secret 
+// generate one 
+val sCookie = akka.util.Crypt.generateSecureCookie
+
+// if we generate we must be the first so lets set it so its used by 
everyone else
+if (SparkHadoopUtil.get.isYarnMode) {
+  val creds = new Credentials()
+  creds.addSecretKey(new Text(akkaCookie), sCookie.getBytes())
+  SparkHadoopUtil.get.addCurrentUserCredentials(creds)
+  logDebug(adding secret to credentials yarn mode)
+} else {
+  System.setProperty(SPARK_SECRET, sCookie)
+  logDebug(adding secret to java property)
+}
+return sCookie
+  }
+
+  def isUIAuthenticationEnabled(): Boolean = return isUIAuthOn 
+
+  // allow anyone in the acl list and the application owner 
+  def checkUIViewPermissions(user: String): Boolean = {
+if (isUIAuthenticationEnabled()  (user != null)) {
+  if ((!viewAcls.contains(user))  (user != 
System.getProperty(user.name))) {
--- End diff --

Agree, that sounds fine.

Regards,
Mridul
On Feb 19, 2014 11:43 PM, Tom Graves notificati...@github.com wrote:

 In core/src/main/scala/org/apache/spark/SecurityManager.scala:

  +  creds.addSecretKey(new Text(akkaCookie), sCookie.getBytes())
  +  SparkHadoopUtil.get.addCurrentUserCredentials(creds)
  +  logDebug(adding secret to credentials yarn

[GitHub] incubator-spark pull request: MLLIB-24: url of Collaborative Filt...

2014-02-19 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/incubator-spark/pull/619#issuecomment-35535759
  
DOI links are permanent so we don't need to worry about the link becoming 
invalid again. People will do a search and find the pdf easily if they don't 
have access to IEEE.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [Proposal] Adding sparse data suppor...

2014-02-19 Thread fommil

Github user fommil commented on the pull request:

https://github.com/apache/incubator-spark/pull/575#issuecomment-35546981
  
@mengxr consider this message to be proof that jniloader is distributed 
under the Apache license. I'll update the build files next time I need a code 
change. If you want it quicker, issue a PR (and add it as a dual license) ;-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [Proposal] Adding sparse data suppor...

2014-02-19 Thread fommil

Github user fommil commented on the pull request:

https://github.com/apache/incubator-spark/pull/575#issuecomment-35547276
  
@srowen The LGPL is ineligible primarily due to the restrictions it places 
on larger works, violating the third license criterion. Therefore, 
LGPL-licensed works must not be included in Apache products. where third 
license criterion is The license must not place restrictions on the 
distribution of larger works, other than to require that the covered component 
still complies with the conditions of its license. I do not see any violation 
here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [Proposal] Adding sparse data suppor...

2014-02-19 Thread fommil

Github user fommil commented on the pull request:

https://github.com/apache/incubator-spark/pull/575#issuecomment-35548061
  
@srowen I've asked the question. I'm interested to see the response: 
https://issues.apache.org/jira/browse/LEGAL-192


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [Proposal] Adding sparse data suppor...

2014-02-19 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/incubator-spark/pull/575#issuecomment-35557645
  
@fommil Thanks a lot! The license JIRA is also interesting to follow ~ :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1105] fix site scala version ...

2014-02-19 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/incubator-spark/pull/618#issuecomment-35567163
  
Thanks guys I put this in master and 0.9.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1105] fix site scala version ...

2014-02-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-spark/pull/618


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1094] Support MiMa for report...

2014-02-19 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/incubator-spark/pull/585#discussion_r9890431
  
--- Diff: project/MimaBuild.scala ---
@@ -0,0 +1,105 @@
+import com.typesafe.tools.mima.plugin.MimaKeys.{binaryIssueFilters, 
previousArtifact}
+import com.typesafe.tools.mima.plugin.MimaPlugin.mimaDefaultSettings
+
+object MimaBuild {
+
+  val ignoredABIProblems = {
+import com.typesafe.tools.mima.core._
+import com.typesafe.tools.mima.core.ProblemFilters._
+/**
+ * A: Detections are semi private or likely to become semi private at 
some point.
+ */
+
Seq(exclude[MissingClassProblem](org.apache.spark.util.XORShiftRandom),
+  
exclude[MissingClassProblem](org.apache.spark.util.XORShiftRandom$),
+  
exclude[MissingMethodProblem](org.apache.spark.util.Utils.cloneWritables),
+  // Scheduler is not considered a public API.
+  excludePackage(org.apache.spark.deploy),
+  // Was made private in 1.0
--- End diff --

Ah darn, seems like this doesn't work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: For SPARK-1082, Use Curator for ZK i...

2014-02-19 Thread colorant

Github user colorant commented on the pull request:

https://github.com/apache/incubator-spark/pull/611#issuecomment-35570472
  
ah, so the sleep removed ;) and the synchronization block is already there, 
is it ok?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: For SPARK-1082, Use Curator for ZK i...

2014-02-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/611#issuecomment-35572288
  
Build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: For SPARK-1082, Use Curator for ZK i...

2014-02-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/611#issuecomment-35572287
  
 Build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

2014-02-19 Thread hsaputra

Github user hsaputra commented on a diff in the pull request:

https://github.com/apache/incubator-spark/pull/615#discussion_r9891770
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -165,19 +165,20 @@ class SparkContext(
 jars.foreach(addJar)
   }
 
+  def warnSparkMem(value: String): String = {
+logWarning(Using SPARK_MEM to set amount of memory to use per 
executor process is  +
+  deprecated, please use instead spark.executor.memory)
--- End diff --

Small nit of the warning wording:

deprecated, please use spark.executor.memory instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: MLLIB-24: url of Collaborative Filt...

2014-02-19 Thread CrazyJvm

Github user CrazyJvm commented on the pull request:

https://github.com/apache/incubator-spark/pull/619#issuecomment-35575435
  
take permanent valid url into consideration, change url from yahoo to 
ieee. thx @mengxr .
http://dx.doi.org/10.1109/ICDM.2008.22


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: For SPARK-1082, Use Curator for ZK i...

2014-02-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/611#issuecomment-35579235
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12782/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: For SPARK-1082, Use Curator for ZK i...

2014-02-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/611#issuecomment-35579234
  
Build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: Add a environment variable that allo...

2014-02-19 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/incubator-spark/pull/192#issuecomment-35580209
  
See SPARK-1110... I took down some notes there relevant to this:
https://spark-project.atlassian.net/browse/SPARK-1110


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

2014-02-19 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/incubator-spark/pull/615#issuecomment-35581271
  
@sryza - I don't think this is relevant to the YARN codepath. AFAIK YARN 
doesn't use the ./spark-class script to launch the YARN application master 
(which embeds the driver program). I'm not totally sure how that JVM is 
actually launched though... couldn't figure it out on a quick glance at that 
code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

2014-02-19 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/incubator-spark/pull/615#issuecomment-35581371
  
It looks like there is a separate variable called `amMemory` that deals 
with this in YARN. The command for launching that JVM gets set-up in:

common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: Adding an option to persist Spark RD...

2014-02-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/468#issuecomment-35582252
  
 Build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [java8API] SPARK-964 Investigate the...

2014-02-19 Thread ScrapCodes

Github user ScrapCodes commented on the pull request:

https://github.com/apache/incubator-spark/pull/539#issuecomment-35586860
  
Hey Matei, 

I feel this is better than before in overall. There is one thing I was not 
very sure about is putting a couple of implicits in JavaPairRDD. But this was 
already being done. There is no way I know our users from previous versions can 
avoid a recompile as such. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: Adding an option to persist Spark RD...

2014-02-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/468#issuecomment-35587409
  
Build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1094] Support MiMa for report...

2014-02-19 Thread ScrapCodes

Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/incubator-spark/pull/585#discussion_r9895285
  
--- Diff: project/MimaBuild.scala ---
@@ -0,0 +1,105 @@
+import com.typesafe.tools.mima.plugin.MimaKeys.{binaryIssueFilters, 
previousArtifact}
+import com.typesafe.tools.mima.plugin.MimaPlugin.mimaDefaultSettings
+
+object MimaBuild {
+
+  val ignoredABIProblems = {
+import com.typesafe.tools.mima.core._
+import com.typesafe.tools.mima.core.ProblemFilters._
+/**
+ * A: Detections are semi private or likely to become semi private at 
some point.
+ */
+
Seq(exclude[MissingClassProblem](org.apache.spark.util.XORShiftRandom),
+  
exclude[MissingClassProblem](org.apache.spark.util.XORShiftRandom$),
+  
exclude[MissingMethodProblem](org.apache.spark.util.Utils.cloneWritables),
+  // Scheduler is not considered a public API.
+  excludePackage(org.apache.spark.deploy),
+  // Was made private in 1.0
--- End diff --

you are right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1094] Support MiMa for report...

2014-02-19 Thread ScrapCodes

Github user ScrapCodes commented on the pull request:

https://github.com/apache/incubator-spark/pull/585#issuecomment-35593470
  
Hey @pwendell, Not sure how, cleared ivy and m2 for spark but it is not 
possible to get rid of these. I am trying it with jenkins once, since you could 
remove them w/o errors.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1094] Support MiMa for report...

2014-02-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/585#issuecomment-35594451
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1094] Support MiMa for report...

2014-02-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/585#issuecomment-35594452
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: MLLIB-22. Support negative implicit ...

2014-02-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-spark/pull/500


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

64 matches

Mail list logo