[GitHub] spark pull request: [SPARK-1194] Fix the same-RDD rule for cache r...

2014-03-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/96


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Allow sbt to use more than 1G of heap.

2014-03-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/103


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Allow sbt to use more than 1G of heap.

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/103#issuecomment-37092461
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Allow sbt to use more than 1G of heap.

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/103#issuecomment-37092462
  
One or more automated tests failed
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13067/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/21#issuecomment-37092479
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/21#issuecomment-37092654
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/21#issuecomment-37092655
  
One or more automated tests failed
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13068/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...

2014-03-08 Thread holdenk
Github user holdenk commented on the pull request:

https://github.com/apache/spark/pull/21#issuecomment-37093324
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/21#issuecomment-37093423
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/21#issuecomment-37093422
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update junitxml plugin to the latest version t...

2014-03-08 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/104

Update junitxml plugin to the latest version to avoid recompilation in 
every SBT command.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark junitxml

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/104.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #104


commit 67ef7bffd92a30b8d81c072ad1c504eb3a53d264
Author: Reynold Xin r...@apache.org
Date:   2014-03-08T09:41:06Z

Update junitxml plugin to the latest version to avoid recompilation in 
every SBT command.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1194] Fix the same-RDD rule for cache r...

2014-03-08 Thread mridulm
Github user mridulm commented on the pull request:

https://github.com/apache/spark/pull/96#issuecomment-37093759
  
Thanks for this fix - excellent catch !


On Sat, Mar 8, 2014 at 1:53 PM, asfgit notificati...@github.com wrote:

 Closed #96 https://github.com/apache/spark/pull/96 via 
0b7b7fdhttps://github.com/apache/spark/commit/0b7b7fd45cd9037d23cb090e62be3ff075214fe7
 .

 --
 Reply to this email directly or view it on 
GitHubhttps://github.com/apache/spark/pull/96
 .



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/21#issuecomment-37094469
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update junitxml plugin to the latest version t...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/104#issuecomment-37094472
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/21#issuecomment-37094471
  
One or more automated tests failed
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13069/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update junitxml plugin to the latest version t...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/104#issuecomment-37094470
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...

2014-03-08 Thread holdenk
Github user holdenk commented on the pull request:

https://github.com/apache/spark/pull/21#issuecomment-37094612
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/21#issuecomment-37094638
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update junitxml plugin to the latest version t...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/104#issuecomment-37095429
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/21#issuecomment-37095433
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update junitxml plugin to the latest version t...

2014-03-08 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/104#issuecomment-37096375
  
Very cool, finally we have this ! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1128: set hadoop task properties when co...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/101#issuecomment-37098779
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1128: set hadoop task properties when co...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/101#issuecomment-37098778
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1102: Create a saveAsNewAPIHadoopDataset...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12#issuecomment-37098783
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1102: Create a saveAsNewAPIHadoopDataset...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12#issuecomment-37098784
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1160: Deprecate toArray in RDD

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/105#issuecomment-37100319
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13074/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1102: Create a saveAsNewAPIHadoopDataset...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12#issuecomment-37100324
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1128: set hadoop task properties when co...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/101#issuecomment-37100327
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13072/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update junitxml plugin to the latest version t...

2014-03-08 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/104#issuecomment-37104948
  
LGTM @rxin is there an equivalent thing to this in maven or no? Seems to me 
like maybe this is sbt only.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1160: Deprecate toArray in RDD

2014-03-08 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/105#discussion_r10410220
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -666,6 +666,7 @@ abstract class RDD[T: ClassTag](
   /**
* Return an array that contains all of the elements in this RDD.
*/
+  @deprecated
--- End diff --

Would you mind adding a message here that explains it's deprecated as of 
1.0.0 and the solution is to use collect()? Take a look at other places where 
we deprecate stuff.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-782 Clean up for ASM dependency.

2014-03-08 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/100#issuecomment-37105043
  
@mateiz this works fine in Java 8 unit tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Updated the formatting of code blocks using Gi...

2014-03-08 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/68#issuecomment-37106963
  
Do you mind closing this? The solution proposed here isn't going to work 
because of the way our docs are compiled. If there is a way to make this work 
well in both our compiled docs and the website then definitely open a new 
request, but AFAIK this isn't so easy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark 615 map partitions with index callable f...

2014-03-08 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/16#issuecomment-37107006
  
@holdenk mind bumping this now that #17 is in? You'll have to change 
`extends` to `with`... since the function classes are now interfaces rather 
than abstract classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark-1163, Added missing Python RDD functions

2014-03-08 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/92#discussion_r10410791
  
--- Diff: python/pyspark/rdd.py ---
@@ -1057,6 +1058,64 @@ def coalesce(self, numPartitions, shuffle=False):
 jrdd = self._jrdd.coalesce(numPartitions)
 return RDD(jrdd, self.ctx, self._jrdd_deserializer)
 
+def name(self):
+
+Return the name of this RDD.
+
+name_ = self._jrdd.name()
+if not name_:
+return None
+return name_.encode('utf-8')
+
+def setName(self, name):
+
+Assign a name to this RDD.
+ rdd1 = sc.parallelize([1,2])
+ rdd1.setName('RDD1')
+ rdd1.name()
+'RDD1'
+
+self._jrdd.setName(name)
+
+def generator(self):
--- End diff --

@mateiz - are you sure we want this function? It might be good to delay 
adding this to pyspark pending a clean-up of the `generator` stuff which I 
think is mostly redundant with the callsite/origin.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark-1163, Added missing Python RDD functions

2014-03-08 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/92#discussion_r10410792
  
--- Diff: python/pyspark/rdd.py ---
@@ -1057,6 +1058,64 @@ def coalesce(self, numPartitions, shuffle=False):
 jrdd = self._jrdd.coalesce(numPartitions)
 return RDD(jrdd, self.ctx, self._jrdd_deserializer)
 
+def name(self):
+
+Return the name of this RDD.
+
+name_ = self._jrdd.name()
+if not name_:
+return None
+return name_.encode('utf-8')
+
+def setName(self, name):
+
+Assign a name to this RDD.
+ rdd1 = sc.parallelize([1,2])
+ rdd1.setName('RDD1')
+ rdd1.name()
+'RDD1'
+
+self._jrdd.setName(name)
+
+def generator(self):
--- End diff --

both this and `setGenerator`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark 615 map partitions with index callable f...

2014-03-08 Thread holdenk
Github user holdenk commented on the pull request:

https://github.com/apache/spark/pull/16#issuecomment-37107763
  
Sure, I'll give this a shot today :)


On Sat, Mar 8, 2014 at 11:24 AM, Patrick Wendell
notificati...@github.comwrote:

 @holdenk https://github.com/holdenk mind bumping this now that 
#17https://github.com/apache/spark/pull/17is in? You'll have to change
 extends to with... since the function classes are now interfaces rather
 than abstract classes.

 --
 Reply to this email directly or view it on 
GitHubhttps://github.com/apache/spark/pull/16#issuecomment-37107006
 .




-- 
Cell : 425-233-8271


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark-1163, Added missing Python RDD functions

2014-03-08 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/92#discussion_r10410905
  
--- Diff: python/pyspark/rdd.py ---
@@ -1057,6 +1058,64 @@ def coalesce(self, numPartitions, shuffle=False):
 jrdd = self._jrdd.coalesce(numPartitions)
 return RDD(jrdd, self.ctx, self._jrdd_deserializer)
 
+def name(self):
+
+Return the name of this RDD.
+
+name_ = self._jrdd.name()
+if not name_:
+return None
+return name_.encode('utf-8')
+
+def setName(self, name):
+
+Assign a name to this RDD.
+ rdd1 = sc.parallelize([1,2])
+ rdd1.setName('RDD1')
+ rdd1.name()
+'RDD1'
+
+self._jrdd.setName(name)
+
+def generator(self):
+
+Return the generator of this RDD.
+
+generator_ = self._jrdd.generator()
+if not generator_:
+return None
+return generator_.encode('utf-8')
+
+def setGenerator(self, generator):
+
+Reset generator of this RDD.
+ rdd1 = sc.parallelize([1,2])
+ rdd1.setGenerator('dummyRDDgenerator')
+ rdd1.generator()
+'dummyRDDgenerator'
+
+self._jrdd.setGenerator(generator)
+
+def toDebugString(self):
+
+A description of this RDD and its recursive dependencies for 
debugging.
+
+debug_string = self._jrdd.toDebugString()
+if not debug_string:
+return None
+return debug_string.encode('utf-8')
+
+def getStorageLevel(self):
--- End diff --

If you add this it should also be possible to write a doctest.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1004. PySpark on YARN

2014-03-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/30#discussion_r10410916
  
--- Diff: sbin/spark-config.sh ---
@@ -34,3 +34,6 @@ this=$config_bin/$script
 export SPARK_PREFIX=`dirname $this`/..
 export SPARK_HOME=${SPARK_PREFIX}
 export SPARK_CONF_DIR=$SPARK_HOME/conf
+# Add the PySpark classes to the PYTHONPATH:
+export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH
--- End diff --

Good point; I think we should move these lines to `spark-class` to make 
sure that workers use the right PYTHONPATH even if they're started manually 
through `spark-class`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1004. PySpark on YARN

2014-03-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/30#discussion_r10410942
  
--- Diff: python/Makefile ---
@@ -0,0 +1,7 @@
+assembly: clean
+   python setup.py build --build-lib build/lib
+   unzip lib/py4j*.zip -d build/lib
+   cd build/lib  zip -r ../pyspark-assembly.zip .
+
--- End diff --

Are you envisioning including the PySpark dependencies in the Spark 
assembly jar?  I think that could work, since we need to build that jar anyways 
when running under YARN.

I'm not sure how easy it will be to modify the Maven or SBT builds to 
include those files.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1004. PySpark on YARN

2014-03-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/30#discussion_r10410971
  
--- Diff: python/pyspark/java_gateway.py ---
@@ -66,3 +71,30 @@ def run(self):
 java_import(gateway.jvm, org.apache.spark.mllib.api.python.*)
 java_import(gateway.jvm, scala.Tuple2)
 return gateway
+
+def set_env_vars_for_yarn(pyspark_zip):
+if SPARK_YARN_DIST_FILES in os.environ:
+os.environ[SPARK_YARN_DIST_FILES] += (, + pyspark_zip)
+else:
+os.environ[SPARK_YARN_DIST_FILES] = pyspark_zip
+
+# Add the pyspark zip to the python path
+env_map = parse_env(os.environ.get(SPARK_YARN_USER_ENV, ))
+if PYTHONPATH in env_map:
+env_map[PYTHONPATH] += (: + os.path.basename(pyspark_zip))
+else:
+env_map[PYTHONPATH] = os.path.basename(pyspark_zip)
+
+os.environ[SPARK_YARN_USER_ENV] = ,.join(map(lambda v: v[0] + = 
+ v[1],
+env_map.items()))
+
+def parse_env(env_str):
+# Turns a comma-separated of env settings into a dict that maps env 
vars to
+# their values.
+env = {}
+for var_str in env_str.split(,):
+parts = var_str.split(=)
+if len(parts) == 2:
--- End diff --

Do you think it would be worth it to crash or throw an error when passed an 
invalid env string?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1004. PySpark on YARN

2014-03-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/30#discussion_r10410987
  
--- Diff: python/pyspark/java_gateway.py ---
@@ -66,3 +71,30 @@ def run(self):
 java_import(gateway.jvm, org.apache.spark.mllib.api.python.*)
 java_import(gateway.jvm, scala.Tuple2)
 return gateway
+
+def set_env_vars_for_yarn(pyspark_zip):
+if SPARK_YARN_DIST_FILES in os.environ:
+os.environ[SPARK_YARN_DIST_FILES] += (, + pyspark_zip)
+else:
+os.environ[SPARK_YARN_DIST_FILES] = pyspark_zip
+
+# Add the pyspark zip to the python path
+env_map = parse_env(os.environ.get(SPARK_YARN_USER_ENV, ))
+if PYTHONPATH in env_map:
+env_map[PYTHONPATH] += (: + os.path.basename(pyspark_zip))
+else:
+env_map[PYTHONPATH] = os.path.basename(pyspark_zip)
+
+os.environ[SPARK_YARN_USER_ENV] = ,.join(map(lambda v: v[0] + = 
+ v[1],
--- End diff --

I think you can write this a little more clearly as

```
 os.environ[SPARK_YARN_USER_ENV] = ,.join(k + '=' + v for (k, v) in 
env_map.items())
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1004. PySpark on YARN

2014-03-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/30#discussion_r10410991
  
--- Diff: python/pyspark/java_gateway.py ---
@@ -15,6 +15,7 @@
 # limitations under the License.
 #
 
+from glob import glob
--- End diff --

I added this import in my original patch, but it's unused now and can be 
removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1004. PySpark on YARN

2014-03-08 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/30#issuecomment-37108992
  
I left a few minor comments in the diff, but overall this looks good to me.

It might be worth adding build/run instructions in either the PySpark 
Programming Guide or YARN guide.

It also occurred to me that the Makefile-based build for the PySpark fat 
zip might be a problem for Windows users; Scala/Java Spark works fine under 
Cygwin, but PySpark only works in cmd.exe / powershell (the main difficulty is 
that in some cases the Java and Python halves of the PySpark driver expect 
different types of paths, so we'd have to replicate parts of the cygpath logic 
in Java and Python).  I suppose we could use the Python 
[`zipfile`](http://docs.python.org/2/library/zipfile) library and implement the 
build script in Python.  Or, as @ahirreddy suggested, maybe we could package 
the Python libraries into a JAR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update junitxml plugin to the latest version t...

2014-03-08 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/104#issuecomment-37109170
  
Ok I merged this.

Not sure about Maven off the top of my head. All these build plugins are 
pretty arcane to me. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1004. PySpark on YARN

2014-03-08 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/30#issuecomment-37109537
  
@sryza another thing here is, whatever the make target ends up being we 
should add it to the `make_release` script and the `make-distribution` script 
(those two need to be merged soon but for now they both exist).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update junitxml plugin to the latest version t...

2014-03-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/104


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark 615 map partitions with index callable f...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/16#issuecomment-37110554
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1205: Clean up callSite/origin/generator...

2014-03-08 Thread pwendell
GitHub user pwendell opened a pull request:

https://github.com/apache/spark/pull/106

SPARK-1205: Clean up callSite/origin/generator.

This patch removes the `generator` field and simplifies + documents
the tracking of callsites.

There are two places where we care about call sites, when a job is
run and when an RDD is created. This patch retains both of those
features but does a slight refactoring and renaming to make things
less confusing.

There was another feature of an rdd called the `generator` which was
by default the user class that in which the RDD was created. This is
used exclusively in the JobLogger. It been subsumed by the ability
to name a job group. The job logger can later be refectored to
read the job group directly (will require some work) but for now
this just preserves the default logged value of the user class.
I'm not sure any users ever used the ability to override this.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pwendell/spark callsite

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/106.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #106


commit 576e60bb204b1caedb7696a3365b4f4f2b2c6a81
Author: Patrick Wendell pwend...@gmail.com
Date:   2014-03-08T21:43:16Z

SPARK-1205: Clean up callSite/origin/generator.

This patch removes the `generator` field and simplifies + documents
the tracking of callsites.

There are two places where we care about call sites, when a job is
run and when an RDD is created. This patch retains both of those
features but does a slight refactoring and renaming to make things
less confusing.

There was another feature of an rdd called the `generator` which was
by default the user class that in which the RDD was created. This is
used exclusively in the JobLogger. It been subsumed by the ability
to name a job group. The job logger can later be refectored to
read the job group directly (will require some work) but for now
this just preserves the default logged value of the user class.
I'm not sure any users ever used the ability to override this.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1190: Do not initialize log4j if slf4j l...

2014-03-08 Thread pwendell
GitHub user pwendell opened a pull request:

https://github.com/apache/spark/pull/107

SPARK-1190: Do not initialize log4j if slf4j log4j backend is not being used



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pwendell/spark logging

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/107.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #107


commit be21c11f1764540bb649d5b7400c92acfbc51511
Author: Patrick Wendell pwend...@gmail.com
Date:   2014-02-07T23:22:29Z

Logging fix




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1160: Deprecate toArray in RDD

2014-03-08 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/105#issuecomment-37111528
  
This doesn't deprecate it in Java (I think). Mind adding that too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark 615 map partitions with index callable f...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/16#issuecomment-37111991
  
One or more automated tests failed
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13075/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1190: Do not initialize log4j if slf4j l...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/107#issuecomment-37111996
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1205: Clean up callSite/origin/generator...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/106#issuecomment-37111998
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1205: Clean up callSite/origin/generator...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/106#issuecomment-37113363
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1205: Clean up callSite/origin/generator...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/106#issuecomment-37113364
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13077/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Spark 0.9.0 and log4j

2014-03-08 Thread Patrick Wendell
Evan I actually remembered that Paul Brown (who also reported this
issue) tested it and found that it worked. I'm going to merge this
into master and branch 0.9, so please give it a spin when you have a
chance.

- Patrick

On Sat, Mar 8, 2014 at 2:00 PM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Evan,

 This is being tracked here:
 https://spark-project.atlassian.net/browse/SPARK-1190

 That patch didn't get merged but I've just opened a new one here:
 https://github.com/apache/spark/pull/107/files

 Would you have any interest in testing this? I want to make sure it
 works for users who are using logback.

 I'd like to get this merged quickly since it's one of the only
 remaining blockers for Spark 0.9.1.

 - Patrick



 On Fri, Mar 7, 2014 at 11:04 AM, Evan Chan e...@ooyala.com wrote:
 Hey guys,

 This is a follow-up to this semi-recent thread:
 http://apache-spark-developers-list.1001551.n3.nabble.com/0-9-0-forces-log4j-usage-td532.html

 0.9.0 final is causing issues for us as well because we use Logback as
 our backend and Spark requires Log4j now.

 I see Patrick has a PR #560 to incubator-spark, was that merged in or
 left out?

 Also I see references to a new PR that might fix this, but I can't
 seem to find it in the github open PR page.   Anybody have a link?

 As a last resort we can switch to Log4j, but would rather not have to
 do that if possible.

 thanks,
 Evan

 --
 --
 Evan Chan
 Staff Engineer
 e...@ooyala.com  |


[GitHub] spark pull request: SPARK-1190: Do not initialize log4j if slf4j l...

2014-03-08 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/107#issuecomment-37114184
  
I've merged this and put it into 0.9. Thanks @prb who tested an earlier 
version of this patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: 0.9.0 forces log4j usage

2014-03-08 Thread Patrick Wendell
The fix for this was just merged into branch 0.9 (will be in 0.9.1+) and master.

On Sun, Feb 9, 2014 at 11:44 PM, Patrick Wendell pwend...@gmail.com wrote:
 Thanks Paul - it isn't mean to be a full solution but just a fix for
 the 0.9 branch - for the full solution there is another PR by Sean
 Owen.

 On Sun, Feb 9, 2014 at 11:35 PM, Paul Brown p...@mult.ifario.us wrote:
 Hi, Patrick --

 I gave that a go locally, and it works as desired.

 Best.
 -- Paul

 --
 p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/


 On Fri, Feb 7, 2014 at 6:10 PM, Patrick Wendell pwend...@gmail.com wrote:

 Ah okay sounds good. This is what I meant earlier by You have
 some other application that directly calls log4j i.e. you have
 for historical reasons installed the log4j-over-slf4j.

 Would you mind trying out this fix and seeing if it works? This is
 designed to be a hotfix for 0.9, not a general solution where we rip
 out log4j from our published dependencies:

 https://github.com/apache/incubator-spark/pull/560/files

 - Patrick

 On Fri, Feb 7, 2014 at 5:57 PM, Paul Brown p...@mult.ifario.us wrote:
  Hi, Patrick --
 
  I forget which other component is responsible, but we're using the
  log4j-over-slf4j as part of an overall requirement to centralize logging,
  i.e., *someone* else is logging over log4j and we're pulling that in.
   (There's also some jul logging from Jersey, etc.)
 
  Goals:
 
  - Fully control/capture all possible logging.  (God forbid we have to
 grab
  System.out/err, but we'd do it if needed.)
  - Use the backend we like best at the moment.  (Happens to be logback.)
 
  Possible cases:
 
  - If Spark used Log4j at all, we would pull in that logging via
  log4j-over-slf4j.
  - If Spark used only slf4j and referenced no backend, we would use it
 as-is
  although we'd still have the log4j-over-slf4j because of other libraries.
  - If Spark used only slf4j and referenced the slf4j-log4j12 backend, we
  would exclude that one dependency (via our POM).
 
  Best.
  -- Paul
 
 
  --
  p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
 
 
  On Fri, Feb 7, 2014 at 5:38 PM, Patrick Wendell pwend...@gmail.com
 wrote:
 
  Hey Paul,
 
  So if your goal is ultimately to output to logback. Then why don't you
  just use slf4j and logback-classic.jar as described here [1]. Why
  involve log4j-over-slf4j at all?
 
  Let's say we refactored the spark build so it didn't advertise
  slf4j-log4j12 as a dependency. Would you still be using
  log4j-over-slf4j... or is this just a fix to deal with the fact that
  Spark is somewhat log4j dependent at this point.
 
  [1] http://www.slf4j.org/manual.html
 
  - Patrick
 
  On Fri, Feb 7, 2014 at 5:14 PM, Paul Brown p...@mult.ifario.us wrote:
   Hi, Patrick --
  
   That's close but not quite it.
  
   The issue that occurs is not the delegation loop mentioned in slf4j
   documentation.  The stack overflow is entirely within the code in the
  Spark
   trait:
  
   at org.apache.spark.Logging$class.initializeLogging(Logging.scala:112)
   at
 org.apache.spark.Logging$class.initializeIfNecessary(Logging.scala:97)
   at org.apache.spark.Logging$class.log(Logging.scala:36)
   at org.apache.spark.SparkEnv$.log(SparkEnv.scala:94)
  
  
   And then that repeats.
  
   As for our situation, we exclude the slf4j-log4j12 dependency when we
   import the Spark library (because we don't want to use log4j) and have
   log4j-over-slf4j already in place to ensure that all of the logging in
  the
   overall application runs through slf4j and then out through logback.
  (We
   also, as another poster already mentioned, also force jcl and jul
 through
   slf4j.)
  
   The zen of slf4j for libraries is that the library uses the slf4j API
 and
   then the enclosing application can route logging as it sees fit.
  Spark
   master CLI would log via slf4j and include the slf4j-log4j12 backend;
  same
   for Spark worker CLI.  Spark as a library (versus as a container)
 would
  not
   include any backend to the slf4j API and leave this up to the
  application.
(FWIW, this would also avoid your log4j warning message.)
  
   But as I was saying before, I'd be happy with a situation where I can
  avoid
   log4j being enabled or configured, and I think you'll find an existing
   choice of logging framework to be a common scenario for those
 embedding
   Spark in other systems.
  
   Best.
   -- Paul
  
   --
   p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
  
  
   On Fri, Feb 7, 2014 at 3:01 PM, Patrick Wendell pwend...@gmail.com
  wrote:
  
   Paul,
  
   Looking back at your problem. I think it's the one here:
   http://www.slf4j.org/codes.html#log4jDelegationLoop
  
   So let me just be clear what you are doing so I understand. You have
   some other application that directly calls log4j. So you have to
   include log4j-over-slf4j to route those logs through slf4j to
 logback.
  
   At the same time you embed Spark in this application. In the 

[GitHub] spark pull request: SPARK-1190: Do not initialize log4j if slf4j l...

2014-03-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/107


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1160: Deprecate toArray in RDD

2014-03-08 Thread CodingCat
Github user CodingCat commented on the pull request:

https://github.com/apache/spark/pull/105#issuecomment-37115130
  
Hi, @pwendell , thank you for the comments

I just fixed that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1160: Deprecate toArray in RDD

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/105#issuecomment-37115142
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1160: Deprecate toArray in RDD

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/105#issuecomment-37115141
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1160: Deprecate toArray in RDD

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/105#issuecomment-37116110
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13078/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1160: Deprecate toArray in RDD

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/105#issuecomment-37116109
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1167: Remove metrics-ganglia from defaul...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/108#issuecomment-37116233
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1167: Remove metrics-ganglia from defaul...

2014-03-08 Thread pwendell
GitHub user pwendell opened a pull request:

https://github.com/apache/spark/pull/108

SPARK-1167: Remove metrics-ganglia from default build due to LGPL issues...

This patch removes Ganglia integration from the default build. It
allows users willing to link against LGPL code to use Ganglia
by adding build flags or linking against a new Spark artifact called
spark-ganglia-lgpl.

This brings Spark in line with the Apache policy on LGPL code
enumerated here:

https://www.apache.org/legal/3party.html#options-optional

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pwendell/spark ganglia

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/108.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #108


commit d3d9d6f062b732eb08d3ccd75fedb02602a4eb97
Author: Patrick Wendell pwend...@gmail.com
Date:   2014-03-09T00:29:35Z

SPARK-1167: Remove metrics-ganglia from default build due to LGPL issues.

This patch removes Ganglia integration from the default build. It
allows users willing to link against LGPL code to use Ganglia
by adding build flags or linking against a new Spark artifact called
spark-ganglia-lgpl.

This brings Spark in line with the Apache policy on LGPL code
enumerated here:

https://www.apache.org/legal/3party.html#options-optional




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1167: Remove metrics-ganglia from defaul...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/108#issuecomment-37116234
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1102: Create a saveAsNewAPIHadoopDataset...

2014-03-08 Thread CodingCat
Github user CodingCat commented on the pull request:

https://github.com/apache/spark/pull/12#issuecomment-37116374
  
@mateiz I have rebased the code, any further comments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1167: Remove metrics-ganglia from defaul...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/108#issuecomment-37117121
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13079/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1167: Remove metrics-ganglia from defaul...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/108#issuecomment-37117125
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1167: Remove metrics-ganglia from defaul...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/108#issuecomment-37117126
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1167: Remove metrics-ganglia from defaul...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/108#issuecomment-37118026
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Updated the formatting of code blocks using Gi...

2014-03-08 Thread jyotiska
Github user jyotiska closed the pull request at:

https://github.com/apache/spark/pull/68


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/42#issuecomment-37119593
  
Build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/42#issuecomment-37119592
  
 Build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Ability to initialize Spark-Shell with command...

2014-03-08 Thread kellrott
GitHub user kellrott opened a pull request:

https://github.com/apache/spark/pull/109

Ability to initialize Spark-Shell with command script

This script allows a user to define a script file with code that will be 
executed when then spark-shell starts up.
This initialization script file can be set either by setting the 
SPARK_SHELL_RC environmental variable to the path, or by placing a file at 
$HOME/.spark_shell_rc (the environmental variable takes precedence over the 
home directory file)

There are two main usage scenarios:
1) The user has a set of commands they want run automatically whenever they 
open spark-shell
2) Other software packages that depend on spark, and want to provide easy 
access to their code in a way similar to spark-shell, can provide a wrapper 
shell for spark-shell that adds the tool jars with ADD_JARS and then executes 
an initialization with SPARK_SHELL_RC to do all the import calls and variable 
initialization. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kellrott/spark shell-rc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/109.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #109


commit 6432e0c30d3605498582313bd8728e7b9cc5413b
Author: Kyle Ellrott kellr...@gmail.com
Date:   2014-03-09T01:59:21Z

Adding code to execute rc file at start of spark-shell. Either defined via 
environmental variable SPARKSHELL_RC or by file at $HOME/.spark_shell_rc

commit 6a30cfdfe946017748620ae7fb89daa3a2dc5eae
Author: Kyle Ellrott kellr...@gmail.com
Date:   2014-03-09T05:02:29Z

Changing SPARKSHELL_RC to SPARK_SHELL_RC




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Ability to initialize Spark-Shell with command...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/109#issuecomment-37120323
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/42#issuecomment-37120318
  
Build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Ability to initialize Spark-Shell with command...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/109#issuecomment-37121124
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13082/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Ability to initialize Spark-Shell with command...

2014-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/109#issuecomment-37121123
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1099:Spark's local mode should probably ...

2014-03-08 Thread qqsun8819
GitHub user qqsun8819 opened a pull request:

https://github.com/apache/spark/pull/110

SPARK-1099:Spark's local mode should probably respect spark.cores.max by 
default

This is for JIRA:https://spark-project.atlassian.net/browse/SPARK-1099
And this is what I do in this patch (also commented in the JIRA) @aarondav

 This is really a behavioral change, so I do this with greate caution, adn 
welcome any review advice:

1 I change the MASTER=local pattern of create LocalBackEnd . In the past, 
we passed 1 core to it . now it use a default cores
The reason here is that when someone use spark-shell to start local mode , 
Repl will use this MASTER=local pattern as default.
So if one also specify cores in the spark-shell command line, it will all 
go in here. So here pass 1 core is not suitalbe reponding to our change here.
2 In the LocalBackEnd , the totalCores variable are fetched following a 
different rule(in the past it just take in a userd passed cores, like 1 in 
MASTER=local pattern, 2 in MASTER=local2 pattern
rules: 
a The second argument of LocalBackEnd 's constructor indicating cores have 
a default value which is Int.MaxValue. If user didn't pass it , its first 
default value is Int.MaxValue
b In getMaxCores, we first compare the former value to Int.MaxValue. if 
it's not equal, we think that user has passed their desired value, so just use 
it 
c. If b is not satified, we then get cores from spark.cores.max, and we get 
real logical cores from Runtime. And if cores specified by spark.cores.max is 
bigger than logical cores, we use logical cores, otherwise we use 
spark.cores.max
3 In SparkContextSchedulerCreationSuite 's test(local) case, assertion is 
modified from 1 to logical cores, because MASTER=local pattern use default 
vaules.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/qqsun8819/spark local-cores

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/110.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #110


commit 6ae1ee82f49e10166c29c538f452503236d06531
Author: qqsun8819 jin@alibaba-inc.com
Date:   2014-03-09T06:19:10Z

Add a static function in LocalBackEnd to let it use spark.cores.max 
specified cores when no cores are passed to it

commit 78b9c60ce8279189e486479fbb211410c1a1b73c
Author: qqsun8819 jin@alibaba-inc.com
Date:   2014-03-09T07:28:23Z

1 SparkContext MASTER=local pattern use default cores instead of 1 to 
construct LocalBackEnd , for use of spark-shell and cores specified in cmd line 
2 some test case change from local to local[1]. 3 
SparkContextSchedulerCreationSuite test spark.cores.max config in local pattern




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---