[GitHub] spark pull request: [SPARK-1194] Fix the same-RDD rule for cache r...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/96 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Allow sbt to use more than 1G of heap.
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/103 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Allow sbt to use more than 1G of heap.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/103#issuecomment-37092461 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Allow sbt to use more than 1G of heap.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/103#issuecomment-37092462 One or more automated tests failed Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13067/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/21#issuecomment-37092479 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/21#issuecomment-37092654 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/21#issuecomment-37092655 One or more automated tests failed Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13068/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/21#issuecomment-37093324 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/21#issuecomment-37093423 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/21#issuecomment-37093422 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update junitxml plugin to the latest version t...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/104 Update junitxml plugin to the latest version to avoid recompilation in every SBT command. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark junitxml Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/104.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #104 commit 67ef7bffd92a30b8d81c072ad1c504eb3a53d264 Author: Reynold Xin r...@apache.org Date: 2014-03-08T09:41:06Z Update junitxml plugin to the latest version to avoid recompilation in every SBT command. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1194] Fix the same-RDD rule for cache r...
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/96#issuecomment-37093759 Thanks for this fix - excellent catch ! On Sat, Mar 8, 2014 at 1:53 PM, asfgit notificati...@github.com wrote: Closed #96 https://github.com/apache/spark/pull/96 via 0b7b7fdhttps://github.com/apache/spark/commit/0b7b7fd45cd9037d23cb090e62be3ff075214fe7 . -- Reply to this email directly or view it on GitHubhttps://github.com/apache/spark/pull/96 . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/21#issuecomment-37094469 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update junitxml plugin to the latest version t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/104#issuecomment-37094472 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/21#issuecomment-37094471 One or more automated tests failed Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13069/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update junitxml plugin to the latest version t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/104#issuecomment-37094470 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/21#issuecomment-37094612 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/21#issuecomment-37094638 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update junitxml plugin to the latest version t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/104#issuecomment-37095429 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/21#issuecomment-37095433 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update junitxml plugin to the latest version t...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/104#issuecomment-37096375 Very cool, finally we have this ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1128: set hadoop task properties when co...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/101#issuecomment-37098779 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1128: set hadoop task properties when co...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/101#issuecomment-37098778 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1102: Create a saveAsNewAPIHadoopDataset...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12#issuecomment-37098783 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1102: Create a saveAsNewAPIHadoopDataset...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12#issuecomment-37098784 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1160: Deprecate toArray in RDD
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/105#issuecomment-37100319 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13074/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1102: Create a saveAsNewAPIHadoopDataset...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12#issuecomment-37100324 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1128: set hadoop task properties when co...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/101#issuecomment-37100327 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13072/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update junitxml plugin to the latest version t...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/104#issuecomment-37104948 LGTM @rxin is there an equivalent thing to this in maven or no? Seems to me like maybe this is sbt only. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1160: Deprecate toArray in RDD
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/105#discussion_r10410220 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -666,6 +666,7 @@ abstract class RDD[T: ClassTag]( /** * Return an array that contains all of the elements in this RDD. */ + @deprecated --- End diff -- Would you mind adding a message here that explains it's deprecated as of 1.0.0 and the solution is to use collect()? Take a look at other places where we deprecate stuff. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-782 Clean up for ASM dependency.
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/100#issuecomment-37105043 @mateiz this works fine in Java 8 unit tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Updated the formatting of code blocks using Gi...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/68#issuecomment-37106963 Do you mind closing this? The solution proposed here isn't going to work because of the way our docs are compiled. If there is a way to make this work well in both our compiled docs and the website then definitely open a new request, but AFAIK this isn't so easy. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark 615 map partitions with index callable f...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/16#issuecomment-37107006 @holdenk mind bumping this now that #17 is in? You'll have to change `extends` to `with`... since the function classes are now interfaces rather than abstract classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark-1163, Added missing Python RDD functions
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/92#discussion_r10410791 --- Diff: python/pyspark/rdd.py --- @@ -1057,6 +1058,64 @@ def coalesce(self, numPartitions, shuffle=False): jrdd = self._jrdd.coalesce(numPartitions) return RDD(jrdd, self.ctx, self._jrdd_deserializer) +def name(self): + +Return the name of this RDD. + +name_ = self._jrdd.name() +if not name_: +return None +return name_.encode('utf-8') + +def setName(self, name): + +Assign a name to this RDD. + rdd1 = sc.parallelize([1,2]) + rdd1.setName('RDD1') + rdd1.name() +'RDD1' + +self._jrdd.setName(name) + +def generator(self): --- End diff -- @mateiz - are you sure we want this function? It might be good to delay adding this to pyspark pending a clean-up of the `generator` stuff which I think is mostly redundant with the callsite/origin. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark-1163, Added missing Python RDD functions
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/92#discussion_r10410792 --- Diff: python/pyspark/rdd.py --- @@ -1057,6 +1058,64 @@ def coalesce(self, numPartitions, shuffle=False): jrdd = self._jrdd.coalesce(numPartitions) return RDD(jrdd, self.ctx, self._jrdd_deserializer) +def name(self): + +Return the name of this RDD. + +name_ = self._jrdd.name() +if not name_: +return None +return name_.encode('utf-8') + +def setName(self, name): + +Assign a name to this RDD. + rdd1 = sc.parallelize([1,2]) + rdd1.setName('RDD1') + rdd1.name() +'RDD1' + +self._jrdd.setName(name) + +def generator(self): --- End diff -- both this and `setGenerator` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark 615 map partitions with index callable f...
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/16#issuecomment-37107763 Sure, I'll give this a shot today :) On Sat, Mar 8, 2014 at 11:24 AM, Patrick Wendell notificati...@github.comwrote: @holdenk https://github.com/holdenk mind bumping this now that #17https://github.com/apache/spark/pull/17is in? You'll have to change extends to with... since the function classes are now interfaces rather than abstract classes. -- Reply to this email directly or view it on GitHubhttps://github.com/apache/spark/pull/16#issuecomment-37107006 . -- Cell : 425-233-8271 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark-1163, Added missing Python RDD functions
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/92#discussion_r10410905 --- Diff: python/pyspark/rdd.py --- @@ -1057,6 +1058,64 @@ def coalesce(self, numPartitions, shuffle=False): jrdd = self._jrdd.coalesce(numPartitions) return RDD(jrdd, self.ctx, self._jrdd_deserializer) +def name(self): + +Return the name of this RDD. + +name_ = self._jrdd.name() +if not name_: +return None +return name_.encode('utf-8') + +def setName(self, name): + +Assign a name to this RDD. + rdd1 = sc.parallelize([1,2]) + rdd1.setName('RDD1') + rdd1.name() +'RDD1' + +self._jrdd.setName(name) + +def generator(self): + +Return the generator of this RDD. + +generator_ = self._jrdd.generator() +if not generator_: +return None +return generator_.encode('utf-8') + +def setGenerator(self, generator): + +Reset generator of this RDD. + rdd1 = sc.parallelize([1,2]) + rdd1.setGenerator('dummyRDDgenerator') + rdd1.generator() +'dummyRDDgenerator' + +self._jrdd.setGenerator(generator) + +def toDebugString(self): + +A description of this RDD and its recursive dependencies for debugging. + +debug_string = self._jrdd.toDebugString() +if not debug_string: +return None +return debug_string.encode('utf-8') + +def getStorageLevel(self): --- End diff -- If you add this it should also be possible to write a doctest. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1004. PySpark on YARN
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/30#discussion_r10410916 --- Diff: sbin/spark-config.sh --- @@ -34,3 +34,6 @@ this=$config_bin/$script export SPARK_PREFIX=`dirname $this`/.. export SPARK_HOME=${SPARK_PREFIX} export SPARK_CONF_DIR=$SPARK_HOME/conf +# Add the PySpark classes to the PYTHONPATH: +export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH --- End diff -- Good point; I think we should move these lines to `spark-class` to make sure that workers use the right PYTHONPATH even if they're started manually through `spark-class`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1004. PySpark on YARN
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/30#discussion_r10410942 --- Diff: python/Makefile --- @@ -0,0 +1,7 @@ +assembly: clean + python setup.py build --build-lib build/lib + unzip lib/py4j*.zip -d build/lib + cd build/lib zip -r ../pyspark-assembly.zip . + --- End diff -- Are you envisioning including the PySpark dependencies in the Spark assembly jar? I think that could work, since we need to build that jar anyways when running under YARN. I'm not sure how easy it will be to modify the Maven or SBT builds to include those files. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1004. PySpark on YARN
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/30#discussion_r10410971 --- Diff: python/pyspark/java_gateway.py --- @@ -66,3 +71,30 @@ def run(self): java_import(gateway.jvm, org.apache.spark.mllib.api.python.*) java_import(gateway.jvm, scala.Tuple2) return gateway + +def set_env_vars_for_yarn(pyspark_zip): +if SPARK_YARN_DIST_FILES in os.environ: +os.environ[SPARK_YARN_DIST_FILES] += (, + pyspark_zip) +else: +os.environ[SPARK_YARN_DIST_FILES] = pyspark_zip + +# Add the pyspark zip to the python path +env_map = parse_env(os.environ.get(SPARK_YARN_USER_ENV, )) +if PYTHONPATH in env_map: +env_map[PYTHONPATH] += (: + os.path.basename(pyspark_zip)) +else: +env_map[PYTHONPATH] = os.path.basename(pyspark_zip) + +os.environ[SPARK_YARN_USER_ENV] = ,.join(map(lambda v: v[0] + = + v[1], +env_map.items())) + +def parse_env(env_str): +# Turns a comma-separated of env settings into a dict that maps env vars to +# their values. +env = {} +for var_str in env_str.split(,): +parts = var_str.split(=) +if len(parts) == 2: --- End diff -- Do you think it would be worth it to crash or throw an error when passed an invalid env string? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1004. PySpark on YARN
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/30#discussion_r10410987 --- Diff: python/pyspark/java_gateway.py --- @@ -66,3 +71,30 @@ def run(self): java_import(gateway.jvm, org.apache.spark.mllib.api.python.*) java_import(gateway.jvm, scala.Tuple2) return gateway + +def set_env_vars_for_yarn(pyspark_zip): +if SPARK_YARN_DIST_FILES in os.environ: +os.environ[SPARK_YARN_DIST_FILES] += (, + pyspark_zip) +else: +os.environ[SPARK_YARN_DIST_FILES] = pyspark_zip + +# Add the pyspark zip to the python path +env_map = parse_env(os.environ.get(SPARK_YARN_USER_ENV, )) +if PYTHONPATH in env_map: +env_map[PYTHONPATH] += (: + os.path.basename(pyspark_zip)) +else: +env_map[PYTHONPATH] = os.path.basename(pyspark_zip) + +os.environ[SPARK_YARN_USER_ENV] = ,.join(map(lambda v: v[0] + = + v[1], --- End diff -- I think you can write this a little more clearly as ``` os.environ[SPARK_YARN_USER_ENV] = ,.join(k + '=' + v for (k, v) in env_map.items()) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1004. PySpark on YARN
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/30#discussion_r10410991 --- Diff: python/pyspark/java_gateway.py --- @@ -15,6 +15,7 @@ # limitations under the License. # +from glob import glob --- End diff -- I added this import in my original patch, but it's unused now and can be removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1004. PySpark on YARN
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/30#issuecomment-37108992 I left a few minor comments in the diff, but overall this looks good to me. It might be worth adding build/run instructions in either the PySpark Programming Guide or YARN guide. It also occurred to me that the Makefile-based build for the PySpark fat zip might be a problem for Windows users; Scala/Java Spark works fine under Cygwin, but PySpark only works in cmd.exe / powershell (the main difficulty is that in some cases the Java and Python halves of the PySpark driver expect different types of paths, so we'd have to replicate parts of the cygpath logic in Java and Python). I suppose we could use the Python [`zipfile`](http://docs.python.org/2/library/zipfile) library and implement the build script in Python. Or, as @ahirreddy suggested, maybe we could package the Python libraries into a JAR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update junitxml plugin to the latest version t...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/104#issuecomment-37109170 Ok I merged this. Not sure about Maven off the top of my head. All these build plugins are pretty arcane to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1004. PySpark on YARN
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/30#issuecomment-37109537 @sryza another thing here is, whatever the make target ends up being we should add it to the `make_release` script and the `make-distribution` script (those two need to be merged soon but for now they both exist). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update junitxml plugin to the latest version t...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/104 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark 615 map partitions with index callable f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/16#issuecomment-37110554 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1205: Clean up callSite/origin/generator...
GitHub user pwendell opened a pull request: https://github.com/apache/spark/pull/106 SPARK-1205: Clean up callSite/origin/generator. This patch removes the `generator` field and simplifies + documents the tracking of callsites. There are two places where we care about call sites, when a job is run and when an RDD is created. This patch retains both of those features but does a slight refactoring and renaming to make things less confusing. There was another feature of an rdd called the `generator` which was by default the user class that in which the RDD was created. This is used exclusively in the JobLogger. It been subsumed by the ability to name a job group. The job logger can later be refectored to read the job group directly (will require some work) but for now this just preserves the default logged value of the user class. I'm not sure any users ever used the ability to override this. You can merge this pull request into a Git repository by running: $ git pull https://github.com/pwendell/spark callsite Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/106.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #106 commit 576e60bb204b1caedb7696a3365b4f4f2b2c6a81 Author: Patrick Wendell pwend...@gmail.com Date: 2014-03-08T21:43:16Z SPARK-1205: Clean up callSite/origin/generator. This patch removes the `generator` field and simplifies + documents the tracking of callsites. There are two places where we care about call sites, when a job is run and when an RDD is created. This patch retains both of those features but does a slight refactoring and renaming to make things less confusing. There was another feature of an rdd called the `generator` which was by default the user class that in which the RDD was created. This is used exclusively in the JobLogger. It been subsumed by the ability to name a job group. The job logger can later be refectored to read the job group directly (will require some work) but for now this just preserves the default logged value of the user class. I'm not sure any users ever used the ability to override this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1190: Do not initialize log4j if slf4j l...
GitHub user pwendell opened a pull request: https://github.com/apache/spark/pull/107 SPARK-1190: Do not initialize log4j if slf4j log4j backend is not being used You can merge this pull request into a Git repository by running: $ git pull https://github.com/pwendell/spark logging Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/107.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #107 commit be21c11f1764540bb649d5b7400c92acfbc51511 Author: Patrick Wendell pwend...@gmail.com Date: 2014-02-07T23:22:29Z Logging fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1160: Deprecate toArray in RDD
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/105#issuecomment-37111528 This doesn't deprecate it in Java (I think). Mind adding that too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark 615 map partitions with index callable f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/16#issuecomment-37111991 One or more automated tests failed Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13075/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1190: Do not initialize log4j if slf4j l...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/107#issuecomment-37111996 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1205: Clean up callSite/origin/generator...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/106#issuecomment-37111998 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1205: Clean up callSite/origin/generator...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/106#issuecomment-37113363 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1205: Clean up callSite/origin/generator...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/106#issuecomment-37113364 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13077/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Spark 0.9.0 and log4j
Evan I actually remembered that Paul Brown (who also reported this issue) tested it and found that it worked. I'm going to merge this into master and branch 0.9, so please give it a spin when you have a chance. - Patrick On Sat, Mar 8, 2014 at 2:00 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Evan, This is being tracked here: https://spark-project.atlassian.net/browse/SPARK-1190 That patch didn't get merged but I've just opened a new one here: https://github.com/apache/spark/pull/107/files Would you have any interest in testing this? I want to make sure it works for users who are using logback. I'd like to get this merged quickly since it's one of the only remaining blockers for Spark 0.9.1. - Patrick On Fri, Mar 7, 2014 at 11:04 AM, Evan Chan e...@ooyala.com wrote: Hey guys, This is a follow-up to this semi-recent thread: http://apache-spark-developers-list.1001551.n3.nabble.com/0-9-0-forces-log4j-usage-td532.html 0.9.0 final is causing issues for us as well because we use Logback as our backend and Spark requires Log4j now. I see Patrick has a PR #560 to incubator-spark, was that merged in or left out? Also I see references to a new PR that might fix this, but I can't seem to find it in the github open PR page. Anybody have a link? As a last resort we can switch to Log4j, but would rather not have to do that if possible. thanks, Evan -- -- Evan Chan Staff Engineer e...@ooyala.com |
[GitHub] spark pull request: SPARK-1190: Do not initialize log4j if slf4j l...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/107#issuecomment-37114184 I've merged this and put it into 0.9. Thanks @prb who tested an earlier version of this patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: 0.9.0 forces log4j usage
The fix for this was just merged into branch 0.9 (will be in 0.9.1+) and master. On Sun, Feb 9, 2014 at 11:44 PM, Patrick Wendell pwend...@gmail.com wrote: Thanks Paul - it isn't mean to be a full solution but just a fix for the 0.9 branch - for the full solution there is another PR by Sean Owen. On Sun, Feb 9, 2014 at 11:35 PM, Paul Brown p...@mult.ifario.us wrote: Hi, Patrick -- I gave that a go locally, and it works as desired. Best. -- Paul -- p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/ On Fri, Feb 7, 2014 at 6:10 PM, Patrick Wendell pwend...@gmail.com wrote: Ah okay sounds good. This is what I meant earlier by You have some other application that directly calls log4j i.e. you have for historical reasons installed the log4j-over-slf4j. Would you mind trying out this fix and seeing if it works? This is designed to be a hotfix for 0.9, not a general solution where we rip out log4j from our published dependencies: https://github.com/apache/incubator-spark/pull/560/files - Patrick On Fri, Feb 7, 2014 at 5:57 PM, Paul Brown p...@mult.ifario.us wrote: Hi, Patrick -- I forget which other component is responsible, but we're using the log4j-over-slf4j as part of an overall requirement to centralize logging, i.e., *someone* else is logging over log4j and we're pulling that in. (There's also some jul logging from Jersey, etc.) Goals: - Fully control/capture all possible logging. (God forbid we have to grab System.out/err, but we'd do it if needed.) - Use the backend we like best at the moment. (Happens to be logback.) Possible cases: - If Spark used Log4j at all, we would pull in that logging via log4j-over-slf4j. - If Spark used only slf4j and referenced no backend, we would use it as-is although we'd still have the log4j-over-slf4j because of other libraries. - If Spark used only slf4j and referenced the slf4j-log4j12 backend, we would exclude that one dependency (via our POM). Best. -- Paul -- p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/ On Fri, Feb 7, 2014 at 5:38 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Paul, So if your goal is ultimately to output to logback. Then why don't you just use slf4j and logback-classic.jar as described here [1]. Why involve log4j-over-slf4j at all? Let's say we refactored the spark build so it didn't advertise slf4j-log4j12 as a dependency. Would you still be using log4j-over-slf4j... or is this just a fix to deal with the fact that Spark is somewhat log4j dependent at this point. [1] http://www.slf4j.org/manual.html - Patrick On Fri, Feb 7, 2014 at 5:14 PM, Paul Brown p...@mult.ifario.us wrote: Hi, Patrick -- That's close but not quite it. The issue that occurs is not the delegation loop mentioned in slf4j documentation. The stack overflow is entirely within the code in the Spark trait: at org.apache.spark.Logging$class.initializeLogging(Logging.scala:112) at org.apache.spark.Logging$class.initializeIfNecessary(Logging.scala:97) at org.apache.spark.Logging$class.log(Logging.scala:36) at org.apache.spark.SparkEnv$.log(SparkEnv.scala:94) And then that repeats. As for our situation, we exclude the slf4j-log4j12 dependency when we import the Spark library (because we don't want to use log4j) and have log4j-over-slf4j already in place to ensure that all of the logging in the overall application runs through slf4j and then out through logback. (We also, as another poster already mentioned, also force jcl and jul through slf4j.) The zen of slf4j for libraries is that the library uses the slf4j API and then the enclosing application can route logging as it sees fit. Spark master CLI would log via slf4j and include the slf4j-log4j12 backend; same for Spark worker CLI. Spark as a library (versus as a container) would not include any backend to the slf4j API and leave this up to the application. (FWIW, this would also avoid your log4j warning message.) But as I was saying before, I'd be happy with a situation where I can avoid log4j being enabled or configured, and I think you'll find an existing choice of logging framework to be a common scenario for those embedding Spark in other systems. Best. -- Paul -- p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/ On Fri, Feb 7, 2014 at 3:01 PM, Patrick Wendell pwend...@gmail.com wrote: Paul, Looking back at your problem. I think it's the one here: http://www.slf4j.org/codes.html#log4jDelegationLoop So let me just be clear what you are doing so I understand. You have some other application that directly calls log4j. So you have to include log4j-over-slf4j to route those logs through slf4j to logback. At the same time you embed Spark in this application. In the
[GitHub] spark pull request: SPARK-1190: Do not initialize log4j if slf4j l...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/107 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1160: Deprecate toArray in RDD
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/105#issuecomment-37115130 Hi, @pwendell , thank you for the comments I just fixed that --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1160: Deprecate toArray in RDD
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/105#issuecomment-37115142 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1160: Deprecate toArray in RDD
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/105#issuecomment-37115141 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1160: Deprecate toArray in RDD
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/105#issuecomment-37116110 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13078/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1160: Deprecate toArray in RDD
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/105#issuecomment-37116109 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1167: Remove metrics-ganglia from defaul...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/108#issuecomment-37116233 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1167: Remove metrics-ganglia from defaul...
GitHub user pwendell opened a pull request: https://github.com/apache/spark/pull/108 SPARK-1167: Remove metrics-ganglia from default build due to LGPL issues... This patch removes Ganglia integration from the default build. It allows users willing to link against LGPL code to use Ganglia by adding build flags or linking against a new Spark artifact called spark-ganglia-lgpl. This brings Spark in line with the Apache policy on LGPL code enumerated here: https://www.apache.org/legal/3party.html#options-optional You can merge this pull request into a Git repository by running: $ git pull https://github.com/pwendell/spark ganglia Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/108.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #108 commit d3d9d6f062b732eb08d3ccd75fedb02602a4eb97 Author: Patrick Wendell pwend...@gmail.com Date: 2014-03-09T00:29:35Z SPARK-1167: Remove metrics-ganglia from default build due to LGPL issues. This patch removes Ganglia integration from the default build. It allows users willing to link against LGPL code to use Ganglia by adding build flags or linking against a new Spark artifact called spark-ganglia-lgpl. This brings Spark in line with the Apache policy on LGPL code enumerated here: https://www.apache.org/legal/3party.html#options-optional --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1167: Remove metrics-ganglia from defaul...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/108#issuecomment-37116234 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1102: Create a saveAsNewAPIHadoopDataset...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/12#issuecomment-37116374 @mateiz I have rebased the code, any further comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1167: Remove metrics-ganglia from defaul...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/108#issuecomment-37117121 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13079/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1167: Remove metrics-ganglia from defaul...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/108#issuecomment-37117125 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1167: Remove metrics-ganglia from defaul...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/108#issuecomment-37117126 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1167: Remove metrics-ganglia from defaul...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/108#issuecomment-37118026 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Updated the formatting of code blocks using Gi...
Github user jyotiska closed the pull request at: https://github.com/apache/spark/pull/68 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/42#issuecomment-37119593 Build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/42#issuecomment-37119592 Build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Ability to initialize Spark-Shell with command...
GitHub user kellrott opened a pull request: https://github.com/apache/spark/pull/109 Ability to initialize Spark-Shell with command script This script allows a user to define a script file with code that will be executed when then spark-shell starts up. This initialization script file can be set either by setting the SPARK_SHELL_RC environmental variable to the path, or by placing a file at $HOME/.spark_shell_rc (the environmental variable takes precedence over the home directory file) There are two main usage scenarios: 1) The user has a set of commands they want run automatically whenever they open spark-shell 2) Other software packages that depend on spark, and want to provide easy access to their code in a way similar to spark-shell, can provide a wrapper shell for spark-shell that adds the tool jars with ADD_JARS and then executes an initialization with SPARK_SHELL_RC to do all the import calls and variable initialization. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kellrott/spark shell-rc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/109.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #109 commit 6432e0c30d3605498582313bd8728e7b9cc5413b Author: Kyle Ellrott kellr...@gmail.com Date: 2014-03-09T01:59:21Z Adding code to execute rc file at start of spark-shell. Either defined via environmental variable SPARKSHELL_RC or by file at $HOME/.spark_shell_rc commit 6a30cfdfe946017748620ae7fb89daa3a2dc5eae Author: Kyle Ellrott kellr...@gmail.com Date: 2014-03-09T05:02:29Z Changing SPARKSHELL_RC to SPARK_SHELL_RC --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Ability to initialize Spark-Shell with command...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/109#issuecomment-37120323 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/42#issuecomment-37120318 Build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Ability to initialize Spark-Shell with command...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/109#issuecomment-37121124 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13082/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Ability to initialize Spark-Shell with command...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/109#issuecomment-37121123 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1099:Spark's local mode should probably ...
GitHub user qqsun8819 opened a pull request: https://github.com/apache/spark/pull/110 SPARK-1099:Spark's local mode should probably respect spark.cores.max by default This is for JIRA:https://spark-project.atlassian.net/browse/SPARK-1099 And this is what I do in this patch (also commented in the JIRA) @aarondav This is really a behavioral change, so I do this with greate caution, adn welcome any review advice: 1 I change the MASTER=local pattern of create LocalBackEnd . In the past, we passed 1 core to it . now it use a default cores The reason here is that when someone use spark-shell to start local mode , Repl will use this MASTER=local pattern as default. So if one also specify cores in the spark-shell command line, it will all go in here. So here pass 1 core is not suitalbe reponding to our change here. 2 In the LocalBackEnd , the totalCores variable are fetched following a different rule(in the past it just take in a userd passed cores, like 1 in MASTER=local pattern, 2 in MASTER=local2 pattern rules: a The second argument of LocalBackEnd 's constructor indicating cores have a default value which is Int.MaxValue. If user didn't pass it , its first default value is Int.MaxValue b In getMaxCores, we first compare the former value to Int.MaxValue. if it's not equal, we think that user has passed their desired value, so just use it c. If b is not satified, we then get cores from spark.cores.max, and we get real logical cores from Runtime. And if cores specified by spark.cores.max is bigger than logical cores, we use logical cores, otherwise we use spark.cores.max 3 In SparkContextSchedulerCreationSuite 's test(local) case, assertion is modified from 1 to logical cores, because MASTER=local pattern use default vaules. You can merge this pull request into a Git repository by running: $ git pull https://github.com/qqsun8819/spark local-cores Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/110.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #110 commit 6ae1ee82f49e10166c29c538f452503236d06531 Author: qqsun8819 jin@alibaba-inc.com Date: 2014-03-09T06:19:10Z Add a static function in LocalBackEnd to let it use spark.cores.max specified cores when no cores are passed to it commit 78b9c60ce8279189e486479fbb211410c1a1b73c Author: qqsun8819 jin@alibaba-inc.com Date: 2014-03-09T07:28:23Z 1 SparkContext MASTER=local pattern use default cores instead of 1 to construct LocalBackEnd , for use of spark-shell and cores specified in cmd line 2 some test case change from local to local[1]. 3 SparkContextSchedulerCreationSuite test spark.cores.max config in local pattern --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---