[GitHub] spark pull request: [SPARK-1429] Spark shell fails to start after ...
Github user liancheng closed the pull request at: https://github.com/apache/spark/pull/337 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1403] Move the class loader creation ba...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/322#issuecomment-39699481 Okay I think the broader issue is that @ueshin you said before that `Class.forName()` method with `null` for 3rd argument tries to load class from bootstrap class loader, which doesn't know the class `org.apache.spark.serializer.JavaSerializer`. But I think in this case we'd expect the bootstrap classloader to know about `JavaSerializer` (this should be on the classpath when the executor starts), right? I'm still not sure why it would fail in this case. I don't see why `MesosExecutorDriver` could be on the java classpath but `JavaSerializer` isn't. @manku-timma I looked more and the reason this doesn't work is that it looks like other parts of the code don't directly use the `classLoader` form the executor. I can look more tomorrow and see how we can best clean this up. The current approach works but it's a bit of a hack. There might be a nicer way to clean this up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1403] Move the class loader creation ba...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/322#issuecomment-39699792 Ah I see, @ueshin you are right. It's the bootstrap class loader and it won't have any spark definitions. I was mixing this up with the system class loader. ``` ./bin/spark-shell scala Class.forName(org.apache.spark.serializer.JavaSerializer) res7: Class[_] = class org.apache.spark.serializer.JavaSerializer scala Class.forName(org.apache.spark.serializer.JavaSerializer, true, null) java.lang.ClassNotFoundException: org/apache/spark/serializer/JavaSerializer at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at $iwC$$iwC$$iwC$$iwC.init(console:11) at $iwC$$iwC$$iwC.init(console:16) at $iwC$$iwC.init(console:18) at $iwC.init(console:20) ``` We should definitely clean this up. The behavior we want in every case is to either use the context class loader (if present) and if not use the classloader that loads spark classes (e.g. the system classloader). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1276] Add a HistoryServer to render per...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/204#issuecomment-39699932 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1276] Add a HistoryServer to render per...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/204#issuecomment-39699926 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1276] Add a HistoryServer to render per...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/204#discussion_r11332376 --- Diff: sbin/start-history-server.sh --- @@ -0,0 +1,46 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the License); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an AS IS BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Starts the history server on the machine this script is executed on. +# +# Usage: start-history-server.sh base-log-dir [web-ui-port] +# Example: ./start-history-server.sh --dir /tmp/spark-events --port 18080 +# + +sbin=`dirname $0` +sbin=`cd $sbin; pwd` + +if [ $# -lt 1 ]; then + echo Usage: ./start-history-server.sh base-log-dir [web-ui-port] + echo Example: ./start-history-server.sh /tmp/spark-events 18080 --- End diff -- In the latest commit, the history server reads from SPARK_DAEMON_MEMORY the same way Masters and Workers do. This may or may not be subject to change with #299. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark logger moving to use scala-logging
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/332#issuecomment-39700055 Is performance really a problem here? I don't think we log stuff in critical path (we probably shouldn't). If we do, maybe we can just get rid of those logging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1403] Move the class loader creation ba...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/322#issuecomment-39700076 @pwendell Yes, the bootstrap class loader knows only core Java APIs and the Spark classes (specified by `-cp` java command argument) are loaded by the system class loader. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r11332469 --- Diff: core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala --- @@ -108,8 +108,7 @@ class JsonProtocolSuite extends FunSuite { // BlockId testBlockId(RDDBlockId(1, 2)) testBlockId(ShuffleBlockId(1, 2, 3)) -testBlockId(BroadcastBlockId(1L)) -testBlockId(BroadcastHelperBlockId(BroadcastBlockId(2L), Spark)) +testBlockId(BroadcastBlockId(1L, Insert words of wisdom here)) --- End diff -- Hey @tdas, looks like the `` and `` here are causing the test failure (in case you haven't investigated it yet). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1252. On YARN, use container-log4j.prope...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/148#issuecomment-39700608 Right, (b) is different than the original intent of the PR. The reason for copying Spark's log4j instead of Hadoop's was the concern brought up by @tgravescs earlier: The one downside to making the yarn one default is that we now get different looking logs if the user just uses the spark one. In the default most things go to syslog file, and if I just put in the conf/log4j.properties by copying the template I won't get a syslog file and most things will be in stdout right? This applies to the master too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark logger moving to use scala-logging
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/332#issuecomment-39701040 I did not find the call that affect performance It is possible that here: Spark Catalyst `logger.debug` is called many times May be like this [RuleExecutor.scala#L64](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala#L64) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SQL] SPARK-1427 Fix toString for SchemaRDD Na...
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/343 [SQL] SPARK-1427 Fix toString for SchemaRDD NativeCommands. You can merge this pull request into a Git repository by running: $ git pull https://github.com/marmbrus/spark toStringFix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/343.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #343 commit 37198fe79e98b3e123b8a5ddd6093dc7516513dc Author: Michael Armbrust mich...@databricks.com Date: 2014-04-07T07:06:54Z Fix toString for SchemaRDD NativeCommands. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/340#issuecomment-39701471 Ok I pushed another change for generators. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SQL] SPARK-1427 Fix toString for SchemaRDD Na...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/343#issuecomment-39701613 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SQL] SPARK-1427 Fix toString for SchemaRDD Na...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/343#issuecomment-39701598 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/340#issuecomment-39701605 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SQL] SPARK-1427 Fix toString for SchemaRDD Na...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/343#issuecomment-39701602 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/340#issuecomment-39701614 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1276] Add a HistoryServer to render per...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/204#issuecomment-39701639 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13837/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1276] Add a HistoryServer to render per...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/204#issuecomment-39701638 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SQL] SPARK-1371 Hash Aggregation Improvements
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/295#issuecomment-39701693 ok merged this. test failures are unrelated to this change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/340#issuecomment-39701865 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1357] [MLLIB] [WIP] Annotate developer ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/298#issuecomment-39701867 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1357] [MLLIB] [WIP] Annotate developer ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/298#issuecomment-39701875 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/340#issuecomment-39701876 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/340#issuecomment-39701962 The default eval method is for things like `UnresolvedAttribute` or `AttributeReference`, though we could probably special case the failure there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/340#issuecomment-39702050 Yea I think it makes sense to put UnsupportedOperations on those rather than having a generic implementation for eval. It is less error-prone that way. I can make that change after this one goes in. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SQL] SPARK-1371 Hash Aggregation Improvements
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/295 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/340#issuecomment-39703070 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SQL] SPARK-1427 Fix toString for SchemaRDD Na...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/343#issuecomment-39703707 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13838/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SQL] SPARK-1427 Fix toString for SchemaRDD Na...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/343#issuecomment-39703706 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/340#issuecomment-39704394 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13839/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/340#issuecomment-39704393 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13840/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1357] [MLLIB] [WIP] Annotate developer ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/298#issuecomment-39704392 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13841/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/340#issuecomment-39704389 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/340#issuecomment-39704390 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1357] [MLLIB] [WIP] Annotate developer ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/298#issuecomment-39704388 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] Fix SPARK-1413: Parquet messes up stdout...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/325#issuecomment-39706859 @AndreSchumacher [parquet.Log](https://github.com/Parquet/parquet-mr/blob/master/parquet-common/src/main/java/parquet/Log.java) has a static block ( add a default handler in case there is none ) The following code unset `Logger.getLogger(parquet)` val parquetLogger = java.util.logging.Logger.getLogger(parquet) parquetLogger.getHandlers.foreach(parquetLogger.removeHandler) if(parquetLogger.getLevel != null) parquetLogger.setLevel(null) if(!parquetLogger.getUseParentHandlers) parquetLogger.setUseParentHandlers(true) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SQL] SPARK-1427 Fix toString for SchemaRDD Na...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/343#issuecomment-39707631 merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/340#issuecomment-39708005 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/340#issuecomment-39708020 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/340#issuecomment-39710934 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13842/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/340#issuecomment-39710933 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SQL] SPARK-1427 Fix toString for SchemaRDD Na...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/343 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1127 Add spark-hbase.
Github user haosdent commented on the pull request: https://github.com/apache/spark/pull/194#issuecomment-39718976 Quiet confused about `InputStreamsSuite`. I pass it in my local machine. And this is a case from `streaming` module, I think my pull request didn't have any related code about this module. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1127 Add spark-hbase.
Github user haosdent commented on the pull request: https://github.com/apache/spark/pull/194#issuecomment-39719475 The error from `https://travis-ci.org/apache/spark/builds/22424147`. I would trigger travis again after others fix that bug on master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1403] Move the class loader creation ba...
Github user manku-timma commented on the pull request: https://github.com/apache/spark/pull/322#issuecomment-39720333 So the current fix looks fine? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1235: manage the DAGScheduler EventProce...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/186#issuecomment-39722501 Hi, any comments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1104: kill Process in workerThread of Ex...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/35#issuecomment-39726360 ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1387. Update build plugins, avoid plugin...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/291#issuecomment-39728466 Thanks @pwendell for finishing it off with the doc update -- would have done it if I weren't asleep here! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1417: Spark on Yarn - spark UI link from...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/344#issuecomment-39738667 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13843/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: In the contex of SPARK-1337: Make sure that al...
Github user dgshep commented on the pull request: https://github.com/apache/spark/pull/338#issuecomment-39748924 Done: https://issues.apache.org/jira/browse/SPARK-1432 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1357] [MLLIB] [WIP] Annotate developer ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/298#issuecomment-39750729 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1357] [MLLIB] [WIP] Annotate developer ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/298#issuecomment-39750745 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1357] [MLLIB] [WIP] Annotate developer ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/298#issuecomment-39755034 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1434] [MLLIB] change labelParser from a...
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/345 [SPARK-1434] [MLLIB] change labelParser from anonymous function to trait This is a patch to address @mateiz 's comment in https://github.com/apache/spark/pull/245 MLUtils#loadLibSVMData uses an anonymous function for the label parser. Java users won't like it. So I make a trait for LabelParser and provide two implementations: binary and multiclass. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mengxr/spark label-parser Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/345.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #345 commit 7f8eb364f216c0e4e776f115192acc01c5e3d0f0 Author: Xiangrui Meng m...@databricks.com Date: 2014-04-07T16:45:48Z change labelParser from annoymous function to trait --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1252. On YARN, use container-log4j.prope...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/148#issuecomment-39756329 Thanks Tom. I just rebased on master. Note the other option would be to change the conf/log4j.properties.template to be more like hadoops. I don't have an opinion on this, but happy to make the change if you think it's the right thing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1432: Make sure that all metadata fields...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/338#issuecomment-39756396 Thanks - merged this into master and 0.9 branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1396] Properly cleanup DAGScheduler on ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/305#issuecomment-39756422 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1252. On YARN, use container-log4j.prope...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/148#issuecomment-39756423 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1434] [MLLIB] change labelParser from a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/345#issuecomment-39756426 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1396] Properly cleanup DAGScheduler on ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/305#issuecomment-39756435 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1434] [MLLIB] change labelParser from a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/345#issuecomment-39756416 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1396] Properly cleanup DAGScheduler on ...
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/305#issuecomment-39756459 Thanks for reviewing @markhamstra -- made the changes you suggested and will merge later today if you don't see any other issues! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1252. On YARN, use container-log4j.prope...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/148#issuecomment-39756436 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Added validation check for parallelizing a seq
Github user bijaybisht commented on the pull request: https://github.com/apache/spark/pull/329#issuecomment-39758586 Sure, ill close this. I presume that the change for the NumericRange which results in a more balanced partitions (which is part of the fix) is also something that is not required. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/299#discussion_r11354857 --- Diff: conf/spark-env.sh.template --- @@ -1,19 +1,36 @@ #!/usr/bin/env bash -# This file contains environment variables required to run Spark. Copy it as -# spark-env.sh and edit that to configure Spark for your site. -# -# The following variables can be set in this file: +# This file is sourced when running various Spark classes. +# Copy it as spark-env.sh and edit that to configure Spark for your site. + +# Options read when launching programs locally with +# ./bin/spark-example or ./bin/spark-submit +# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node +# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program +# - SPARK_CLASSPATH, default classpath entries to append + +# Options read by executors and drivers running inside the cluster # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node +# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program --- End diff -- Looks like this is duplicated at the very end of the file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Added validation check for parallelizing a seq
Github user bijaybisht closed the pull request at: https://github.com/apache/spark/pull/329 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Make streaming/test pass.
GitHub user haosdent opened a pull request: https://github.com/apache/spark/pull/346 Make streaming/test pass. From this [commit][1], `SparkBuild.scala` add a new javaOptions `-Dsun.io.serialization.extendedDebugInfo=true` in Test. This make `org.apache.spark.streaming.InputStreamsSuite` failed. [1]: https://github.com/apache/spark/commit/accd0999f9cb6a449434d3fc5274dd469eeecab2 You can merge this pull request into a Git repository by running: $ git pull https://github.com/haosdent/spark travis-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/346.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #346 commit 99ce079e7c38f982d1dfd982aeeac2e4001be126 Author: haosdent haosd...@gmail.com Date: 2014-04-07T17:21:50Z Make streaming/test pass. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1432: Make sure that all metadata fields...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/338 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Remove extendedDebugInfo option in test build ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/346#issuecomment-39759094 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/299#discussion_r11355049 --- Diff: conf/spark-env.sh.template --- @@ -1,19 +1,36 @@ #!/usr/bin/env bash -# This file contains environment variables required to run Spark. Copy it as -# spark-env.sh and edit that to configure Spark for your site. -# -# The following variables can be set in this file: +# This file is sourced when running various Spark classes. +# Copy it as spark-env.sh and edit that to configure Spark for your site. + +# Options read when launching programs locally with +# ./bin/spark-example or ./bin/spark-submit +# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node +# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program +# - SPARK_CLASSPATH, default classpath entries to append + +# Options read by executors and drivers running inside the cluster # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node +# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program +# - SPARK_LOCAL_DIRS, shuffle directories to use on this node # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos -# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that -# we recommend setting app-wide options in the application's driver program. -# Examples of node-specific options : -Dspark.local.dir, GC options -# Examples of app-wide options : -Dspark.serializer -# -# If using the standalone deploy mode, you can also set variables for it here: +# - SPARK_CLASSPATH, default classpath entries to append + +# Options read in YARN client mode +# - SPARK_YARN_APP_JAR, Path to your applicationâs JAR file (required) +# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2) +# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1). +# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G) +# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb) +# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark) +# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: âdefaultâ) +# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job. +# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job. + +# Options for the daemons used in the standalone deploy mode: # - SPARK_MASTER_IP, to bind the master to a different IP address or hostname # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports +# - SPARK_MASTER_OPTS, to set config properties at the master (e.g -Dx=y) --- End diff -- What is the plan for SPARK_DAEMON_*? Do we plan to keep them around? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/299#discussion_r11355234 --- Diff: conf/spark-env.sh.template --- @@ -1,19 +1,36 @@ #!/usr/bin/env bash -# This file contains environment variables required to run Spark. Copy it as -# spark-env.sh and edit that to configure Spark for your site. -# -# The following variables can be set in this file: +# This file is sourced when running various Spark classes. +# Copy it as spark-env.sh and edit that to configure Spark for your site. + +# Options read when launching programs locally with +# ./bin/spark-example or ./bin/spark-submit +# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node +# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program +# - SPARK_CLASSPATH, default classpath entries to append + +# Options read by executors and drivers running inside the cluster # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node +# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program +# - SPARK_LOCAL_DIRS, shuffle directories to use on this node # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos -# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that -# we recommend setting app-wide options in the application's driver program. -# Examples of node-specific options : -Dspark.local.dir, GC options -# Examples of app-wide options : -Dspark.serializer -# -# If using the standalone deploy mode, you can also set variables for it here: +# - SPARK_CLASSPATH, default classpath entries to append + +# Options read in YARN client mode +# - SPARK_YARN_APP_JAR, Path to your applicationâs JAR file (required) +# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2) +# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1). +# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G) +# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb) +# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark) +# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: âdefaultâ) +# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job. +# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job. + +# Options for the daemons used in the standalone deploy mode: # - SPARK_MASTER_IP, to bind the master to a different IP address or hostname # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports +# - SPARK_MASTER_OPTS, to set config properties at the master (e.g -Dx=y) --- End diff -- Also, is SPARK_MASTER_MEMORY missing here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/299#discussion_r11355250 --- Diff: conf/spark-env.sh.template --- @@ -1,19 +1,36 @@ #!/usr/bin/env bash -# This file contains environment variables required to run Spark. Copy it as -# spark-env.sh and edit that to configure Spark for your site. -# -# The following variables can be set in this file: +# This file is sourced when running various Spark classes. +# Copy it as spark-env.sh and edit that to configure Spark for your site. + +# Options read when launching programs locally with +# ./bin/spark-example or ./bin/spark-submit +# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node +# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program +# - SPARK_CLASSPATH, default classpath entries to append + +# Options read by executors and drivers running inside the cluster # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node +# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program +# - SPARK_LOCAL_DIRS, shuffle directories to use on this node # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos -# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that -# we recommend setting app-wide options in the application's driver program. -# Examples of node-specific options : -Dspark.local.dir, GC options -# Examples of app-wide options : -Dspark.serializer -# -# If using the standalone deploy mode, you can also set variables for it here: +# - SPARK_CLASSPATH, default classpath entries to append + +# Options read in YARN client mode +# - SPARK_YARN_APP_JAR, Path to your applicationâs JAR file (required) +# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2) +# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1). +# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G) +# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb) +# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark) +# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: âdefaultâ) +# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job. +# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job. + +# Options for the daemons used in the standalone deploy mode: # - SPARK_MASTER_IP, to bind the master to a different IP address or hostname # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports +# - SPARK_MASTER_OPTS, to set config properties at the master (e.g -Dx=y) # - SPARK_WORKER_CORES, to set the number of cores to use on this machine # - SPARK_WORKER_MEMORY, to set how much memory to use (e.g. 1000m, 2g) --- End diff -- What happens if both SPARK_WORKER_MEMORY and SPARK_DAEMON_MEMORY are set, which one takes precedence? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/299#discussion_r11355327 --- Diff: conf/spark-env.sh.template --- @@ -1,19 +1,36 @@ #!/usr/bin/env bash -# This file contains environment variables required to run Spark. Copy it as -# spark-env.sh and edit that to configure Spark for your site. -# -# The following variables can be set in this file: +# This file is sourced when running various Spark classes. +# Copy it as spark-env.sh and edit that to configure Spark for your site. + +# Options read when launching programs locally with +# ./bin/spark-example or ./bin/spark-submit +# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node +# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program +# - SPARK_CLASSPATH, default classpath entries to append + +# Options read by executors and drivers running inside the cluster # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node +# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program +# - SPARK_LOCAL_DIRS, shuffle directories to use on this node # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos -# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that -# we recommend setting app-wide options in the application's driver program. -# Examples of node-specific options : -Dspark.local.dir, GC options -# Examples of app-wide options : -Dspark.serializer -# -# If using the standalone deploy mode, you can also set variables for it here: +# - SPARK_CLASSPATH, default classpath entries to append + +# Options read in YARN client mode +# - SPARK_YARN_APP_JAR, Path to your applicationâs JAR file (required) +# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2) +# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1). +# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G) +# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb) +# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark) +# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: âdefaultâ) +# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job. +# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job. + +# Options for the daemons used in the standalone deploy mode: # - SPARK_MASTER_IP, to bind the master to a different IP address or hostname # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports +# - SPARK_MASTER_OPTS, to set config properties at the master (e.g -Dx=y) # - SPARK_WORKER_CORES, to set the number of cores to use on this machine # - SPARK_WORKER_MEMORY, to set how much memory to use (e.g. 1000m, 2g) --- End diff -- those two are unrelated (unfortunately) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/299#discussion_r11355031 --- Diff: conf/spark-env.sh.template --- @@ -1,19 +1,36 @@ #!/usr/bin/env bash -# This file contains environment variables required to run Spark. Copy it as -# spark-env.sh and edit that to configure Spark for your site. -# -# The following variables can be set in this file: +# This file is sourced when running various Spark classes. +# Copy it as spark-env.sh and edit that to configure Spark for your site. + +# Options read when launching programs locally with +# ./bin/spark-example or ./bin/spark-submit +# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node +# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program +# - SPARK_CLASSPATH, default classpath entries to append + +# Options read by executors and drivers running inside the cluster # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node +# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program +# - SPARK_LOCAL_DIRS, shuffle directories to use on this node # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos -# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that -# we recommend setting app-wide options in the application's driver program. -# Examples of node-specific options : -Dspark.local.dir, GC options -# Examples of app-wide options : -Dspark.serializer -# -# If using the standalone deploy mode, you can also set variables for it here: +# - SPARK_CLASSPATH, default classpath entries to append + +# Options read in YARN client mode +# - SPARK_YARN_APP_JAR, Path to your applicationâs JAR file (required) +# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2) +# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1). +# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G) +# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb) --- End diff -- What happens if both SPARK_MASTER_MEMORY and SPARK_DAEMON_MEMORY are set, which one takes precedence? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Remove extendedDebugInfo option in test build ...
Github user haosdent commented on the pull request: https://github.com/apache/spark/pull/346#issuecomment-39759846 After merge this pull request #295 . Travis build failed. https://travis-ci.org/apache/spark/jobs/22424149 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/299#discussion_r11355356 --- Diff: conf/spark-env.sh.template --- @@ -1,19 +1,36 @@ #!/usr/bin/env bash -# This file contains environment variables required to run Spark. Copy it as -# spark-env.sh and edit that to configure Spark for your site. -# -# The following variables can be set in this file: +# This file is sourced when running various Spark classes. +# Copy it as spark-env.sh and edit that to configure Spark for your site. + +# Options read when launching programs locally with +# ./bin/spark-example or ./bin/spark-submit +# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node +# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program +# - SPARK_CLASSPATH, default classpath entries to append + +# Options read by executors and drivers running inside the cluster # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node +# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program +# - SPARK_LOCAL_DIRS, shuffle directories to use on this node # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos -# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that -# we recommend setting app-wide options in the application's driver program. -# Examples of node-specific options : -Dspark.local.dir, GC options -# Examples of app-wide options : -Dspark.serializer -# -# If using the standalone deploy mode, you can also set variables for it here: +# - SPARK_CLASSPATH, default classpath entries to append + +# Options read in YARN client mode +# - SPARK_YARN_APP_JAR, Path to your applicationâs JAR file (required) +# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2) +# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1). +# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G) +# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb) +# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark) +# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: âdefaultâ) +# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job. +# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job. + +# Options for the daemons used in the standalone deploy mode: # - SPARK_MASTER_IP, to bind the master to a different IP address or hostname # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports +# - SPARK_MASTER_OPTS, to set config properties at the master (e.g -Dx=y) --- End diff -- there isn't actually a SPARK_MASTER_MEMORY, SPARK_DAEMON_MEMORY is the only way to set this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Remove extendedDebugInfo option in test build ...
Github user haosdent commented on the pull request: https://github.com/apache/spark/pull/346#issuecomment-39759960 The complete failure log from travis: ``` [info] - actor input stream *** FAILED *** (8 seconds, 991 milliseconds) [info] 0 did not equal 9 (InputStreamsSuite.scala:193) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:318) [info] at org.apache.spark.streaming.InputStreamsSuite.newAssertionFailedException(InputStreamsSuite.scala:44) [info] at org.scalatest.Assertions$class.assert(Assertions.scala:401) [info] at org.apache.spark.streaming.InputStreamsSuite.assert(InputStreamsSuite.scala:44) [info] at org.apache.spark.streaming.InputStreamsSuite$$anonfun$3.apply$mcV$sp(InputStreamsSuite.scala:193) [info] at org.apache.spark.streaming.InputStreamsSuite$$anonfun$3.apply(InputStreamsSuite.scala:148) [info] at org.apache.spark.streaming.InputStreamsSuite$$anonfun$3.apply(InputStreamsSuite.scala:148) [info] at org.scalatest.FunSuite$$anon$1.apply(FunSuite.scala:1265) [info] at org.scalatest.Suite$class.withFixture(Suite.scala:1974) [info] at org.apache.spark.streaming.InputStreamsSuite.withFixture(InputStreamsSuite.scala:44) [info] at org.scalatest.FunSuite$class.invokeWithFixture$1(FunSuite.scala:1262) [info] at org.scalatest.FunSuite$$anonfun$runTest$1.apply(FunSuite.scala:1271) [info] at org.scalatest.FunSuite$$anonfun$runTest$1.apply(FunSuite.scala:1271) [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:198) [info] at org.scalatest.FunSuite$class.runTest(FunSuite.scala:1271) [info] at org.apache.spark.streaming.InputStreamsSuite.org$scalatest$BeforeAndAfter$$super$runTest(InputStreamsSuite.scala:44) [info] at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:171) [info] at org.apache.spark.streaming.InputStreamsSuite.runTest(InputStreamsSuite.scala:44) [info] at org.scalatest.FunSuite$$anonfun$runTests$1.apply(FunSuite.scala:1304) [info] at org.scalatest.FunSuite$$anonfun$runTests$1.apply(FunSuite.scala:1304) [info] at org.scalatest.SuperEngine$$anonfun$org$scalatest$SuperEngine$$runTestsInBranch$1.apply(Engine.scala:260) [info] at org.scalatest.SuperEngine$$anonfun$org$scalatest$SuperEngine$$runTestsInBranch$1.apply(Engine.scala:249) [info] at scala.collection.immutable.List.foreach(List.scala:318) [info] at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:249) [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:326) [info] at org.scalatest.FunSuite$class.runTests(FunSuite.scala:1304) [info] at org.apache.spark.streaming.InputStreamsSuite.runTests(InputStreamsSuite.scala:44) [info] at org.scalatest.Suite$class.run(Suite.scala:2303) [info] at org.apache.spark.streaming.InputStreamsSuite.org$scalatest$FunSuite$$super$run(InputStreamsSuite.scala:44) [info] at org.scalatest.FunSuite$$anonfun$run$1.apply(FunSuite.scala:1310) [info] at org.scalatest.FunSuite$$anonfun$run$1.apply(FunSuite.scala:1310) [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:362) [info] at org.scalatest.FunSuite$class.run(FunSuite.scala:1310) [info] at org.apache.spark.streaming.InputStreamsSuite.org$scalatest$BeforeAndAfter$$super$run(InputStreamsSuite.scala:44) [info] at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:208) [info] at org.apache.spark.streaming.InputStreamsSuite.run(InputStreamsSuite.scala:44) [info] at org.scalatest.tools.ScalaTestFramework$ScalaTestRunner.run(ScalaTestFramework.scala:214) [info] at sbt.RunnerWrapper$1.runRunner2(FrameworkWrapper.java:220) [info] at sbt.RunnerWrapper$1.execute(FrameworkWrapper.java:233) [info] at sbt.ForkMain$Run.runTest(ForkMain.java:243) [info] at sbt.ForkMain$Run.runTestSafe(ForkMain.java:214) [info] at sbt.ForkMain$Run.runTests(ForkMain.java:190) [info] at sbt.ForkMain$Run.run(ForkMain.java:257) [info] at sbt.ForkMain.main(ForkMain.java:99) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/299#discussion_r11355444 --- Diff: conf/spark-env.sh.template --- @@ -1,19 +1,36 @@ #!/usr/bin/env bash -# This file contains environment variables required to run Spark. Copy it as -# spark-env.sh and edit that to configure Spark for your site. -# -# The following variables can be set in this file: +# This file is sourced when running various Spark classes. +# Copy it as spark-env.sh and edit that to configure Spark for your site. + +# Options read when launching programs locally with +# ./bin/spark-example or ./bin/spark-submit +# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node +# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program +# - SPARK_CLASSPATH, default classpath entries to append + +# Options read by executors and drivers running inside the cluster # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node +# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program +# - SPARK_LOCAL_DIRS, shuffle directories to use on this node # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos -# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that -# we recommend setting app-wide options in the application's driver program. -# Examples of node-specific options : -Dspark.local.dir, GC options -# Examples of app-wide options : -Dspark.serializer -# -# If using the standalone deploy mode, you can also set variables for it here: +# - SPARK_CLASSPATH, default classpath entries to append + +# Options read in YARN client mode +# - SPARK_YARN_APP_JAR, Path to your applicationâs JAR file (required) +# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2) +# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1). +# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G) +# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb) +# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark) +# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: âdefaultâ) +# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job. +# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job. + +# Options for the daemons used in the standalone deploy mode: # - SPARK_MASTER_IP, to bind the master to a different IP address or hostname # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports +# - SPARK_MASTER_OPTS, to set config properties at the master (e.g -Dx=y) --- End diff -- Hm... so shouldn't we list that here as well? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Remove extendedDebugInfo option in test build ...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/346#issuecomment-39760126 That test is flakey. I believe I already filled a JIRA for it. On Apr 7, 2014 10:39 AM, haosdent notificati...@github.com wrote: The complete failure log from travis: [info] - actor input stream *** FAILED *** (8 seconds, 991 milliseconds) [info] 0 did not equal 9 (InputStreamsSuite.scala:193) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:318) [info] at org.apache.spark.streaming.InputStreamsSuite.newAssertionFailedException(InputStreamsSuite.scala:44) [info] at org.scalatest.Assertions$class.assert(Assertions.scala:401) [info] at org.apache.spark.streaming.InputStreamsSuite.assert(InputStreamsSuite.scala:44) [info] at org.apache.spark.streaming.InputStreamsSuite$$anonfun$3.apply$mcV$sp(InputStreamsSuite.scala:193) [info] at org.apache.spark.streaming.InputStreamsSuite$$anonfun$3.apply(InputStreamsSuite.scala:148) [info] at org.apache.spark.streaming.InputStreamsSuite$$anonfun$3.apply(InputStreamsSuite.scala:148) [info] at org.scalatest.FunSuite$$anon$1.apply(FunSuite.scala:1265) [info] at org.scalatest.Suite$class.withFixture(Suite.scala:1974) [info] at org.apache.spark.streaming.InputStreamsSuite.withFixture(InputStreamsSuite.scala:44) [info] at org.scalatest.FunSuite$class.invokeWithFixture$1(FunSuite.scala:1262) [info] at org.scalatest.FunSuite$$anonfun$runTest$1.apply(FunSuite.scala:1271) [info] at org.scalatest.FunSuite$$anonfun$runTest$1.apply(FunSuite.scala:1271) [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:198) [info] at org.scalatest.FunSuite$class.runTest(FunSuite.scala:1271) [info] at org.apache.spark.streaming.InputStreamsSuite.org$scalatest$BeforeAndAfter$$super$runTest(InputStreamsSuite.scala:44) [info] at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:171) [info] at org.apache.spark.streaming.InputStreamsSuite.runTest(InputStreamsSuite.scala:44) [info] at org.scalatest.FunSuite$$anonfun$runTests$1.apply(FunSuite.scala:1304) [info] at org.scalatest.FunSuite$$anonfun$runTests$1.apply(FunSuite.scala:1304) [info] at org.scalatest.SuperEngine$$anonfun$org$scalatest$SuperEngine$$runTestsInBranch$1.apply(Engine.scala:260) [info] at org.scalatest.SuperEngine$$anonfun$org$scalatest$SuperEngine$$runTestsInBranch$1.apply(Engine.scala:249) [info] at scala.collection.immutable.List.foreach(List.scala:318) [info] at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:249) [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:326) [info] at org.scalatest.FunSuite$class.runTests(FunSuite.scala:1304) [info] at org.apache.spark.streaming.InputStreamsSuite.runTests(InputStreamsSuite.scala:44) [info] at org.scalatest.Suite$class.run(Suite.scala:2303) [info] at org.apache.spark.streaming.InputStreamsSuite.org$scalatest$FunSuite$$super$run(InputStreamsSuite.scala:44) [info] at org.scalatest.FunSuite$$anonfun$run$1.apply(FunSuite.scala:1310) [info] at org.scalatest.FunSuite$$anonfun$run$1.apply(FunSuite.scala:1310) [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:362) [info] at org.scalatest.FunSuite$class.run(FunSuite.scala:1310) [info] at org.apache.spark.streaming.InputStreamsSuite.org$scalatest$BeforeAndAfter$$super$run(InputStreamsSuite.scala:44) [info] at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:208) [info] at org.apache.spark.streaming.InputStreamsSuite.run(InputStreamsSuite.scala:44) [info] at org.scalatest.tools.ScalaTestFramework$ScalaTestRunner.run(ScalaTestFramework.scala:214) [info] at sbt.RunnerWrapper$1.runRunner2(FrameworkWrapper.java:220) [info] at sbt.RunnerWrapper$1.execute(FrameworkWrapper.java:233) [info] at sbt.ForkMain$Run.runTest(ForkMain.java:243) [info] at sbt.ForkMain$Run.runTestSafe(ForkMain.java:214) [info] at sbt.ForkMain$Run.runTests(ForkMain.java:190) [info] at sbt.ForkMain$Run.run(ForkMain.java:257) [info] at sbt.ForkMain.main(ForkMain.java:99) -- Reply to this email directly or view it on GitHubhttps://github.com/apache/spark/pull/346#issuecomment-39759960 . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1434] [MLLIB] change labelParser from a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/345#issuecomment-39760445 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13845/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Remove extendedDebugInfo option in test build ...
Github user haosdent commented on the pull request: https://github.com/apache/spark/pull/346#issuecomment-39760589 I believe I already filled a JIRA for it. @marmbrus Could you post the JIRA link about it? If that test case in `InputStreamsSuite` is flakey, maybe we should fix it instead of remove this option. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1396] Properly cleanup DAGScheduler on ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/305#issuecomment-39760644 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1396] Properly cleanup DAGScheduler on ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/305#issuecomment-39760645 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13846/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/299#discussion_r11355670 --- Diff: conf/spark-env.sh.template --- @@ -1,19 +1,36 @@ #!/usr/bin/env bash -# This file contains environment variables required to run Spark. Copy it as -# spark-env.sh and edit that to configure Spark for your site. -# -# The following variables can be set in this file: +# This file is sourced when running various Spark classes. +# Copy it as spark-env.sh and edit that to configure Spark for your site. + +# Options read when launching programs locally with +# ./bin/spark-example or ./bin/spark-submit +# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node +# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program +# - SPARK_CLASSPATH, default classpath entries to append + +# Options read by executors and drivers running inside the cluster # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node +# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program +# - SPARK_LOCAL_DIRS, shuffle directories to use on this node # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos -# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that -# we recommend setting app-wide options in the application's driver program. -# Examples of node-specific options : -Dspark.local.dir, GC options -# Examples of app-wide options : -Dspark.serializer -# -# If using the standalone deploy mode, you can also set variables for it here: +# - SPARK_CLASSPATH, default classpath entries to append + +# Options read in YARN client mode +# - SPARK_YARN_APP_JAR, Path to your applicationâs JAR file (required) +# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2) +# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1). +# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G) +# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb) +# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark) +# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: âdefaultâ) +# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job. +# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job. + +# Options for the daemons used in the standalone deploy mode: # - SPARK_MASTER_IP, to bind the master to a different IP address or hostname # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports +# - SPARK_MASTER_OPTS, to set config properties at the master (e.g -Dx=y) # - SPARK_WORKER_CORES, to set the number of cores to use on this machine # - SPARK_WORKER_MEMORY, to set how much memory to use (e.g. 1000m, 2g) --- End diff -- Oh I see, the former is the total amount of memory for all executors on one machine, but the latter is the memory given to the Worker daemon thread that launches these executors... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1276] Add a HistoryServer to render per...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/204#issuecomment-39761686 This is ready for further review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/299#discussion_r11356283 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -221,6 +247,13 @@ object SparkSubmit { val url = localJarFile.getAbsoluteFile.toURI.toURL loader.addURL(url) } + + private def getDefaultProperties(file: File): Seq[(String, String)] = { +val inputStream = new FileInputStream(file) +val properties = new Properties() +properties.load(inputStream) +properties.stringPropertyNames().toSeq.map(k = (k, properties(k))) + } --- End diff -- Would be good to add a try catch here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1396] Properly cleanup DAGScheduler on ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/305#issuecomment-39762025 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1252. On YARN, use container-log4j.prope...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/148#issuecomment-39762310 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13847/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/299#discussion_r11356743 --- Diff: yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientClusterScheduler.scala --- @@ -29,7 +29,7 @@ import org.apache.spark.util.Utils */ private[spark] class YarnClientClusterScheduler(sc: SparkContext, conf: Configuration) extends TaskSchedulerImpl(sc) { - def this(sc: SparkContext) = this(sc, new Configuration()) + def this(sc: SparkContext) = this(sc, sc.getConf) --- End diff -- Maybe I'm missing something here, but doesn't sc.getConf return `SparkConf`, not a hadoop `Configuration`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: HOTFIX: Disable actor input stream.
GitHub user pwendell opened a pull request: https://github.com/apache/spark/pull/347 HOTFIX: Disable actor input stream. This test makes incorrect assumptions about the behavior of Thread.sleep(). You can merge this pull request into a Git repository by running: $ git pull https://github.com/pwendell/spark stream-tests Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/347.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #347 commit 10e09e0bd001b64ee06c9e8bb9d8f6bb7f111666 Author: Patrick Wendell pwend...@gmail.com Date: 2014-04-07T18:06:14Z HOTFIX: Disable actor input stream. This test makes incorrect assumptions about the behavior of Thread.sleep(). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: HOTFIX: Disable actor input stream test.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/347#issuecomment-39763890 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: HOTFIX: Disable actor input stream test.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/347#issuecomment-39763908 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r11357428 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerSlaveActor.scala --- @@ -26,14 +29,60 @@ import org.apache.spark.storage.BlockManagerMessages._ * this is used to remove blocks from the slave's BlockManager. */ private[storage] -class BlockManagerSlaveActor(blockManager: BlockManager) extends Actor { - override def receive = { +class BlockManagerSlaveActor( +blockManager: BlockManager, +mapOutputTracker: MapOutputTracker) + extends Actor with Logging { + + import context.dispatcher + // Operations that involve removing blocks may be slow and should be done asynchronously + override def receive = { case RemoveBlock(blockId) = - blockManager.removeBlock(blockId) + doAsync[Boolean](removing block, sender) { +blockManager.removeBlock(blockId) +true + } case RemoveRdd(rddId) = - val numBlocksRemoved = blockManager.removeRdd(rddId) - sender ! numBlocksRemoved + doAsync[Int](removing RDD, sender) { +blockManager.removeRdd(rddId) + } + +case RemoveShuffle(shuffleId) = + doAsync[Boolean](removing shuffle, sender) { +if (mapOutputTracker != null) { + mapOutputTracker.unregisterShuffle(shuffleId) +} +blockManager.shuffleBlockManager.removeShuffle(shuffleId) + } + +case RemoveBroadcast(broadcastId, tellMaster) = + doAsync[Int](removing RDD, sender) { +blockManager.removeBroadcast(broadcastId, tellMaster) + } + +case GetBlockStatus(blockId, _) = + sender ! blockManager.getStatus(blockId) + +case GetMatchingBlockIds(filter, _) = + sender ! blockManager.getMatchingBlockIds(filter) + } + + private def doAsync[T](actionMessage: String, responseActor: ActorRef)(body: = T) { +val future = Future { + logDebug(actionMessage) + val response = body + response --- End diff -- Why not just rename `body` to `response` in the first place? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r11357667 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -47,6 +47,7 @@ private[spark] class DiskBlockManager(shuffleManager: ShuffleBlockManager, rootD private val subDirs = Array.fill(localDirs.length)(new Array[File](subDirsPerLocalDir)) private var shuffleSender : ShuffleSender = null + --- End diff -- nit: this was probably not intended --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r11357471 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerSlaveActor.scala --- @@ -26,14 +29,60 @@ import org.apache.spark.storage.BlockManagerMessages._ * this is used to remove blocks from the slave's BlockManager. */ private[storage] -class BlockManagerSlaveActor(blockManager: BlockManager) extends Actor { - override def receive = { +class BlockManagerSlaveActor( +blockManager: BlockManager, +mapOutputTracker: MapOutputTracker) + extends Actor with Logging { + + import context.dispatcher + // Operations that involve removing blocks may be slow and should be done asynchronously + override def receive = { case RemoveBlock(blockId) = - blockManager.removeBlock(blockId) + doAsync[Boolean](removing block, sender) { +blockManager.removeBlock(blockId) +true + } case RemoveRdd(rddId) = - val numBlocksRemoved = blockManager.removeRdd(rddId) - sender ! numBlocksRemoved + doAsync[Int](removing RDD, sender) { +blockManager.removeRdd(rddId) + } + +case RemoveShuffle(shuffleId) = + doAsync[Boolean](removing shuffle, sender) { +if (mapOutputTracker != null) { + mapOutputTracker.unregisterShuffle(shuffleId) +} +blockManager.shuffleBlockManager.removeShuffle(shuffleId) + } + +case RemoveBroadcast(broadcastId, tellMaster) = + doAsync[Int](removing RDD, sender) { +blockManager.removeBroadcast(broadcastId, tellMaster) + } + +case GetBlockStatus(blockId, _) = + sender ! blockManager.getStatus(blockId) + +case GetMatchingBlockIds(filter, _) = + sender ! blockManager.getMatchingBlockIds(filter) + } + + private def doAsync[T](actionMessage: String, responseActor: ActorRef)(body: = T) { +val future = Future { + logDebug(actionMessage) + val response = body + response +} +future.onSuccess { case response = + logDebug(Done + actionMessage + , response is + response) --- End diff -- We probably want to include the RDD/shuffle/broadcast ID in the action message --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1216. Add a OneHotEncoder for handling c...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/304#issuecomment-39765951 Ahh, makes sense. Posted a revision that uses LocalSparkContext. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1216. Add a OneHotEncoder for handling c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/304#issuecomment-39766370 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1216. Add a OneHotEncoder for handling c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/304#issuecomment-39766383 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---