[GitHub] spark pull request: [SPARK-1429] Spark shell fails to start after ...

2014-04-07 Thread liancheng
Github user liancheng closed the pull request at:

https://github.com/apache/spark/pull/337


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1403] Move the class loader creation ba...

2014-04-07 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/322#issuecomment-39699481
  
Okay I think the broader issue is that 

@ueshin you said before that `Class.forName()` method with `null` for 3rd 
argument tries to load class from bootstrap class loader, which doesn't know 
the class `org.apache.spark.serializer.JavaSerializer`.

But I think in this case we'd expect the bootstrap classloader to know 
about `JavaSerializer` (this should be on the classpath when the executor 
starts), right? I'm still not sure why it would fail in this case. I don't see 
why `MesosExecutorDriver` could be on the java classpath but `JavaSerializer` 
isn't.

@manku-timma I looked more and the reason this doesn't work is that it 
looks like other parts of the code don't directly use the `classLoader` form 
the executor. I can look more tomorrow and see how we can best clean this up. 
The current approach works but it's a bit of a hack. There might be a nicer way 
to clean this up.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1403] Move the class loader creation ba...

2014-04-07 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/322#issuecomment-39699792
  
Ah I see, @ueshin you are right. It's the bootstrap class loader and it 
won't have any spark definitions. I was mixing this up with the system class 
loader.

```
./bin/spark-shell
scala Class.forName(org.apache.spark.serializer.JavaSerializer)
res7: Class[_] = class org.apache.spark.serializer.JavaSerializer

scala Class.forName(org.apache.spark.serializer.JavaSerializer, true, 
null)
java.lang.ClassNotFoundException: org/apache/spark/serializer/JavaSerializer
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at $iwC$$iwC$$iwC$$iwC.init(console:11)
at $iwC$$iwC$$iwC.init(console:16)
at $iwC$$iwC.init(console:18)
at $iwC.init(console:20)
```

We should definitely clean this up. The behavior we want in every case is 
to either use the context class loader (if present) and if not use the 
classloader that loads spark classes (e.g. the system classloader).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1276] Add a HistoryServer to render per...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/204#issuecomment-39699932
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1276] Add a HistoryServer to render per...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/204#issuecomment-39699926
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1276] Add a HistoryServer to render per...

2014-04-07 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/204#discussion_r11332376
  
--- Diff: sbin/start-history-server.sh ---
@@ -0,0 +1,46 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Starts the history server on the machine this script is executed on.
+#
+# Usage: start-history-server.sh base-log-dir [web-ui-port]
+#   Example: ./start-history-server.sh --dir /tmp/spark-events --port 18080
+#
+
+sbin=`dirname $0`
+sbin=`cd $sbin; pwd`
+
+if [ $# -lt 1 ]; then
+  echo Usage: ./start-history-server.sh base-log-dir [web-ui-port]
+  echo Example: ./start-history-server.sh /tmp/spark-events 18080
--- End diff --

In the latest commit, the history server reads from SPARK_DAEMON_MEMORY the 
same way Masters and Workers do. This may or may not be subject to change with 
#299.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark logger moving to use scala-logging

2014-04-07 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/332#issuecomment-39700055
  
Is performance really a problem here? I don't think we log stuff in 
critical path (we probably shouldn't). If we do, maybe we can just get rid of 
those logging.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1403] Move the class loader creation ba...

2014-04-07 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/322#issuecomment-39700076
  
@pwendell Yes, the bootstrap class loader knows only core Java APIs and the 
Spark classes (specified by `-cp` java command argument) are loaded by the 
system class loader.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-04-07 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/126#discussion_r11332469
  
--- Diff: core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala 
---
@@ -108,8 +108,7 @@ class JsonProtocolSuite extends FunSuite {
 // BlockId
 testBlockId(RDDBlockId(1, 2))
 testBlockId(ShuffleBlockId(1, 2, 3))
-testBlockId(BroadcastBlockId(1L))
-testBlockId(BroadcastHelperBlockId(BroadcastBlockId(2L), Spark))
+testBlockId(BroadcastBlockId(1L, Insert words of wisdom here))
--- End diff --

Hey @tdas, looks like the `` and `` here are causing the test failure (in 
case you haven't investigated it yet).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1252. On YARN, use container-log4j.prope...

2014-04-07 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/148#issuecomment-39700608
  
Right, (b) is different than the original intent of the PR.  The reason for 
copying Spark's log4j instead of Hadoop's was the concern brought up by 
@tgravescs earlier:

 The one downside to making the yarn one default is that we now get 
different looking logs if the user just uses the spark one. In the default most 
things go to syslog file, and if I just put in the conf/log4j.properties by 
copying the template I won't get a syslog file and most things will be in 
stdout right? 

This applies to the master too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark logger moving to use scala-logging

2014-04-07 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/332#issuecomment-39701040
  
I did not find the call that affect performance
It is possible that here:
Spark Catalyst `logger.debug`  is called many times
May be like this

[RuleExecutor.scala#L64](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala#L64)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] SPARK-1427 Fix toString for SchemaRDD Na...

2014-04-07 Thread marmbrus
GitHub user marmbrus opened a pull request:

https://github.com/apache/spark/pull/343

[SQL] SPARK-1427 Fix toString for SchemaRDD NativeCommands.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marmbrus/spark toStringFix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/343.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #343


commit 37198fe79e98b3e123b8a5ddd6093dc7516513dc
Author: Michael Armbrust mich...@databricks.com
Date:   2014-04-07T07:06:54Z

Fix toString for SchemaRDD NativeCommands.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...

2014-04-07 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/340#issuecomment-39701471
  
Ok I pushed another change for generators.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] SPARK-1427 Fix toString for SchemaRDD Na...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/343#issuecomment-39701613
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] SPARK-1427 Fix toString for SchemaRDD Na...

2014-04-07 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/343#issuecomment-39701598
  
lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/340#issuecomment-39701605
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] SPARK-1427 Fix toString for SchemaRDD Na...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/343#issuecomment-39701602
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/340#issuecomment-39701614
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1276] Add a HistoryServer to render per...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/204#issuecomment-39701639
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13837/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1276] Add a HistoryServer to render per...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/204#issuecomment-39701638
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] SPARK-1371 Hash Aggregation Improvements

2014-04-07 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/295#issuecomment-39701693
  
ok merged this. test failures are unrelated to this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/340#issuecomment-39701865
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1357] [MLLIB] [WIP] Annotate developer ...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/298#issuecomment-39701867
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1357] [MLLIB] [WIP] Annotate developer ...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/298#issuecomment-39701875
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/340#issuecomment-39701876
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...

2014-04-07 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/340#issuecomment-39701962
  
The default eval method is for things like `UnresolvedAttribute` or 
`AttributeReference`, though we could probably special case the failure there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...

2014-04-07 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/340#issuecomment-39702050
  
Yea I think it makes sense to put UnsupportedOperations on those rather 
than having a generic implementation for eval. It is less error-prone that way. 
I can make that change after this one goes in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] SPARK-1371 Hash Aggregation Improvements

2014-04-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/295


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...

2014-04-07 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/340#issuecomment-39703070
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] SPARK-1427 Fix toString for SchemaRDD Na...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/343#issuecomment-39703707
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13838/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] SPARK-1427 Fix toString for SchemaRDD Na...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/343#issuecomment-39703706
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/340#issuecomment-39704394
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13839/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/340#issuecomment-39704393
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13840/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1357] [MLLIB] [WIP] Annotate developer ...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/298#issuecomment-39704392
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13841/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/340#issuecomment-39704389
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/340#issuecomment-39704390
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1357] [MLLIB] [WIP] Annotate developer ...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/298#issuecomment-39704388
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] Fix SPARK-1413: Parquet messes up stdout...

2014-04-07 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/325#issuecomment-39706859
  
@AndreSchumacher 

[parquet.Log](https://github.com/Parquet/parquet-mr/blob/master/parquet-common/src/main/java/parquet/Log.java)
  has a static block ( add a default handler in case there is none ) 
The following code unset `Logger.getLogger(parquet)`

val parquetLogger = java.util.logging.Logger.getLogger(parquet)
parquetLogger.getHandlers.foreach(parquetLogger.removeHandler)
if(parquetLogger.getLevel != null) parquetLogger.setLevel(null)
if(!parquetLogger.getUseParentHandlers) 
parquetLogger.setUseParentHandlers(true)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] SPARK-1427 Fix toString for SchemaRDD Na...

2014-04-07 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/343#issuecomment-39707631
  
merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/340#issuecomment-39708005
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/340#issuecomment-39708020
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/340#issuecomment-39710934
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13842/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [sql] Rename Expression.apply to eval for bett...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/340#issuecomment-39710933
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] SPARK-1427 Fix toString for SchemaRDD Na...

2014-04-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/343


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1127 Add spark-hbase.

2014-04-07 Thread haosdent
Github user haosdent commented on the pull request:

https://github.com/apache/spark/pull/194#issuecomment-39718976
  
Quiet confused about `InputStreamsSuite`. I pass it in my local machine. 
And this is a case from `streaming` module, I think my pull request didn't have 
any related code about this module.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1127 Add spark-hbase.

2014-04-07 Thread haosdent
Github user haosdent commented on the pull request:

https://github.com/apache/spark/pull/194#issuecomment-39719475
  
The error from `https://travis-ci.org/apache/spark/builds/22424147`. I 
would trigger travis again after others fix that bug on master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1403] Move the class loader creation ba...

2014-04-07 Thread manku-timma
Github user manku-timma commented on the pull request:

https://github.com/apache/spark/pull/322#issuecomment-39720333
  
So the current fix looks fine?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1235: manage the DAGScheduler EventProce...

2014-04-07 Thread CodingCat
Github user CodingCat commented on the pull request:

https://github.com/apache/spark/pull/186#issuecomment-39722501
  
Hi, any comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1104: kill Process in workerThread of Ex...

2014-04-07 Thread CodingCat
Github user CodingCat commented on the pull request:

https://github.com/apache/spark/pull/35#issuecomment-39726360
  
ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1387. Update build plugins, avoid plugin...

2014-04-07 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/291#issuecomment-39728466
  
Thanks @pwendell for finishing it off with the doc update -- would have 
done it if I weren't asleep here!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1417: Spark on Yarn - spark UI link from...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/344#issuecomment-39738667
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13843/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: In the contex of SPARK-1337: Make sure that al...

2014-04-07 Thread dgshep
Github user dgshep commented on the pull request:

https://github.com/apache/spark/pull/338#issuecomment-39748924
  
Done: https://issues.apache.org/jira/browse/SPARK-1432


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1357] [MLLIB] [WIP] Annotate developer ...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/298#issuecomment-39750729
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1357] [MLLIB] [WIP] Annotate developer ...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/298#issuecomment-39750745
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1357] [MLLIB] [WIP] Annotate developer ...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/298#issuecomment-39755034
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1434] [MLLIB] change labelParser from a...

2014-04-07 Thread mengxr
GitHub user mengxr opened a pull request:

https://github.com/apache/spark/pull/345

[SPARK-1434] [MLLIB] change labelParser from anonymous function to trait

This is a patch to address @mateiz 's comment in 
https://github.com/apache/spark/pull/245

MLUtils#loadLibSVMData uses an anonymous function for the label parser. 
Java users won't like it. So I make a trait for LabelParser and provide two 
implementations: binary and multiclass.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mengxr/spark label-parser

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/345.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #345


commit 7f8eb364f216c0e4e776f115192acc01c5e3d0f0
Author: Xiangrui Meng m...@databricks.com
Date:   2014-04-07T16:45:48Z

change labelParser from annoymous function to trait




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1252. On YARN, use container-log4j.prope...

2014-04-07 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/148#issuecomment-39756329
  
Thanks Tom.  I just rebased on master.

 Note the other option would be to change the 
conf/log4j.properties.template to be more like hadoops. 
I don't have an opinion on this, but happy to make the change if you think 
it's the right thing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1432: Make sure that all metadata fields...

2014-04-07 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/338#issuecomment-39756396
  
Thanks - merged this into master and 0.9 branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1396] Properly cleanup DAGScheduler on ...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/305#issuecomment-39756422
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1252. On YARN, use container-log4j.prope...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/148#issuecomment-39756423
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1434] [MLLIB] change labelParser from a...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/345#issuecomment-39756426
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1396] Properly cleanup DAGScheduler on ...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/305#issuecomment-39756435
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1434] [MLLIB] change labelParser from a...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/345#issuecomment-39756416
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1396] Properly cleanup DAGScheduler on ...

2014-04-07 Thread kayousterhout
Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/305#issuecomment-39756459
  
Thanks for reviewing @markhamstra -- made the changes you suggested and 
will merge later today if you don't see any other issues!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1252. On YARN, use container-log4j.prope...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/148#issuecomment-39756436
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Added validation check for parallelizing a seq

2014-04-07 Thread bijaybisht
Github user bijaybisht commented on the pull request:

https://github.com/apache/spark/pull/329#issuecomment-39758586
  
Sure,  ill close this. I presume that the change for the NumericRange which 
results in a more balanced partitions (which is part of the fix) is also 
something that is not required.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

2014-04-07 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/299#discussion_r11354857
  
--- Diff: conf/spark-env.sh.template ---
@@ -1,19 +1,36 @@
 #!/usr/bin/env bash
 
-# This file contains environment variables required to run Spark. Copy it 
as
-# spark-env.sh and edit that to configure Spark for your site.
-#
-# The following variables can be set in this file:
+# This file is sourced when running various Spark classes. 
+# Copy it as spark-env.sh and edit that to configure Spark for your site.
+
+# Options read when launching programs locally with 
+# ./bin/spark-example or ./bin/spark-submit
+# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
+# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
+# - SPARK_CLASSPATH, default classpath entries to append
+
+# Options read by executors and drivers running inside the cluster
 # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
+# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
--- End diff --

Looks like this is duplicated at the very end of the file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Added validation check for parallelizing a seq

2014-04-07 Thread bijaybisht
Github user bijaybisht closed the pull request at:

https://github.com/apache/spark/pull/329


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Make streaming/test pass.

2014-04-07 Thread haosdent
GitHub user haosdent opened a pull request:

https://github.com/apache/spark/pull/346

Make streaming/test pass.

From this [commit][1], `SparkBuild.scala` add a new javaOptions 
`-Dsun.io.serialization.extendedDebugInfo=true` in Test. This make 
`org.apache.spark.streaming.InputStreamsSuite` failed.


[1]: 
https://github.com/apache/spark/commit/accd0999f9cb6a449434d3fc5274dd469eeecab2

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/haosdent/spark travis-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/346.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #346


commit 99ce079e7c38f982d1dfd982aeeac2e4001be126
Author: haosdent haosd...@gmail.com
Date:   2014-04-07T17:21:50Z

Make streaming/test pass.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1432: Make sure that all metadata fields...

2014-04-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/338


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Remove extendedDebugInfo option in test build ...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/346#issuecomment-39759094
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

2014-04-07 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/299#discussion_r11355049
  
--- Diff: conf/spark-env.sh.template ---
@@ -1,19 +1,36 @@
 #!/usr/bin/env bash
 
-# This file contains environment variables required to run Spark. Copy it 
as
-# spark-env.sh and edit that to configure Spark for your site.
-#
-# The following variables can be set in this file:
+# This file is sourced when running various Spark classes. 
+# Copy it as spark-env.sh and edit that to configure Spark for your site.
+
+# Options read when launching programs locally with 
+# ./bin/spark-example or ./bin/spark-submit
+# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
+# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
+# - SPARK_CLASSPATH, default classpath entries to append
+
+# Options read by executors and drivers running inside the cluster
 # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
+# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
+# - SPARK_LOCAL_DIRS, shuffle directories to use on this node
 # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
-# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
-#   we recommend setting app-wide options in the application's driver 
program.
-# Examples of node-specific options : -Dspark.local.dir, GC options
-# Examples of app-wide options : -Dspark.serializer
-#
-# If using the standalone deploy mode, you can also set variables for it 
here:
+# - SPARK_CLASSPATH, default classpath entries to append
+
+# Options read in YARN client mode
+# - SPARK_YARN_APP_JAR, Path to your application’s JAR file (required)
+# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2)
+# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1).
+# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
+# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 
Mb)
+# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)
+# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests 
(Default: ‘default’)
+# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed 
with the job.
+# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be 
distributed with the job.
+
+# Options for the daemons used in the standalone deploy mode:
 # - SPARK_MASTER_IP, to bind the master to a different IP address or 
hostname
 # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports
+# - SPARK_MASTER_OPTS, to set config properties at the master (e.g -Dx=y)
--- End diff --

What is the plan for SPARK_DAEMON_*? Do we plan to keep them around?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

2014-04-07 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/299#discussion_r11355234
  
--- Diff: conf/spark-env.sh.template ---
@@ -1,19 +1,36 @@
 #!/usr/bin/env bash
 
-# This file contains environment variables required to run Spark. Copy it 
as
-# spark-env.sh and edit that to configure Spark for your site.
-#
-# The following variables can be set in this file:
+# This file is sourced when running various Spark classes. 
+# Copy it as spark-env.sh and edit that to configure Spark for your site.
+
+# Options read when launching programs locally with 
+# ./bin/spark-example or ./bin/spark-submit
+# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
+# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
+# - SPARK_CLASSPATH, default classpath entries to append
+
+# Options read by executors and drivers running inside the cluster
 # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
+# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
+# - SPARK_LOCAL_DIRS, shuffle directories to use on this node
 # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
-# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
-#   we recommend setting app-wide options in the application's driver 
program.
-# Examples of node-specific options : -Dspark.local.dir, GC options
-# Examples of app-wide options : -Dspark.serializer
-#
-# If using the standalone deploy mode, you can also set variables for it 
here:
+# - SPARK_CLASSPATH, default classpath entries to append
+
+# Options read in YARN client mode
+# - SPARK_YARN_APP_JAR, Path to your application’s JAR file (required)
+# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2)
+# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1).
+# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
+# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 
Mb)
+# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)
+# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests 
(Default: ‘default’)
+# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed 
with the job.
+# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be 
distributed with the job.
+
+# Options for the daemons used in the standalone deploy mode:
 # - SPARK_MASTER_IP, to bind the master to a different IP address or 
hostname
 # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports
+# - SPARK_MASTER_OPTS, to set config properties at the master (e.g -Dx=y)
--- End diff --

Also, is SPARK_MASTER_MEMORY missing here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

2014-04-07 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/299#discussion_r11355250
  
--- Diff: conf/spark-env.sh.template ---
@@ -1,19 +1,36 @@
 #!/usr/bin/env bash
 
-# This file contains environment variables required to run Spark. Copy it 
as
-# spark-env.sh and edit that to configure Spark for your site.
-#
-# The following variables can be set in this file:
+# This file is sourced when running various Spark classes. 
+# Copy it as spark-env.sh and edit that to configure Spark for your site.
+
+# Options read when launching programs locally with 
+# ./bin/spark-example or ./bin/spark-submit
+# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
+# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
+# - SPARK_CLASSPATH, default classpath entries to append
+
+# Options read by executors and drivers running inside the cluster
 # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
+# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
+# - SPARK_LOCAL_DIRS, shuffle directories to use on this node
 # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
-# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
-#   we recommend setting app-wide options in the application's driver 
program.
-# Examples of node-specific options : -Dspark.local.dir, GC options
-# Examples of app-wide options : -Dspark.serializer
-#
-# If using the standalone deploy mode, you can also set variables for it 
here:
+# - SPARK_CLASSPATH, default classpath entries to append
+
+# Options read in YARN client mode
+# - SPARK_YARN_APP_JAR, Path to your application’s JAR file (required)
+# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2)
+# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1).
+# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
+# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 
Mb)
+# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)
+# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests 
(Default: ‘default’)
+# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed 
with the job.
+# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be 
distributed with the job.
+
+# Options for the daemons used in the standalone deploy mode:
 # - SPARK_MASTER_IP, to bind the master to a different IP address or 
hostname
 # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports
+# - SPARK_MASTER_OPTS, to set config properties at the master (e.g -Dx=y)
 # - SPARK_WORKER_CORES, to set the number of cores to use on this machine
 # - SPARK_WORKER_MEMORY, to set how much memory to use (e.g. 1000m, 2g)
--- End diff --

What happens if both SPARK_WORKER_MEMORY and SPARK_DAEMON_MEMORY are set, 
which one takes precedence?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

2014-04-07 Thread aarondav
Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/299#discussion_r11355327
  
--- Diff: conf/spark-env.sh.template ---
@@ -1,19 +1,36 @@
 #!/usr/bin/env bash
 
-# This file contains environment variables required to run Spark. Copy it 
as
-# spark-env.sh and edit that to configure Spark for your site.
-#
-# The following variables can be set in this file:
+# This file is sourced when running various Spark classes. 
+# Copy it as spark-env.sh and edit that to configure Spark for your site.
+
+# Options read when launching programs locally with 
+# ./bin/spark-example or ./bin/spark-submit
+# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
+# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
+# - SPARK_CLASSPATH, default classpath entries to append
+
+# Options read by executors and drivers running inside the cluster
 # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
+# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
+# - SPARK_LOCAL_DIRS, shuffle directories to use on this node
 # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
-# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
-#   we recommend setting app-wide options in the application's driver 
program.
-# Examples of node-specific options : -Dspark.local.dir, GC options
-# Examples of app-wide options : -Dspark.serializer
-#
-# If using the standalone deploy mode, you can also set variables for it 
here:
+# - SPARK_CLASSPATH, default classpath entries to append
+
+# Options read in YARN client mode
+# - SPARK_YARN_APP_JAR, Path to your application’s JAR file (required)
+# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2)
+# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1).
+# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
+# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 
Mb)
+# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)
+# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests 
(Default: ‘default’)
+# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed 
with the job.
+# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be 
distributed with the job.
+
+# Options for the daemons used in the standalone deploy mode:
 # - SPARK_MASTER_IP, to bind the master to a different IP address or 
hostname
 # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports
+# - SPARK_MASTER_OPTS, to set config properties at the master (e.g -Dx=y)
 # - SPARK_WORKER_CORES, to set the number of cores to use on this machine
 # - SPARK_WORKER_MEMORY, to set how much memory to use (e.g. 1000m, 2g)
--- End diff --

those two are unrelated (unfortunately)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

2014-04-07 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/299#discussion_r11355031
  
--- Diff: conf/spark-env.sh.template ---
@@ -1,19 +1,36 @@
 #!/usr/bin/env bash
 
-# This file contains environment variables required to run Spark. Copy it 
as
-# spark-env.sh and edit that to configure Spark for your site.
-#
-# The following variables can be set in this file:
+# This file is sourced when running various Spark classes. 
+# Copy it as spark-env.sh and edit that to configure Spark for your site.
+
+# Options read when launching programs locally with 
+# ./bin/spark-example or ./bin/spark-submit
+# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
+# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
+# - SPARK_CLASSPATH, default classpath entries to append
+
+# Options read by executors and drivers running inside the cluster
 # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
+# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
+# - SPARK_LOCAL_DIRS, shuffle directories to use on this node
 # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
-# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
-#   we recommend setting app-wide options in the application's driver 
program.
-# Examples of node-specific options : -Dspark.local.dir, GC options
-# Examples of app-wide options : -Dspark.serializer
-#
-# If using the standalone deploy mode, you can also set variables for it 
here:
+# - SPARK_CLASSPATH, default classpath entries to append
+
+# Options read in YARN client mode
+# - SPARK_YARN_APP_JAR, Path to your application’s JAR file (required)
+# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2)
+# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1).
+# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
+# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 
Mb)
--- End diff --

What happens if both SPARK_MASTER_MEMORY and SPARK_DAEMON_MEMORY are set, 
which one takes precedence?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Remove extendedDebugInfo option in test build ...

2014-04-07 Thread haosdent
Github user haosdent commented on the pull request:

https://github.com/apache/spark/pull/346#issuecomment-39759846
  
After merge this pull request #295 . Travis build failed. 
https://travis-ci.org/apache/spark/jobs/22424149


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

2014-04-07 Thread aarondav
Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/299#discussion_r11355356
  
--- Diff: conf/spark-env.sh.template ---
@@ -1,19 +1,36 @@
 #!/usr/bin/env bash
 
-# This file contains environment variables required to run Spark. Copy it 
as
-# spark-env.sh and edit that to configure Spark for your site.
-#
-# The following variables can be set in this file:
+# This file is sourced when running various Spark classes. 
+# Copy it as spark-env.sh and edit that to configure Spark for your site.
+
+# Options read when launching programs locally with 
+# ./bin/spark-example or ./bin/spark-submit
+# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
+# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
+# - SPARK_CLASSPATH, default classpath entries to append
+
+# Options read by executors and drivers running inside the cluster
 # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
+# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
+# - SPARK_LOCAL_DIRS, shuffle directories to use on this node
 # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
-# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
-#   we recommend setting app-wide options in the application's driver 
program.
-# Examples of node-specific options : -Dspark.local.dir, GC options
-# Examples of app-wide options : -Dspark.serializer
-#
-# If using the standalone deploy mode, you can also set variables for it 
here:
+# - SPARK_CLASSPATH, default classpath entries to append
+
+# Options read in YARN client mode
+# - SPARK_YARN_APP_JAR, Path to your application’s JAR file (required)
+# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2)
+# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1).
+# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
+# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 
Mb)
+# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)
+# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests 
(Default: ‘default’)
+# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed 
with the job.
+# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be 
distributed with the job.
+
+# Options for the daemons used in the standalone deploy mode:
 # - SPARK_MASTER_IP, to bind the master to a different IP address or 
hostname
 # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports
+# - SPARK_MASTER_OPTS, to set config properties at the master (e.g -Dx=y)
--- End diff --

there isn't actually a SPARK_MASTER_MEMORY, SPARK_DAEMON_MEMORY is the only 
way to set this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Remove extendedDebugInfo option in test build ...

2014-04-07 Thread haosdent
Github user haosdent commented on the pull request:

https://github.com/apache/spark/pull/346#issuecomment-39759960
  
The complete failure log from travis:

```
[info] - actor input stream *** FAILED *** (8 seconds, 991 milliseconds)
[info]   0 did not equal 9 (InputStreamsSuite.scala:193)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at 
org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:318)
[info]   at 
org.apache.spark.streaming.InputStreamsSuite.newAssertionFailedException(InputStreamsSuite.scala:44)
[info]   at org.scalatest.Assertions$class.assert(Assertions.scala:401)
[info]   at 
org.apache.spark.streaming.InputStreamsSuite.assert(InputStreamsSuite.scala:44)
[info]   at 
org.apache.spark.streaming.InputStreamsSuite$$anonfun$3.apply$mcV$sp(InputStreamsSuite.scala:193)
[info]   at 
org.apache.spark.streaming.InputStreamsSuite$$anonfun$3.apply(InputStreamsSuite.scala:148)
[info]   at 
org.apache.spark.streaming.InputStreamsSuite$$anonfun$3.apply(InputStreamsSuite.scala:148)
[info]   at org.scalatest.FunSuite$$anon$1.apply(FunSuite.scala:1265)
[info]   at org.scalatest.Suite$class.withFixture(Suite.scala:1974)
[info]   at 
org.apache.spark.streaming.InputStreamsSuite.withFixture(InputStreamsSuite.scala:44)
[info]   at 
org.scalatest.FunSuite$class.invokeWithFixture$1(FunSuite.scala:1262)
[info]   at 
org.scalatest.FunSuite$$anonfun$runTest$1.apply(FunSuite.scala:1271)
[info]   at 
org.scalatest.FunSuite$$anonfun$runTest$1.apply(FunSuite.scala:1271)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:198)
[info]   at org.scalatest.FunSuite$class.runTest(FunSuite.scala:1271)
[info]   at 
org.apache.spark.streaming.InputStreamsSuite.org$scalatest$BeforeAndAfter$$super$runTest(InputStreamsSuite.scala:44)
[info]   at 
org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:171)
[info]   at 
org.apache.spark.streaming.InputStreamsSuite.runTest(InputStreamsSuite.scala:44)
[info]   at 
org.scalatest.FunSuite$$anonfun$runTests$1.apply(FunSuite.scala:1304)
[info]   at 
org.scalatest.FunSuite$$anonfun$runTests$1.apply(FunSuite.scala:1304)
[info]   at 
org.scalatest.SuperEngine$$anonfun$org$scalatest$SuperEngine$$runTestsInBranch$1.apply(Engine.scala:260)
[info]   at 
org.scalatest.SuperEngine$$anonfun$org$scalatest$SuperEngine$$runTestsInBranch$1.apply(Engine.scala:249)
[info]   at scala.collection.immutable.List.foreach(List.scala:318)
[info]   at 
org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:249)
[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:326)
[info]   at org.scalatest.FunSuite$class.runTests(FunSuite.scala:1304)
[info]   at 
org.apache.spark.streaming.InputStreamsSuite.runTests(InputStreamsSuite.scala:44)
[info]   at org.scalatest.Suite$class.run(Suite.scala:2303)
[info]   at 
org.apache.spark.streaming.InputStreamsSuite.org$scalatest$FunSuite$$super$run(InputStreamsSuite.scala:44)
[info]   at org.scalatest.FunSuite$$anonfun$run$1.apply(FunSuite.scala:1310)
[info]   at org.scalatest.FunSuite$$anonfun$run$1.apply(FunSuite.scala:1310)
[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:362)
[info]   at org.scalatest.FunSuite$class.run(FunSuite.scala:1310)
[info]   at 
org.apache.spark.streaming.InputStreamsSuite.org$scalatest$BeforeAndAfter$$super$run(InputStreamsSuite.scala:44)
[info]   at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:208)
[info]   at 
org.apache.spark.streaming.InputStreamsSuite.run(InputStreamsSuite.scala:44)
[info]   at 
org.scalatest.tools.ScalaTestFramework$ScalaTestRunner.run(ScalaTestFramework.scala:214)
[info]   at sbt.RunnerWrapper$1.runRunner2(FrameworkWrapper.java:220)
[info]   at sbt.RunnerWrapper$1.execute(FrameworkWrapper.java:233)
[info]   at sbt.ForkMain$Run.runTest(ForkMain.java:243)
[info]   at sbt.ForkMain$Run.runTestSafe(ForkMain.java:214)
[info]   at sbt.ForkMain$Run.runTests(ForkMain.java:190)
[info]   at sbt.ForkMain$Run.run(ForkMain.java:257)
[info]   at sbt.ForkMain.main(ForkMain.java:99)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

2014-04-07 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/299#discussion_r11355444
  
--- Diff: conf/spark-env.sh.template ---
@@ -1,19 +1,36 @@
 #!/usr/bin/env bash
 
-# This file contains environment variables required to run Spark. Copy it 
as
-# spark-env.sh and edit that to configure Spark for your site.
-#
-# The following variables can be set in this file:
+# This file is sourced when running various Spark classes. 
+# Copy it as spark-env.sh and edit that to configure Spark for your site.
+
+# Options read when launching programs locally with 
+# ./bin/spark-example or ./bin/spark-submit
+# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
+# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
+# - SPARK_CLASSPATH, default classpath entries to append
+
+# Options read by executors and drivers running inside the cluster
 # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
+# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
+# - SPARK_LOCAL_DIRS, shuffle directories to use on this node
 # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
-# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
-#   we recommend setting app-wide options in the application's driver 
program.
-# Examples of node-specific options : -Dspark.local.dir, GC options
-# Examples of app-wide options : -Dspark.serializer
-#
-# If using the standalone deploy mode, you can also set variables for it 
here:
+# - SPARK_CLASSPATH, default classpath entries to append
+
+# Options read in YARN client mode
+# - SPARK_YARN_APP_JAR, Path to your application’s JAR file (required)
+# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2)
+# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1).
+# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
+# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 
Mb)
+# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)
+# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests 
(Default: ‘default’)
+# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed 
with the job.
+# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be 
distributed with the job.
+
+# Options for the daemons used in the standalone deploy mode:
 # - SPARK_MASTER_IP, to bind the master to a different IP address or 
hostname
 # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports
+# - SPARK_MASTER_OPTS, to set config properties at the master (e.g -Dx=y)
--- End diff --

Hm... so shouldn't we list that here as well?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Remove extendedDebugInfo option in test build ...

2014-04-07 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/346#issuecomment-39760126
  
That test is flakey.  I believe I already filled a JIRA for it.
On Apr 7, 2014 10:39 AM, haosdent notificati...@github.com wrote:

 The complete failure log from travis:

 [info] - actor input stream *** FAILED *** (8 seconds, 991 milliseconds)
 [info]   0 did not equal 9 (InputStreamsSuite.scala:193)
 [info]   org.scalatest.exceptions.TestFailedException:
 [info]   at 
org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:318)
 [info]   at 
org.apache.spark.streaming.InputStreamsSuite.newAssertionFailedException(InputStreamsSuite.scala:44)
 [info]   at org.scalatest.Assertions$class.assert(Assertions.scala:401)
 [info]   at 
org.apache.spark.streaming.InputStreamsSuite.assert(InputStreamsSuite.scala:44)
 [info]   at 
org.apache.spark.streaming.InputStreamsSuite$$anonfun$3.apply$mcV$sp(InputStreamsSuite.scala:193)
 [info]   at 
org.apache.spark.streaming.InputStreamsSuite$$anonfun$3.apply(InputStreamsSuite.scala:148)
 [info]   at 
org.apache.spark.streaming.InputStreamsSuite$$anonfun$3.apply(InputStreamsSuite.scala:148)
 [info]   at org.scalatest.FunSuite$$anon$1.apply(FunSuite.scala:1265)
 [info]   at org.scalatest.Suite$class.withFixture(Suite.scala:1974)
 [info]   at 
org.apache.spark.streaming.InputStreamsSuite.withFixture(InputStreamsSuite.scala:44)
 [info]   at 
org.scalatest.FunSuite$class.invokeWithFixture$1(FunSuite.scala:1262)
 [info]   at 
org.scalatest.FunSuite$$anonfun$runTest$1.apply(FunSuite.scala:1271)
 [info]   at 
org.scalatest.FunSuite$$anonfun$runTest$1.apply(FunSuite.scala:1271)
 [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:198)
 [info]   at org.scalatest.FunSuite$class.runTest(FunSuite.scala:1271)
 [info]   at 
org.apache.spark.streaming.InputStreamsSuite.org$scalatest$BeforeAndAfter$$super$runTest(InputStreamsSuite.scala:44)
 [info]   at 
org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:171)
 [info]   at 
org.apache.spark.streaming.InputStreamsSuite.runTest(InputStreamsSuite.scala:44)
 [info]   at 
org.scalatest.FunSuite$$anonfun$runTests$1.apply(FunSuite.scala:1304)
 [info]   at 
org.scalatest.FunSuite$$anonfun$runTests$1.apply(FunSuite.scala:1304)
 [info]   at 
org.scalatest.SuperEngine$$anonfun$org$scalatest$SuperEngine$$runTestsInBranch$1.apply(Engine.scala:260)
 [info]   at 
org.scalatest.SuperEngine$$anonfun$org$scalatest$SuperEngine$$runTestsInBranch$1.apply(Engine.scala:249)
 [info]   at scala.collection.immutable.List.foreach(List.scala:318)
 [info]   at 
org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:249)
 [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:326)
 [info]   at org.scalatest.FunSuite$class.runTests(FunSuite.scala:1304)
 [info]   at 
org.apache.spark.streaming.InputStreamsSuite.runTests(InputStreamsSuite.scala:44)
 [info]   at org.scalatest.Suite$class.run(Suite.scala:2303)
 [info]   at 
org.apache.spark.streaming.InputStreamsSuite.org$scalatest$FunSuite$$super$run(InputStreamsSuite.scala:44)
 [info]   at 
org.scalatest.FunSuite$$anonfun$run$1.apply(FunSuite.scala:1310)
 [info]   at 
org.scalatest.FunSuite$$anonfun$run$1.apply(FunSuite.scala:1310)
 [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:362)
 [info]   at org.scalatest.FunSuite$class.run(FunSuite.scala:1310)
 [info]   at 
org.apache.spark.streaming.InputStreamsSuite.org$scalatest$BeforeAndAfter$$super$run(InputStreamsSuite.scala:44)
 [info]   at 
org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:208)
 [info]   at 
org.apache.spark.streaming.InputStreamsSuite.run(InputStreamsSuite.scala:44)
 [info]   at 
org.scalatest.tools.ScalaTestFramework$ScalaTestRunner.run(ScalaTestFramework.scala:214)
 [info]   at sbt.RunnerWrapper$1.runRunner2(FrameworkWrapper.java:220)
 [info]   at sbt.RunnerWrapper$1.execute(FrameworkWrapper.java:233)
 [info]   at sbt.ForkMain$Run.runTest(ForkMain.java:243)
 [info]   at sbt.ForkMain$Run.runTestSafe(ForkMain.java:214)
 [info]   at sbt.ForkMain$Run.runTests(ForkMain.java:190)
 [info]   at sbt.ForkMain$Run.run(ForkMain.java:257)
 [info]   at sbt.ForkMain.main(ForkMain.java:99)

 --
 Reply to this email directly or view it on 
GitHubhttps://github.com/apache/spark/pull/346#issuecomment-39759960
 .



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1434] [MLLIB] change labelParser from a...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/345#issuecomment-39760445
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13845/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Remove extendedDebugInfo option in test build ...

2014-04-07 Thread haosdent
Github user haosdent commented on the pull request:

https://github.com/apache/spark/pull/346#issuecomment-39760589
  
I believe I already filled a JIRA for it.

@marmbrus Could you post the JIRA link about it? If that test case in 
`InputStreamsSuite` is flakey, maybe we should fix it instead of remove this 
option.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1396] Properly cleanup DAGScheduler on ...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/305#issuecomment-39760644
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1396] Properly cleanup DAGScheduler on ...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/305#issuecomment-39760645
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13846/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

2014-04-07 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/299#discussion_r11355670
  
--- Diff: conf/spark-env.sh.template ---
@@ -1,19 +1,36 @@
 #!/usr/bin/env bash
 
-# This file contains environment variables required to run Spark. Copy it 
as
-# spark-env.sh and edit that to configure Spark for your site.
-#
-# The following variables can be set in this file:
+# This file is sourced when running various Spark classes. 
+# Copy it as spark-env.sh and edit that to configure Spark for your site.
+
+# Options read when launching programs locally with 
+# ./bin/spark-example or ./bin/spark-submit
+# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
+# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
+# - SPARK_CLASSPATH, default classpath entries to append
+
+# Options read by executors and drivers running inside the cluster
 # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
+# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
+# - SPARK_LOCAL_DIRS, shuffle directories to use on this node
 # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
-# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
-#   we recommend setting app-wide options in the application's driver 
program.
-# Examples of node-specific options : -Dspark.local.dir, GC options
-# Examples of app-wide options : -Dspark.serializer
-#
-# If using the standalone deploy mode, you can also set variables for it 
here:
+# - SPARK_CLASSPATH, default classpath entries to append
+
+# Options read in YARN client mode
+# - SPARK_YARN_APP_JAR, Path to your application’s JAR file (required)
+# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2)
+# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1).
+# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
+# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 
Mb)
+# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)
+# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests 
(Default: ‘default’)
+# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed 
with the job.
+# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be 
distributed with the job.
+
+# Options for the daemons used in the standalone deploy mode:
 # - SPARK_MASTER_IP, to bind the master to a different IP address or 
hostname
 # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports
+# - SPARK_MASTER_OPTS, to set config properties at the master (e.g -Dx=y)
 # - SPARK_WORKER_CORES, to set the number of cores to use on this machine
 # - SPARK_WORKER_MEMORY, to set how much memory to use (e.g. 1000m, 2g)
--- End diff --

Oh I see, the former is the total amount of memory for all executors on one 
machine, but the latter is the memory given to the Worker daemon thread that 
launches these executors...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1276] Add a HistoryServer to render per...

2014-04-07 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/204#issuecomment-39761686
  
This is ready for further review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

2014-04-07 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/299#discussion_r11356283
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -221,6 +247,13 @@ object SparkSubmit {
 val url = localJarFile.getAbsoluteFile.toURI.toURL
 loader.addURL(url)
   }
+
+  private def getDefaultProperties(file: File): Seq[(String, String)] = {
+val inputStream = new FileInputStream(file)
+val properties = new Properties()
+properties.load(inputStream)
+properties.stringPropertyNames().toSeq.map(k = (k, properties(k)))
+  }
--- End diff --

Would be good to add a try catch here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1396] Properly cleanup DAGScheduler on ...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/305#issuecomment-39762025
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1252. On YARN, use container-log4j.prope...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/148#issuecomment-39762310
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13847/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

2014-04-07 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/299#discussion_r11356743
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientClusterScheduler.scala
 ---
@@ -29,7 +29,7 @@ import org.apache.spark.util.Utils
  */
 private[spark] class YarnClientClusterScheduler(sc: SparkContext, conf: 
Configuration) extends TaskSchedulerImpl(sc) {
 
-  def this(sc: SparkContext) = this(sc, new Configuration())
+  def this(sc: SparkContext) = this(sc, sc.getConf)
--- End diff --

Maybe I'm missing something here, but doesn't sc.getConf return 
`SparkConf`, not a hadoop `Configuration`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: HOTFIX: Disable actor input stream.

2014-04-07 Thread pwendell
GitHub user pwendell opened a pull request:

https://github.com/apache/spark/pull/347

HOTFIX: Disable actor input stream.

This test makes incorrect assumptions about the behavior of Thread.sleep().

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pwendell/spark stream-tests

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/347.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #347


commit 10e09e0bd001b64ee06c9e8bb9d8f6bb7f111666
Author: Patrick Wendell pwend...@gmail.com
Date:   2014-04-07T18:06:14Z

HOTFIX: Disable actor input stream.

This test makes incorrect assumptions about the behavior of Thread.sleep().




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: HOTFIX: Disable actor input stream test.

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/347#issuecomment-39763890
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: HOTFIX: Disable actor input stream test.

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/347#issuecomment-39763908
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-04-07 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/126#discussion_r11357428
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/BlockManagerSlaveActor.scala ---
@@ -26,14 +29,60 @@ import org.apache.spark.storage.BlockManagerMessages._
  * this is used to remove blocks from the slave's BlockManager.
  */
 private[storage]
-class BlockManagerSlaveActor(blockManager: BlockManager) extends Actor {
-  override def receive = {
+class BlockManagerSlaveActor(
+blockManager: BlockManager,
+mapOutputTracker: MapOutputTracker)
+  extends Actor with Logging {
+
+  import context.dispatcher
 
+  // Operations that involve removing blocks may be slow and should be 
done asynchronously
+  override def receive = {
 case RemoveBlock(blockId) =
-  blockManager.removeBlock(blockId)
+  doAsync[Boolean](removing block, sender) {
+blockManager.removeBlock(blockId)
+true
+  }
 
 case RemoveRdd(rddId) =
-  val numBlocksRemoved = blockManager.removeRdd(rddId)
-  sender ! numBlocksRemoved
+  doAsync[Int](removing RDD, sender) {
+blockManager.removeRdd(rddId)
+  }
+
+case RemoveShuffle(shuffleId) =
+  doAsync[Boolean](removing shuffle, sender) {
+if (mapOutputTracker != null) {
+  mapOutputTracker.unregisterShuffle(shuffleId)
+}
+blockManager.shuffleBlockManager.removeShuffle(shuffleId)
+  }
+
+case RemoveBroadcast(broadcastId, tellMaster) =
+  doAsync[Int](removing RDD, sender) {
+blockManager.removeBroadcast(broadcastId, tellMaster)
+  }
+
+case GetBlockStatus(blockId, _) =
+  sender ! blockManager.getStatus(blockId)
+
+case GetMatchingBlockIds(filter, _) =
+  sender ! blockManager.getMatchingBlockIds(filter)
+  }
+
+  private def doAsync[T](actionMessage: String, responseActor: 
ActorRef)(body: = T) {
+val future = Future {
+  logDebug(actionMessage)
+  val response = body
+  response
--- End diff --

Why not just rename `body` to `response` in the first place?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-04-07 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/126#discussion_r11357667
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala ---
@@ -47,6 +47,7 @@ private[spark] class DiskBlockManager(shuffleManager: 
ShuffleBlockManager, rootD
   private val subDirs = Array.fill(localDirs.length)(new 
Array[File](subDirsPerLocalDir))
   private var shuffleSender : ShuffleSender = null
 
+
--- End diff --

nit: this was probably not intended


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-04-07 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/126#discussion_r11357471
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/BlockManagerSlaveActor.scala ---
@@ -26,14 +29,60 @@ import org.apache.spark.storage.BlockManagerMessages._
  * this is used to remove blocks from the slave's BlockManager.
  */
 private[storage]
-class BlockManagerSlaveActor(blockManager: BlockManager) extends Actor {
-  override def receive = {
+class BlockManagerSlaveActor(
+blockManager: BlockManager,
+mapOutputTracker: MapOutputTracker)
+  extends Actor with Logging {
+
+  import context.dispatcher
 
+  // Operations that involve removing blocks may be slow and should be 
done asynchronously
+  override def receive = {
 case RemoveBlock(blockId) =
-  blockManager.removeBlock(blockId)
+  doAsync[Boolean](removing block, sender) {
+blockManager.removeBlock(blockId)
+true
+  }
 
 case RemoveRdd(rddId) =
-  val numBlocksRemoved = blockManager.removeRdd(rddId)
-  sender ! numBlocksRemoved
+  doAsync[Int](removing RDD, sender) {
+blockManager.removeRdd(rddId)
+  }
+
+case RemoveShuffle(shuffleId) =
+  doAsync[Boolean](removing shuffle, sender) {
+if (mapOutputTracker != null) {
+  mapOutputTracker.unregisterShuffle(shuffleId)
+}
+blockManager.shuffleBlockManager.removeShuffle(shuffleId)
+  }
+
+case RemoveBroadcast(broadcastId, tellMaster) =
+  doAsync[Int](removing RDD, sender) {
+blockManager.removeBroadcast(broadcastId, tellMaster)
+  }
+
+case GetBlockStatus(blockId, _) =
+  sender ! blockManager.getStatus(blockId)
+
+case GetMatchingBlockIds(filter, _) =
+  sender ! blockManager.getMatchingBlockIds(filter)
+  }
+
+  private def doAsync[T](actionMessage: String, responseActor: 
ActorRef)(body: = T) {
+val future = Future {
+  logDebug(actionMessage)
+  val response = body
+  response
+}
+future.onSuccess { case response =
+  logDebug(Done  + actionMessage + , response is  + response)
--- End diff --

We probably want to include the RDD/shuffle/broadcast ID in the action 
message


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1216. Add a OneHotEncoder for handling c...

2014-04-07 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/304#issuecomment-39765951
  
Ahh, makes sense.  Posted a revision that uses LocalSparkContext.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1216. Add a OneHotEncoder for handling c...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/304#issuecomment-39766370
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1216. Add a OneHotEncoder for handling c...

2014-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/304#issuecomment-39766383
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   3   4   >