date:20140715

[GitHub] spark pull request: [SQL] Attribute equality comparisons should be...

2014-07-15 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/1414#issuecomment-48993578
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2467] Revert SparkBuild to publish-loca...

2014-07-15 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1398#issuecomment-48993635
  
Thanks for fixing this - I tested this locally and it worked (though I did 
have to do a clean build first).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] Attribute equality comparisons should be...

2014-07-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1414#issuecomment-48993731
  
QA tests have started for PR 1414. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16663/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2467] Revert SparkBuild to publish-loca...

2014-07-15 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1398


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1536: multiclass classification support ...

2014-07-15 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/886#issuecomment-48994002
  
@manishamde  - can you add `[MLlib]` to the title of this pull request? 
Otherwise it doesn't get filtered properly by our filters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2486: Utils.getCallSite is now resilient...

2014-07-15 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1413#issuecomment-48994041
  
LGTM - I'll merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2486: Utils.getCallSite is now resilient...

2014-07-15 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1413


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Fix JIRA-983 and support exteranl sort for sor...

2014-07-15 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/931#issuecomment-48994210
  
Jenkins, test this please. @xiajunluan actually I think the main issue now 
is that this isn't merging cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2469: Use Snappy (instead of LZF) for de...

2014-07-15 Thread rxin

GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/1415

SPARK-2469: Use Snappy (instead of LZF) for default shuffle compression 
codec

This reduces shuffle compression memory usage by 3x.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark snappy

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1415.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1415


commit 06c1a01471cc5f6368062e88bd655ecc2634a8b7
Author: Reynold Xin r...@apache.org
Date:   2014-07-15T06:16:45Z

SPARK-2469: Use Snappy (instead of LZF) for default shuffle compression 
codec.

This reduces shuffle compression memory usage by 3x.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2469: Use Snappy (instead of LZF) for de...

2014-07-15 Thread mridulm

Github user mridulm commented on the pull request:

https://github.com/apache/spark/pull/1415#issuecomment-48994982
  
Do we want to change default for everything or only for shuffle ? (only 
shuffle wont impact anything outside of spark)
What would be impact on user data if we change for all ? (It is 
developerApi after all, so there might be user data consuming this ?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: discarded exceeded completedDrivers

2014-07-15 Thread lianhuiwang

Github user lianhuiwang commented on the pull request:

https://github.com/apache/spark/pull/1114#issuecomment-48995170
  
@andrewor14  i have created a jira issue SPARK-2302. yes, it is for 
reducing Master's memory. Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2485][SQL] Lock usage of hive client.

2014-07-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1412#issuecomment-48995408
  
QA results for PR 1412:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16657/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2469: Use Snappy (instead of LZF) for de...

2014-07-15 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1415#issuecomment-48995490
  
This is actually only used in shuffle.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2469: Use Snappy (instead of LZF) for de...

2014-07-15 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1415#issuecomment-48995567
  
Actually I lied. Somebody else added some code to use the compression codec 
to compress event data ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2469: Use Snappy (instead of LZF) for de...

2014-07-15 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1415#issuecomment-48995664
  
cc @andrewor14 I guess you added the event code ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Add/increase severity of warning in documentat...

2014-07-15 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1380


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2399] Add support for LZ4 compression.

2014-07-15 Thread rxin

GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/1416

[SPARK-2399] Add support for LZ4 compression.

Based on Greg Bowyer's patch from JIRA 
https://issues.apache.org/jira/browse/SPARK-2399



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark lz4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1416.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1416


commit 8a14d38523e2b35b7f38503bb70bb9934e229cf3
Author: Reynold Xin r...@apache.org
Date:   2014-07-15T06:38:49Z

[SPARK-2399] Add support for LZ4 compression.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2411] Add a history-not-found page to s...

2014-07-15 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/1336#discussion_r14919256
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/master/ui/HistoryNotFoundPage.scala 
---
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.master.ui
+
+import javax.servlet.http.HttpServletRequest
+
+import scala.xml.Node
+
+import org.apache.spark.ui.{UIUtils, WebUIPage}
+
+private[spark] class HistoryNotFoundPage(parent: MasterWebUI)
+  extends WebUIPage(history/not-found) {
+
+  def render(request: HttpServletRequest): Seq[Node] = {
+val content =
+  div class=row-fluid
+div class=span12 style=font-size:14px;font-weight:bold
+  No event logs were found for this application. To enable event 
logging, please set
--- End diff --

I mean that if the are joining an existing cluster, they'll need to figure 
out what HDFS or FS path to set this too such that it's consistent with the 
path set by the person running the history server.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2399] Add support for LZ4 compression.

2014-07-15 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1416#issuecomment-48995876
  
cc @davies


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2399] Add support for LZ4 compression.

2014-07-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1416#issuecomment-48995956
  
QA tests have started for PR 1416. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/1/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1415#issuecomment-48996149
  
I looked into the event logger code and it appears that codec change should 
be fine. It figures out the codec for old data automatically anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2399] Add support for LZ4 compression.

2014-07-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1416#issuecomment-48996263
  
QA tests have started for PR 1416. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16667/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/1415#issuecomment-48996256
  
Yes, we log the codec used in a separate file so we don't lock ourselves 
out of our old event logs. This change seems fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2390] Files in staging directory cannot...

2014-07-15 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1326#issuecomment-48996710
  
Sure - I guess we can do this. It seems strange to open a filesystem and 
never close it (what if someone creates a large number of FileLogger 
instances... after all this is a generic class). I guess we'll rely on shutdown 
to close this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread mridulm

Github user mridulm commented on the pull request:

https://github.com/apache/spark/pull/1415#issuecomment-48996763
  
@andrewor14 do we also log the block size, etc of the codec used ?
If yes, then atleast for event data we should be fine.

IIRC we use the codec to compress 
a) RDD (which could be in tachyon - and so shared between spark deployments 
?), 
b) shuffle (private to spark), 
c) broadcast (private to spark).
d) event logging (discussed above)
e) checkpoint (could be shared between runs ?)

Other than (a) and (e), sharing data via others would be non trivial and 
something we dont need to support imo.
I am not very sure of (a) and (e) - thoughts ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2390] Files in staging directory cannot...

2014-07-15 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1326


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2412] CoalescedRDD throws exception wit...

2014-07-15 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/1337#discussion_r14919602
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala ---
@@ -258,7 +258,7 @@ private[spark] class PartitionCoalescer(maxPartitions: 
Int, prev: RDD[_], balanc
 val pgroup = PartitionGroup(nxt_replica)
 groupArr += pgroup
 addPartToPGroup(nxt_part, pgroup)
-groupHash += (nxt_replica - (ArrayBuffer(pgroup))) // list in 
case we have multiple
+groupHash.put(nxt_replica, ArrayBuffer(pgroup)) // list in case we 
have multiple
--- End diff --

Is this just a stylistic change or does this operator somehow have 
different semantics?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2412] CoalescedRDD throws exception wit...

2014-07-15 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1337#issuecomment-48996905
  
LGTM pending one small question


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: discarded exceeded completedDrivers

2014-07-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1114#issuecomment-48997521
  
QA results for PR 1114:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16660/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2412] CoalescedRDD throws exception wit...

2014-07-15 Thread aarondav

Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/1337#discussion_r14919798
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala ---
@@ -258,7 +258,7 @@ private[spark] class PartitionCoalescer(maxPartitions: 
Int, prev: RDD[_], balanc
 val pgroup = PartitionGroup(nxt_replica)
 groupArr += pgroup
 addPartToPGroup(nxt_part, pgroup)
-groupHash += (nxt_replica - (ArrayBuffer(pgroup))) // list in 
case we have multiple
+groupHash.put(nxt_replica, ArrayBuffer(pgroup)) // list in case we 
have multiple
--- End diff --

strictly stylistic -- it made more sense when I was using put below, now 
there's no reason for it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2485][SQL] Lock usage of hive client.

2014-07-15 Thread aarondav

Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/1412#issuecomment-48997884
  
LGTM, merging into master and branch-1.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2485][SQL] Lock usage of hive client.

2014-07-15 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1412


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2154] Schedule next Driver when one com...

2014-07-15 Thread aarondav

Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/1405#issuecomment-48998138
  
@pwendell The reporters of this issue have reported that this PR fixed the 
problem. Ideally it can go into 1.0.2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2486: Utils.getCallSite is now resilient...

2014-07-15 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1413#issuecomment-48998126
  
@willb @aarondav my bad guys, I thought all outstanding issues were 
addressed here but I realize that's not the case. Feel free to submit another 
patch to clean up the brackets.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1536: multiclass classification support ...

2014-07-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/886#issuecomment-48998279
  
QA results for PR 886:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16661/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: discarded exceeded completedDrivers

2014-07-15 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1114#issuecomment-48998515
  
Thanks. Merging this in master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: discarded exceeded completedDrivers

2014-07-15 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1114


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [MLlib] SPARK-1536: multiclass classification ...

2014-07-15 Thread manishamde

Github user manishamde commented on the pull request:

https://github.com/apache/spark/pull/886#issuecomment-48998668
  
@pwendell I modified the title.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2485][SQL] Lock usage of hive client.

2014-07-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1412#issuecomment-48998885
  
QA results for PR 1412:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16662/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [RFC] Disable local execution of Spark jobs by...

2014-07-15 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1321#issuecomment-48999051
  
@rxin is there a case where you think local execution will yield a relevant 
performance improvement? I don't see why shopping a task for a few milliseconds 
is a bit deal. The main use case I see for this is people running `take` in a 
repl... in this case the cluster scheduler is not backlogged because they can't 
access the repl at all until the prior command has finished anyways.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [RFC] Disable local execution of Spark jobs by...

2014-07-15 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1321#issuecomment-48999161
  
When the cluster is busy and backlogged ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread ScrapCodes

Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-48999523
  
I am trying to figure out why it happend, this might not be my conclusion 
but at the moment I feel that since this class has a private [mllib] 
constructor, there is an entry in ignores file as follows 
`org.apache.spark.mllib.recommendation.MatrixFactorizationModel.init`. This 
particular entry makes the whole class ignored by mima tool. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread ScrapCodes

Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-48999631
  
And to my surprise also has 
`org.apache.spark.mllib.recommendation.MatrixFactorizationModel.predict` not 
sure why it has that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] SPARK-2360: CSV import to SchemaRDDs

2014-07-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1351#issuecomment-49000138
  
QA tests have started for PR 1351. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16668/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread ScrapCodes

Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-49000211
  
Ahh understood (please ignore my previous theory.), so it happened because 
we have a function which is `@developerApi` in the same class with same name. 
So this was added by our GenerateMimaIgnoreTool to ignores file and as a result 
MIMA check for all predict methods was disabled. Not sure if mima can 
disambiguate them. I will check that. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1415#issuecomment-49001592
  
QA results for PR 1415:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16665/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [RFC] Disable local execution of Spark jobs by...

2014-07-15 Thread aarondav

Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/1321#issuecomment-49002113
  
I think it makes more sense if you can't run a command than certain 
commands happen to be runnable while there are no cluster resources. This sort 
of execution puts more stress on the driver, as well, and things like 
OutOfMemoryErrors on the driver are far more serious than on an Executor (for 
example, [this 
issue](https://groups.google.com/forum/#!msg/spark-users/eu9RJc3nQng/-T6wmcjMFiwJ)).

My hypothesis is that this feature is rarely useful, and often leads to 
more confusion for users and potentially less stability.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1669 SPARK-1379][SQL][WIP] Made Schem...

2014-07-15 Thread liancheng

Github user liancheng closed the pull request at:

https://github.com/apache/spark/pull/829


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2399] Add support for LZ4 compression.

2014-07-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1416#issuecomment-49003237
  
QA results for PR 1416:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brclass LZ4CompressionCodec(conf: SparkConf) extends 
CompressionCodec {brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/1/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2399] Add support for LZ4 compression.

2014-07-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1416#issuecomment-49003425
  
QA results for PR 1416:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brclass LZ4CompressionCodec(conf: SparkConf) extends 
CompressionCodec {brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16667/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2399] Add support for LZ4 compression.

2014-07-15 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1416#issuecomment-49005453
  
Ok merging this in master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2399] Add support for LZ4 compression.

2014-07-15 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1416


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread mridulm

Github user mridulm commented on the pull request:

https://github.com/apache/spark/pull/1415#issuecomment-49005728
  
weird that test failures - unrelated to this change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread mridulm

Github user mridulm commented on the pull request:

https://github.com/apache/spark/pull/1415#issuecomment-49005818
  
ah yes, blocksize is only used during compression time : and inferred from 
stream during decompression.
Then only class name should be sufficient


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1415#issuecomment-49005883
  
Yea the test failure isn't related. 

If there is no objection, I'm going to merge this tomorrow. I will file a 
jira ticket so we can prepend compression codec information to compressed data 
and then perhaps pick compression codec during decompression based on that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...

2014-07-15 Thread mridulm

Github user mridulm commented on the pull request:

https://github.com/apache/spark/pull/1212#issuecomment-49006034
  
hi @lirui-intel looks good to me !
Will merge when I get my laptop working again - unfortunate state of 
affairs :-)

In meantime, if @pwendell or someone else could merge this, that would be 
great too !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread mridulm

Github user mridulm commented on the pull request:

https://github.com/apache/spark/pull/1415#issuecomment-49006312
  
Cant comment on tachyon since we dont use it and have no experience with it 
unfortunately.
I am fine with this change for the rest.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Added LZ4 to compression codec in configuratio...

2014-07-15 Thread rxin

GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/1417

Added LZ4 to compression codec in configuration page.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark lz4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1417.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1417


commit 9cf0b2f389deeef898a6b75878aae1e3ec88
Author: Reynold Xin r...@apache.org
Date:   2014-07-15T09:03:36Z

Added LZ4 to compression codec in configuration page.

commit 472f6a130c4454f2b0ae3716a811168b2d322e7b
Author: Reynold Xin r...@apache.org
Date:   2014-07-15T09:05:01Z

Set the proper default.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] SPARK-2360: CSV import to SchemaRDDs

2014-07-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1351#issuecomment-49007980
  
QA results for PR 1351:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16668/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2477][MLlib] Using appendBias for addin...

2014-07-15 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1410#issuecomment-49008039
  
LGTM. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2477][MLlib] Using appendBias for addin...

2014-07-15 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1410


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [MLLIB] [SPARK-2222] Add multiclass evaluation...

2014-07-15 Thread avulanov

Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1155#issuecomment-49009673
  
@mengxr I've addressed your comments. Thanks for pointing me to the Scala 
issue


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [MLLIB] [SPARK-2222] Add multiclass evaluation...

2014-07-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1155#issuecomment-49009683
  
QA tests have started for PR 1155. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16670/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [MLLIB] [SPARK-2222] Add multiclass evaluation...

2014-07-15 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1155#issuecomment-49011680
  
@avulanov I made some minor updates and send you a PR at 
https://github.com/avulanov/spark/pull/1 . If it looks good to you, please 
merge that PR and the changes should show up here. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [MLLIB] [SPARK-2222] Add multiclass evaluation...

2014-07-15 Thread avulanov

Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1155#issuecomment-49012039
  
@mengxr done!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [MLLIB] [SPARK-2222] Add multiclass evaluation...

2014-07-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1155#issuecomment-49012327
  
QA tests have started for PR 1155. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16671/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1215: Clustering: Index out of bounds er...

2014-07-15 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1407#discussion_r14925960
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/LocalKMeans.scala ---
@@ -59,6 +59,11 @@ private[mllib] object LocalKMeans extends Logging {
 cumulativeScore += weights(j) * KMeans.pointCost(curCenters, 
points(j))
 j += 1
   }
+  if (j == 0) {
+logWarning(kMeansPlusPlus initialization ran out of distinct 
points for centers. +
+  s Using duplicate point for center k = $i.)
+j = 1
--- End diff --

The code may be clearer if written in this way

~~~
centers(i) = 
  if (j == 0) {
logWarning(...)
points(0).toDense
  } else {
points(j - 1).toDense
  }
~~~

or 

~~~
if (j == 0) {
  logWarning(...)
  centers(i) = points(0).toDense
} else {
  centers(i) = points(j - 1).toDense
}
~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/1393#discussion_r14926032
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
 ---
@@ -53,7 +53,7 @@ class MatrixFactorizationModel private[mllib] (
 * @param usersProducts  RDD of (user, product) pairs.
 * @return RDD of Ratings.
 */
-  def predict(usersProducts: RDD[(Int, Int)]): RDD[Rating] = {
+  def predict(usersProducts: RDD[(Long, Long)]): RDD[Rating] = {
--- End diff --

I had understood all of this to be an `@Experimental` API, though it is not 
consistently marked. For example, `Rating` is experimental but its API is 
actually bound to this API here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1215: Clustering: Index out of bounds er...

2014-07-15 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1407#discussion_r14926024
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/clustering/KMeansSuite.scala ---
@@ -61,6 +61,30 @@ class KMeansSuite extends FunSuite with 
LocalSparkContext {
 assert(model.clusterCenters.head === center)
   }
 
+  test(no distinct points) {
+val data = sc.parallelize(Array(
+  Vectors.dense(1.0, 2.0, 3.0),
+  Vectors.dense(1.0, 2.0, 3.0),
+  Vectors.dense(1.0, 2.0, 3.0)
+))
+val center = Vectors.dense(1.0, 2.0, 3.0)
+
+// Make sure code runs.
+var model = KMeans.train(data, k=2, maxIterations=1)
+assert(model.clusterCenters.size === 2)
+  }
+
+  test(more clusters than points) {
+val data = sc.parallelize(Array(
+  Vectors.dense(1.0, 2.0, 3.0),
+  Vectors.dense(1.0, 3.0, 4.0)
+))
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1215: Clustering: Index out of bounds er...

2014-07-15 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1407#discussion_r14926012
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/clustering/KMeansSuite.scala ---
@@ -61,6 +61,30 @@ class KMeansSuite extends FunSuite with 
LocalSparkContext {
 assert(model.clusterCenters.head === center)
   }
 
+  test(no distinct points) {
+val data = sc.parallelize(Array(
+  Vectors.dense(1.0, 2.0, 3.0),
+  Vectors.dense(1.0, 2.0, 3.0),
+  Vectors.dense(1.0, 2.0, 3.0)
+))
--- End diff --

add `, 2` to test two partitions


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1215: Clustering: Index out of bounds er...

2014-07-15 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1407#issuecomment-49012803
  
@jkbradley The fix looks good to me except some minor style issues. Thanks 
for fixing it! Btw, please add `[MLLIB]` to the title so this is easy to find.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-49013972
  
Yes you could also tell callers to track their own user-ID mapping and 
maintain it consistently everywhere. Callers have to share that state then 
somehow. Hashing is easier, and 64 bits makes it work for practical purposes. 

A caller has to do something like these to deal with real-world identifiers 
because an `Int` ID API by itself doesn't quite work. This is an instance of a 
meta-concern I have, if an API which (from my perspective) is going to be 
problematic at scale is already unchangeable before battle-testing. (I actually 
thought all of MLlib was de facto `@Experimental`?)

Yeah however you can layer on other APIs to fix it, or use `@deprecated` in 
cases like this to keep existing methods but add new signatures too. I think 
that would be the simplest solution to this particular concern.

The question of serialized size is still out there. That is worth weighing 
in on.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [MLLIB] [SPARK-2222] Add multiclass evaluation...

2014-07-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1155#issuecomment-49020152
  
QA results for PR 1155:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brclass MulticlassMetrics(predictionAndLabels: RDD[(Double, 
Double)]) {br* (equals to precision for multiclass classifierbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16671/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-15 Thread viirya

GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/1418

[SPARK-2490] Change recursive visiting on RDD dependencies to iterative 
approach


When performing some transformations on RDDs after many iterations, the 
dependencies of RDDs could be very long. It can easily cause StackOverflowError 
when recursively visiting these dependencies in Spark core. For example:

var rdd = sc.makeRDD(Array(1))
for (i - 1 to 1000) { 
  rdd = rdd.coalesce(1).cache()
  rdd.collect()
}

This PR changes recursive visiting on rdd's dependencies to iterative 
approach to avoid StackOverflowError. 

In addition to the recursive visiting, since the Java serializer has a 
known [bug](http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4152790) that 
causes StackOverflowError too when serializing/deserializing a large graph of 
objects. So applying this PR only solves part of the problem. Using 
KryoSerializer to replace Java serializer might be helpful. However, since 
KryoSerializer is not supported for `spark.closure.serializer` now, I can not 
test if KryoSerializer can solve Java serializer's problem completely. 






You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 remove_recursive_visit

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1418.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1418


commit 900538bbcb61683bf1418534c2466463a630569f
Author: Liang-Chi Hsieh vii...@gmail.com
Date:   2014-07-15T10:58:45Z

change recursive visiting on rdd's dependencies to iterative approach to 
avoid stackoverflowerror.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1418#issuecomment-49022775
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2486: Utils.getCallSite is now resilient...

2014-07-15 Thread willb

Github user willb commented on the pull request:

https://github.com/apache/spark/pull/1413#issuecomment-49023679
  
@aarondav @pwendell Yes, with this patch I'm able to enable the YourKit 
features that were causing crashes before.  I'll submit an update to fix the 
bracket style and cc you both.  Thanks for the quick review!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Reformat multi-line closure argument.

2014-07-15 Thread willb

GitHub user willb opened a pull request:

https://github.com/apache/spark/pull/1419

Reformat multi-line closure argument.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/willb/spark reformat-2486

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1419.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1419


commit 26762310ddf0ea88a418c506ed0e86892fe6e4d5
Author: William Benton wi...@redhat.com
Date:   2014-07-15T12:35:13Z

Reformat multi-line closure argument.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Reformat multi-line closure argument.

2014-07-15 Thread willb

Github user willb commented on the pull request:

https://github.com/apache/spark/pull/1419#issuecomment-49024982
  
(See discussion on #1413; cc @aarondav and @pwendell.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Reformat multi-line closure argument.

2014-07-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1419#issuecomment-49025080
  
QA tests have started for PR 1419. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16672/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-07-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1269#issuecomment-49029010
  
QA results for PR 1269:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brclass Document(val tokens: SparseVector[Int], val 
alphabetSize: Int) extends Serializablebrclass DocumentParameters(val 
document: Document, val theta: Array[Float],brclass GlobalCounters(val 
wordsFromTopics: Array[Array[Float]], val alphabetSize: Int)brclass 
GlobalParameters(val phi : Array[Array[Float]], val alphabetSize : 
Int)brclass PLSA(@transient protected val sc: SparkContext,brclass 
RobustDocumentParameters(document: Document,brclass 
RobustGlobalCounters(wordsFromTopic: Array[Array[Float]],brclass 
RobustGlobalParameters(phi : Array[Array[Float]],brclass 
RobustPLSA(@transient protected val sc: SparkContext,brtrait 
SparseVectorFasterSum {brtrait DocumentOverTopicDistributionRegularizer 
extends Serializable with MatrixInPlaceModification {brtrait 
MatrixInPlaceModification {brclass SymmetricDirich
 letDocumentOverTopicDistributionRegularizer(protected val alpha: 
Float)brtrait SymmetricDirichletHelper {brclass 
SymmetricDirichletTopicRegularizer(protected val alpha: Float) extends 
TopicsRegularizerbrtrait TopicsRegularizer extends MatrixInPlaceModification 
{brclass UniformDocumentOverTopicRegularizer extends 
DocumentOverTopicDistributionRegularizer {brclass UniformTopicRegularizer 
extends TopicsRegularizer {brclass TObjectIntHashMapSerializer extends 
Serializer[TObjectIntHashMap[Object]] {brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16673/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-07-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1269#issuecomment-49030237
  
QA tests have started for PR 1269. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16674/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2492][Streaming] kafkaReceiver minor ch...

2014-07-15 Thread jerryshao

GitHub user jerryshao opened a pull request:

https://github.com/apache/spark/pull/1420

[SPARK-2492][Streaming] kafkaReceiver minor changes to align with Kafka 0.8

Update the KafkaReceiver's behavior when auto.offset.reset is set to 
smallest, which is aligned with Kafka 0.8 ConsoleConsumer. Also using Kafka 
offered API to replace with previous code.

@tdas, would you please review this PR? Thanks a lot.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jerryshao/apache-spark kafka-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1420.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1420


commit ed2d54001f3cf9baa6d40023bd326df5ebd90f14
Author: jerryshao saisai.s...@intel.com
Date:   2014-07-15T12:56:15Z

Changes to align with Kafka 0.8




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2492][Streaming] kafkaReceiver minor ch...

2014-07-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1420#issuecomment-49032122
  
QA tests have started for PR 1420. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16675/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2480: Resolve sbt warnings NOTE: SPARK_...

2014-07-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1404#issuecomment-49032797
  
QA tests have started for PR 1404. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16676/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1470,SPARK-1842] Use the scala-logging ...

2014-07-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1369#issuecomment-49034870
  
QA tests have started for PR 1369. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16679/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2480: Resolve sbt warnings NOTE: SPARK_...

2014-07-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1404#issuecomment-49033458
  
QA tests have started for PR 1404. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16677/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2480: Resolve sbt warnings NOTE: SPARK_...

2014-07-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1404#issuecomment-49034147
  
QA tests have started for PR 1404. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16678/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1291: Link the spark UI to RM ui in yarn...

2014-07-15 Thread tgravescs

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/1112#discussion_r14935222
  
--- Diff: 
yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ExecutorLauncher.scala 
---
@@ -82,6 +84,9 @@ class ExecutorLauncher(args: ApplicationMasterArguments, 
conf: Configuration, sp
   case x: DisassociatedEvent =
 logInfo(sDriver terminated or disconnected! Shutting down. $x)
 driverClosed = true
+  case x: AddWebUIFilter =
--- End diff --

Can you make the same changes for yarn alpha mode also please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1291: Link the spark UI to RM ui in yarn...

2014-07-15 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/1112#issuecomment-49036292
  
@witgo this looks good could you also add support for setting it in yarn 
alpha mode, sorry I missed that in earlier reviews.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Reformat multi-line closure argument.

2014-07-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1419#issuecomment-49037229
  
QA results for PR 1419:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16672/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...

2014-07-15 Thread tgravescs

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/1094#discussion_r14936355
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala
 ---
@@ -132,4 +135,17 @@ object YarnSparkHadoopUtil {
 }
   }
 
+  def getUIHistoryAddress(sc: SparkContext, conf: SparkConf) : String = {
+val eventLogDir = sc.eventLogger match {
+  case Some(logger) = logger.logDir.split(/).last
--- End diff --

I think it would be better to add a routine to the eventLogger to just give 
us the name of the directory rather then us splitting it and it possibly 
breaking in the future.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...

2014-07-15 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/1094#issuecomment-49039069
  
This PR conflicts with pr1112. I would like to put that one in first and 
then upmerge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...

2014-07-15 Thread tgravescs

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/1094#discussion_r14936811
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala ---
@@ -172,6 +172,8 @@ class HistoryServer(
 object HistoryServer {
   private val conf = new SparkConf
 
+  val UI_PATH_PREFIX = /history/
--- End diff --

If we are adding this we should also use it in this file to set the path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1291: Link the spark UI to RM ui in yarn...

2014-07-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1112#issuecomment-49044799
  
QA tests have started for PR 1112. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16680/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1291: Link the spark UI to RM ui in yarn...

2014-07-15 Thread witgo

Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/1112#issuecomment-49045150
  
@tgravescs  The code has been submitted. Because I don't have the hadoop 
0.23.x cluster, the code no strict test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-15 Thread tgravescs

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/634#discussion_r14940726
  
--- Diff: 
yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -416,19 +407,8 @@ object ApplicationMaster extends Logging {
   // This is to ensure that we have reasonable number of containers before 
we start
--- End diff --

we can remove this whole comment block now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-15 Thread tgravescs

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/634#discussion_r14940751
  
--- Diff: 
yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -370,7 +359,6 @@ object ApplicationMaster extends Logging {
   // This is to ensure that we have reasonable number of containers before 
we start
--- End diff --

same here, we can remove the comment block.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1890 and SPARK-1891- add admin and modif...

2014-07-15 Thread tgravescs

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/1196#discussion_r14941021
  
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -169,18 +192,43 @@ private[spark] class SecurityManager(sparkConf: 
SparkConf) extends Logging {
 )
   }
 
-  private[spark] def setViewAcls(defaultUsers: Seq[String], allowedUsers: 
String) {
-viewAcls = (defaultUsers ++ 
allowedUsers.split(',')).map(_.trim()).filter(!_.isEmpty).toSet 
+  /**
+   * Split a comma separated String, filter out any empty items, and 
return a Set of strings
+   */
+  private def stringToSet(list: String): Set[String] = {
+(list.split(',')).map(_.trim()).filter(!_.isEmpty).toSet
--- End diff --

removed a couple.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1890 and SPARK-1891- add admin and modif...

2014-07-15 Thread tgravescs

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/1196#discussion_r14941058
  
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -169,18 +192,43 @@ private[spark] class SecurityManager(sparkConf: 
SparkConf) extends Logging {
 )
   }
 
-  private[spark] def setViewAcls(defaultUsers: Seq[String], allowedUsers: 
String) {
-viewAcls = (defaultUsers ++ 
allowedUsers.split(',')).map(_.trim()).filter(!_.isEmpty).toSet 
+  /**
+   * Split a comma separated String, filter out any empty items, and 
return a Set of strings
+   */
+  private def stringToSet(list: String): Set[String] = {
+(list.split(',')).map(_.trim()).filter(!_.isEmpty).toSet
+  }
+
+  private[spark] def setViewAcls(defaultUsers: Set[String], allowedUsers: 
String) {
--- End diff --

no I don't believe it is needed, I'll remove them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2492][Streaming] kafkaReceiver minor ch...

2014-07-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1420#issuecomment-49046694
  
QA results for PR 1420:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16675/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

1 2 3 >

1 - 100 of 286 matches

Mail list logo