[GitHub] spark pull request: [SQL] Attribute equality comparisons should be...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/1414#issuecomment-48993578 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2467] Revert SparkBuild to publish-loca...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1398#issuecomment-48993635 Thanks for fixing this - I tested this locally and it worked (though I did have to do a clean build first). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SQL] Attribute equality comparisons should be...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1414#issuecomment-48993731 QA tests have started for PR 1414. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16663/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2467] Revert SparkBuild to publish-loca...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1398 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1536: multiclass classification support ...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/886#issuecomment-48994002 @manishamde - can you add `[MLlib]` to the title of this pull request? Otherwise it doesn't get filtered properly by our filters. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2486: Utils.getCallSite is now resilient...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1413#issuecomment-48994041 LGTM - I'll merge this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2486: Utils.getCallSite is now resilient...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1413 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fix JIRA-983 and support exteranl sort for sor...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/931#issuecomment-48994210 Jenkins, test this please. @xiajunluan actually I think the main issue now is that this isn't merging cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2469: Use Snappy (instead of LZF) for de...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/1415 SPARK-2469: Use Snappy (instead of LZF) for default shuffle compression codec This reduces shuffle compression memory usage by 3x. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark snappy Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1415.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1415 commit 06c1a01471cc5f6368062e88bd655ecc2634a8b7 Author: Reynold Xin r...@apache.org Date: 2014-07-15T06:16:45Z SPARK-2469: Use Snappy (instead of LZF) for default shuffle compression codec. This reduces shuffle compression memory usage by 3x. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2469: Use Snappy (instead of LZF) for de...
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-48994982 Do we want to change default for everything or only for shuffle ? (only shuffle wont impact anything outside of spark) What would be impact on user data if we change for all ? (It is developerApi after all, so there might be user data consuming this ?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: discarded exceeded completedDrivers
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1114#issuecomment-48995170 @andrewor14 i have created a jira issue SPARK-2302. yes, it is for reducing Master's memory. Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2485][SQL] Lock usage of hive client.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1412#issuecomment-48995408 QA results for PR 1412:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16657/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2469: Use Snappy (instead of LZF) for de...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-48995490 This is actually only used in shuffle. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2469: Use Snappy (instead of LZF) for de...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-48995567 Actually I lied. Somebody else added some code to use the compression codec to compress event data ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2469: Use Snappy (instead of LZF) for de...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-48995664 cc @andrewor14 I guess you added the event code ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Add/increase severity of warning in documentat...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1380 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2399] Add support for LZ4 compression.
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/1416 [SPARK-2399] Add support for LZ4 compression. Based on Greg Bowyer's patch from JIRA https://issues.apache.org/jira/browse/SPARK-2399 You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark lz4 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1416.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1416 commit 8a14d38523e2b35b7f38503bb70bb9934e229cf3 Author: Reynold Xin r...@apache.org Date: 2014-07-15T06:38:49Z [SPARK-2399] Add support for LZ4 compression. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2411] Add a history-not-found page to s...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1336#discussion_r14919256 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/ui/HistoryNotFoundPage.scala --- @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.master.ui + +import javax.servlet.http.HttpServletRequest + +import scala.xml.Node + +import org.apache.spark.ui.{UIUtils, WebUIPage} + +private[spark] class HistoryNotFoundPage(parent: MasterWebUI) + extends WebUIPage(history/not-found) { + + def render(request: HttpServletRequest): Seq[Node] = { +val content = + div class=row-fluid +div class=span12 style=font-size:14px;font-weight:bold + No event logs were found for this application. To enable event logging, please set --- End diff -- I mean that if the are joining an existing cluster, they'll need to figure out what HDFS or FS path to set this too such that it's consistent with the path set by the person running the history server. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2399] Add support for LZ4 compression.
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1416#issuecomment-48995876 cc @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2399] Add support for LZ4 compression.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1416#issuecomment-48995956 QA tests have started for PR 1416. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/1/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-48996149 I looked into the event logger code and it appears that codec change should be fine. It figures out the codec for old data automatically anyway. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2399] Add support for LZ4 compression.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1416#issuecomment-48996263 QA tests have started for PR 1416. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16667/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-48996256 Yes, we log the codec used in a separate file so we don't lock ourselves out of our old event logs. This change seems fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2390] Files in staging directory cannot...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1326#issuecomment-48996710 Sure - I guess we can do this. It seems strange to open a filesystem and never close it (what if someone creates a large number of FileLogger instances... after all this is a generic class). I guess we'll rely on shutdown to close this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-48996763 @andrewor14 do we also log the block size, etc of the codec used ? If yes, then atleast for event data we should be fine. IIRC we use the codec to compress a) RDD (which could be in tachyon - and so shared between spark deployments ?), b) shuffle (private to spark), c) broadcast (private to spark). d) event logging (discussed above) e) checkpoint (could be shared between runs ?) Other than (a) and (e), sharing data via others would be non trivial and something we dont need to support imo. I am not very sure of (a) and (e) - thoughts ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2390] Files in staging directory cannot...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1326 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2412] CoalescedRDD throws exception wit...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1337#discussion_r14919602 --- Diff: core/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala --- @@ -258,7 +258,7 @@ private[spark] class PartitionCoalescer(maxPartitions: Int, prev: RDD[_], balanc val pgroup = PartitionGroup(nxt_replica) groupArr += pgroup addPartToPGroup(nxt_part, pgroup) -groupHash += (nxt_replica - (ArrayBuffer(pgroup))) // list in case we have multiple +groupHash.put(nxt_replica, ArrayBuffer(pgroup)) // list in case we have multiple --- End diff -- Is this just a stylistic change or does this operator somehow have different semantics? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2412] CoalescedRDD throws exception wit...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1337#issuecomment-48996905 LGTM pending one small question --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: discarded exceeded completedDrivers
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1114#issuecomment-48997521 QA results for PR 1114:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16660/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2412] CoalescedRDD throws exception wit...
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1337#discussion_r14919798 --- Diff: core/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala --- @@ -258,7 +258,7 @@ private[spark] class PartitionCoalescer(maxPartitions: Int, prev: RDD[_], balanc val pgroup = PartitionGroup(nxt_replica) groupArr += pgroup addPartToPGroup(nxt_part, pgroup) -groupHash += (nxt_replica - (ArrayBuffer(pgroup))) // list in case we have multiple +groupHash.put(nxt_replica, ArrayBuffer(pgroup)) // list in case we have multiple --- End diff -- strictly stylistic -- it made more sense when I was using put below, now there's no reason for it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2485][SQL] Lock usage of hive client.
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/1412#issuecomment-48997884 LGTM, merging into master and branch-1.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2485][SQL] Lock usage of hive client.
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1412 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2154] Schedule next Driver when one com...
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/1405#issuecomment-48998138 @pwendell The reporters of this issue have reported that this PR fixed the problem. Ideally it can go into 1.0.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2486: Utils.getCallSite is now resilient...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1413#issuecomment-48998126 @willb @aarondav my bad guys, I thought all outstanding issues were addressed here but I realize that's not the case. Feel free to submit another patch to clean up the brackets. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1536: multiclass classification support ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/886#issuecomment-48998279 QA results for PR 886:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16661/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: discarded exceeded completedDrivers
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1114#issuecomment-48998515 Thanks. Merging this in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: discarded exceeded completedDrivers
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1114 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [MLlib] SPARK-1536: multiclass classification ...
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/886#issuecomment-48998668 @pwendell I modified the title. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2485][SQL] Lock usage of hive client.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1412#issuecomment-48998885 QA results for PR 1412:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16662/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [RFC] Disable local execution of Spark jobs by...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1321#issuecomment-48999051 @rxin is there a case where you think local execution will yield a relevant performance improvement? I don't see why shopping a task for a few milliseconds is a bit deal. The main use case I see for this is people running `take` in a repl... in this case the cluster scheduler is not backlogged because they can't access the repl at all until the prior command has finished anyways. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [RFC] Disable local execution of Spark jobs by...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1321#issuecomment-48999161 When the cluster is busy and backlogged ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-48999523 I am trying to figure out why it happend, this might not be my conclusion but at the moment I feel that since this class has a private [mllib] constructor, there is an entry in ignores file as follows `org.apache.spark.mllib.recommendation.MatrixFactorizationModel.init`. This particular entry makes the whole class ignored by mima tool. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-48999631 And to my surprise also has `org.apache.spark.mllib.recommendation.MatrixFactorizationModel.predict` not sure why it has that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] SPARK-2360: CSV import to SchemaRDDs
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1351#issuecomment-49000138 QA tests have started for PR 1351. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16668/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49000211 Ahh understood (please ignore my previous theory.), so it happened because we have a function which is `@developerApi` in the same class with same name. So this was added by our GenerateMimaIgnoreTool to ignores file and as a result MIMA check for all predict methods was disabled. Not sure if mima can disambiguate them. I will check that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49001592 QA results for PR 1415:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16665/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [RFC] Disable local execution of Spark jobs by...
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/1321#issuecomment-49002113 I think it makes more sense if you can't run a command than certain commands happen to be runnable while there are no cluster resources. This sort of execution puts more stress on the driver, as well, and things like OutOfMemoryErrors on the driver are far more serious than on an Executor (for example, [this issue](https://groups.google.com/forum/#!msg/spark-users/eu9RJc3nQng/-T6wmcjMFiwJ)). My hypothesis is that this feature is rarely useful, and often leads to more confusion for users and potentially less stability. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1669 SPARK-1379][SQL][WIP] Made Schem...
Github user liancheng closed the pull request at: https://github.com/apache/spark/pull/829 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2399] Add support for LZ4 compression.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1416#issuecomment-49003237 QA results for PR 1416:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass LZ4CompressionCodec(conf: SparkConf) extends CompressionCodec {brbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/1/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2399] Add support for LZ4 compression.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1416#issuecomment-49003425 QA results for PR 1416:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass LZ4CompressionCodec(conf: SparkConf) extends CompressionCodec {brbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16667/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2399] Add support for LZ4 compression.
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1416#issuecomment-49005453 Ok merging this in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2399] Add support for LZ4 compression.
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1416 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49005728 weird that test failures - unrelated to this change --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49005818 ah yes, blocksize is only used during compression time : and inferred from stream during decompression. Then only class name should be sufficient --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49005883 Yea the test failure isn't related. If there is no objection, I'm going to merge this tomorrow. I will file a jira ticket so we can prepend compression codec information to compressed data and then perhaps pick compression codec during decompression based on that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-49006034 hi @lirui-intel looks good to me ! Will merge when I get my laptop working again - unfortunate state of affairs :-) In meantime, if @pwendell or someone else could merge this, that would be great too ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49006312 Cant comment on tachyon since we dont use it and have no experience with it unfortunately. I am fine with this change for the rest. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Added LZ4 to compression codec in configuratio...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/1417 Added LZ4 to compression codec in configuration page. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark lz4 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1417.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1417 commit 9cf0b2f389deeef898a6b75878aae1e3ec88 Author: Reynold Xin r...@apache.org Date: 2014-07-15T09:03:36Z Added LZ4 to compression codec in configuration page. commit 472f6a130c4454f2b0ae3716a811168b2d322e7b Author: Reynold Xin r...@apache.org Date: 2014-07-15T09:05:01Z Set the proper default. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] SPARK-2360: CSV import to SchemaRDDs
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1351#issuecomment-49007980 QA results for PR 1351:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16668/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2477][MLlib] Using appendBias for addin...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1410#issuecomment-49008039 LGTM. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2477][MLlib] Using appendBias for addin...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1410 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [MLLIB] [SPARK-2222] Add multiclass evaluation...
Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/1155#issuecomment-49009673 @mengxr I've addressed your comments. Thanks for pointing me to the Scala issue --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [MLLIB] [SPARK-2222] Add multiclass evaluation...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1155#issuecomment-49009683 QA tests have started for PR 1155. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16670/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [MLLIB] [SPARK-2222] Add multiclass evaluation...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1155#issuecomment-49011680 @avulanov I made some minor updates and send you a PR at https://github.com/avulanov/spark/pull/1 . If it looks good to you, please merge that PR and the changes should show up here. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [MLLIB] [SPARK-2222] Add multiclass evaluation...
Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/1155#issuecomment-49012039 @mengxr done! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [MLLIB] [SPARK-2222] Add multiclass evaluation...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1155#issuecomment-49012327 QA tests have started for PR 1155. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16671/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1215: Clustering: Index out of bounds er...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1407#discussion_r14925960 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LocalKMeans.scala --- @@ -59,6 +59,11 @@ private[mllib] object LocalKMeans extends Logging { cumulativeScore += weights(j) * KMeans.pointCost(curCenters, points(j)) j += 1 } + if (j == 0) { +logWarning(kMeansPlusPlus initialization ran out of distinct points for centers. + + s Using duplicate point for center k = $i.) +j = 1 --- End diff -- The code may be clearer if written in this way ~~~ centers(i) = if (j == 0) { logWarning(...) points(0).toDense } else { points(j - 1).toDense } ~~~ or ~~~ if (j == 0) { logWarning(...) centers(i) = points(0).toDense } else { centers(i) = points(j - 1).toDense } ~~~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/1393#discussion_r14926032 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -53,7 +53,7 @@ class MatrixFactorizationModel private[mllib] ( * @param usersProducts RDD of (user, product) pairs. * @return RDD of Ratings. */ - def predict(usersProducts: RDD[(Int, Int)]): RDD[Rating] = { + def predict(usersProducts: RDD[(Long, Long)]): RDD[Rating] = { --- End diff -- I had understood all of this to be an `@Experimental` API, though it is not consistently marked. For example, `Rating` is experimental but its API is actually bound to this API here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1215: Clustering: Index out of bounds er...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1407#discussion_r14926024 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/clustering/KMeansSuite.scala --- @@ -61,6 +61,30 @@ class KMeansSuite extends FunSuite with LocalSparkContext { assert(model.clusterCenters.head === center) } + test(no distinct points) { +val data = sc.parallelize(Array( + Vectors.dense(1.0, 2.0, 3.0), + Vectors.dense(1.0, 2.0, 3.0), + Vectors.dense(1.0, 2.0, 3.0) +)) +val center = Vectors.dense(1.0, 2.0, 3.0) + +// Make sure code runs. +var model = KMeans.train(data, k=2, maxIterations=1) +assert(model.clusterCenters.size === 2) + } + + test(more clusters than points) { +val data = sc.parallelize(Array( + Vectors.dense(1.0, 2.0, 3.0), + Vectors.dense(1.0, 3.0, 4.0) +)) --- End diff -- ditto --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1215: Clustering: Index out of bounds er...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1407#discussion_r14926012 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/clustering/KMeansSuite.scala --- @@ -61,6 +61,30 @@ class KMeansSuite extends FunSuite with LocalSparkContext { assert(model.clusterCenters.head === center) } + test(no distinct points) { +val data = sc.parallelize(Array( + Vectors.dense(1.0, 2.0, 3.0), + Vectors.dense(1.0, 2.0, 3.0), + Vectors.dense(1.0, 2.0, 3.0) +)) --- End diff -- add `, 2` to test two partitions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1215: Clustering: Index out of bounds er...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1407#issuecomment-49012803 @jkbradley The fix looks good to me except some minor style issues. Thanks for fixing it! Btw, please add `[MLLIB]` to the title so this is easy to find. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49013972 Yes you could also tell callers to track their own user-ID mapping and maintain it consistently everywhere. Callers have to share that state then somehow. Hashing is easier, and 64 bits makes it work for practical purposes. A caller has to do something like these to deal with real-world identifiers because an `Int` ID API by itself doesn't quite work. This is an instance of a meta-concern I have, if an API which (from my perspective) is going to be problematic at scale is already unchangeable before battle-testing. (I actually thought all of MLlib was de facto `@Experimental`?) Yeah however you can layer on other APIs to fix it, or use `@deprecated` in cases like this to keep existing methods but add new signatures too. I think that would be the simplest solution to this particular concern. The question of serialized size is still out there. That is worth weighing in on. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [MLLIB] [SPARK-2222] Add multiclass evaluation...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1155#issuecomment-49020152 QA results for PR 1155:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass MulticlassMetrics(predictionAndLabels: RDD[(Double, Double)]) {br* (equals to precision for multiclass classifierbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16671/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/1418 [SPARK-2490] Change recursive visiting on RDD dependencies to iterative approach When performing some transformations on RDDs after many iterations, the dependencies of RDDs could be very long. It can easily cause StackOverflowError when recursively visiting these dependencies in Spark core. For example: var rdd = sc.makeRDD(Array(1)) for (i - 1 to 1000) { rdd = rdd.coalesce(1).cache() rdd.collect() } This PR changes recursive visiting on rdd's dependencies to iterative approach to avoid StackOverflowError. In addition to the recursive visiting, since the Java serializer has a known [bug](http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4152790) that causes StackOverflowError too when serializing/deserializing a large graph of objects. So applying this PR only solves part of the problem. Using KryoSerializer to replace Java serializer might be helpful. However, since KryoSerializer is not supported for `spark.closure.serializer` now, I can not test if KryoSerializer can solve Java serializer's problem completely. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 remove_recursive_visit Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1418.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1418 commit 900538bbcb61683bf1418534c2466463a630569f Author: Liang-Chi Hsieh vii...@gmail.com Date: 2014-07-15T10:58:45Z change recursive visiting on rdd's dependencies to iterative approach to avoid stackoverflowerror. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1418#issuecomment-49022775 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2486: Utils.getCallSite is now resilient...
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1413#issuecomment-49023679 @aarondav @pwendell Yes, with this patch I'm able to enable the YourKit features that were causing crashes before. I'll submit an update to fix the bracket style and cc you both. Thanks for the quick review! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Reformat multi-line closure argument.
GitHub user willb opened a pull request: https://github.com/apache/spark/pull/1419 Reformat multi-line closure argument. You can merge this pull request into a Git repository by running: $ git pull https://github.com/willb/spark reformat-2486 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1419.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1419 commit 26762310ddf0ea88a418c506ed0e86892fe6e4d5 Author: William Benton wi...@redhat.com Date: 2014-07-15T12:35:13Z Reformat multi-line closure argument. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Reformat multi-line closure argument.
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1419#issuecomment-49024982 (See discussion on #1413; cc @aarondav and @pwendell.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Reformat multi-line closure argument.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1419#issuecomment-49025080 QA tests have started for PR 1419. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16672/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-49029010 QA results for PR 1269:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass Document(val tokens: SparseVector[Int], val alphabetSize: Int) extends Serializablebrclass DocumentParameters(val document: Document, val theta: Array[Float],brclass GlobalCounters(val wordsFromTopics: Array[Array[Float]], val alphabetSize: Int)brclass GlobalParameters(val phi : Array[Array[Float]], val alphabetSize : Int)brclass PLSA(@transient protected val sc: SparkContext,brclass RobustDocumentParameters(document: Document,brclass RobustGlobalCounters(wordsFromTopic: Array[Array[Float]],brclass RobustGlobalParameters(phi : Array[Array[Float]],brclass RobustPLSA(@transient protected val sc: SparkContext,brtrait SparseVectorFasterSum {brtrait DocumentOverTopicDistributionRegularizer extends Serializable with MatrixInPlaceModification {brtrait MatrixInPlaceModification {brclass SymmetricDirich letDocumentOverTopicDistributionRegularizer(protected val alpha: Float)brtrait SymmetricDirichletHelper {brclass SymmetricDirichletTopicRegularizer(protected val alpha: Float) extends TopicsRegularizerbrtrait TopicsRegularizer extends MatrixInPlaceModification {brclass UniformDocumentOverTopicRegularizer extends DocumentOverTopicDistributionRegularizer {brclass UniformTopicRegularizer extends TopicsRegularizer {brclass TObjectIntHashMapSerializer extends Serializer[TObjectIntHashMap[Object]] {brbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16673/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-49030237 QA tests have started for PR 1269. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16674/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2492][Streaming] kafkaReceiver minor ch...
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/1420 [SPARK-2492][Streaming] kafkaReceiver minor changes to align with Kafka 0.8 Update the KafkaReceiver's behavior when auto.offset.reset is set to smallest, which is aligned with Kafka 0.8 ConsoleConsumer. Also using Kafka offered API to replace with previous code. @tdas, would you please review this PR? Thanks a lot. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerryshao/apache-spark kafka-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1420.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1420 commit ed2d54001f3cf9baa6d40023bd326df5ebd90f14 Author: jerryshao saisai.s...@intel.com Date: 2014-07-15T12:56:15Z Changes to align with Kafka 0.8 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2492][Streaming] kafkaReceiver minor ch...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1420#issuecomment-49032122 QA tests have started for PR 1420. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16675/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2480: Resolve sbt warnings NOTE: SPARK_...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1404#issuecomment-49032797 QA tests have started for PR 1404. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16676/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1470,SPARK-1842] Use the scala-logging ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1369#issuecomment-49034870 QA tests have started for PR 1369. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16679/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2480: Resolve sbt warnings NOTE: SPARK_...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1404#issuecomment-49033458 QA tests have started for PR 1404. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16677/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2480: Resolve sbt warnings NOTE: SPARK_...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1404#issuecomment-49034147 QA tests have started for PR 1404. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16678/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1291: Link the spark UI to RM ui in yarn...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/1112#discussion_r14935222 --- Diff: yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ExecutorLauncher.scala --- @@ -82,6 +84,9 @@ class ExecutorLauncher(args: ApplicationMasterArguments, conf: Configuration, sp case x: DisassociatedEvent = logInfo(sDriver terminated or disconnected! Shutting down. $x) driverClosed = true + case x: AddWebUIFilter = --- End diff -- Can you make the same changes for yarn alpha mode also please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1291: Link the spark UI to RM ui in yarn...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/1112#issuecomment-49036292 @witgo this looks good could you also add support for setting it in yarn alpha mode, sorry I missed that in earlier reviews. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Reformat multi-line closure argument.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1419#issuecomment-49037229 QA results for PR 1419:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16672/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/1094#discussion_r14936355 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala --- @@ -132,4 +135,17 @@ object YarnSparkHadoopUtil { } } + def getUIHistoryAddress(sc: SparkContext, conf: SparkConf) : String = { +val eventLogDir = sc.eventLogger match { + case Some(logger) = logger.logDir.split(/).last --- End diff -- I think it would be better to add a routine to the eventLogger to just give us the name of the directory rather then us splitting it and it possibly breaking in the future. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/1094#issuecomment-49039069 This PR conflicts with pr1112. I would like to put that one in first and then upmerge this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/1094#discussion_r14936811 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala --- @@ -172,6 +172,8 @@ class HistoryServer( object HistoryServer { private val conf = new SparkConf + val UI_PATH_PREFIX = /history/ --- End diff -- If we are adding this we should also use it in this file to set the path. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1291: Link the spark UI to RM ui in yarn...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1112#issuecomment-49044799 QA tests have started for PR 1112. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16680/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1291: Link the spark UI to RM ui in yarn...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1112#issuecomment-49045150 @tgravescs The code has been submitted. Because I don't have the hadoop 0.23.x cluster, the code no strict test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/634#discussion_r14940726 --- Diff: yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -416,19 +407,8 @@ object ApplicationMaster extends Logging { // This is to ensure that we have reasonable number of containers before we start --- End diff -- we can remove this whole comment block now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/634#discussion_r14940751 --- Diff: yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -370,7 +359,6 @@ object ApplicationMaster extends Logging { // This is to ensure that we have reasonable number of containers before we start --- End diff -- same here, we can remove the comment block. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1890 and SPARK-1891- add admin and modif...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/1196#discussion_r14941021 --- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala --- @@ -169,18 +192,43 @@ private[spark] class SecurityManager(sparkConf: SparkConf) extends Logging { ) } - private[spark] def setViewAcls(defaultUsers: Seq[String], allowedUsers: String) { -viewAcls = (defaultUsers ++ allowedUsers.split(',')).map(_.trim()).filter(!_.isEmpty).toSet + /** + * Split a comma separated String, filter out any empty items, and return a Set of strings + */ + private def stringToSet(list: String): Set[String] = { +(list.split(',')).map(_.trim()).filter(!_.isEmpty).toSet --- End diff -- removed a couple. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1890 and SPARK-1891- add admin and modif...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/1196#discussion_r14941058 --- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala --- @@ -169,18 +192,43 @@ private[spark] class SecurityManager(sparkConf: SparkConf) extends Logging { ) } - private[spark] def setViewAcls(defaultUsers: Seq[String], allowedUsers: String) { -viewAcls = (defaultUsers ++ allowedUsers.split(',')).map(_.trim()).filter(!_.isEmpty).toSet + /** + * Split a comma separated String, filter out any empty items, and return a Set of strings + */ + private def stringToSet(list: String): Set[String] = { +(list.split(',')).map(_.trim()).filter(!_.isEmpty).toSet + } + + private[spark] def setViewAcls(defaultUsers: Set[String], allowedUsers: String) { --- End diff -- no I don't believe it is needed, I'll remove them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2492][Streaming] kafkaReceiver minor ch...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1420#issuecomment-49046694 QA results for PR 1420:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16675/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---