[GitHub] spark pull request: update spark.default.parallelism
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/389#issuecomment-40448460 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: style fix
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/411#discussion_r11620940 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -678,7 +678,7 @@ private[spark] class BlockManager( case ArrayBufferValues(array) = tachyonStore.putValues(blockId, array, level, false) case ByteBufferValues(bytes) = { --- End diff -- can you remove the { and } here also? thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: style fix
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/411#issuecomment-40448496 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: update spark.default.parallelism
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/389#issuecomment-40448600 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: update spark.default.parallelism
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/389#issuecomment-40448594 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: style fix
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/411#issuecomment-40448662 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: style fix
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/411#issuecomment-40448808 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1426: Make MLlib work with NumPy version...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/391#issuecomment-40449078 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1456 Remove view bounds on Ordered in fa...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/410#discussion_r11621180 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -89,12 +89,14 @@ class HashPartitioner(partitions: Int) extends Partitioner { * A [[org.apache.spark.Partitioner]] that partitions sortable records by range into roughly * equal ranges. The ranges are determined by sampling the content of the RDD passed in. */ -class RangePartitioner[K % Ordered[K]: ClassTag, V]( +class RangePartitioner[K : Ordering : ClassTag, V]( --- End diff -- Yes, when de-sugared there is an implicit Ordering -- that's why Michael can recover and bind it explicitly at line 98. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: style fix
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/411#discussion_r11621270 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -678,7 +678,7 @@ private[spark] class BlockManager( case ArrayBufferValues(array) = tachyonStore.putValues(blockId, array, level, false) case ByteBufferValues(bytes) = { --- End diff -- if you don't mind changing those that'd be great! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: remove unnecessary brace and semicolon in 'put...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/411#issuecomment-40451066 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14137/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1456 Remove view bounds on Ordered in fa...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/410#issuecomment-40451062 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: remove unnecessary brace and semicolon in 'put...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/411#issuecomment-40451060 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1456 Remove view bounds on Ordered in fa...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/410#issuecomment-40451063 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14135/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: remove unnecessary brace and semicolon in 'put...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/411#issuecomment-40451067 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: update spark.default.parallelism
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/389#issuecomment-40451065 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14136/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1426: Make MLlib work with NumPy version...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/391#issuecomment-40451261 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Include stack trace for exceptions thrown by u...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/409#issuecomment-40451260 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1426: Make MLlib work with NumPy version...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/391#issuecomment-40451262 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14139/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Include stack trace for exceptions thrown by u...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/409#issuecomment-40451263 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14138/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1374: PySpark API for SparkSQL
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/363#issuecomment-40451813 I've merged this. Thanks @ahirreddy - cool stuff! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Make spark logo link refer to /.
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/408#issuecomment-40451991 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Make spark logo link refer to /.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/408#issuecomment-40452091 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Make spark logo link refer to /.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/408#issuecomment-40452101 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/353#issuecomment-40452444 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1374: PySpark API for SparkSQL
Github user ahirreddy commented on the pull request: https://github.com/apache/spark/pull/363#issuecomment-40452665 Awesome, thanks!â Sent from Mailbox for iPhone On Tue, Apr 15, 2014 at 12:16 AM, asfgit notificati...@github.com wrote: Closed #363 via c99bcb7feaa761c5826f2e1d844d0502a3b79538. --- Reply to this email directly or view it on GitHub: https://github.com/apache/spark/pull/363 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1426: Make MLlib work with NumPy version...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/391#issuecomment-40452638 Thanks Sandeep! I've merged this in. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/353#issuecomment-40452733 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Make distribution
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/412 Make distribution You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark make_distribution Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/412.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #412 commit 709d71945a75a11fea6d91fd97ded30bd98b2950 Author: witgo wi...@qq.com Date: 2014-04-15T07:26:30Z add with-hive argument to make-distribution.sh commit 6d344c8e35f28a2bb1063bbd24057e256d3fa2f2 Author: witgo wi...@qq.com Date: 2014-04-15T07:29:29Z add with-hive argument to make-distribution.sh --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: add with-hive argument to make-distribution.sh
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/412#issuecomment-40453310 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: remove unnecessary brace and semicolon in 'put...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/411#issuecomment-40453437 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14140/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: remove unnecessary brace and semicolon in 'put...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/411#issuecomment-40453434 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1497. Fix scalastyle warnings in YARN, H...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/413#issuecomment-40454904 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1497. Fix scalastyle warnings in YARN, H...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/413#issuecomment-40460591 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1497. Fix scalastyle warnings in YARN, H...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/413#issuecomment-40467603 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1497. Fix scalastyle warnings in YARN, H...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/413#issuecomment-40467611 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1235: manage the DAGScheduler EventProce...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/186#issuecomment-40470083 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1235: manage the DAGScheduler EventProce...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/186#issuecomment-40470091 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1497. Fix scalastyle warnings in YARN, H...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/413#issuecomment-40473966 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/407#discussion_r11631197 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -167,11 +169,24 @@ class ALS private ( this.numBlocks } -val partitioner = new HashPartitioner(numBlocks) +// Hash an integer to propagate random bits at all positions, similar to java.util.HashTable +def hash(x: Int): Int = { --- End diff -- That hash function was already there. I moved it up a few lines. I can change the hash function as well if you like... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1235: manage the DAGScheduler EventProce...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/186#issuecomment-40475281 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/407#discussion_r11631405 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -96,6 +97,7 @@ class ALS private ( private var lambda: Double, private var implicitPrefs: Boolean, private var alpha: Double, +private var partitioner: Partitioner = null, --- End diff -- I'll do this, but note that this adds considerable functionality above what's described in the bug I'm supposed to address. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1497. Fix scalastyle warnings in YARN, H...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/413#issuecomment-40478717 Different unrelated failures each time. One more roll of the dice. Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1357 (addendum). More Experimental items...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/372#issuecomment-40482466 @mengxr Nice one, that is music to my ears. I just suggest that if you agree, to mark a few more of these parts of MLlib as Experimental in order to give you the freedom to make these changes later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: improve the readability of SparkContext.scala
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/414 improve the readability of SparkContext.scala You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark SparkContext Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/414.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #414 commit a9d7cd0b4e6cf9e2c08a22cdd9e4d61b86ec55bc Author: witgo wi...@qq.com Date: 2014-04-14T07:57:08Z improve the readability of SparkContext code --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1501: Ensure assertions in Graph.apply a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/415#issuecomment-40489083 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1501: Ensure assertions in Graph.apply a...
GitHub user willb opened a pull request: https://github.com/apache/spark/pull/415 SPARK-1501: Ensure assertions in Graph.apply are asserted. The Graph.apply test in GraphSuite had some assertions in a closure in a graph transformation. As a consequence, these assertions never actually executed. Furthermore, these closures had a reference to (non-serializable) test harness classes because they called assert(), which could be a problem if we proactively check closure serializability in the future. This commit simply changes the Graph.apply test to collect the graph triplets so it can assert about each triplet from a map method. You can merge this pull request into a Git repository by running: $ git pull https://github.com/willb/spark graphsuite-nop-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/415.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #415 commit 0b636586b797546ce0cf78dbbfbe7462712aeaa4 Author: William Benton wi...@redhat.com Date: 2014-03-14T16:40:56Z Ensure assertions in Graph.apply are asserted. The Graph.apply test in GraphSuite had some assertions in a closure in a graph transformation. As a consequence, these assertions never actually executed. Furthermore, these closures had a reference to (non-serializable) test harness classes because they called assert(), which could be a problem if we proactively check closure serializability in the future. This commit simply changes the Graph.apply test to collect the graph triplets so it can assert about each triplet from a map method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1501: Ensure assertions in Graph.apply a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/415#issuecomment-40489064 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1235: manage the DAGScheduler EventProce...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/186#issuecomment-40491073 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1235: manage the DAGScheduler EventProce...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/186#issuecomment-40491050 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1462: Examples of ML algorithms are usin...
GitHub user techaddict opened a pull request: https://github.com/apache/spark/pull/416 SPARK-1462: Examples of ML algorithms are using deprecated APIs This is a work in progress any comments are welcome. This will also fix SPARK-1464: Update MLLib Examples to Use Breeze. You can merge this pull request into a Git repository by running: $ git pull https://github.com/techaddict/spark 1462 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/416.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #416 commit e7edc4af33cce17729176bf5f2270f38b15aad49 Author: Sandeep sand...@techaddict.me Date: 2014-04-15T14:53:15Z LocalLR uses breeze.linalg.Vector and DenseVector --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1462: Examples of ML algorithms are usin...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/416#issuecomment-40495701 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1501: Ensure assertions in Graph.apply a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/415#issuecomment-40499267 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1501: Ensure assertions in Graph.apply a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/415#issuecomment-40499268 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14148/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Decision Tree documentation for MLlib programm...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/402#issuecomment-40501557 @manishamde Thanks for writing decision tree documentation! There are some minor issues, but not worth another iteration. Do you mind me merging this first and then making minor updates? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1465: Spark compilation is broken with t...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/396#discussion_r11642707 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala --- @@ -17,10 +17,19 @@ package org.apache.spark.deploy.yarn +import java.util.Map +import java.util.regex.Matcher +import java.util.regex.Pattern + +import scala.collection.mutable.HashMap +import scala.collection.mutable.Map + import org.apache.hadoop.io.Text import org.apache.hadoop.mapred.JobConf import org.apache.hadoop.security.Credentials import org.apache.hadoop.security.UserGroupInformation +import org.apache.hadoop.util.Shell --- End diff -- This doesn't really matter now but this also doesn't compile for 0.23. Please make sure to try it on both 0.23 and 2.x builds. If you don't have those environments let me know. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1462: Examples of ML algorithms are usin...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/416#issuecomment-40502368 @techaddict Thanks for work on this JIRA! Since we try to hide breeze types in MLlib, I'm not sure whether we should use breeze vectors directly in examples. We might choose either using breeze vectors in examples and leaving a note about their usage in MLlib, or implementing necessary operations in MLlib's vectors to be used in examples. I prefer the former given the time frame. @mateiz what do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1462: Examples of ML algorithms are usin...
Github user techaddict commented on the pull request: https://github.com/apache/spark/pull/416#issuecomment-40502741 @mengxr Ya i was thinking that too, as eventually we'll need function's like squaredDist(in KMeans Examples) implemented in mllib. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1235: manage the DAGScheduler EventProce...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/186#issuecomment-40502679 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/407#discussion_r11643032 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -167,11 +169,24 @@ class ALS private ( this.numBlocks } -val partitioner = new HashPartitioner(numBlocks) +// Hash an integer to propagate random bits at all positions, similar to java.util.HashTable +def hash(x: Int): Int = { --- End diff -- Yes, let's change it. We need a fast hash function but not necessarily high quality. Using an existing implementation can simplify the code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/407#discussion_r11643157 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -96,6 +97,7 @@ class ALS private ( private var lambda: Double, private var implicitPrefs: Boolean, private var alpha: Double, +private var partitioner: Partitioner = null, --- End diff -- Okay, let's make it simpler. Just a single partitioner for both users and products, customizable via a setter function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1462: Examples of ML algorithms are usin...
Github user techaddict commented on the pull request: https://github.com/apache/spark/pull/416#issuecomment-40506793 @srowen i think we need to implements some additional function's to `linalg.Vector` like squaredDist (supported by `util.Vector`) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Loads test tables when running sbt hive/conso...
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/417 Loads test tables when running sbt hive/console without HIVE_DEV_HOME When running Hive tests, the working directory is `$SPARK_HOME/sql/hive`, while when running `sbt hive/console`, it becomes `$SPARK_HOME`, and test tables are not loaded if `HIVE_DEV_HOME` is not defined. You can merge this pull request into a Git repository by running: $ git pull https://github.com/liancheng/spark loadTestTables Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/417.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #417 commit 7cea8d668248c0c39225931c52baa39d42217b23 Author: Cheng Lian lian.cs@gmail.com Date: 2014-04-15T16:53:43Z Loads test tables when running sbt hive/console without HIVE_DEV_HOME --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1501: Ensure assertions in Graph.apply a...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/415#issuecomment-40510777 lgtm. merged. thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1310: Start adding k-fold cross validati...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/18#issuecomment-40514510 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1310: Start adding k-fold cross validati...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/18#issuecomment-40514490 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Decision Tree documentation for MLlib programm...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/402#issuecomment-40515218 Looks good - thanks I've merged this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Loads test tables when running sbt hive/conso...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/417#issuecomment-40515442 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Loads test tables when running sbt hive/conso...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/417#issuecomment-40515443 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14150/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Generalize pattern for planning hash joins.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/418#issuecomment-40516161 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Generalize pattern for planning hash joins.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/418#issuecomment-40516180 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Generalize pattern for planning hash joins.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/418#issuecomment-40516364 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14152/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Generalize pattern for planning hash joins.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/418#issuecomment-40516363 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Generalize pattern for planning hash joins.
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/418 Generalize pattern for planning hash joins. This will be helpful for [SPARK-1495](https://issues.apache.org/jira/browse/SPARK-1495) and other cases where we want to have custom hash join implementations but don't want to repeat the logic for finding the join keys. You can merge this pull request into a Git repository by running: $ git pull https://github.com/marmbrus/spark hashFilter Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/418.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #418 commit 165387dc03d23557a494d18992eb0f4c165fd20b Author: Michael Armbrust mich...@databricks.com Date: 2014-04-15T18:16:49Z Move common functions to PredicateHelper. commit d4ebf124921e838557eebb7eb2175c59865f1ffa Author: Michael Armbrust mich...@databricks.com Date: 2014-04-15T18:17:24Z Generalize pattern for planning hash joins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Loads test tables when running sbt hive/conso...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/417#issuecomment-40517020 This change seems reasonable. I think we should leave `HIVE_DEV_HOME` though. The point is to easily allow you to override the built in tests when we want to upgrade or run against a different version of hive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...
Github user dbtsai closed the pull request at: https://github.com/apache/spark/pull/353 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1465: Spark compilation is broken with t...
Github user xgong commented on the pull request: https://github.com/apache/spark/pull/396#issuecomment-40518846 @tgravescs Would you mind to review this again ? Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Decision Tree documentation for MLlib programm...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/402 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1310: Start adding k-fold cross validati...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/18#issuecomment-40523303 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14151/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1310: Start adding k-fold cross validati...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/18#issuecomment-40523300 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1310: Start adding k-fold cross validati...
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/18#issuecomment-40523420 I merged master in and fixed the conflicts, it should be good to merge now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1465: Spark compilation is broken with t...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/396#discussion_r11653067 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala --- @@ -73,4 +81,61 @@ object YarnSparkHadoopUtil { def getLoggingArgsForContainerCommandLine(): String = { -Dlog4j.configuration=log4j-spark-container.properties } + + def addToEnvironment( + env: HashMap[String, String], + variable: String, + value: String, + classPathSeparator: String) = { +var envVariable = +if (env.get(variable) == None) { + envVariable = value +} else { + envVariable = env.get(variable).get + classPathSeparator + value +} +env put (StringInterner.weakIntern(variable), StringInterner.weakIntern(envVariable)) + } + + def setEnvFromInputString( + env: HashMap[String, String], + envString: String, + classPathSeparator: String) = { +if (envString != null envString.length() 0) { + var childEnvs = envString.split(,) + var p = Pattern.compile(getEnvironmentVariableRegex()) + for (cEnv - childEnvs) { +var parts = cEnv.split(=) // split on '=' --- End diff -- @sryza @tgravescs does Hadoop not support env variables that have `=` inside of quoted strings? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1004. PySpark on YARN
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/30#discussion_r11653646 --- Diff: python/pyspark/context.py --- @@ -130,6 +130,13 @@ def __init__(self, master=None, appName=None, sparkHome=None, pyFiles=None, varName = k[len(spark.executorEnv.):] self.environment[varName] = v +# Check if we're running on YARN: +if self.master == yarn-client: +if not os.environ.get(SPARK_JAR): +raise Exception(Must set SPARK_JAR when using yarn-client mode) +if not os.environ.get(PYSPARK_ZIP): --- End diff -- Rather than exposing this to the user, why not just export it in the `./bin/pyspark` script, and there you can fail with a message that says you need to run `make` if the user hasn't done it already. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/407#issuecomment-40525624 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/407#issuecomment-40525634 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/407#discussion_r11653978 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -167,11 +169,24 @@ class ALS private ( this.numBlocks } -val partitioner = new HashPartitioner(numBlocks) +// Hash an integer to propagate random bits at all positions, similar to java.util.HashTable +def hash(x: Int): Int = { --- End diff -- OK; I changed all instances of the hash function to byteswap32. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1235: manage the DAGScheduler EventProce...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/186#issuecomment-40526721 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1004. PySpark on YARN
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/30#issuecomment-40527887 @sryza I think this is looking good. I played around with this on a local yarn install and it worked. The only points are twofold. Could we ditch requiring SPARK_JAR? I'm going to merge a patch shortly that removes that requirement. Also, we just automatically create the pyspark zip file and not expose this to the user? Eventually we'll probably bundle this inside of the Spark assembly... but in the mean time having a thing that just works for users where they don't have to e.g. set environment variables would be nice. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1465: Spark compilation is broken with t...
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/396#discussion_r11655050 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala --- @@ -73,4 +81,61 @@ object YarnSparkHadoopUtil { def getLoggingArgsForContainerCommandLine(): String = { -Dlog4j.configuration=log4j-spark-container.properties } + + def addToEnvironment( + env: HashMap[String, String], + variable: String, + value: String, + classPathSeparator: String) = { +var envVariable = +if (env.get(variable) == None) { + envVariable = value +} else { + envVariable = env.get(variable).get + classPathSeparator + value +} +env put (StringInterner.weakIntern(variable), StringInterner.weakIntern(envVariable)) + } + + def setEnvFromInputString( + env: HashMap[String, String], + envString: String, + classPathSeparator: String) = { +if (envString != null envString.length() 0) { + var childEnvs = envString.split(,) + var p = Pattern.compile(getEnvironmentVariableRegex()) + for (cEnv - childEnvs) { +var parts = cEnv.split(=) // split on '=' --- End diff -- I've noticed this as an issue as well. There's definitely room for improvement here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1497. Fix scalastyle warnings in YARN, H...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/413#issuecomment-40529664 Thanks for looking at this! To add this to the test harness you can augment `dev/scalastyle` with two additional checks: ``` SPARK_YARN=true sbt/sbt yarn/scalastyle scalastyle.txt SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt yarn/scalastyle scalastyle.txt ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1497. Fix scalastyle warnings in YARN, H...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/413#issuecomment-40532258 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1497. Fix scalastyle warnings in YARN, H...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/413#issuecomment-40532267 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Make spark logo link refer to /.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/408#issuecomment-40532862 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/407#issuecomment-40533842 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/407#issuecomment-40533843 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14153/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Loads test tables when running sbt hive/conso...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/417#issuecomment-40534041 @pwendell, this can be merged. No reason not to include in 1.0 as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Make spark logo link refer to /.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/408#issuecomment-40534045 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1497. Fix scalastyle warnings in YARN, H...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/413#issuecomment-40539374 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1497. Fix scalastyle warnings in YARN, H...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/413#discussion_r11665116 --- Diff: dev/scalastyle --- @@ -18,6 +18,10 @@ # echo -e q\n | sbt/sbt clean scalastyle scalastyle.txt +# Check style with YARN alpha built too --- End diff -- Any interest in doing the hive one here too? I edited my comment earlier to show how to do that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [FIX] update sbt-idea to version 1.6.0
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/419#issuecomment-40551810 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---