[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...
Github user techaddict closed the pull request at: https://github.com/apache/spark/pull/550 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/550#issuecomment-56289232 It sounds like the conclusion here is to close this issue then. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/550#issuecomment-51286697 QA tests have started for PR 550. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17978/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/550#issuecomment-51286708 QA results for PR 550:- This patch FAILED unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17978/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/550#issuecomment-41483260 I'd rather not add the implicit conversion from int to partitioner, it will be very hard to discover on its own. Instead maybe we can just leave this API as is. It's strange but there's a good reason for it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...
Github user techaddict commented on the pull request: https://github.com/apache/spark/pull/550#issuecomment-41462407 @rxin +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/550#issuecomment-41462077 I have one solution to this, although it is technically an API change, so just throwing it out there for discussion. We can remove all the numPartitions: Int arguments, and add an implicit conversion from int to HashPartitioner. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/550#discussion_r12023303 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala --- @@ -267,7 +267,7 @@ class ReceiverTracker(ssc: StreamingContext) extends Logging { // Run the dummy Spark job to ensure that all slaves have registered. // This avoids all the receivers to be scheduled on the same node. if (!ssc.sparkContext.isLocal) { -ssc.sparkContext.makeRDD(1 to 50, 50).map(x => (x, 1)).reduceByKey(_ + _, 20).collect() +ssc.sparkContext.makeRDD(1 to 50, 50).map(x => (x, 1)).reduceByKey((x: Int, y: Int) => x + y, 20).collect() --- End diff -- @rxin will fix this as soon as, a decision is made over whether we want to do this or not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/550#issuecomment-41461521 I never even realized we had a version of reduceByKey where the first argument is not the closure ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/550#discussion_r12023299 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala --- @@ -267,7 +267,7 @@ class ReceiverTracker(ssc: StreamingContext) extends Logging { // Run the dummy Spark job to ensure that all slaves have registered. // This avoids all the receivers to be scheduled on the same node. if (!ssc.sparkContext.isLocal) { -ssc.sparkContext.makeRDD(1 to 50, 50).map(x => (x, 1)).reduceByKey(_ + _, 20).collect() +ssc.sparkContext.makeRDD(1 to 50, 50).map(x => (x, 1)).reduceByKey((x: Int, y: Int) => x + y, 20).collect() --- End diff -- This line is over 100 chars wide --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...
Github user techaddict commented on the pull request: https://github.com/apache/spark/pull/550#issuecomment-41455521 @mateiz I think this only applies with anon function's, thus isn't affecting either cogroup or groupByKey. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/550#issuecomment-41454850 CC @rxin, @pwendell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/550#issuecomment-41454793 Ah, wow, I never knew that. So if one takes a Partitioner first and one takes a function, the types are inferred, but if both take a function first, they're not? In that case we *might* want to change our other methods too, like cogroup and groupByKey, to take a Partitioner first. Wouldn't this problem also affect them? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/550#issuecomment-4139 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...
Github user techaddict commented on the pull request: https://github.com/apache/spark/pull/550#issuecomment-41388760 We'll need to specify the parameter types for function passed to reduceByKey `reduceByKey((x: Long, y: Long) => x + y, 10)` instead of `reduceByKey(_ + _, 10)` For detailed discussion on compiler issue causing this, https://groups.google.com/forum/#!topic/scala-user/Qhd3vJ2rAWM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...
GitHub user techaddict opened a pull request: https://github.com/apache/spark/pull/550 SPARK-1597: Add a version of reduceByKey that takes the Partitioner as a... ... second argument Most of our shuffle methods can take a Partitioner or a number of partitions as a second argument, but for some reason reduceByKey takes the Partitioner as a first argument: http://spark.apache.org/docs/0.9.1/api/core/#org.apache.spark.rdd.PairRDDFunctions. Deprecated that version and added one where the Partitioner is the second argument. You can merge this pull request into a Git repository by running: $ git pull https://github.com/techaddict/spark SPARK-1597 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/550.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #550 commit bac490c6f7301c6472a72db8adfda8bd1d7d6817 Author: Sandeep Date: 2014-04-25T07:59:04Z SPARK-1597: Add a version of reduceByKey that takes the Partitioner as a second argument Most of our shuffle methods can take a Partitioner or a number of partitions as a second argument, but for some reason reduceByKey takes the Partitioner as a first argument: http://spark.apache.org/docs/0.9.1/api/core/#org.apache.spark.rdd.PairRDDFunctions. Deprecated that version and added one where the Partitioner is the second argument. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---