[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...

2014-09-20 Thread techaddict
Github user techaddict closed the pull request at:

https://github.com/apache/spark/pull/550


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...

2014-09-20 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/550#issuecomment-56289232
  
It sounds like the conclusion here is to close this issue then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...

2014-08-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/550#issuecomment-51286697
  
QA tests have started for PR 550. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17978/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...

2014-08-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/550#issuecomment-51286708
  
QA results for PR 550:- This patch FAILED unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17978/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...

2014-04-26 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/550#issuecomment-41483260
  
I'd rather not add the implicit conversion from int to partitioner, it will 
be very hard to discover on its own. Instead maybe we can just leave this API 
as is. It's strange but there's a good reason for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...

2014-04-26 Thread techaddict
Github user techaddict commented on the pull request:

https://github.com/apache/spark/pull/550#issuecomment-41462407
  
@rxin +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...

2014-04-26 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/550#issuecomment-41462077
  
I have one solution to this, although it is technically an API change, so 
just throwing it out there for discussion. We can remove all the numPartitions: 
Int arguments, and add an implicit conversion from int to HashPartitioner.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...

2014-04-25 Thread techaddict
Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/550#discussion_r12023303
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala
 ---
@@ -267,7 +267,7 @@ class ReceiverTracker(ssc: StreamingContext) extends 
Logging {
   // Run the dummy Spark job to ensure that all slaves have registered.
   // This avoids all the receivers to be scheduled on the same node.
   if (!ssc.sparkContext.isLocal) {
-ssc.sparkContext.makeRDD(1 to 50, 50).map(x => (x, 
1)).reduceByKey(_ + _, 20).collect()
+ssc.sparkContext.makeRDD(1 to 50, 50).map(x => (x, 
1)).reduceByKey((x: Int, y: Int) => x + y, 20).collect()
--- End diff --

@rxin  will fix this as soon as, a decision is made over whether we want to 
do this or not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...

2014-04-25 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/550#issuecomment-41461521
  
I never even realized we had a version of reduceByKey where the first 
argument is not the closure ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...

2014-04-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/550#discussion_r12023299
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala
 ---
@@ -267,7 +267,7 @@ class ReceiverTracker(ssc: StreamingContext) extends 
Logging {
   // Run the dummy Spark job to ensure that all slaves have registered.
   // This avoids all the receivers to be scheduled on the same node.
   if (!ssc.sparkContext.isLocal) {
-ssc.sparkContext.makeRDD(1 to 50, 50).map(x => (x, 
1)).reduceByKey(_ + _, 20).collect()
+ssc.sparkContext.makeRDD(1 to 50, 50).map(x => (x, 
1)).reduceByKey((x: Int, y: Int) => x + y, 20).collect()
--- End diff --

This line is over 100 chars wide


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...

2014-04-25 Thread techaddict
Github user techaddict commented on the pull request:

https://github.com/apache/spark/pull/550#issuecomment-41455521
  
@mateiz I think this only applies with anon function's, thus isn't 
affecting either cogroup or groupByKey.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...

2014-04-25 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/550#issuecomment-41454850
  
CC @rxin, @pwendell


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...

2014-04-25 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/550#issuecomment-41454793
  
Ah, wow, I never knew that. So if one takes a Partitioner first and one 
takes a function, the types are inferred, but if both take a function first, 
they're not?

In that case we *might* want to change our other methods too, like cogroup 
and groupByKey, to take a Partitioner first. Wouldn't this problem also affect 
them?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...

2014-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/550#issuecomment-4139
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...

2014-04-25 Thread techaddict
Github user techaddict commented on the pull request:

https://github.com/apache/spark/pull/550#issuecomment-41388760
  
We'll need to specify the parameter types for function passed to reduceByKey
`reduceByKey((x: Long, y: Long) => x + y, 10)` instead of `reduceByKey(_ + 
_, 10)`
For detailed discussion on compiler issue causing this,
https://groups.google.com/forum/#!topic/scala-user/Qhd3vJ2rAWM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1597: Add a version of reduceByKey that ...

2014-04-25 Thread techaddict
GitHub user techaddict opened a pull request:

https://github.com/apache/spark/pull/550

SPARK-1597: Add a version of reduceByKey that takes the Partitioner as a...

... second argument

Most of our shuffle methods can take a Partitioner or a number of 
partitions as a second argument, but for some reason reduceByKey takes the 
Partitioner as a first argument: 
http://spark.apache.org/docs/0.9.1/api/core/#org.apache.spark.rdd.PairRDDFunctions.
Deprecated that version and added one where the Partitioner is the second 
argument.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/techaddict/spark SPARK-1597

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/550.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #550


commit bac490c6f7301c6472a72db8adfda8bd1d7d6817
Author: Sandeep 
Date:   2014-04-25T07:59:04Z

SPARK-1597: Add a version of reduceByKey that takes the Partitioner as a 
second argument
Most of our shuffle methods can take a Partitioner or a number of 
partitions as a second argument, but for some reason reduceByKey takes the 
Partitioner as a first argument: 
http://spark.apache.org/docs/0.9.1/api/core/#org.apache.spark.rdd.PairRDDFunctions.
Deprecated that version and added one where the Partitioner is the second 
argument.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---