[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-10 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14738648#comment-14738648 ] Sean Owen commented on SPARK-10493: --- OK, yes I see now that temp4 is count-ed. I'm out of ideas. I

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-10 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14738829#comment-14738829 ] Sean Owen commented on SPARK-10493: --- Maybe union() tides you over; CDH 5.5 = Spark 1.5 is coming in ~2

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-10 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14738794#comment-14738794 ] Glenn Strycker commented on SPARK-10493: Unfortunately we don't have anything past 1.3.0. We're

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736879#comment-14736879 ] Sean Owen commented on SPARK-10493: --- That much should be OK. zipPartitions only makes sense if you

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-09 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736869#comment-14736869 ] Glenn Strycker commented on SPARK-10493: The RDD I am using has the form ((String, String),

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-09 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737001#comment-14737001 ] Glenn Strycker commented on SPARK-10493: In this example, our RDDs are partitioned with a hash

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737051#comment-14737051 ] Sean Owen commented on SPARK-10493: --- I think you still have the same issue with zipPartitions, unless

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737050#comment-14737050 ] Sean Owen commented on SPARK-10493: --- I think you still have the same issue with zipPartitions, unless

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737730#comment-14737730 ] Sean Owen commented on SPARK-10493: --- checkpoint doesn't materialize the RDD, which is why it occurred

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-09 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737727#comment-14737727 ] Glenn Strycker commented on SPARK-10493: I already have that added in my code that I'm testing...

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737681#comment-14737681 ] Sean Owen commented on SPARK-10493: --- If the RDD is a result of reduceByKey, I agree that the keys

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-09 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737735#comment-14737735 ] Glenn Strycker commented on SPARK-10493: Of course. I have count statements everywhere in order

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737252#comment-14737252 ] Sean Owen commented on SPARK-10493: --- What do you mean that it's not collapsing key pairs? the output of

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-09 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737296#comment-14737296 ] Glenn Strycker commented on SPARK-10493: [~srowen], the code I attached did run correctly.

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-08 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735626#comment-14735626 ] Glenn Strycker commented on SPARK-10493: Thanks for the speedy follow-up, [~frosner]! I'm

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-08 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735653#comment-14735653 ] Glenn Strycker commented on SPARK-10493: Note: this only seems to be occurring "at scale" so far.

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-08 Thread Frank Rosner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735598#comment-14735598 ] Frank Rosner commented on SPARK-10493: -- Thanks for submitting the issue, [~glenn.strycker] :) Can

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735772#comment-14735772 ] Sean Owen commented on SPARK-10493: --- There are some key pieces of info missing, like what the key and