[jira] [Commented] (SPARK-3789) [GRAPHX] Python bindings for GraphX

2015-11-04 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990006#comment-14990006 ] Glenn Strycker commented on SPARK-3789: --- I posted a similar question on stackoverflow about a year

[jira] [Created] (SPARK-11387) minimize shuffles during joins by using existing partitions and bundling messages

2015-10-28 Thread Glenn Strycker (JIRA)
Glenn Strycker created SPARK-11387: -- Summary: minimize shuffles during joins by using existing partitions and bundling messages Key: SPARK-11387 URL: https://issues.apache.org/jira/browse/SPARK-11387

[jira] [Commented] (SPARK-11387) minimize shuffles during joins by using existing partitions and bundling messages

2015-10-28 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979070#comment-14979070 ] Glenn Strycker commented on SPARK-11387: This ticket may be a particular implementation idea that

[jira] [Commented] (SPARK-6235) Address various 2G limits

2015-10-14 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957554#comment-14957554 ] Glenn Strycker commented on SPARK-6235: --- I don't think so, but I can check. My RDD came from an RDD

[jira] [Updated] (SPARK-11004) MapReduce Hive-like join operations for RDDs

2015-10-08 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Glenn Strycker updated SPARK-11004: --- Description: Could a feature be added to Spark that would use disk-only MapReduce operations

[jira] [Created] (SPARK-11004) MapReduce Hive-like join operations for RDDs

2015-10-08 Thread Glenn Strycker (JIRA)
Glenn Strycker created SPARK-11004: -- Summary: MapReduce Hive-like join operations for RDDs Key: SPARK-11004 URL: https://issues.apache.org/jira/browse/SPARK-11004 Project: Spark Issue Type:

[jira] [Commented] (SPARK-11004) MapReduce Hive-like join operations for RDDs

2015-10-08 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948943#comment-14948943 ] Glenn Strycker commented on SPARK-11004: True, fixing the 2GB will go a long way. However, this

[jira] [Commented] (SPARK-11004) MapReduce Hive-like join operations for RDDs

2015-10-08 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949076#comment-14949076 ] Glenn Strycker commented on SPARK-11004: Currently we could do the following from withing a linux

[jira] [Commented] (SPARK-11004) MapReduce Hive-like join operations for RDDs

2015-10-08 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949171#comment-14949171 ] Glenn Strycker commented on SPARK-11004: So maybe we can simplify this idea down to forcing

[jira] [Commented] (SPARK-11004) MapReduce Hive-like join operations for RDDs

2015-10-08 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949187#comment-14949187 ] Glenn Strycker commented on SPARK-11004: Awesome -- thanks, I'll try that out. Is there a way to

[jira] [Comment Edited] (SPARK-11004) MapReduce Hive-like join operations for RDDs

2015-10-08 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949076#comment-14949076 ] Glenn Strycker edited comment on SPARK-11004 at 10/8/15 6:12 PM: -

[jira] [Comment Edited] (SPARK-11004) MapReduce Hive-like join operations for RDDs

2015-10-08 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949076#comment-14949076 ] Glenn Strycker edited comment on SPARK-11004 at 10/8/15 6:13 PM: -

[jira] [Commented] (SPARK-10735) CatalystTypeConverters MatchError converting RDD with custom object to dataframe

2015-09-24 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906877#comment-14906877 ] Glenn Strycker commented on SPARK-10735: This appears very similar to a problem I had earlier

[jira] [Comment Edited] (SPARK-10735) CatalystTypeConverters MatchError converting RDD with custom object to dataframe

2015-09-24 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906877#comment-14906877 ] Glenn Strycker edited comment on SPARK-10735 at 9/24/15 7:40 PM: - This

[jira] [Commented] (SPARK-4489) JavaPairRDD.collectAsMap from checkpoint RDD may fail with ClassCastException

2015-09-23 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904572#comment-14904572 ] Glenn Strycker commented on SPARK-4489: --- My ticket SPARK-10762 may have just been a user error, but

[jira] [Closed] (SPARK-10762) GenericRowWithSchema exception in casting ArrayBuffer to HashSet in DataFrame to RDD from Hive table

2015-09-23 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Glenn Strycker closed SPARK-10762. -- This probably isn't completely fixed, but should be a new ticket for casting ArrayBuffers

[jira] [Resolved] (SPARK-10762) GenericRowWithSchema exception in casting ArrayBuffer to HashSet in DataFrame to RDD from Hive table

2015-09-23 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Glenn Strycker resolved SPARK-10762. Resolution: Not A Problem Instead of

[jira] [Commented] (SPARK-10762) GenericRowWithSchema exception in casting ArrayBuffer to HashSet in DataFrame to RDD from Hive table

2015-09-23 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904567#comment-14904567 ] Glenn Strycker commented on SPARK-10762: Please see the accepted solution to

[jira] [Commented] (SPARK-1040) Collect as Map throws a casting exception when run on a JavaPairRDD object

2015-09-23 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904571#comment-14904571 ] Glenn Strycker commented on SPARK-1040: --- My ticket SPARK-10762 may have just been a user error, but

[jira] [Created] (SPARK-10762) GenericRowWithSchema exception in casting ArrayBuffer to HashSet in DataFrame to RDD from Hive table

2015-09-22 Thread Glenn Strycker (JIRA)
Glenn Strycker created SPARK-10762: -- Summary: GenericRowWithSchema exception in casting ArrayBuffer to HashSet in DataFrame to RDD from Hive table Key: SPARK-10762 URL:

[jira] [Commented] (SPARK-2737) ClassCastExceptions when collect()ing JavaRDDs' underlying Scala RDDs

2015-09-22 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903542#comment-14903542 ] Glenn Strycker commented on SPARK-2737: --- I am getting a similar error in Spark 1.3.0... see a new

[jira] [Commented] (SPARK-10762) GenericRowWithSchema exception in casting ArrayBuffer to HashSet in DataFrame to RDD from Hive table

2015-09-22 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903551#comment-14903551 ] Glenn Strycker commented on SPARK-10762: Is this related?

[jira] [Commented] (SPARK-1040) Collect as Map throws a casting exception when run on a JavaPairRDD object

2015-09-22 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903541#comment-14903541 ] Glenn Strycker commented on SPARK-1040: --- I am getting a similar error in Spark 1.3.0... see a new

[jira] [Created] (SPARK-10636) RDD filter does not work after if..then..else RDD blocks

2015-09-16 Thread Glenn Strycker (JIRA)
Glenn Strycker created SPARK-10636: -- Summary: RDD filter does not work after if..then..else RDD blocks Key: SPARK-10636 URL: https://issues.apache.org/jira/browse/SPARK-10636 Project: Spark

[jira] [Commented] (SPARK-10636) RDD filter does not work after if..then..else RDD blocks

2015-09-16 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14790681#comment-14790681 ] Glenn Strycker commented on SPARK-10636: I didn't "forget", I believed that "RDD = if {} else {}

[jira] [Closed] (SPARK-10636) RDD filter does not work after if..then..else RDD blocks

2015-09-16 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Glenn Strycker closed SPARK-10636. -- > RDD filter does not work after if..then..else RDD blocks >

[jira] [Commented] (SPARK-10569) Kryo serialization fails on sortByKey operation on registered RDDs

2015-09-11 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741486#comment-14741486 ] Glenn Strycker commented on SPARK-10569: Playing around with adding additional registrations, I

[jira] [Created] (SPARK-10569) Kryo serialization fails on sortByKey operation on registered RDDs

2015-09-11 Thread Glenn Strycker (JIRA)
Glenn Strycker created SPARK-10569: -- Summary: Kryo serialization fails on sortByKey operation on registered RDDs Key: SPARK-10569 URL: https://issues.apache.org/jira/browse/SPARK-10569 Project:

[jira] [Commented] (SPARK-10569) Kryo serialization fails on sortByKey operation on registered RDDs

2015-09-11 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741582#comment-14741582 ] Glenn Strycker commented on SPARK-10569: Is this issue related to HIVE-7540 or SPARK-2421? >

[jira] [Commented] (SPARK-10569) Kryo serialization fails on sortByKey operation on registered RDDs

2015-09-11 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741638#comment-14741638 ] Glenn Strycker commented on SPARK-10569: Note that I am still using 1.3.0. I noticed that

[jira] [Commented] (SPARK-10251) Some internal spark classes are not registered with kryo

2015-09-11 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741639#comment-14741639 ] Glenn Strycker commented on SPARK-10251: I opened a ticket earlier today that might be related to

[jira] [Commented] (SPARK-10569) Kryo serialization fails on sortByKey operation on registered RDDs

2015-09-11 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741554#comment-14741554 ] Glenn Strycker commented on SPARK-10569: I'm also seeing the occasional "User class threw

[jira] [Commented] (SPARK-10569) Kryo serialization fails on sortByKey operation on registered RDDs

2015-09-11 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741627#comment-14741627 ] Glenn Strycker commented on SPARK-10569: It looks very similar to this thread:

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-10 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14738794#comment-14738794 ] Glenn Strycker commented on SPARK-10493: Unfortunately we don't have anything past 1.3.0. We're

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-09 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736869#comment-14736869 ] Glenn Strycker commented on SPARK-10493: The RDD I am using has the form ((String, String),

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-09 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737001#comment-14737001 ] Glenn Strycker commented on SPARK-10493: In this example, our RDDs are partitioned with a hash

[jira] [Comment Edited] (SPARK-10493) reduceByKey not returning distinct results

2015-09-09 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737055#comment-14737055 ] Glenn Strycker edited comment on SPARK-10493 at 9/9/15 3:40 PM: I'm still

[jira] [Updated] (SPARK-10493) reduceByKey not returning distinct results

2015-09-09 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Glenn Strycker updated SPARK-10493: --- Attachment: reduceByKey_example_001.scala I'm still working on checking unit tests and

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-09 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737727#comment-14737727 ] Glenn Strycker commented on SPARK-10493: I already have that added in my code that I'm testing...

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-09 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737735#comment-14737735 ] Glenn Strycker commented on SPARK-10493: Of course. I have count statements everywhere in order

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-09 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737296#comment-14737296 ] Glenn Strycker commented on SPARK-10493: [~srowen], the code I attached did run correctly.

[jira] [Created] (SPARK-10493) reduceByKey not returning distinct results

2015-09-08 Thread Glenn Strycker (JIRA)
Glenn Strycker created SPARK-10493: -- Summary: reduceByKey not returning distinct results Key: SPARK-10493 URL: https://issues.apache.org/jira/browse/SPARK-10493 Project: Spark Issue Type:

[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce

2015-09-08 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735557#comment-14735557 ] Glenn Strycker commented on SPARK-2620: --- I am finding similar behavior for a non-case-class RDD...

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-08 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735626#comment-14735626 ] Glenn Strycker commented on SPARK-10493: Thanks for the speedy follow-up, [~frosner]! I'm

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-08 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735653#comment-14735653 ] Glenn Strycker commented on SPARK-10493: Note: this only seems to be occurring "at scale" so far.

[jira] [Closed] (SPARK-8666) checkpointing does not take advantage of persisted/cached RDDs

2015-06-29 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Glenn Strycker closed SPARK-8666. - checkpointing does not take advantage of persisted/cached RDDs

[jira] [Updated] (SPARK-8666) checkpointing does not take advantage of persisted/cached RDDs

2015-06-26 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Glenn Strycker updated SPARK-8666: -- Description: I have been noticing that when checkpointing RDDs, all operations are occurring

[jira] [Commented] (SPARK-8666) checkpointing does not take advantage of persisted/cached RDDs

2015-06-26 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603175#comment-14603175 ] Glenn Strycker commented on SPARK-8666: --- I added a stackoverflow question to

[jira] [Commented] (SPARK-8666) checkpointing does not take advantage of persisted/cached RDDs

2015-06-26 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603189#comment-14603189 ] Glenn Strycker commented on SPARK-8666: --- Looks like this is ticket is a duplicate of

[jira] [Created] (SPARK-8666) checkpointing does not take advantage of persisted/cached RDDs

2015-06-26 Thread Glenn Strycker (JIRA)
Glenn Strycker created SPARK-8666: - Summary: checkpointing does not take advantage of persisted/cached RDDs Key: SPARK-8666 URL: https://issues.apache.org/jira/browse/SPARK-8666 Project: Spark

[jira] [Commented] (SPARK-8582) Optimize checkpointing to avoid computing an RDD twice

2015-06-26 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603195#comment-14603195 ] Glenn Strycker commented on SPARK-8582: --- I didn't see this ticket and made a

[jira] [Closed] (SPARK-1885) GraphX reduce function not working properly -- returns only 1 element

2014-05-20 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Glenn Strycker closed SPARK-1885. - Resolution: Fixed User needed to use reduceByKey, not reduce GraphX reduce function not

[jira] [Updated] (SPARK-1883) spark graph.triplets does not return correct values

2014-05-19 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Glenn Strycker updated SPARK-1883: -- Summary: spark graph.triplets does not return correct values (was: spark graphx triplets.map

[jira] [Created] (SPARK-1883) spark graphx triplets.map does not return correct values

2014-05-19 Thread Glenn Strycker (JIRA)
Glenn Strycker created SPARK-1883: - Summary: spark graphx triplets.map does not return correct values Key: SPARK-1883 URL: https://issues.apache.org/jira/browse/SPARK-1883 Project: Spark

[jira] [Commented] (SPARK-1883) spark graph.triplets does not return correct values

2014-05-19 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002358#comment-14002358 ] Glenn Strycker commented on SPARK-1883: --- Sorry, this has been fixed --

[jira] [Resolved] (SPARK-1883) spark graph.triplets does not return correct values

2014-05-19 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Glenn Strycker resolved SPARK-1883. --- Resolution: Fixed already fixed -- user is running an old version of Spark spark

[jira] [Created] (SPARK-1885) GraphX reduce function not working properly -- returns only 1 element

2014-05-19 Thread Glenn Strycker (JIRA)
Glenn Strycker created SPARK-1885: - Summary: GraphX reduce function not working properly -- returns only 1 element Key: SPARK-1885 URL: https://issues.apache.org/jira/browse/SPARK-1885 Project: Spark