[jira] [Comment Edited] (SPARK-17020) Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data

2016-08-11 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15417301#comment-15417301 ] Roi Reshef edited comment on SPARK-17020 at 8/11/16 2:09 PM: -

[jira] [Comment Edited] (SPARK-17020) Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data

2016-08-11 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15417301#comment-15417301 ] Roi Reshef edited comment on SPARK-17020 at 8/11/16 2:09 PM: -

[jira] [Commented] (SPARK-17020) Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data

2016-08-11 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15417301#comment-15417301 ] Roi Reshef commented on SPARK-17020: Nevertheless, any attempt to repartition the res

[jira] [Commented] (SPARK-17020) Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data

2016-08-11 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15417288#comment-15417288 ] Roi Reshef commented on SPARK-17020: The problem occurs only when calling **.rdd** on

[jira] [Commented] (SPARK-17020) Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data

2016-08-11 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15417254#comment-15417254 ] Roi Reshef commented on SPARK-17020: Also note that I have just called: *data.cache(

[jira] [Commented] (SPARK-17020) Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data

2016-08-11 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15417250#comment-15417250 ] Roi Reshef commented on SPARK-17020: val ab = SomeReader.read(...) //some reader fun

[jira] [Commented] (SPARK-17020) Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data

2016-08-11 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15417218#comment-15417218 ] Roi Reshef commented on SPARK-17020: [~srowen] Should there be any effect on this if

[jira] [Comment Edited] (SPARK-17020) Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data

2016-08-11 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15417204#comment-15417204 ] Roi Reshef edited comment on SPARK-17020 at 8/11/16 1:13 PM: -

[jira] [Commented] (SPARK-17020) Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data

2016-08-11 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15417204#comment-15417204 ] Roi Reshef commented on SPARK-17020: [~srowen] I have 2 DataFrames that are generated

[jira] [Updated] (SPARK-17020) Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data

2016-08-11 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roi Reshef updated SPARK-17020: --- Affects Version/s: 2.0.0 > Materialization of RDD via DataFrame.rdd forces a poor re-distribution of

[jira] [Updated] (SPARK-17020) Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data

2016-08-11 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roi Reshef updated SPARK-17020: --- Attachment: rdd_cache.PNG dataframe_cache.PNG > Materialization of RDD via DataFrame.

[jira] [Created] (SPARK-17020) Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data

2016-08-11 Thread Roi Reshef (JIRA)
Roi Reshef created SPARK-17020: -- Summary: Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data Key: SPARK-17020 URL: https://issues.apache.org/jira/browse/SPARK-17020 Project: S

[jira] [Comment Edited] (SPARK-10789) Cluster mode SparkSubmit classpath only includes Spark assembly

2015-12-29 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073832#comment-15073832 ] Roi Reshef edited comment on SPARK-10789 at 12/29/15 11:56 AM:

[jira] [Commented] (SPARK-10789) Cluster mode SparkSubmit classpath only includes Spark assembly

2015-12-29 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073832#comment-15073832 ] Roi Reshef commented on SPARK-10789: Thanks [~jonathak]. That requires rebuilding spa

[jira] [Comment Edited] (SPARK-10789) Cluster mode SparkSubmit classpath only includes Spark assembly

2015-12-29 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073832#comment-15073832 ] Roi Reshef edited comment on SPARK-10789 at 12/29/15 11:55 AM:

[jira] [Commented] (SPARK-10789) Cluster mode SparkSubmit classpath only includes Spark assembly

2015-12-21 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066304#comment-15066304 ] Roi Reshef commented on SPARK-10789: Any resolution on that? Can you elaborate more o

[jira] [Issue Comment Deleted] (SPARK-5081) Shuffle write increases

2015-06-15 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roi Reshef updated SPARK-5081: -- Comment: was deleted (was: Hi Guys, Was this issue already solved by any chance? I'm using Spark 1.3.1 f

[jira] [Commented] (SPARK-5081) Shuffle write increases

2015-06-15 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585641#comment-14585641 ] Roi Reshef commented on SPARK-5081: --- Hi Guys, Was this issue already solved by any chanc

[jira] [Comment Edited] (SPARK-5081) Shuffle write increases

2015-06-15 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585641#comment-14585641 ] Roi Reshef edited comment on SPARK-5081 at 6/15/15 8:41 AM: Hi