[jira] [Commented] (SPARK-2737) ClassCastExceptions when collect()ing JavaRDDs' underlying Scala RDDs
[ https://issues.apache.org/jira/browse/SPARK-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904141#comment-14904141 ] Sean Owen commented on SPARK-2737: -- [~glenn.stryc...@gmail.com] you can use JIRA to link issues if you're pretty sure they're related. It's more visible than in a comment. > ClassCastExceptions when collect()ing JavaRDDs' underlying Scala RDDs > - > > Key: SPARK-2737 > URL: https://issues.apache.org/jira/browse/SPARK-2737 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 0.8.0, 0.9.0, 1.0.0 >Reporter: Josh Rosen >Assignee: Josh Rosen > Fix For: 1.1.0 > > > The Java API's use of fake ClassTags doesn't seem to cause any problems for > Java users, but it can lead to issues when passing JavaRDDs' underlying RDDs > to Scala code (e.g. in the MLlib Java API wrapper code). If we call > {{collect()}} on a Scala RDD with an incorrect ClassTag, this causes > ClassCastExceptions when we try to allocate an array of the wrong type (for > example, see SPARK-2197). > There are a few possible fixes here. An API-breaking fix would be to > completely remove the fake ClassTags and require Java API users to pass > {{java.lang.Class}} instances to all {{parallelize()}} calls and add > {{returnClass}} fields to all {{Function}} implementations. This would be > extremely verbose. > Instead, I propose that we add internal APIs to "repair" a Scala RDD with an > incorrect ClassTag by wrapping it and overriding its ClassTag. This should > be okay for cases where the Scala code that calls {{collect()}} knows what > type of array should be allocated, which is the case in the MLlib wrappers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2737) ClassCastExceptions when collect()ing JavaRDDs' underlying Scala RDDs
[ https://issues.apache.org/jira/browse/SPARK-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903542#comment-14903542 ] Glenn Strycker commented on SPARK-2737: --- I am getting a similar error in Spark 1.3.0... see a new ticket I created: https://issues.apache.org/jira/browse/SPARK-10762 > ClassCastExceptions when collect()ing JavaRDDs' underlying Scala RDDs > - > > Key: SPARK-2737 > URL: https://issues.apache.org/jira/browse/SPARK-2737 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 0.8.0, 0.9.0, 1.0.0 >Reporter: Josh Rosen >Assignee: Josh Rosen > Fix For: 1.1.0 > > > The Java API's use of fake ClassTags doesn't seem to cause any problems for > Java users, but it can lead to issues when passing JavaRDDs' underlying RDDs > to Scala code (e.g. in the MLlib Java API wrapper code). If we call > {{collect()}} on a Scala RDD with an incorrect ClassTag, this causes > ClassCastExceptions when we try to allocate an array of the wrong type (for > example, see SPARK-2197). > There are a few possible fixes here. An API-breaking fix would be to > completely remove the fake ClassTags and require Java API users to pass > {{java.lang.Class}} instances to all {{parallelize()}} calls and add > {{returnClass}} fields to all {{Function}} implementations. This would be > extremely verbose. > Instead, I propose that we add internal APIs to "repair" a Scala RDD with an > incorrect ClassTag by wrapping it and overriding its ClassTag. This should > be okay for cases where the Scala code that calls {{collect()}} knows what > type of array should be allocated, which is the case in the MLlib wrappers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2737) ClassCastExceptions when collect()ing JavaRDDs' underlying Scala RDDs
[ https://issues.apache.org/jira/browse/SPARK-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078705#comment-14078705 ] Joseph K. Bradley commented on SPARK-2737: -- Relating to [SPARK-2197 Spark invoke DecisionTree by Java | https://issues.apache.org/jira/browse/SPARK-2197], this makes the Java DecisionTree test get farther, but does not fix it completely. Will examine logs more. > ClassCastExceptions when collect()ing JavaRDDs' underlying Scala RDDs > - > > Key: SPARK-2737 > URL: https://issues.apache.org/jira/browse/SPARK-2737 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 0.8.0, 0.9.0, 1.0.0 >Reporter: Josh Rosen >Assignee: Josh Rosen > > The Java API's use of fake ClassTags doesn't seem to cause any problems for > Java users, but it can lead to issues when passing JavaRDDs' underlying RDDs > to Scala code (e.g. in the MLlib Java API wrapper code). If we call > {{collect()}} on a Scala RDD with an incorrect ClassTag, this causes > ClassCastExceptions when we try to allocate an array of the wrong type (for > example, see SPARK-2197). > There are a few possible fixes here. An API-breaking fix would be to > completely remove the fake ClassTags and require Java API users to pass > {{java.lang.Class}} instances to all {{parallelize()}} calls and add > {{returnClass}} fields to all {{Function}} implementations. This would be > extremely verbose. > Instead, I propose that we add internal APIs to "repair" a Scala RDD with an > incorrect ClassTag by wrapping it and overriding its ClassTag. This should > be okay for cases where the Scala code that calls {{collect()}} knows what > type of array should be allocated, which is the case in the MLlib wrappers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2737) ClassCastExceptions when collect()ing JavaRDDs' underlying Scala RDDs
[ https://issues.apache.org/jira/browse/SPARK-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078398#comment-14078398 ] Apache Spark commented on SPARK-2737: - User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/1639 > ClassCastExceptions when collect()ing JavaRDDs' underlying Scala RDDs > - > > Key: SPARK-2737 > URL: https://issues.apache.org/jira/browse/SPARK-2737 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 0.8.0, 0.9.0, 1.0.0 >Reporter: Josh Rosen >Assignee: Josh Rosen > > The Java API's use of fake ClassTags doesn't seem to cause any problems for > Java users, but it can lead to issues when passing JavaRDDs' underlying RDDs > to Scala code (e.g. in the MLlib Java API wrapper code). If we call > {{collect()}} on a Scala RDD with an incorrect ClassTag, this causes > ClassCastExceptions when we try to allocate an array of the wrong type (for > example, see SPARK-2197). > There are a few possible fixes here. An API-breaking fix would be to > completely remove the fake ClassTags and require Java API users to pass > {{java.lang.Class}} instances to all {{parallelize()}} calls and add > {{returnClass}} fields to all {{Function}} implementations. This would be > extremely verbose. > Instead, I propose that we add internal APIs to "repair" a Scala RDD with an > incorrect ClassTag by wrapping it and overriding its ClassTag. This should > be okay for cases where the Scala code that calls {{collect()}} knows what > type of array should be allocated, which is the case in the MLlib wrappers. -- This message was sent by Atlassian JIRA (v6.2#6252)