[GitHub] spark pull request #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15396#discussion_r98218538 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1589,7 +1589,8 @@ abstract class RDD[T: ClassTag]( * This is introduced as an alias for `isCheckpointed` to clarify the semantics of the * return value. Exposed for testing. */ - private[spark] def isCheckpointedAndMaterialized: Boolean = isCheckpointed + private[spark] def isCheckpointedAndMaterialized: Boolean = +checkpointData.exists(_.isCheckpointed) --- End diff -- looks good, but you forgot to make this final --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15396 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15396#discussion_r97884515 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1589,7 +1589,8 @@ abstract class RDD[T: ClassTag]( * This is introduced as an alias for `isCheckpointed` to clarify the semantics of the * return value. Exposed for testing. */ - private[spark] def isCheckpointedAndMaterialized: Boolean = isCheckpointed + private[spark] def isCheckpointedAndMaterialized: Boolean = +checkpointData.exists(_.isCheckpointed) --- End diff -- Addressed it. Can you check again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15396#discussion_r83968123 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1589,7 +1589,8 @@ abstract class RDD[T: ClassTag]( * This is introduced as an alias for `isCheckpointed` to clarify the semantics of the * return value. Exposed for testing. */ - private[spark] def isCheckpointedAndMaterialized: Boolean = isCheckpointed + private[spark] def isCheckpointedAndMaterialized: Boolean = +checkpointData.exists(_.isCheckpointed) --- End diff -- what do you mean. if isCheckpointedAndMaterialized = isCheckpointed, and you override isCheckpointed, then that would change the behavior of isCheckpointedAndMaterialized as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of...
Github user apivovarov commented on a diff in the pull request: https://github.com/apache/spark/pull/15396#discussion_r83111395 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1589,7 +1589,8 @@ abstract class RDD[T: ClassTag]( * This is introduced as an alias for `isCheckpointed` to clarify the semantics of the --- End diff -- This method left as is in this PR - https://github.com/apache/spark/pull/15447 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15396#discussion_r83043442 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1589,7 +1589,8 @@ abstract class RDD[T: ClassTag]( * This is introduced as an alias for `isCheckpointed` to clarify the semantics of the * return value. Exposed for testing. */ - private[spark] def isCheckpointedAndMaterialized: Boolean = isCheckpointed + private[spark] def isCheckpointedAndMaterialized: Boolean = +checkpointData.exists(_.isCheckpointed) --- End diff -- One way to fix the duplicate code is to make this one final and have `isCheckpointed` call this one, then implementations of RDD can feel free to override `isCheckpointed` without affecting our internal implementation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15396#discussion_r83042522 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1589,7 +1589,8 @@ abstract class RDD[T: ClassTag]( * This is introduced as an alias for `isCheckpointed` to clarify the semantics of the --- End diff -- oh wait I just realized this is exactly the same as `isCheckpointed`. It's kind of confusing why we have two methods that do the same thing. Is it because VertexRDD or something overrides one but not the other? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of...
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/15396 [SPARK-14804][Spark][Graphx] Fix checkpointing of VertexRDD/EdgeRDD ## What changes were proposed in this pull request? EdgeRDD/VertexRDD overrides checkpoint() and isCheckpointed() to forward these to the internal partitionRDD. So when checkpoint() is called on them, its the partitionRDD that actually gets checkpointed. However since isCheckpointed() also overridden to call partitionRDD.isCheckpointed, EdgeRDD/VertexRDD.isCheckpointed returns true even though this RDD is actually not checkpointed. This would have been fine except the RDD's internal logic for computing the RDD depends on isCheckpointed(). So for VertexRDD/EdgeRDD, since isCheckpointed is true, when computing Spark tries to read checkpoint data of VertexRDD/EdgeRDD even though they are not actually checkpointed. Through a crazy sequence of call forwarding, it reads checkpoint data of partitionsRDD and tries to cast it to types in Vertex/EdgeRDD. This leads to ClassCastException. The minimal fix that does not change any public behavior is to modify RDD internal to not use public override-able API for internal logic. ## How was this patch tested? New unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tdas/spark SPARK-14804 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15396.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15396 commit 22d16b46e8ea18ec7a1b585103aa72f77a7e78f7 Author: Tathagata Das Date: 2016-10-07T22:04:31Z Fixed checkpointing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org