[ https://issues.apache.org/jira/browse/SPARK-10071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tathagata Das resolved SPARK-10071. ----------------------------------- Resolution: Fixed Assignee: Shixiong Zhu Fix Version/s: 1.5.1 1.6.0 1.4.2 > QueueInputDStream Should Allow Checkpointing > -------------------------------------------- > > Key: SPARK-10071 > URL: https://issues.apache.org/jira/browse/SPARK-10071 > Project: Spark > Issue Type: Improvement > Components: Streaming > Affects Versions: 1.4.1, 1.5.0 > Reporter: Asim Jalis > Assignee: Shixiong Zhu > Fix For: 1.4.2, 1.6.0, 1.5.1 > > > I would like for https://issues.apache.org/jira/browse/SPARK-8630 to be > reverted and that issue resolved as won’t fix, and for QueueInputDStream to > revert to its old behavior of not throwing an exception if checkpointing is > enabled. > Why? The reason is that this fix which throws an exception if the DStream is > being checkpointed breaks the primary use case for QueueInputDStream, which > is testing. For example, the Spark Streaming documentation recommends using > QueueInputDStream for testing. > Why does throwing an exception if checkpointing is used break this class? The > reason is that if I use windowing operations or updateStateByKey then the > StreamingContext requires that I enable checkpointing. It throws an exception > if I don’t enable checkpointing. But then if I enable checkpointing this > class throws an exception saying that I cannot use checkpointing with the > queue stream. The end result of this is that I cannot use QueueInputDStream > to test windowing operations and updateStateByKey. It can only be used for > trivial stateless DStreams. > But would removing the exception-throwing logic make this code fragile? It > should not. In the testing scenario the RDD that is passed into the > QueueInputDStream is created through parallelize and it is checkpointable. > But what about people who are using QueueInputDStream in non-testing > scenarios with non-recoverable RDDs? Perhaps a warning suffices here that > checkpointing will not be able to recover state if their RDDs are > non-recoverable. Then it is up to them how they resolve this situation. > Since right now we have no good way of determining if a QueueInputDStream > contains RDDs that are recoverable or not, why not err on the side of leaving > it to the user of the class to not expect recoverability, rather than forcing > checkpointing. > In conclusion: my recommendation would be to revert to the old behavior and > to resolve this bug as won’t fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org