Asim Jalis created SPARK-10071:
----------------------------------

             Summary: QueueInputDStream Should Allow Checkpointing
                 Key: SPARK-10071
                 URL: https://issues.apache.org/jira/browse/SPARK-10071
             Project: Spark
          Issue Type: Improvement
          Components: Streaming
    Affects Versions: 1.4.1
            Reporter: Asim Jalis


I would like for https://issues.apache.org/jira/browse/SPARK-8630 to be 
reverted and that issue resolved as won’t fix, and for QueueInputDStream to 
revert to its old behavior of not throwing an exception if checkpointing is
enabled.

Why? The reason is that this fix which throws an exception if the DStream is 
being checkpointed breaks the primary use case for QueueInputDStream, which is 
testing. For example, the Spark Streaming documentation recommends using 
QueueInputDStream for testing.

Why does throwing an exception if checkpointing is used break this class? The 
reason is that if I use windowing operations or updateStateByKey then the 
StreamingContext requires that I enable checkpointing. It throws an exception 
if I don’t enable checkpointing. But then if I enable checkpointing this class 
throws an exception saying that I cannot use checkpointing with the queue 
stream. The end result of this is that I cannot use QueueInputDStream to test 
windowing operations and updateStateByKey. It can only be used for trivial 
stateless DStreams.

But would removing the exception-throwing logic make this code fragile? It 
should not. In the testing scenario the RDD that is passed into the 
QueueInputDStream is created through parallelize and it is checkpointable.

But what about people who are using QueueInputDStream in non-testing scenarios 
with non-recoverable RDDs? Perhaps a warning suffices here that checkpointing 
will not be able to recover state if their RDDs are non-recoverable. Then it is 
up to them how they resolve this situation.

Since right now we have no good way of determining if a QueueInputDStream 
contains RDDs that are recoverable or not, why not err on the side of leaving 
it to the user of the class to not expect recoverability, rather than forcing 
checkpointing.

In conclusion: my recommendation would be to revert to the old behavior and to 
resolve this bug as won’t fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to