[ 
https://issues.apache.org/jira/browse/SPARK-17815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15559911#comment-15559911
 ] 

Cody Koeninger commented on SPARK-17815:
----------------------------------------

The WAL cannot be the only source of truth, because it can be corrupted in a 
situation where the downstream results and offsets are not.  The downstream 
offsets by contrast cant be corrupted without also affecting the results, thats 
the whole point of transactions.  Even if you do ignore the fact that the wal 
can be corrupted, you still have to be careful about aligning boundaries of the 
wal with boundaries of the downstream store.
The kafka commit log cant be ignored as merely for metric collection either. A 
kafka consumer is going to use it in preference to auto.offset.reset as the 
starting point for a newly constructed consumer.
I'm not saying these issues are unsolvable, but you cant just handwave them 
away, and they are confusing to end users. There was already confusion with 
only 2 stores - ZK and the dstream checkpoint.

> Report committed offsets
> ------------------------
>
>                 Key: SPARK-17815
>                 URL: https://issues.apache.org/jira/browse/SPARK-17815
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Michael Armbrust
>
> Since we manage our own offsets, we have turned off auto-commit.  However, 
> this means that external tools are not able to report on how far behind a 
> given streaming job is.  When the user manually gives us a group.id, we 
> should report back to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to