Raghu Angadi created SPARK-45138:
------------------------------------

             Summary: Define a new error class and apply for the case where 
checkpointing state to DFS fails
                 Key: SPARK-45138
                 URL: https://issues.apache.org/jira/browse/SPARK-45138
             Project: Spark
          Issue Type: Task
          Components: Structured Streaming
    Affects Versions: 4.0.0
            Reporter: Raghu Angadi


[From Neil Ramswamy]:

When there is an exception during storing state from DFS (Hadoop, object store, 
etc), we just propagate the exception without the context. There is a context 
if customers look into stack trace, but that is definitely not something we 
want customers to look into by themselves.

For example, let’s say file system related exception happened during 
checkpointing state. Since we just let the exception be bubbled up to the very 
top without adding any context, it is quite confusing for customers to quickly 
indicate whether there is an issue with source/sink (if source/sink data source 
is based on file), or offset/commit log, or state load/checkpoint. Too many 
operations can throw the same exception.

The ticket aims to wrap the exception during the commit of the state, to assign 
error class properly. With assigning error class, we can classify the errors 
which help us to determine what errors customers are struggling much.

StateStore.commit() is the entry point. Each state store provider has its own 
implementation, but use the same error class across implementations, as we want 
to categorize it as the same. Even better if we can put it to the higher-level 
of caller, but would be OK to handle it in built-in impls.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to