[ 
https://issues.apache.org/jira/browse/SPARK-31931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17128707#comment-17128707
 ] 

Jungtaek Lim edited comment on SPARK-31931 at 6/8/20, 11:12 PM:
----------------------------------------------------------------

Well, I looked into the attached file, and it doesn't show the "actual" cause. 
(It might not be the matter of atomic rename.) It'd be better to capture the 
place which shows "how" the task was failed. Probably the task listener for 
state store may throw exception.


was (Author: kabhwan):
Well, I looked into the attached file, and it doesn't show the "actual" cause. 
It'd be better to capture the place which shows "how" the task was failed. 
Probably the task listener for state store may throw exception.

> When using GCS as checkpoint location for Structured Streaming aggregation 
> pipeline, the Spark writing job is aborted
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-31931
>                 URL: https://issues.apache.org/jira/browse/SPARK-31931
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.4.5
>         Environment: GCP Dataproc 1.5 Debian 10 (Hadoop 2.10.0, Spark 2.4.5, 
> Cloud Storage Connector hadoop2.2.1.3, Scala 2.12.10)
>            Reporter: Adrian Jones
>            Priority: Major
>         Attachments: spark-structured-streaming-error
>
>
> Structured streaming checkpointing does not work with Google Cloud Storage 
> when there are aggregations included in the streaming pipeline.
> Using GCS as the external store works fine when there are no aggregations 
> present in the pipeline (i.e. groupBy); however, once an aggregation is 
> introduced, the attached error is thrown.
> The error is only thrown when aggregating and pointing checkpointLocation to 
> GCS. The exact code works fine when pointing checkpointLocation to HDFS.
> Is it expected for GCS to function as a checkpoint location for aggregated 
> pipelines? Are efforts currently in progress to enable this? Is it on a 
> roadmap?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to