[ 
https://issues.apache.org/jira/browse/SPARK-31931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-31931:
---------------------------------
    Priority: Major  (was: Blocker)

> When using GCS as checkpoint location for Structured Streaming aggregation 
> pipeline, the Spark writing job is aborted
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-31931
>                 URL: https://issues.apache.org/jira/browse/SPARK-31931
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.4.5
>         Environment: GCP Dataproc 1.5 Debian 10 (Hadoop 2.10.0, Spark 2.4.5, 
> Cloud Storage Connector hadoop2.2.1.3, Scala 2.12.10)
>            Reporter: Adrian Jones
>            Priority: Major
>         Attachments: spark-structured-streaming-error
>
>
> Structured streaming checkpointing does not work with Google Cloud Storage 
> when there are aggregations included in the streaming pipeline.
> Using GCS as the external store works fine when there are no aggregations 
> present in the pipeline (i.e. groupBy); however, once an aggregation is 
> introduced, the attached error is thrown.
> The error is only thrown when aggregating and pointing checkpointLocation to 
> GCS. The exact code works fine when pointing checkpointLocation to HDFS.
> Is it expected for GCS to function as a checkpoint location for aggregated 
> pipelines? Are efforts currently in progress to enable this? Is it on a 
> roadmap?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to