[ https://issues.apache.org/jira/browse/SPARK-31931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jungtaek Lim updated SPARK-31931: --------------------------------- Priority: Major (was: Blocker) > When using GCS as checkpoint location for Structured Streaming aggregation > pipeline, the Spark writing job is aborted > --------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-31931 > URL: https://issues.apache.org/jira/browse/SPARK-31931 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 2.4.5 > Environment: GCP Dataproc 1.5 Debian 10 (Hadoop 2.10.0, Spark 2.4.5, > Cloud Storage Connector hadoop2.2.1.3, Scala 2.12.10) > Reporter: Adrian Jones > Priority: Major > Attachments: spark-structured-streaming-error > > > Structured streaming checkpointing does not work with Google Cloud Storage > when there are aggregations included in the streaming pipeline. > Using GCS as the external store works fine when there are no aggregations > present in the pipeline (i.e. groupBy); however, once an aggregation is > introduced, the attached error is thrown. > The error is only thrown when aggregating and pointing checkpointLocation to > GCS. The exact code works fine when pointing checkpointLocation to HDFS. > Is it expected for GCS to function as a checkpoint location for aggregated > pipelines? Are efforts currently in progress to enable this? Is it on a > roadmap? > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org