[jira] [Commented] (SPARK-31599) Reading from S3 (Structured Streaming Bucket) Fails after Compaction

Jungtaek Lim (Jira) Thu, 30 Apr 2020 05:43:31 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-31599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096483#comment-17096483
 ]


Jungtaek Lim commented on SPARK-31599:
--------------------------------------

You understand how file stream sink and file source works correctly.

There's no official way to do this, hence this is more likely a question. 
That's why I commented to ask this in user@ mailing list.

If you're really adventurous then you can stop the query, and update the 
metadata by yourself, and rerun the query. It's written as JSON, and the format 
is not complicated to fill out by yourself.

I'll close this for now - if someone claims it should be done and provide a 
nice approach then this can be opened.

> Reading from S3 (Structured Streaming Bucket) Fails after Compaction
> --------------------------------------------------------------------
>
>                 Key: SPARK-31599
>                 URL: https://issues.apache.org/jira/browse/SPARK-31599
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL, Structured Streaming
>    Affects Versions: 2.4.5
>            Reporter: Felix Kizhakkel Jose
>            Priority: Major
>
> I have a S3 bucket which has data streamed (Parquet format) to it by Spark 
> Structured Streaming Framework from Kafka. Periodically I try to run 
> compaction on this bucket (a separate Spark Job), and on successful 
> compaction delete the non compacted (parquet) files. After which I am getting 
> following error on Spark jobs which read from that bucket:
>  *Caused by: java.io.FileNotFoundException: No such file or directory: 
> s3a://spark-kafka-poc/intermediate/part-00000-05ff7893-8a13-4dcd-aeed-3f0d4b5d1691-c000.gz.parquet*
> How do we run *_c_ompaction on Structured Streaming S3 bucket_s*. Also I need 
> to delete the un-compacted files after successful compaction to save space.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31599) Reading from S3 (Structured Streaming Bucket) Fails after Compaction

Reply via email to