[ https://issues.apache.org/jira/browse/SPARK-29217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936309#comment-16936309 ]
Thanida commented on SPARK-29217: --------------------------------- I use spark stream writer {code:java} df.writeStream .trigger(Trigger.ProcessingTime(30000)) .outputMode("append") .format("parquet") .option("path", "path/destination") .partitionBy("dt").start(); {code} I got data in the output as {code:java} - _spark_metadata/.. - dt=20190923/part-00000-....parquet - dt=20190923/part-00001-....parquet - dt=20190923/part-00002-....parquet - dt=20190924/part-00000-....parquet{code} Then, I delete one partition {code:java} dt=20190923{code} After that, I read the data by {code:java} spark.read.format("parquet").load("path/destination"){code} I got an error java.io.FileNotFoundException {code:java} java.io.FileNotFoundException: File file:path/destination/dt=20190923/part-00000-....parquet{code} > How to read streaming output path by ignoring metadata log files > ---------------------------------------------------------------- > > Key: SPARK-29217 > URL: https://issues.apache.org/jira/browse/SPARK-29217 > Project: Spark > Issue Type: Question > Components: Spark Core > Affects Versions: 2.4.3 > Reporter: Thanida > Priority: Minor > > As the output path of spark streaming contains `_spark_metadata` directory, > reading by > {code:java} > spark.read.format("parquet").load(filepath) > {code} > always depend on files listing in metadata log. > Moving some files in the output while streaming caused reading data failed. > So, how to read data in the streaming output path by ignoring metadata log > files? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org