[ https://issues.apache.org/jira/browse/SPARK-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799931#comment-15799931 ]
Shixiong Zhu commented on SPARK-19013: -------------------------------------- Thanks, [~zzztimbo] That must be caused by the negative cache. [~steve_l] FYI, HDFSMetadataLog uses this pattern in this code path: "check if a file exists" (done by the `create` method with "overwrite=false") -> "create a file" -> "check if a file exists again" (done by "rename") > java.util.ConcurrentModificationException when using s3 path as > checkpointLocation > ----------------------------------------------------------------------------------- > > Key: SPARK-19013 > URL: https://issues.apache.org/jira/browse/SPARK-19013 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 2.0.2 > Reporter: Tim Chan > > I have a structured stream job running on EMR. The job will fail due to this > {code} > Multiple HDFSMetadataLog are using s3://mybucket/myapp > org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:162) > {code} > There is only one instance of this stream job running. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org