[jira] [Commented] (SPARK-19013) java.util.ConcurrentModificationException when using s3 path as checkpointLocation

Shixiong Zhu (JIRA) Tue, 27 Dec 2016 10:45:06 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781019#comment-15781019
 ]


Shixiong Zhu commented on SPARK-19013:
--------------------------------------

This is probably because S3 negative cache. 

"a negative GET may be cached, such that even if an object is immediately 
created, the fact that there "wasn't" an object is still remembered." See 
https://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/s3-trouble/index.html#visible-s3-inconsistency
 for details.

> java.util.ConcurrentModificationException when using s3 path as 
> checkpointLocation 
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-19013
>                 URL: https://issues.apache.org/jira/browse/SPARK-19013
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.0.2
>            Reporter: Tim Chan
>
> I have a structured stream job running on EMR. The job will fail due to this
> ```
> Multiple HDFSMetadataLog are using s3://mybucket/myapp 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:162)
> ```
> There is only one instance of this stream job running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19013) java.util.ConcurrentModificationException when using s3 path as checkpointLocation

Reply via email to