[jira] [Commented] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-08-13 Thread bharath kumar avusherla (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579242#comment-16579242 ] bharath kumar avusherla commented on SPARK-23050: - [~ste...@apache.org], I can start 

[jira] [Commented] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-08-08 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574276#comment-16574276 ] Steve Loughran commented on SPARK-23050: bq. Is there any way we can avoid happening this? With

[jira] [Commented] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-08-08 Thread bharath kumar avusherla (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573700#comment-16573700 ] bharath kumar avusherla commented on SPARK-23050: - Is there any way we can avoid

[jira] [Commented] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-01-21 Thread Yash Sharma (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333809#comment-16333809 ] Yash Sharma commented on SPARK-23050: - Hi [~ste...@apache.org], Thanks for bringing this great

[jira] [Commented] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-01-21 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333477#comment-16333477 ] Steve Loughran commented on SPARK-23050: there's one thing which worries me here: the implication

[jira] [Commented] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-01-16 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327649#comment-16327649 ] Steve Loughran commented on SPARK-23050: {quote} Is there an API to detect S3 like file systems?

[jira] [Commented] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-01-16 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327632#comment-16327632 ] Shixiong Zhu commented on SPARK-23050: -- [~ste...@apache.org] Yeah, that's a good improvement for S3.

[jira] [Commented] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-01-15 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326330#comment-16326330 ] Steve Loughran commented on SPARK-23050: Quick review of the code Yes, there's potentially a

[jira] [Commented] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-01-12 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324745#comment-16324745 ] Steve Loughran commented on SPARK-23050: this s3n is the amazon EMR closed source impl; nothing

[jira] [Commented] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-01-12 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324717#comment-16324717 ] Shixiong Zhu commented on SPARK-23050: -- [~yash...@gmail.com] Spark SQL should handle it. Yeah,

[jira] [Commented] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-01-12 Thread Yash Sharma (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324711#comment-16324711 ] Yash Sharma commented on SPARK-23050: - [~marmbrus], [~shixi...@databricks.com] When we say read via

[jira] [Commented] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-01-12 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16323883#comment-16323883 ] Michael Armbrust commented on SPARK-23050: -- [~zsxwing] is correct. While it is possible for

[jira] [Commented] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-01-11 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16323329#comment-16323329 ] Sean Owen commented on SPARK-23050: --- Also have you read Steve's documentation on how S3 works with

[jira] [Commented] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-01-11 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16323320#comment-16323320 ] Shixiong Zhu commented on SPARK-23050: -- How do you read the output? If you use Spark to read the