[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

steveloughran Mon, 23 Jan 2017 10:00:37 -0800

Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/14731
  
    @uncleGen I've updated it. Note that 
[HADOOP-13946](https://issues.apache.org/jira/browse/HADOOP-13946) tracks the 
changes in the Hadoop docs, which writes down what HDFS actually does, then 
note how cloud object stores have no consistent behaviour w.r.t. 
timestamps.While I personally believe that direct PUT calls is the way to write 
data, there's still ambiguity then as to when the objects get a timestamp (S3 : 
when the PUT/multipart put is first initiated, and not updated on the close() 
if the put was started earlier) âso when they become visible. So: I don't go 
into the details, just say "look at the docs, then test on your system". That's 
about as authoritative as you can get



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

Reply via email to