Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/12004
  
    I have the impression that you can't really use Spark with S3 and only S3, 
not as an intermediate store, because it's too eventually-consistent. Does the 
presence of additional integration libraries alone change that, or am I 
mistaken? that is, I'm wondering whether this really does what it appears to 
say on the tin, which is to make Spark usable with just S3.
    
    My other question was indeed whether we need a different module if we're 
just about to only support 2.7, or 2.6/2.7. That's more of a detail of 
implementation. Does a build of Spark + Hadoop 2.7 right now have no ability at 
all to read from S3 out of the box, or just not full / ideal support?
    
    Finally, is the Spark build the best thing to provide these dependencies? 
well, it provides the core Hadoop FS support already, so yes. But on the other 
hand, Hadoop is farming this out as optional itself, it can be added by a user 
app (right?), it would already be present if running in the presence of a 
cluster.
    
    It looks like there's a little more to it than just adding one new 
dependency, but not much more. I'm working out just how much the module is 
needed for users to get it right, vs complexity. 
    
    The docs are valuable.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to