Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/17723
  
    I didn't read through the super long debate here, but I have a strong 
preference to not expose Hadoop APIs directly. I'm seeing more and more 
deployments out there that do not use Hadoop (e.g. connect directly to cloud 
storage, connect to some on-premise object store, connect to Redis, connect to 
some netapp appliance, connect directly to a message queue or just run Spark on 
a laptop).
    
    Hadoop APIs were designed for a different world pre Spark. Serialization is 
painful (Configuration?) to deal with, API breaking changes are painful to deal 
with, size of the dependencies are painful to deal with (especially considering 
the single node use cases in which ideally we'd just want a super trimmed down 
jar).
    
    As you can see (although most of you that have chimed in here don't know 
much about the new components), the newer components (Spark SQL) does not 
expose Hadoop APIs.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to