Ameen Tayyebi created SPARK-23443:
-------------------------------------

             Summary: Spark with Glue as external catalog
                 Key: SPARK-23443
                 URL: https://issues.apache.org/jira/browse/SPARK-23443
             Project: Spark
          Issue Type: New Feature
          Components: Spark Core
    Affects Versions: 2.4.0
            Reporter: Ameen Tayyebi


AWS Glue Catalog is an external Hive metastore backed by a web service. It 
allows permanent storage of catalog data for BigData use cases.

To find out more information about AWS Glue, please consult:
 * AWS Glue - [https://aws.amazon.com/glue/]
 * Using Glue as a Metastore catalog for Spark - 
[https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html]

Today, the integration of Glue and Spark is through the Hive layer. Glue 
implements the IMetaStore interface of Hive and for installations of Spark that 
contain Hive, Glue can be used as the metastore.

The feature set that Glue supports does not align 1-1 with the set of features 
that the latest version of Spark supports. For example, Glue interface supports 
more advanced partition pruning that the latest version of Hive embedded in 
Spark.

To enable a more natural integration with Spark and to allow leveraging latest 
features of Glue, without being coupled to Hive, a direct integration through 
Spark's own Catalog API is proposed. This Jira tracks this work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to