Hi all, I'd like to start a discussion thread about this topic, as it blocks an important feature that we target for Spark 3.1: unify the CREATE TABLE SQL syntax.
A bit more background for CREATE EXTERNAL TABLE: it's kind of a hidden feature in Spark for Hive compatibility. When you write native CREATE TABLE syntax such as `CREATE EXTERNAL TABLE ... USING parquet`, the parser fails and tells you that EXTERNAL can't be specified. When we write Hive CREATE TABLE syntax, the EXTERNAL can be specified if LOCATION clause or path option is present. For example, `CREATE EXTERNAL TABLE ... STORED AS parquet` is not allowed as there is no LOCATION clause or path option. This is not 100% Hive compatible. As we are unifying the CREATE TABLE SQL syntax, one problem is how to deal with CREATE EXTERNAL TABLE. We can keep it as a hidden feature as it was, or we can officially support it. Please let us know your thoughts: 1. As an end-user, what do you expect CREATE EXTERNAL TABLE to do? Have you used it in production before? For what use cases? 2. As a catalog developer, how are you going to implement EXTERNAL TABLE? It seems to me that it only makes sense for file source, as the table directory can be managed. I'm not sure how to interpret EXTERNAL in catalogs like jdbc, cassandra, etc. For more details, please refer to the long discussion in https://github.com/apache/spark/pull/28026 Thanks, Wenchen