[ https://issues.apache.org/jira/browse/SPARK-27617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832671#comment-16832671 ]
Sujith Chacko edited comment on SPARK-27617 at 5/3/19 5:06 PM: --------------------------------------------------------------- CREATE TABLE IF NOT EXISTS ext2 (name STRING) LOCATION 'D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/abc_orc13'""") CREATE EXTERNAL TABLE IF NOT EXISTS ext2 (name STRING) LOCATION 'D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/abc_orc13'""") Both commands creates an external table here , Where as in impala and hive behaves differently If 'EXTERNAL' keyword is been used in CREATE command, only then the table will be considered as external, else it will be managed. Since spark consider table as always external on the execution of both the commands(With or without external keyword), below mentioned use-case is getting blocked. a) user will not able to set an external location for a managed table. b) compatibility issue with hive/impala where the system allow managed table to specify location uri if user created table without 'EXTERNAL' keyword. Just wanted to know the behavior of spark is an intentional one, or there is any scope of improvement which i think we shall do because our many customer wants to provide an external location for a manged table so that system can manage the table by itself and drop the data/metadata when tables are getting dropped. Also while migrating the hive jobs they expect more compatibility with hive where as in this scenario we are deviating. Hope its clear now. revert me for any clarifications. was (Author: s71955): CREATE TABLE IF NOT EXISTS ext2 (name STRING) LOCATION 'D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/abc_orc13'""") CREATE EXTERNAL TABLE IF NOT EXISTS ext2 (name STRING) LOCATION 'D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/abc_orc13'""") Both commands creates an external table Where as in impala and hive behaves differently If 'EXTERNAL' keyword is been used i CREATE command, only then the table will be considered as external, else it will be managed. Since spark consider table as always external on the execution of both the commands(With or without external keyword), below mentioned use-case is getting blocked. a) user will not able to set an external location for a managed table. b) compatibility issue with hive/impala where the system allow managed table to specify location uri if user created table without 'EXTERNAL' keyword. Just wanted to know the behavior of spark is an intentional one, or there is any scope of improvement which i think we shall do because our many customer wants to provide an external location for a manged table so that system can manage the table by itself and drop the data/metadata when tables are getting dropped. Also while migrating the hive jobs they expect more compatibility with hive where as in this scenario we are deviating. Hope its clear now. revert me for any clarifications. > Not able to specify LOCATION for internal table > ----------------------------------------------- > > Key: SPARK-27617 > URL: https://issues.apache.org/jira/browse/SPARK-27617 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.0.0, 2.2.0, 2.3.0, 2.4.0, 3.0.0 > Reporter: Sujith Chacko > Priority: Major > > In Spark whenever user specifies location uri in create table without > external keyword the table is treated as external table . > Because of this behavior following following problems has been observed > a) user will not able to set an external location for a managed table. > b) compatibility issue with hive/impala where the system allow managed table > to specify location uri if user created table without 'EXTERNAL' keyword. > {code:java} > scala> spark.sql("""CREATE TABLE IF NOT EXISTS ext2 (name STRING) LOCATION > 'D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/abc_orc13'"""); > -chgrp: 'HTIPL-23270\None' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > res15: org.apache.spark.sql.DataFrame = [] > scala> spark.sql("desc formatted ext2").show(false) > > +-----------------------------+---------------------------------------------------------------++------- > |col_name|data_type|comment| > +-----------------------------+---------------------------------------------------------------++------- > |name|string|null| > | | | | > | # Detailed Table Information| | | > |Database|default| | > |Table|ext2| | > |Owner|Administrator| | > |Created Time|Wed May 01 21:52:57 IST 2019| | > |Last Access|Thu Jan 01 05:30:00 IST 1970| | > |Created By|Spark 2.4.1| | > |Type|EXTERNAL| | > |Provider|hive| | > |Table Properties|[transient_lastDdlTime=1556727777]| | > |Location|file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/abc_orc13| | > |Serde Library|org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe| | > |InputFormat|org.apache.hadoop.mapred.TextInputFormat| | > |OutputFormat|org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat| | > |Storage Properties|[serialization.format=1]| | > |Partition Provider|Catalog| | > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org