[ 
https://issues.apache.org/jira/browse/SPARK-27617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832671#comment-16832671
 ] 

Sujith Chacko edited comment on SPARK-27617 at 5/3/19 6:27 PM:
---------------------------------------------------------------

CREATE TABLE IF NOT EXISTS ext2 (name STRING) LOCATION 
'D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/abc_orc13'""")

CREATE EXTERNAL TABLE IF NOT EXISTS ext2 (name STRING) LOCATION 
'D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/abc_orc13'""")

Both commands creates an external table here , Where as in impala and hive 
behaves differently If 'EXTERNAL' keyword is been used in CREATE command, only 
then the table will be considered as external, else it will be managed. this 
behavior is making below mentioned use-case getting blocked.

usecase 1: user will not able to set an external location for a managed table.
usecase 2:compatibility issue with hive/impala where the system allow managed 
table to specify location uri if user created table without 'EXTERNAL' keyword.

Just wanted to know the behavior of spark is an intentional one, or there is 
any scope of improvement which i think we shall do because our many customer 
wants to provide an external location for a manged table so that system can 
manage the table by itself and drop the data/metadata when tables are getting 
dropped.

Also while migrating the hive jobs to spark they expect more compatibility with 
hive where as in this scenario we are deviating.

 

*Expected Output:*

CREATE TABLE IF NOT EXISTS ext2 (name STRING) LOCATION 
'D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/abc_orc13'""")  - - shall 
create a managed table which can refer any location specified by user and able 
to delete the metadata/data of user on drop table command .

 

Hope its clear now. revert me for any clarifications.


was (Author: s71955):
CREATE TABLE IF NOT EXISTS ext2 (name STRING) LOCATION 
'D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/abc_orc13'""")

CREATE EXTERNAL TABLE IF NOT EXISTS ext2 (name STRING) LOCATION 
'D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/abc_orc13'""")

Both commands creates an external table here , Where as in impala and hive 
behaves differently If 'EXTERNAL' keyword is been used in CREATE command, only 
then the table will be considered as external, else it will be managed.
 Since spark consider table as always external on the execution of both the 
commands(With or without external keyword), below mentioned use-case is getting 
blocked.

a) user will not able to set an external location for a managed table.
 b) compatibility issue with hive/impala where the system allow managed table 
to specify location uri if user created table without 'EXTERNAL' keyword.

Just wanted to know the behavior of spark is an intentional one, or there is 
any scope of improvement which i think we shall do because our many customer 
wants to provide an external location for a manged table so that system can 
manage the table by itself and drop the data/metadata when tables are getting 
dropped.

Also while migrating the hive jobs to spark they expect more compatibility with 
hive where as in this scenario we are deviating.

 

*Expected Output:*

CREATE TABLE IF NOT EXISTS ext2 (name STRING) LOCATION 
'D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/abc_orc13'""")  - - shall 
create a managed table which can refer any location specified by user and able 
to delete the metadata/data of user on drop table command .

 

Hope its clear now. revert me for any clarifications.

> Not able to specify LOCATION for internal table
> -----------------------------------------------
>
>                 Key: SPARK-27617
>                 URL: https://issues.apache.org/jira/browse/SPARK-27617
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.0.0, 2.2.0, 2.3.0, 2.4.0, 3.0.0
>            Reporter: Sujith Chacko
>            Priority: Major
>
> In Spark whenever user specifies location uri in create table without 
> external keyword the table is treated as external table . 
> Because of this behavior following following problems has been observed
> a) user will not able to set an external location for a managed table.
> b) compatibility issue with hive/impala where the system allow managed table 
> to specify location uri if user created table without 'EXTERNAL' keyword.
> {code:java}
> scala> spark.sql("""CREATE TABLE IF NOT EXISTS ext2 (name STRING) LOCATION 
> 'D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/abc_orc13'""");
>  -chgrp: 'HTIPL-23270\None' does not match expected pattern for group
>  Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
>  res15: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("desc formatted ext2").show(false)
>  
> +-----------------------------+---------------------------------------------------------------++-------
> |col_name|data_type|comment|
> +-----------------------------+---------------------------------------------------------------++-------
> |name|string|null|
> | | | |
> | # Detailed Table Information| | |
> |Database|default| |
> |Table|ext2| |
> |Owner|Administrator| |
> |Created Time|Wed May 01 21:52:57 IST 2019| |
> |Last Access|Thu Jan 01 05:30:00 IST 1970| |
> |Created By|Spark 2.4.1| |
> |Type|EXTERNAL| |
> |Provider|hive| |
> |Table Properties|[transient_lastDdlTime=1556727777]| |
> |Location|file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/abc_orc13| |
> |Serde Library|org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe| |
> |InputFormat|org.apache.hadoop.mapred.TextInputFormat| |
> |OutputFormat|org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat| |
> |Storage Properties|[serialization.format=1]| |
> |Partition Provider|Catalog| |
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to