[ 
https://issues.apache.org/jira/browse/SPARK-31709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31709:
------------------------------------

    Assignee:     (was: Apache Spark)

> Proper base path for location when it is a relative path
> --------------------------------------------------------
>
>                 Key: SPARK-31709
>                 URL: https://issues.apache.org/jira/browse/SPARK-31709
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.5, 3.0.0, 3.1.0
>            Reporter: Kent Yao
>            Priority: Major
>
> Currently, the user home directory is used as the base path for the database 
> and table locations when their location is specified with a relative path, 
> e.g.
> {code:sql}
> > set spark.sql.warehouse.dir;
> spark.sql.warehouse.dir       
> file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/spark-warehouse/
> spark-sql> create database loctest location 'loctestdbdir';
> spark-sql> desc database loctest;
> Database Name loctest
> Comment
> Location      
> file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/loctestdbdir
> Owner kentyao
> spark-sql> create table loctest(id int) location 'loctestdbdir';
> spark-sql> desc formatted loctest;
> id    int     NULL
> # Detailed Table Information
> Database      default
> Table loctest
> Owner kentyao
> Created Time  Thu May 14 16:29:05 CST 2020
> Last Access   UNKNOWN
> Created By    Spark 3.1.0-SNAPSHOT
> Type  EXTERNAL
> Provider      parquet
> Location      
> file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/loctestdbdir
> Serde Library org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
> InputFormat   org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
> OutputFormat  org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
> {code}
> The user home is not always warehouse-related, unchangeable in runtime, and 
> shared both by database and table as the parent directory. Meanwhile, we use 
> the table path as the parent directory for relative partition locations.
> the config `spark.sql.warehouse.dir` represents the default location for 
> managed databases and tables. For databases, the case above seems not to 
> follow its semantics. For tables it is right but here I suggest enriching its 
> meaning that is also for external tables with relative paths for locations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to