Kent Yao created SPARK-31709:
--------------------------------

             Summary: Proper base path for location when it is a relative path
                 Key: SPARK-31709
                 URL: https://issues.apache.org/jira/browse/SPARK-31709
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.4.5, 3.0.0, 3.1.0
            Reporter: Kent Yao


Currently, the user home directory is used as the base path for the database 
and table locations when their location is specified with a relative path, e.g.
{code:sql}
> set spark.sql.warehouse.dir;
spark.sql.warehouse.dir 
file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/spark-warehouse/
spark-sql> create database loctest location 'loctestdbdir';

spark-sql> desc database loctest;
Database Name   loctest
Comment
Location        
file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/loctestdbdir
Owner   kentyao

spark-sql> create table loctest(id int) location 'loctestdbdir';
spark-sql> desc formatted loctest;
id      int     NULL

# Detailed Table Information
Database        default
Table   loctest
Owner   kentyao
Created Time    Thu May 14 16:29:05 CST 2020
Last Access     UNKNOWN
Created By      Spark 3.1.0-SNAPSHOT
Type    EXTERNAL
Provider        parquet
Location        
file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/loctestdbdir
Serde Library   org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat     org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat    org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
{code}


The user home is not always warehouse-related, unchangeable in runtime, and 
shared both by database and table as the parent directory. Meanwhile, we use 
the table path as the parent directory for relative partition locations.

the config `spark.sql.warehouse.dir` represents the default location for 
managed databases and tables. For databases, the case above seems not to follow 
its semantics. For tables it is right but here I suggest enriching its meaning 
that is also for external tables with relative paths for locations.






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to