[ 
https://issues.apache.org/jira/browse/SPARK-49825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Asad Shaikh updated SPARK-49825:
--------------------------------
    Environment: 
macOS 15.0

Java 8 Update 421

  was:macOS 15.0


> default value of `spark.sql.warehouse.dir` is not decoded correctly when 
> saving table
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-49825
>                 URL: https://issues.apache.org/jira/browse/SPARK-49825
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.5.3
>         Environment: macOS 15.0
> Java 8 Update 421
>            Reporter: Asad Shaikh
>            Priority: Minor
>
> I haven't looked into how _general_ this problem is, but here's a very 
> specific scenario which I ran into last night.
>  
> When the `{{{}SparkSession{}}}` is created _without_ specifying the config 
> `{{{}spark.sql.warehouse.sql{}}}`, the default value is _cwd/spark-warehouse_ 
> and this path seems URL-encoded when printed via 
> `spark.conf.get('spark.sql.warehouse.dir')`.
> So, for instance, if any spaces were present in the path, they will be 
> replaced by "%20".
> If this is the case, then the path should be decoded whenever necessary, but 
> it turns out this encoded path is taken literally and consequently spark 
> writes tables to a different location than intended.
>  
> here's a minimal snippet to reproduce:
> ```py
> {{from pyspark.sql import SparkSession}}
> {{spark = SparkSession.builder.getOrCreate()}}
>  
> {{spark.conf.get('spark.sql.warehouse.dir') # 
> 'file:/Users/user/cwd%20with%20space/spark-warehouse'}}
>  
> {{df = ...}}
> {{df.write.saveAsTable('df') # table will be saved at 
> /Users/user/cwd%20with%20space/spark-warehouse}}
> ```
>  
> Interestingly, this doesn't happen if the path is manually specified when 
> creating the session. Even if the path is literally the same as what spark 
> would've taken by-default.
>  
> ```py
> {{from pyspark.sql import SparkSession}}
> {{{}spark = SparkSession.builder.config('{}}}spark.sql.warehouse.dir', 
> 'spark-warehouse/').getOrCreate()
>  
> {{spark.conf.get('spark.sql.warehouse.dir') # 'file:/Users/user/cwd with 
> space/spark-warehouse'}}
> ```
>  
> The above works fine.
>  
> PS. plz forgive me if this is supposed to happen by design



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to