[
https://issues.apache.org/jira/browse/SPARK-49825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-49825:
-----------------------------------
Labels: pull-request-available (was: )
> default value of `spark.sql.warehouse.dir` is not decoded correctly when
> saving table
> -------------------------------------------------------------------------------------
>
> Key: SPARK-49825
> URL: https://issues.apache.org/jira/browse/SPARK-49825
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 3.5.3
> Environment: macOS 15.0
> Java 8 Update 421
> Reporter: Asad Shaikh
> Priority: Minor
> Labels: pull-request-available
>
> I haven't looked into how _general_ this problem is, but here's a very
> specific scenario which I ran into last night.
>
> When the `{{{}SparkSession{}}}` is created _without_ specifying the config
> `{{{}spark.sql.warehouse.sql{}}}`, the default value is _cwd/spark-warehouse_
> and this path seems URL-encoded when printed via
> `spark.conf.get('spark.sql.warehouse.dir')`.
> So, for instance, if any spaces were present in the path, they will be
> replaced by "%20".
> If this is the case, then the path should be decoded whenever necessary, but
> it turns out this encoded path is taken literally and consequently spark
> writes tables to a different location than intended.
>
> here's a minimal snippet to reproduce:
> ```py
> {{from pyspark.sql import SparkSession}}
> {{spark = SparkSession.builder.getOrCreate()}}
>
> {{spark.conf.get('spark.sql.warehouse.dir') #
> 'file:/Users/user/cwd%20with%20space/spark-warehouse'}}
>
> {{df = ...}}
> {{df.write.saveAsTable('df') # table will be saved at
> /Users/user/cwd%20with%20space/spark-warehouse}}
> ```
>
> Interestingly, this doesn't happen if the path is manually specified when
> creating the session. Even if the path is literally the same as what spark
> would've taken by-default.
>
> ```py
> {{from pyspark.sql import SparkSession}}
> {{{}spark = SparkSession.builder.config('{}}}spark.sql.warehouse.dir',
> 'spark-warehouse/').getOrCreate()
>
> {{spark.conf.get('spark.sql.warehouse.dir') # 'file:/Users/user/cwd with
> space/spark-warehouse'}}
> ```
>
> The above works fine.
>
> PS. plz forgive me if this is supposed to happen by design
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]