[ https://issues.apache.org/jira/browse/SPARK-49825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Asad Shaikh updated SPARK-49825: -------------------------------- Environment: macOS 15.0 Java 8 Update 421 was:macOS 15.0 > default value of `spark.sql.warehouse.dir` is not decoded correctly when > saving table > ------------------------------------------------------------------------------------- > > Key: SPARK-49825 > URL: https://issues.apache.org/jira/browse/SPARK-49825 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.5.3 > Environment: macOS 15.0 > Java 8 Update 421 > Reporter: Asad Shaikh > Priority: Minor > > I haven't looked into how _general_ this problem is, but here's a very > specific scenario which I ran into last night. > > When the `{{{}SparkSession{}}}` is created _without_ specifying the config > `{{{}spark.sql.warehouse.sql{}}}`, the default value is _cwd/spark-warehouse_ > and this path seems URL-encoded when printed via > `spark.conf.get('spark.sql.warehouse.dir')`. > So, for instance, if any spaces were present in the path, they will be > replaced by "%20". > If this is the case, then the path should be decoded whenever necessary, but > it turns out this encoded path is taken literally and consequently spark > writes tables to a different location than intended. > > here's a minimal snippet to reproduce: > ```py > {{from pyspark.sql import SparkSession}} > {{spark = SparkSession.builder.getOrCreate()}} > > {{spark.conf.get('spark.sql.warehouse.dir') # > 'file:/Users/user/cwd%20with%20space/spark-warehouse'}} > > {{df = ...}} > {{df.write.saveAsTable('df') # table will be saved at > /Users/user/cwd%20with%20space/spark-warehouse}} > ``` > > Interestingly, this doesn't happen if the path is manually specified when > creating the session. Even if the path is literally the same as what spark > would've taken by-default. > > ```py > {{from pyspark.sql import SparkSession}} > {{{}spark = SparkSession.builder.config('{}}}spark.sql.warehouse.dir', > 'spark-warehouse/').getOrCreate() > > {{spark.conf.get('spark.sql.warehouse.dir') # 'file:/Users/user/cwd with > space/spark-warehouse'}} > ``` > > The above works fine. > > PS. plz forgive me if this is supposed to happen by design -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org