[jira] [Updated] (SPARK-51117) Which catalog to enable SPJ/Storage Partition Join in spark 4?

Steven Enns (Jira) Thu, 06 Feb 2025 12:40:20 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-51117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steven Enns updated SPARK-51117:
--------------------------------
    Description: I would like to use SPJ (specifically 
spark.sql.sources.v2.bucketing.sorting.enabled since 4.0) but ["Storage 
Partition Joins are currently supported for compatible V2 
DataSources"|https://spark.apache.org/docs/4.0.0-preview2/sql-performance-tuning.html#storage-partition-join].
  I tried using iceberg but it's not compatible with spark 4.0.  The only way I 
can get it to work is with 
org.apache.spark.sql.connector.catalog.InMemoryCatalog which seems unfit for 
production.  What is the most simple and direct way to turn on v2 data sources 
in spark 4.x?   (was: I would like to use SPJ (specifically 
spark.sql.sources.v2.bucketing.sorting.enabled since 4.0) but ["Storage 
Partition Joins are currently supported for compatible V2 
DataSources"|https://spark.apache.org/docs/4.0.0-preview2/sql-performance-tuning.html#storage-partition-join].
  I tried using iceberg but it's not compatible with spark 4.0.  The only way I 
can get it to work is with 
{{org.apache.spark.sql.connector.catalog.InMemoryCatalog which seems unfit for 
production.  What is the most simple and direct way to turn on v2 data sources 
in spark 4.x? }})

> Which catalog to enable SPJ/Storage Partition Join in spark 4?
> --------------------------------------------------------------
>
>                 Key: SPARK-51117
>                 URL: https://issues.apache.org/jira/browse/SPARK-51117
>             Project: Spark
>          Issue Type: Question
>          Components: Spark Core, SQL
>    Affects Versions: 4.0.0, 4.1.0
>            Reporter: Steven Enns
>            Priority: Major
>
> I would like to use SPJ (specifically 
> spark.sql.sources.v2.bucketing.sorting.enabled since 4.0) but ["Storage 
> Partition Joins are currently supported for compatible V2 
> DataSources"|https://spark.apache.org/docs/4.0.0-preview2/sql-performance-tuning.html#storage-partition-join].
>   I tried using iceberg but it's not compatible with spark 4.0.  The only way 
> I can get it to work is with 
> org.apache.spark.sql.connector.catalog.InMemoryCatalog which seems unfit for 
> production.  What is the most simple and direct way to turn on v2 data 
> sources in spark 4.x? 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-51117) Which catalog to enable SPJ/Storage Partition Join in spark 4?

Reply via email to