[
https://issues.apache.org/jira/browse/SPARK-51117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steven Enns updated SPARK-51117:
--------------------------------
Description: I would like to use SPJ (specifically
spark.sql.sources.v2.bucketing.sorting.enabled since 4.0) but ["Storage
Partition Joins are currently supported for compatible V2
DataSources"|https://spark.apache.org/docs/4.0.0-preview2/sql-performance-tuning.html#storage-partition-join].
I tried using iceberg but it's not compatible with spark 4.0. The only way I
can get it to work is with
org.apache.spark.sql.connector.catalog.InMemoryCatalog which seems unfit for
production. What is the most simple and direct way to turn on v2 data sources
in spark 4.x? (was: I would like to use SPJ (specifically
spark.sql.sources.v2.bucketing.sorting.enabled since 4.0) but ["Storage
Partition Joins are currently supported for compatible V2
DataSources"|https://spark.apache.org/docs/4.0.0-preview2/sql-performance-tuning.html#storage-partition-join].
I tried using iceberg but it's not compatible with spark 4.0. The only way I
can get it to work is with
{{org.apache.spark.sql.connector.catalog.InMemoryCatalog which seems unfit for
production. What is the most simple and direct way to turn on v2 data sources
in spark 4.x? }})
> Which catalog to enable SPJ/Storage Partition Join in spark 4?
> --------------------------------------------------------------
>
> Key: SPARK-51117
> URL: https://issues.apache.org/jira/browse/SPARK-51117
> Project: Spark
> Issue Type: Question
> Components: Spark Core, SQL
> Affects Versions: 4.0.0, 4.1.0
> Reporter: Steven Enns
> Priority: Major
>
> I would like to use SPJ (specifically
> spark.sql.sources.v2.bucketing.sorting.enabled since 4.0) but ["Storage
> Partition Joins are currently supported for compatible V2
> DataSources"|https://spark.apache.org/docs/4.0.0-preview2/sql-performance-tuning.html#storage-partition-join].
> I tried using iceberg but it's not compatible with spark 4.0. The only way
> I can get it to work is with
> org.apache.spark.sql.connector.catalog.InMemoryCatalog which seems unfit for
> production. What is the most simple and direct way to turn on v2 data
> sources in spark 4.x?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]