[I] Getting storage partitioned join to work [iceberg]

via GitHub Mon, 29 Apr 2024 15:25:29 -0700


mrbrahman opened a new issue, #10250:
URL: https://github.com/apache/iceberg/issues/10250


   ### Query engine
   
   Spark on AWS EMR 6.15 
   
   ### Question
   
   Trying to get Storage Partitioned join to work in a simple test case, but 
not successful. I followed most of the settings mentioned in #7832, but still 
not able to get it to work.
   
   Here's what I did:
   
   ~~~scala
   val df = spark.range(0,1000000)
   
   val a = df.repartition(10).withColumn("part", spark_partition_id)
   a.write.partitionBy("part").format("iceberg").saveAsTable("ice1")
   a.write.partitionBy("part").format("iceberg").saveAsTable("ice2")
   
   ~~~
   
   ~~~sql
   %sql
   set spark.sql.autoBroadcastJoinThreshold = -1;
   
   set spark.sql.adaptive.enabled = false;
   set spark.sql.sources.bucketing.enabled = true;
   set spark.sql.sources.v2.bucketing.enabled=true;
   -- set "spark.sql.iceberg.planning.preserve-data-grouping=true";  -- this 
didn't work for me
   set spark.sql.sources.v2.bucketing.pushPartValues.enabled=true;
   set spark.sql.requireAllClusterKeysForCoPartition=false;
   set 
spark.sql.sources.v2.bucketing.partiallyClusteredDistribution.enabled=true;
   
   
   create table ice_joined_9 using iceberg
   select a.id id1, b.id id2
   from ice1 a
     inner join ice2 b on a.id=b.id and a.part=b.part
   ~~~
   The DAG is still showing exchange + sort
   
   ![spj not 
working](https://github.com/apache/iceberg/assets/16898939/1cbad73d-7c5e-40e5-93df-6489eadd569c)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Getting storage partitioned join to work [iceberg]

Reply via email to