Anton Uvarov created SPARK-45198: ------------------------------------ Summary: problem using broadcast join with parquet/iceberg tables Key: SPARK-45198 URL: https://issues.apache.org/jira/browse/SPARK-45198 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.4.1 Reporter: Anton Uvarov
We have 2 Parquet tables: load_test_full_warehouse.gen_document_type and load_test_full_warehouse.generation_document_part. Trying to make a left join of these two tables onto each other gives a strange result. In the case where on the left side of the join we use a large table load_test_full_warehouse.generation_document_part, the optimizer uses a broadcast join. However, in the case when on the left in the join we use a small reference table, the optimizer chooses to execute the query using the merge sort. Although it would seem that the small table on the left in a left join should initiate a broadcast join. An attempt to use hints and collect statistics did not yield results. The following queries were used: spark.sql(f"""create table iceberg_warehouse.t1 using iceberg as SELECT /*+ BROADCAST(doc_tp) */ doc.DOCUMENT_DATE , doc_tp.NAME as DOCUMENT_TYPE , COUNT(*) as DOC_QTY FROM load_test_full_warehouse.generation_document_part doc LEFT JOIN load_test_full_warehouse.gen_document_type doc_tp ON doc.DOCUMENT_TYPE_ID_INT = doc_tp.DOCUMENT_TYPE_ID_INT GROUP BY doc.DOCUMENT_DATE, doc_tp.NAME""") spark.sql(f"""create table iceberg_warehouse.t2 using iceberg as SELECT /*+ BROADCAST(doc_tp) */ doc.DOCUMENT_DATE , doc_tp.NAME as DOCUMENT_TYPE , COUNT(*) as DOC_QTY FROM load_test_full_warehouse.gen_document_type doc_tp LEFT JOIN load_test_full_warehouse.generation_document_part doc ON doc.DOCUMENT_TYPE_ID_INT = doc_tp.DOCUMENT_TYPE_ID_INT GROUP BY doc.DOCUMENT_DATE, doc_tp.NAME""") -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org