milenkovicm opened a new issue, #1826:
URL: https://github.com/apache/datafusion-ballista/issues/1826
**Describe the bug**
```
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Job HUBVTW4/3 physical plan:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SortShuffleWriterExec: partitioning=Hash([i_item_desc@0, w_warehouse_name@1,
d_week_seq@2], 32)
AggregateExec: mode=Partial, gby=[i_item_desc@1 as i_item_desc,
w_warehouse_name@0 as w_warehouse_name, d_week_seq@2 as d_week_seq],
aggr=[sum(CASE WHEN promotion.p_promo_sk IS NULL THEN Int64(1) ELSE Int64(0)
END), sum(CASE WHEN promotion.p_promo_sk IS NOT NULL THEN Int64(1) ELSE
Int64(0) END), count(Int64(1))]
HashJoinExec: mode=CollectLeft, join_type=Left, on=[(cs_item_sk@0,
cr_item_sk@0), (cs_order_number@1, cr_order_number@1)],
projection=[w_warehouse_name@2, i_item_desc@3, d_week_seq@4, p_promo_sk@5]
HashJoinExec: mode=CollectLeft, join_type=Left, on=[(cs_promo_sk@1,
p_promo_sk@0)], projection=[cs_item_sk@0, cs_order_number@2,
w_warehouse_name@3, i_item_desc@4, d_week_seq@5, p_promo_sk@6]
HashJoinExec: mode=CollectLeft, join_type=Inner,
on=[(cs_ship_date_sk@0, d_date_sk@0)], filter=d_date@1 > d_date@0 +
IntervalMonthDayNano { months: 0, days: 5, nanoseconds: 0 },
projection=[cs_item_sk@1, cs_promo_sk@2, cs_order_number@3, w_warehouse_name@4,
i_item_desc@5, d_week_seq@7]
HashJoinExec: mode=CollectLeft, join_type=Inner,
on=[(inv_date_sk@4, d_date_sk@0), (d_week_seq@8, d_week_seq@1)],
projection=[cs_ship_date_sk@0, cs_item_sk@1, cs_promo_sk@2, cs_order_number@3,
w_warehouse_name@5, i_item_desc@6, d_date@7, d_week_seq@8]
HashJoinExec: mode=CollectLeft, join_type=Inner,
on=[(cs_sold_date_sk@0, d_date_sk@0)], projection=[cs_ship_date_sk@1,
cs_item_sk@2, cs_promo_sk@3, cs_order_number@4, inv_date_sk@5,
w_warehouse_name@6, i_item_desc@7, d_date@9, d_week_seq@10]
HashJoinExec: mode=CollectLeft, join_type=Inner,
on=[(cs_bill_hdemo_sk@2, hd_demo_sk@0)], projection=[cs_sold_date_sk@0,
cs_ship_date_sk@1, cs_item_sk@3, cs_promo_sk@4, cs_order_number@5,
inv_date_sk@6, w_warehouse_name@7, i_item_desc@8]
HashJoinExec: mode=CollectLeft, join_type=Inner,
on=[(cs_bill_cdemo_sk@2, cd_demo_sk@0)], projection=[cs_sold_date_sk@0,
cs_ship_date_sk@1, cs_bill_hdemo_sk@3, cs_item_sk@4, cs_promo_sk@5,
cs_order_number@6, inv_date_sk@7, w_warehouse_name@8, i_item_desc@9]
HashJoinExec: mode=CollectLeft, join_type=Inner,
on=[(i_item_sk@0, cs_item_sk@4)], projection=[cs_sold_date_sk@2,
cs_ship_date_sk@3, cs_bill_cdemo_sk@4, cs_bill_hdemo_sk@5, cs_item_sk@6,
cs_promo_sk@7, cs_order_number@8, inv_date_sk@9, w_warehouse_name@10,
i_item_desc@1]
DataSourceExec: file_groups={1 group:
[[Users/marko/TMP/tpcds_data/1g_parquet/item.parquet]]}, projection=[i_item_sk,
i_item_desc], file_type=parquet, predicate=DynamicFilter [ empty ]
ShuffleReaderExec: upstream_stage: 2, broadcast: true,
upstream_partition_count: 32
FilterExec: cd_marital_status@1 = S,
projection=[cd_demo_sk@0]
DataSourceExec: file_groups={1 group:
[[Users/marko/TMP/tpcds_data/1g_parquet/customer_demographics.parquet]]},
projection=[cd_demo_sk, cd_marital_status], file_type=parquet,
predicate=cd_marital_status@2 = S AND cd_marital_status@2 = S AND
cd_marital_status@2 = S AND cd_marital_status@2 = S AND DynamicFilter [ empty
], pruning_predicate=cd_marital_status_null_count@2 != row_count@3 AND
cd_marital_status_min@0 <= S AND S <= cd_marital_status_max@1 AND
cd_marital_status_null_count@2 != row_count@3 AND cd_marital_status_min@0 <= S
AND S <= cd_marital_status_max@1 AND cd_marital_status_null_count@2 !=
row_count@3 AND cd_marital_status_min@0 <= S AND S <= cd_marital_status_max@1
AND cd_marital_status_null_count@2 != row_count@3 AND cd_marital_status_min@0
<= S AND S <= cd_marital_status_max@1, required_guarantees=[cd_marital_status
in (S)]
FilterExec: hd_buy_potential@1 = 501-1000,
projection=[hd_demo_sk@0]
DataSourceExec: file_groups={1 group:
[[Users/marko/TMP/tpcds_data/1g_parquet/household_demographics.parquet]]},
projection=[hd_demo_sk, hd_buy_potential], file_type=parquet,
predicate=hd_buy_potential@2 = 501-1000 AND hd_buy_potential@2 = 501-1000 AND
hd_buy_potential@2 = 501-1000 AND hd_buy_potential@2 = 501-1000 AND
DynamicFilter [ empty ], pruning_predicate=hd_buy_potential_null_count@2 !=
row_count@3 AND hd_buy_potential_min@0 <= 501-1000 AND 501-1000 <=
hd_buy_potential_max@1 AND hd_buy_potential_null_count@2 != row_count@3 AND
hd_buy_potential_min@0 <= 501-1000 AND 501-1000 <= hd_buy_potential_max@1 AND
hd_buy_potential_null_count@2 != row_count@3 AND hd_buy_potential_min@0 <=
501-1000 AND 501-1000 <= hd_buy_potential_max@1 AND
hd_buy_potential_null_count@2 != row_count@3 AND hd_buy_potential_min@0 <=
501-1000 AND 501-1000 <= hd_buy_potential_max@1,
required_guarantees=[hd_buy_potential in (501-1000)]
FilterExec: d_year@3 = 1999, projection=[d_date_sk@0,
d_date@1, d_week_seq@2]
DataSourceExec: file_groups={1 group:
[[Users/marko/TMP/tpcds_data/1g_parquet/date_dim.parquet]]},
projection=[d_date_sk, d_date, d_week_seq, d_year], file_type=parquet,
predicate=d_year@6 = 1999 AND d_year@6 = 1999 AND d_year@6 = 1999 AND d_year@6
= 1999 AND DynamicFilter [ empty ], pruning_predicate=d_year_null_count@2 !=
row_count@3 AND d_year_min@0 <= 1999 AND 1999 <= d_year_max@1 AND
d_year_null_count@2 != row_count@3 AND d_year_min@0 <= 1999 AND 1999 <=
d_year_max@1 AND d_year_null_count@2 != row_count@3 AND d_year_min@0 <= 1999
AND 1999 <= d_year_max@1 AND d_year_null_count@2 != row_count@3 AND
d_year_min@0 <= 1999 AND 1999 <= d_year_max@1, required_guarantees=[d_year in
(1999)]
DataSourceExec: file_groups={1 group:
[[Users/marko/TMP/tpcds_data/1g_parquet/date_dim.parquet]]},
projection=[d_date_sk, d_week_seq], file_type=parquet, predicate=DynamicFilter
[ empty ]
DataSourceExec: file_groups={1 group:
[[Users/marko/TMP/tpcds_data/1g_parquet/date_dim.parquet]]},
projection=[d_date_sk, d_date], file_type=parquet, predicate=DynamicFilter [
empty ]
DataSourceExec: file_groups={1 group:
[[Users/marko/TMP/tpcds_data/1g_parquet/promotion.parquet]]},
projection=[p_promo_sk], file_type=parquet
DataSourceExec: file_groups={32 groups:
[[Users/marko/TMP/tpcds_data/1g_parquet/catalog_returns.parquet:0..352852],
[Users/marko/TMP/tpcds_data/1g_parquet/catalog_returns.parquet:352852..705704],
[Users/marko/TMP/tpcds_data/1g_parquet/catalog_returns.parquet:705704..1058556],
[Users/marko/TMP/tpcds_data/1g_parquet/catalog_returns.parquet:1058556..1411408],
[Users/marko/TMP/tpcds_data/1g_parquet/catalog_returns.parquet:1411408..1764260],
...]}, projection=[cr_item_sk, cr_order_number], file_type=parquet
```
fails with
```
Task failed due to runtime execution error:
DataFusionError(Shared(Shared(Shared(Shared(Shared(Shared(Shared(ArrowError(OffsetOverflowError(2147731589),
Some(\"\"))))))))))\n")), Some(""))
```
**To Reproduce**
Steps to reproduce the behavior:
- enable AQE
- run Q72 (SF1)
**Expected behavior**
getting results back
**Additional context**
- with some changes in current dynamic join query functionality it can be
reproduced with SF10
- same error shows with sort and hash shuffle, so its probably not related
to ballista code
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]