I mean you can push your logic into CarbonDataSourceScan as a dynamic runtime filter.
Actually, CarbonDataSourceScan already used min/max zoom maps as an index filter to prune blocklist (in the CarbonScanRDD.getPartition method). We can do more things on the join query. Here I assume the source table is much smaller than the target table. 1. when the join broadcast the source table 1.1 when the join columns contain the partition keys of the target table, it can reuse the result of the broadcast to prune the partitions of the target table. 1.2 when the join query has some filters on the target table, use min/max zoom maps to prune the block list of the target table 1.3 when the join query has some filters on the source table, it can use min/max zoom maps of join columns to match the result of the broadcast 2. when the join doesn't broadcast the source table 2.1 when the join query has some filters on the target table, use min/max zoom maps to prune the block list of the target table 2.2 join source table with min/max zoom maps of the target table to get the new block list. In the future, it better to move all pruning logics of the driver side into one place and invoke them in CarbonDataSourceScan to get input partitions for ScanRDD. (include min/max index, si, partition pruning, and dynamic filters) ----- Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/