NGA-TRAN opened a new issue, #18251: URL: https://github.com/apache/datafusion/issues/18251
### Is your feature request related to a problem or challenge? This task is part of feature https://github.com/apache/datafusion/issues/18249, showing join order enumeration for Q5 using a Join Ranking, where higher ranks indicate earlier execution. It assumes ranks are precomputed and focuses solely on enumeration and pruning. Rank computation will be covered in a separate ticket. ## Enumeration Examples Figure 2 shows three joins ranked 4, one ranked 3, and two ranked 2. The three rank-4 joins are selected first in the join order, initiating three in-progress plans as illustrated in Figure 3. In these examples, we assume the join output remains sorted on the join key if the larger input is already sorted. This reflects the behavior of merge joins, which preserve sort order. Hash joins, on the other hand, do not generally guarantee output order, though some engines may preserve probe-side order in specific cases. <img width="326" height="340" alt="Image" src="https://github.com/user-attachments/assets/a0c5dc39-383f-46cb-8d34-8a9d17d7c1fa" /> Figure 2: Join Graph Annotated with Join Ranked After a join is selected, remaining join ranks are recalculated using updated properties and may change from earlier values. <img width="1064" height="373" alt="Image" src="https://github.com/user-attachments/assets/efc3c4c6-9882-4312-b269-49deb059c146" /> Figure 3: Level-1 Partial Plans In this enumeration scheme/examples, join type matters: if inputs are sorted, both merge and hash joins are considered; otherwise, only hash joins are used. As a result, the three plans above expand into five plans shown in Figure 4. <img width="1344" height="385" alt="Image" src="https://github.com/user-attachments/assets/0ab1a5bc-5140-4dfa-be07-b5aa1e6beac2" /> Figure 4: Level-1 Partial Plans with Join Type Selection To avoid plan space explosion during enumeration, we retain at most two plans after each step. A pruning strategy is applied to discard less promising options. We'll cover pruning later; for now, we assume it eliminates plans (1.2) and (2), as shown in Figure 5. <img width="1360" height="381" alt="Image" src="https://github.com/user-attachments/assets/002cee6a-a4de-4442-8f2a-b99da7e5f668" /> Figure 5: Level-1 Partial Plans After Join Type Selection and Pruning Similarly, the remaining level-1 partial plans (1.1) and (3) generate three level-2 plans, as shown in Figure 6. <img width="1064" height="362" alt="Image" src="https://github.com/user-attachments/assets/993e1888-aff8-4af2-aa47-1cfb42b52737" /> Figure 6: Level-2 Partial Plans Then plan (3.1) is pruned because it shares the same structure as plan (1.1.1). <img width="1062" height="376" alt="Image" src="https://github.com/user-attachments/assets/759b530c-3eff-4dfb-b100-483255d7bc64" /> Figure 7: Level-2 Partial Plans After Pruning Continuing, level-2 plans (1.1.1) and (3.2) each contain a single highest-ranked join (rank 3), resulting in two level-3 plans shown in Figure 8. <img width="538" height="212" alt="Image" src="https://github.com/user-attachments/assets/687dc456-05d9-4f33-a3e8-7d228a5300bd" /> Figure 8: Level-3 Partial Plans Continuing, Figures 9 and 10 show the level-4 plans before and after pruning, respectively. <img width="1184" height="208" alt="Image" src="https://github.com/user-attachments/assets/261093cf-e2a1-4b0d-a662-a69fdd9dfbe8" /> Figure 9: Level-4 Partial Plans <img width="1202" height="191" alt="Image" src="https://github.com/user-attachments/assets/19f50e00-12ab-4e38-97f5-4b079328a575" /> Figure10: Level-4 Partial Plans after Pruning Finally, Figure 11 shows the two final-level plans, with plan (3.2.1.1.1) selected as the best. <img width="460" height="204" alt="Image" src="https://github.com/user-attachments/assets/2243e007-2eaa-4e36-8ee8-da3c08850dfd" /> Figure11: Final-level Plans ## Enumeration Summary As shown, the enumeration process depends on three customizable components: - A join ranking strategy, which can vary and be tailored as needed - A pruning mechanism, which can also be adapted - An option to include or exclude join types (i.e., physical plans) during enumeration This suggests we can design a flexible enumeration framework that avoids relying on fixed join rankings or pruning strategies. ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
