> On Nov 20, 2023, at 17:52, Alexander Monakov <amona...@ispras.ru> wrote: > > > On Mon, 20 Nov 2023, Maxim Kuvyrkov wrote: > >> This patch avoids sched-deps.cc:find_inc() creating exponential number >> of dependencies, which become memory and compilation time hogs. >> Consider example (simplified from PR96388) ... >> === >> sp=sp-4 // sp_insnA >> mem_insnA1[sp+A1] >> ... >> mem_insnAN[sp+AN] >> sp=sp-4 // sp_insnB >> mem_insnB1[sp+B1] >> ... >> mem_insnBM[sp+BM] >> === >> ... in this example find_modifiable_mems() will arrange for mem_insnA* >> to be able to pass sp_insnA, and, while doing this, will create >> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB >> is a consumer of sp_insnA. After this sp_insnB will have N new >> backward dependencies. >> Then find_modifiable_mems() gets to mem_insnB*s and starts to create >> N new dependencies for _every_ mem_insnB*. This gets us N*M new >> dependencies.
[For avoidance of doubt, below discussion is about the general implementation of find_modifiable_mems() and not about the patch.] > > It's a bit hard to read this without knowing which value of 'backwards' > is assumed. > > Say 'backwards' is true and we are inspecting producer sp_insnB of mem_insnB1. > This is a true dependency. We know we can break it by adjusting B1 by -4, but > we need to be careful not to move such modified mem_insnB1 above sp_insnA, so > we need to iterate over *incoming true dependencies* of sp_insnB and add them. > > But the code seems to be iterating over *all incoming dependencies*, so it > will e.g. take anti-dependency mem_insnA1 -> sp_insnB and create a true > dependency mem_insnA1 -> mem_insnB1'. This seems utterly inefficient, if my > understanding is correct. Yeap, your understanding is correct. However, this is what find_modifiable_mems() has to do to avoid complicated analysis of second-level dependencies. I think, the optimization that find_modifiable_mems() does can be implemented as part of main dependency analysis, and I'm going to discuss that with Bernd. I might be missing something, but it seems that instruction transformations that find_modifiable_mems() is doing are simpler than what ia64 speculation is doing. So, I think, we should be able to leverage speculation infrastructure, rather than doing this optimization "on the side" of normal scheduling. -- Maxim Kuvyrkov https://www.linaro.org