https://github.com/NagyDonat commented:
Unfortunately I'm not convinced that this is the right direction for improving the analyzer runtime. On the "risks" side I think that adding the corner case that "this may also return `UnknownVal` in rare situations" into many functions complicates the logic, burdens the code with early return branches and I fear that it will act as a footgun. On the "benefits" side I fear that your statistics don't prove enough: 1. You found that _"Out of the worst 500 entry points, 45 were improved by at least 10%. Out of these 45, 5 were improved by more than 50%. Out of these 45, 2 were improved by more than 80%."_ but this only covers 9% of the worst 500 entry points. Eyeballing the graph suggests that there are some cases where the runtime actually got _worse_ -- so please check that the overall effect of the change is also positive (e.g. the total runtime is reduced meaningfully). 2. Moreover, if "worst 500 entry points" means "worst 500 in the first run", then it is _a biased sample_: if you pick the worst outliers (i.e. the entry points where the sum _expected runtime + luck factor_ is largest), then you are expected to get entry points with worse than average luck (because among two similar entry points, the one with bad luck ends up in the worst 500 while the one with good luck avoids it), so if you redo the measurement, then [regression toward the mean](https://en.wikipedia.org/wiki/Regression_toward_the_mean) will produce better results -- even if you do both measurements with the same setup! As a sanity check, please redo the statistics on the the entry points that produced the worst 500 runtimes _in the second run_ -- I fear that on that sample (which is biased in the opposite direction) you will see that the new revision is _worse_ than the baseline. 3. I'm also interested in comparing the statistical results with a second independent measurement -- is the set of "worst 500 entry points" stable between runs, or are these random unlucky functions that are hit with environmental issues? If you can share the raw data, I can help with statistical calculations. https://github.com/llvm/llvm-project/pull/144327 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits