Thank you for reviewing this patch, Bryan.  My comments below.

One question, though: what is the actual symptom of this bug? If I am understanding it correctly, the symptom is that the optimizer may pointlessly continue to investigate a possible query plan which it should already be able to reject as being
too expensive.

Yes, this is correct.

Does that mean that the only symptom here is that the optimizer
may waste some resources?

Directly, yes. But indirectly, this could affect the the query plan that optimizer chooses because of optimizer "timeout". The more time the optimizer wastes optimizing join orders that it should already be able to reject, the fewer join orders it's going to see before timeout occurs. The patch for DERBY-1357 avoids the wasted time, thereby allowing the optimizer to (potentially) see more join orders before timeout occurs, and thus potentially allowing the optimizer to find a better plan.

That said, you may be wondering why I didn't add any new tests with this patch. The answer is that the only time this patch will show a difference in query plan is if timeout is enabled (as explained above), but as I've learned from other tests already in the harness (esp. wisconsin, predicatePushdown, subquery), a test that enables timeout and then prints query plans is bound to fail (usually intermittently) on one system or another (because timeout is actually based on milliseconds and the amount of work that the optimizer can do in xxx milliseconds depends a lot on what machine is being used). So adding a test case with timeout disabled won't show the symptom, and adding it with timeout enabled will likely lead to intermittent failures on different machines. Hence no new tests.

The diff that occurs for predicatePushdown is an admittedly accidental way to see that the patch is in fact short-circuiting plans (see above comment on the master update), but other than that I'm not sure how else to measure the symptom.

I guess the thing I'm still struggling with is the bit of the change which adds "reloadBestPlan = true" at about line 530. Does that part of the patch have any actual user-visible impact on which query plan would be chosen,

Great question. Timeout effects aside, the plan that the optimizer *chooses* will be the same with or without this "reloadBestPlan" change. However, the plan that the optimizer *generates* could change with this patch. Let's say that the optimizer chooses some access plan for a subquery in some specific join order and calls it the "best so far", then it starts processing a new join order in which the plan for the subquery changes, and then short-circuit happens (either because of the DERBY-1357 changes or because of timeout). Without the "reloadBestPlan" change the optimizer would have *chosen* the "best so far" plan as the overall best one--but since it didn't reload the best plan for the subquery, it would actually end up *generating* the latest plan (i.e. the plan that was found as part of the short-circuited join order). By setting reloadBestPlan to true, we make sure that the optimizer will reload the "best so far" plan as its best, and thus that's the plan that will ultimately get generated.

such that we need to have a release note here?

Hmm, another good question. I guess I didn't think this warranted a release note because it's fixing an existing problem in the optimizer (or actually, two: incorrect short-circuit logic and failure to reload best plans when short-circuit occurs) and now the optimizer is generating the correct plan based on short-circuiting. But since, as you say, this could cause existing query plans to change, I guess this might merit a release note after all...?

If that's the case, then the same will be true of DERBY-781 (and was also true of DERBY-1007 and DERBY-805 changes...), and in all cases the release note would have to be something generic like "optimizer fixes/enhancements could cause the optimizer to choose different query plans, so performance of existing queries may change". Theoretically the performance change should always be a *good* thing, and it seems a bit odd to include a release note saying "queries might run more quickly now!" :) However, I guess the practical side of it is that some queries could potentially end up with worse query plans, so perhaps a release note is a good idea...

If that's the consensus, I'll try to post a suggested release note to this issue (and maybe to DERBY-781, as well).

Hopefully that answers your questions--but if not, please feel free to ask more.

And thanks again for the review; I appreciate it!
Army

Reply via email to