[Maria-developers] Updated (by Guest): Table elimination (17)
--- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...: Table elimination CREATION DATE..: Sun, 10 May 2009, 19:57 SUPERVISOR.: Monty IMPLEMENTOR: Psergey COPIES TO..: CATEGORY...: Server-Sprint TASK ID: 17 (http://askmonty.org/worklog/?tid=17) VERSION: Server-5.1 STATUS.: Assigned PRIORITY...: 60 WORKED HOURS...: 0 ESTIMATE...: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Mon, 01 Jun 2009, 12:49)=-=- Low Level Design modified. --- /tmp/wklog.17.old.32202 2009-06-01 12:49:15.0 +0300 +++ /tmp/wklog.17.new.32202 2009-06-01 12:49:15.0 +0300 @@ -8,7 +8,7 @@ 6. Todo, issues to resolve 6.1 To resolve 6.2 Resolved - +7. Additional issues /contents It's not really about elimination of tables, it's about elimination of inner @@ -116,3 +116,9 @@ * We remove ON clauses within semi-join nests. If these clauses contain subqueries, they probably should be gone from EXPLAIN output also? +* Aggregate functions report they depend on all tables, that is, + + item_agg_func-used_tables() == (1ULL join-tables) - 1 + + always. If we want table elimination to work in presence of grouping, need + to devise some other way of analyzing aggregate functions. -=-=(Guest - Fri, 29 May 2009, 00:45)=-=- Low Level Design modified. --- /tmp/wklog.17.old.1348 2009-05-29 00:45:21.0 +0300 +++ /tmp/wklog.17.new.1348 2009-05-29 00:45:21.0 +0300 @@ -111,3 +111,8 @@ referred to an inner table (requirement for OJ-IJ conversion) then table elimination would not be applicable anyway. +7. Additional issues + +* We remove ON clauses within semi-join nests. If these clauses contain + subqueries, they probably should be gone from EXPLAIN output also? + -=-=(Guest - Tue, 26 May 2009, 21:52)=-=- Low Level Design modified. --- /tmp/wklog.17.old.14120 2009-05-26 21:52:06.0 +0300 +++ /tmp/wklog.17.new.14120 2009-05-26 21:52:06.0 +0300 @@ -1,11 +1,14 @@ contents 1. Conditions for removal +1.1 Quick check if there are candidates 2. Removal operation properties 3. Removal operation 4. User interface -5. Todo, issues to resolve -5.1 To resolve -5.2 Resolved +5. Tests and benchmarks +6. Todo, issues to resolve +6.1 To resolve +6.2 Resolved + /contents It's not really about elimination of tables, it's about elimination of inner @@ -29,6 +32,18 @@ GROUP BY and HAVING do not refer to the inner tables of the outer join nest. +1.1 Quick check if there are candidates +~~~ +Before we start to enumerate join nests, here is a quick way to check if +there *can be* something to be removed: + + if ((tables used in select_list | + tables used in group/order by UNION | + tables used in where) != bitmap_of_all_tables) + { + attempt table elimination; + } + 2. Removal operation properties --- * There is always one way to remove (no choice to remove either this or that) @@ -56,22 +71,24 @@ - * We'll add an @@optimizer switch flag for table elimination. Tentative name: 'table_elimination'. + (Note ^^ utility of the above questioned ^, as table elimination can never + be worse than no elimination. We're leaning towards not adding the flag) -* With EXPLAIN, there are two options: - - Show removed tables in a way similar to const tables, with some -indication that they are removed. - - Do not show them altogether. -(the second one seems to be better? We're targeting a situation with VIEWs, -where the user would not care about what tables were added into his query -and then discarded from it?) +* EXPLAIN will not show the removed tables at all. This will allow to check + if tables were removed, and also will behave nicely with anchor model and + VIEWs: stuff that user doesn't care about just won't be there. + +5. Tests and benchmarks +--- +Should create a benchmark in sql-bench which checks if the dbms has table +elimination. +TODO elaborate -5. Todo, issues to resolve +6. Todo, issues to resolve -- -5.1 To resolve +6.1 To resolve ~~ -- See EXPLAIN question in section #4. - - Re-check how this works with equality propagation. - Relationship with prepared statements. @@ -87,7 +104,7 @@ that we'll meet outer joins which have N inner tables of which some are 1-row MyISAM tables that do not have primary key. -5.2 Resolved +6.2 Resolved - outer-inner join conversion is not a problem for table elimination. We make outer-inner conversions based on predicates in WHERE. If the WHERE -=-=(Guest - Fri, 22 May 2009, 17:23)=-=- High-Level Specification modified. --- /tmp/wklog.17.old.30851 2009-05-22 17:23:38.0
Re: [Maria-developers] Bzr merge order
Hi! Kristian == Kristian Nielsen kniel...@knielsen-hq.org writes: K Following up on this, I just learned about the bzr option K append_revisions_only that can be set on a branch. K This option can be used to enforce that the main history (sequence of K primary/left-hand parents from the tip) correctly reflects the series of K pushes into the public tree. K So assume this sequence of events: K 1. Joe branches lp:maria (revision 1000), and starts hacking on a patch. K 2. Other developers push revisions 1001, 1002, and 1003 to lp:maria. K 3. Joe finishes the patch, the review is good. He runs `bzr merge lp:maria`, Kfollowed by `bzr push lp:maria`. K As it is now, the resulting history of lp:maria will look like this: K 1002 Joe (merge) K 1000.1.3 | Other K 1000.1.2 | developer's K 1000.1.1 | commits K 1001 Joe's patch K 1000 Starting point K So it is not at all clear that the other commits were at some point pushed K individually to lp:maria. And if someone goes to look at revision 1002 K thinking to see one of the other developer's commits, it will be missing (or K worse refer to the wrong commit after further pushes). K But if we set the append_revisions_only option on lp:maria, instead of this K Joe will get this error: K bzr: ERROR: Server sent an unexpected error: ('error', 'Operation denied because it would change the main history, which is not permitted by the append_revisions_only setting on branch lp-139886317008592:///~knielsen/maria/tmp-buildbot-test.') K In this case, Joe will instead have to do this: K bzr branch lp:maria # or bzr pull lp:maria into an existing clean clone K bzr merge ../branch-with-patch K bzr commit -mmerged with trunk K bzr push lp:maria K Then the resulting history will be this: K 1004 Joe (merge) K 1000.1.1 Joe patch K 1003 | Other K 1002 | developer's K 1001 | commits K 1000 Starting point K Which is much nicer, IMHO. K So basically append_revisions_only enforces the merging style that I, Sanja, K and Monty already proposed as a good practise. K So I propose to add this option to the 5.1, 5.2, and 6.0 trees on K Launchpad. This will provide a clear record of the push history in the K repository, at the cost of enforcing the good practise merge style with one K extra bzr step. K Any opinions? Reasons not to do this? I like the end result of this approach, but I have got a couple of questions/concerns about this: - Doing the extra branch is a bit of a pain and slows down things if you pushes a lot. It would be nice to get this done internally in bzr as an part of the merge. Should we talk with canonical to get this option into bzr ? - What is the sequence to do if someone does a push of this code at between the step 3) and step 4): 1)bzr branch lp:maria # or bzr pull lp:maria into an existing clean clone 2)bzr merge ../branch-with-patch 3)bzr commit -mmerged with trunk Someone else pushes here. 4) bzr push lp:maria Doing the 'merge' again would be a real pain. I assume one should in this case start again from step 1) and do the merge with the new tree? K If there is general agreement I will add the option (the process is somewhat K inconvenient, but I tested a procedure using sftp and that works ok). K (Incidentally, this way also makes it possible to correlate the branch history K directly with build history from Buildbot/Pushbuild, something I really missed K in the BitKeeper days). Regards, Monty ___ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Re: [Maria-developers] Bzr merge order
Hi! Kristian == Kristian Nielsen kniel...@knielsen-hq.org writes: Kristian Kristian Nielsen kniel...@knielsen-hq.org writes: Following up on this, I just learned about the bzr option append_revisions_only that can be set on a branch. Kristian Any opinions on this? Kristian Having thought more on this, I really think we should enable Kristian append_revisions_only. Kristian There are many tools (and people as well) that depend on revision numbers and Kristian have no knowledge about using revid: revision ids. And they often get confused Kristian when a push changes the revision numbers of lots of recent commits. Kristian Just two days ago on Tuesday, the merge of MySQL 5.1 changed the last 6 Kristian revision numbers or so, and this seems to have confused the Buildbot Bzr code Kristian to the point where it choose to not build the new merge at all. (I have written Kristian and installed a new method for Buildbot to detect pushes that hopefully solves Kristian this problem. But I fear there are more places in the Buildbot code which Kristian assumes revision numbers are simple numbers that can be compared meaningfully Kristian to determine their ordering). Kristian In any case, having simple revision numbers that correctly reflect the push Kristian history is just so much nicer to work with, and this is something I really Kristian missed with Bitkeeper and Pushbuild. Kristian So unless there are objections, I want to implement this change early next Kristian week. At most, developers will need to do an extra pull into a fresh clone of Kristian main trees before pushing, if they forget this when merging up. And if needed, Kristian we can probably ask Bzr developers to implement a simple --reverse or Kristian --use-other-as-base option to make even this extra step unnecessary. Kristian - Kristian. Agree with above. Kurt, can you talk with the bzr devlopers to implement a --reverse merge option to bzr? Regards, Monty ___ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Re: [Maria-developers] Bzr merge order
Michael Widenius mo...@askmonty.org writes: K In this case, Joe will instead have to do this: K bzr branch lp:maria # or bzr pull lp:maria into an existing clean clone K bzr merge ../branch-with-patch K bzr commit -mmerged with trunk K bzr push lp:maria K Then the resulting history will be this: K 1004 Joe (merge) K 1000.1.1 Joe patch K 1003 | Other K 1002 | developer's K 1001 | commits K 1000 Starting point K Which is much nicer, IMHO. K So basically append_revisions_only enforces the merging style that I, Sanja, K and Monty already proposed as a good practise. K So I propose to add this option to the 5.1, 5.2, and 6.0 trees on K Launchpad. This will provide a clear record of the push history in the K repository, at the cost of enforcing the good practise merge style with one K extra bzr step. K Any opinions? Reasons not to do this? I like the end result of this approach, but I have got a couple of questions/concerns about this: - Doing the extra branch is a bit of a pain and slows down things if you pushes a lot. It would be nice to get this done internally in bzr as an part of the merge. Should we talk with canonical to get this option into bzr ? Yes, this is the main pain point. There are a couple of things that can help with this: 1. If bzr people were to implement `bzr pull --reverse`, then there would be no additional step, so problem would be solved (as long as one remembers to use the --reverse). The merge to be done is exactly the same one way or the other, the only difference is which tree becomes the primary parent and which becomes the secondary. 2. I suppose most people (like me) in any case keep a branch around of the main trees that we pull into regularly. This can be used to do the merge to avoid the need for an extra `bzr branch`. But then if someone else were to push to the main branch in-between, the clean local branch would have to be re-created of course. 3. I often use bzr with `bzr branch --no-tree` branches and seperate `bzr checkout --lightweight` working trees. This sometimes greatly speeds up operations when one can use `bzr switch` to avoid having to make bzr check out a complete working tree (which it does really slowly). But it has its own hassles also due to the need to keep track of more directories, so it is no silver bullet. With --lightweight checkouts, one can do something like this, all the steps of which run in a few seconds (on my machine at least): (cd ~/repo bzr branch --no-tree mariadb-5.1 mariadb-5.1-merge) bzr switch ~/repo/mariadb-5.1-merge bzr merge ~/repo/branch-with-patch # compile, test bzr push lp:maria - What is the sequence to do if someone does a push of this code at between the step 3) and step 4): 1)bzr branch lp:maria # or bzr pull lp:maria into an existing clean clone 2)bzr merge ../branch-with-patch 3)bzr commit -mmerged with trunk Someone else pushes here. 4) bzr push lp:maria Doing the 'merge' again would be a real pain. The issue is really the same no matter which of the two methods you use. If someone else pushes in-between, you need to merge again. I assume one should in this case start again from step 1) and do the merge with the new tree? Yes. One would need a new fresh branch of the now updated lp:maria tree, and merge into that. One could merge in either the original branch-with-patch or the branch with the first merge, depending on whatever works best (using the original will save a merge changeset, but the line of primary parent commits will be the same in the resulting push either way). K If there is general agreement I will add the option (the process is somewhat K inconvenient, but I tested a procedure using sftp and that works ok). I discussed this with Monty on IRC. I will implement it later this week in out Launchpad trees (and if there turns out to be unforseen problems with it, it will be easy enough to remove it again). Monty also will talk to Jani to get this method into autopush (in the autopush case this should be no extra hassle at all for the person pushing). - Kristian. ___ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
[Maria-developers] Updated (by Guest): index_merge: fair choice between index_merge union and range access (24)
--- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...: index_merge: fair choice between index_merge union and range access CREATION DATE..: Tue, 26 May 2009, 12:10 SUPERVISOR.: Monty IMPLEMENTOR: Psergey COPIES TO..: Psergey CATEGORY...: Server-RawIdeaBin TASK ID: 24 (http://askmonty.org/worklog/?tid=24) VERSION: 9.x STATUS.: Un-Assigned PRIORITY...: 60 WORKED HOURS...: 0 ESTIMATE...: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Mon, 01 Jun 2009, 23:30)=-=- High-Level Specification modified. --- /tmp/wklog.24.old.21580 2009-06-01 23:30:06.0 +0300 +++ /tmp/wklog.24.new.21580 2009-06-01 23:30:06.0 +0300 @@ -64,6 +64,9 @@ * How strict is the limitation on the form of the WHERE? +* Which version should this be based on? 5.1? Which patches are should be in + (google's/percona's/maria/etc?) + * TODO: The optimizer didn't compare costs of index_merge and range before (ok it did but that was done for accesses to different tables). Will there be any possible gotchas here? -=-=(Guest - Wed, 27 May 2009, 14:41)=-=- Category updated. --- /tmp/wklog.24.old.8414 2009-05-27 14:41:43.0 +0300 +++ /tmp/wklog.24.new.8414 2009-05-27 14:41:43.0 +0300 @@ -1 +1 @@ -Client-BackLog +Server-RawIdeaBin -=-=(Guest - Wed, 27 May 2009, 14:41)=-=- Version updated. --- /tmp/wklog.24.old.8414 2009-05-27 14:41:43.0 +0300 +++ /tmp/wklog.24.new.8414 2009-05-27 14:41:43.0 +0300 @@ -1 +1 @@ -Server-9.x +9.x -=-=(Guest - Wed, 27 May 2009, 13:59)=-=- Title modified. --- /tmp/wklog.24.old.9498 2009-05-27 13:59:23.0 +0300 +++ /tmp/wklog.24.new.9498 2009-05-27 13:59:23.0 +0300 @@ -1 +1 @@ -index_merge optimizer: dont discard index_merge union strategies when range is available +index_merge: fair choice between index_merge union and range access -=-=(Guest - Wed, 27 May 2009, 13:59)=-=- Version updated. --- /tmp/wklog.24.old.9498 2009-05-27 13:59:23.0 +0300 +++ /tmp/wklog.24.new.9498 2009-05-27 13:59:23.0 +0300 @@ -1 +1 @@ -Benchmarks-3.0 +Server-9.x -=-=(Guest - Tue, 26 May 2009, 13:27)=-=- High-Level Specification modified. --- /tmp/wklog.24.old.305 2009-05-26 13:27:32.0 +0300 +++ /tmp/wklog.24.new.305 2009-05-26 13:27:32.0 +0300 @@ -1 +1,70 @@ +(Not a ready HLS but draft) +contents +Solution overview +Limitations +TODO + +/contents + +Solution overview += +The idea is to delay discarding potential index_merge plans until the point +where it is really necessary. + +This way, we won't have to do much changes in the range analyzer, but will be +able to keep potential index_merge plan just enough so that it's possible to +take it into consideration together with range access plans. + +Since there are no changes in the optimizer, the ability to consider both +range and index_merge options will be limited to WHERE clauses of this form: + + WHERE := range_cond(key1_1) AND + range_cond(key2_1) AND + other_cond AND + index_merge_OR_cond1(key3_1, key3_2, ...) + index_merge_OR_cond2(key4_1, key4_2, ...) + +where + + index_merge_OR_cond{N} := (range_cond(keyN_1) OR +range_cond(keyN_2) OR ...) + + + range_cond(keyX) := condition that allows to construct range access of keyX + and doesn't allow to construct range/index_merge accesses + for any keys of the table in question. + + +For such WHERE clauses, the range analyzer will produce SEL_TREE of this form: + + SEL_TREE( +range(key1_1), +... +range(key2_1), +SEL_IMERGE( (1) + SEL_TREE(key3_1}) + SEL_TREE(key3_2}) + ... +) +... + ) + +which can be used to make a cost-based choice between range and index_merge. + +Limitations +--- +This will not be a full solution in a sense that the range analyzer will not +be able to produce sel_tree (1) if the WHERE clause is specified in other form +(e.g. brackets were opened). + +TODO + +* is it a problem if there are keys that are referred to both from + index_merge and from range access? + +* How strict is the limitation on the form of the WHERE? + +* TODO: The optimizer didn't compare costs of index_merge and range before (ok + it did but that was done for accesses to different tables). Will there be any + possible gotchas here? DESCRIPTION: Current range optimizer will discard possible index_merge/[sort]union strategies when there is a possible range plan. This action is a part of measures we take to avoid combinatorial explosion of possible range/ index_merge strategies. A bad side effect of this is that for WHERE clauses in form t.key1=