[Maria-developers] Updated (by Guest): Table elimination (17)

2009-06-01 Thread worklog-noreply
---
  WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...: Table elimination
CREATION DATE..: Sun, 10 May 2009, 19:57
SUPERVISOR.: Monty
IMPLEMENTOR: Psergey
COPIES TO..: 
CATEGORY...: Server-Sprint
TASK ID: 17 (http://askmonty.org/worklog/?tid=17)
VERSION: Server-5.1
STATUS.: Assigned
PRIORITY...: 60
WORKED HOURS...: 0
ESTIMATE...: 0 (hours remain)
ORIG. ESTIMATE.: 0

PROGRESS NOTES:

-=-=(Guest - Mon, 01 Jun 2009, 12:49)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.32202 2009-06-01 12:49:15.0 +0300
+++ /tmp/wklog.17.new.32202 2009-06-01 12:49:15.0 +0300
@@ -8,7 +8,7 @@
 6. Todo, issues to resolve
 6.1 To resolve
 6.2 Resolved
-
+7. Additional issues
 /contents
 
 It's not really about elimination of tables, it's about elimination of inner
@@ -116,3 +116,9 @@
 * We remove ON clauses within semi-join nests. If these clauses contain
   subqueries, they probably should be gone from EXPLAIN output also?
 
+* Aggregate functions report they depend on all tables, that is,
+   
+ item_agg_func-used_tables() == (1ULL  join-tables) - 1
+
+  always. If we want table elimination to work in presence of grouping, need 
+  to devise some other way of analyzing aggregate functions.

-=-=(Guest - Fri, 29 May 2009, 00:45)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.1348  2009-05-29 00:45:21.0 +0300
+++ /tmp/wklog.17.new.1348  2009-05-29 00:45:21.0 +0300
@@ -111,3 +111,8 @@
   referred to an inner table (requirement for OJ-IJ conversion) then table
   elimination would not be applicable anyway.
 
+7. Additional issues
+
+* We remove ON clauses within semi-join nests. If these clauses contain
+  subqueries, they probably should be gone from EXPLAIN output also?
+

-=-=(Guest - Tue, 26 May 2009, 21:52)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.14120 2009-05-26 21:52:06.0 +0300
+++ /tmp/wklog.17.new.14120 2009-05-26 21:52:06.0 +0300
@@ -1,11 +1,14 @@
 contents
 1. Conditions for removal
+1.1 Quick check if there are candidates
 2. Removal operation properties
 3. Removal operation
 4. User interface
-5. Todo, issues to resolve
-5.1 To resolve
-5.2 Resolved
+5. Tests and benchmarks
+6. Todo, issues to resolve
+6.1 To resolve
+6.2 Resolved
+
 /contents
 
 It's not really about elimination of tables, it's about elimination of inner
@@ -29,6 +32,18 @@
   GROUP BY and HAVING do not refer to the inner tables of the outer join
   nest.
 
+1.1 Quick check if there are candidates
+~~~
+Before we start to enumerate join nests, here is a quick way to check if
+there *can be* something to be removed:
+
+  if ((tables used in select_list |  
+   tables used in group/order by UNION |
+   tables used in where) != bitmap_of_all_tables)
+  {
+ attempt table elimination;
+  }
+
 2. Removal operation properties
 ---
 * There is always one way to remove (no choice to remove either this or that)
@@ -56,22 +71,24 @@
 -
 * We'll add an @@optimizer switch flag for table elimination. Tentative
   name: 'table_elimination'.
+  (Note ^^ utility of the above questioned ^, as table elimination can never 
+   be worse than no elimination. We're leaning towards not adding the flag)
 
-* With EXPLAIN, there are two options: 
-  - Show removed tables in a way similar to const tables, with some
-indication that they are removed.
-  - Do not show them altogether.
-(the second one seems to be better? We're targeting a situation with VIEWs,
-where the user would not care about what tables were added into his query
-and then discarded from it?)
+* EXPLAIN will not show the removed tables at all. This will allow to check
+  if tables were removed, and also will behave nicely with anchor model and
+  VIEWs: stuff that user doesn't care about just won't be there.
+
+5. Tests and benchmarks
+---
+Should create a benchmark in sql-bench which checks if the dbms has table
+elimination.
+TODO elaborate
 
-5. Todo, issues to resolve
+6. Todo, issues to resolve
 --
 
-5.1 To resolve
+6.1 To resolve
 ~~
-- See EXPLAIN question in section #4.
-
 - Re-check how this works with equality propagation.
 
 - Relationship with prepared statements. 
@@ -87,7 +104,7 @@
   that we'll meet outer joins which have N inner tables of which some are 1-row
   MyISAM tables that do not have primary key.
 
-5.2 Resolved
+6.2 Resolved
 
 - outer-inner join conversion is not a problem for table elimination. 
   We make outer-inner conversions based on predicates in WHERE. If the WHERE

-=-=(Guest - Fri, 22 May 2009, 17:23)=-=-
High-Level Specification modified.
--- /tmp/wklog.17.old.30851 2009-05-22 17:23:38.0 

Re: [Maria-developers] Bzr merge order

2009-06-01 Thread Michael Widenius

Hi!

 Kristian == Kristian Nielsen kniel...@knielsen-hq.org writes:

K Following up on this, I just learned about the bzr option
K append_revisions_only that can be set on a branch.

K This option can be used to enforce that the main history (sequence of
K primary/left-hand parents from the tip) correctly reflects the series of
K pushes into the public tree.

K So assume this sequence of events:

K 1. Joe branches lp:maria (revision 1000), and starts hacking on a patch.
K 2. Other developers push revisions 1001, 1002, and 1003 to lp:maria.
K 3. Joe finishes the patch, the review is good. He runs `bzr merge lp:maria`,
Kfollowed by `bzr push lp:maria`.

K As it is now, the resulting history of lp:maria will look like this:

K 1002 Joe (merge)
K   1000.1.3 | Other
K   1000.1.2 | developer's
K   1000.1.1 | commits
K 1001 Joe's patch
K 1000 Starting point

K So it is not at all clear that the other commits were at some point pushed
K individually to lp:maria. And if someone goes to look at revision 1002
K thinking to see one of the other developer's commits, it will be missing (or
K worse refer to the wrong commit after further pushes).

K But if we set the append_revisions_only option on lp:maria, instead of this
K Joe will get this error:

K bzr: ERROR: Server sent an unexpected error: ('error', 'Operation denied 
because it would change the main history, which is not permitted by the 
append_revisions_only setting on branch 
lp-139886317008592:///~knielsen/maria/tmp-buildbot-test.')

K In this case, Joe will instead have to do this:

K bzr branch lp:maria  # or bzr pull lp:maria into an existing clean clone
K bzr merge ../branch-with-patch
K bzr commit -mmerged with trunk
K bzr push lp:maria

K Then the resulting history will be this:

K 1004 Joe (merge)
K   1000.1.1 Joe patch
K 1003 | Other
K 1002 | developer's
K 1001 | commits
K 1000 Starting point

K Which is much nicer, IMHO.

K So basically append_revisions_only enforces the merging style that I, Sanja,
K and Monty already proposed as a good practise.

K So I propose to add this option to the 5.1, 5.2, and 6.0 trees on
K Launchpad. This will provide a clear record of the push history in the
K repository, at the cost of enforcing the good practise merge style with one
K extra bzr step.

K Any opinions? Reasons not to do this?

I like the end result of this approach, but I have got a couple of
questions/concerns about this:

- Doing the extra branch is a bit of a pain and slows down things if you
  pushes a lot.  It would be nice to get this done internally in bzr as an
  part of the merge.
  Should we talk with canonical to get this option into bzr ?

- What is the sequence to do if someone does a push of this code at
  between the step 3) and step 4):

1)bzr branch lp:maria  # or bzr pull lp:maria into an existing clean clone
2)bzr merge ../branch-with-patch
3)bzr commit -mmerged with trunk

Someone else pushes here.

4) bzr push lp:maria

Doing the 'merge' again would be a real pain.

I assume one should in this case start again from step 1) and do the
merge with the new tree?

K If there is general agreement I will add the option (the process is somewhat
K inconvenient, but I tested a procedure using sftp and that works ok).

K (Incidentally, this way also makes it possible to correlate the branch 
history
K directly with build history from Buildbot/Pushbuild, something I really 
missed
K in the BitKeeper days).

Regards,
Monty

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] Bzr merge order

2009-06-01 Thread Michael Widenius

Hi!

 Kristian == Kristian Nielsen kniel...@knielsen-hq.org writes:

Kristian Kristian Nielsen kniel...@knielsen-hq.org writes:
 Following up on this, I just learned about the bzr option
 append_revisions_only that can be set on a branch.

Kristian Any opinions on this?

Kristian Having thought more on this, I really think we should enable
Kristian append_revisions_only.

Kristian There are many tools (and people as well) that depend on revision 
numbers and
Kristian have no knowledge about using revid: revision ids. And they often get 
confused
Kristian when a push changes the revision numbers of lots of recent commits.

Kristian Just two days ago on Tuesday, the merge of MySQL 5.1 changed the last 
6
Kristian revision numbers or so, and this seems to have confused the Buildbot 
Bzr code
Kristian to the point where it choose to not build the new merge at all. (I 
have written
Kristian and installed a new method for Buildbot to detect pushes that 
hopefully solves
Kristian this problem. But I fear there are more places in the Buildbot code 
which
Kristian assumes revision numbers are simple numbers that can be compared 
meaningfully
Kristian to determine their ordering).

Kristian In any case, having simple revision numbers that correctly reflect 
the push
Kristian history is just so much nicer to work with, and this is something I 
really
Kristian missed with Bitkeeper and Pushbuild.

Kristian So unless there are objections, I want to implement this change early 
next
Kristian week. At most, developers will need to do an extra pull into a fresh 
clone of
Kristian main trees before pushing, if they forget this when merging up. And 
if needed,
Kristian we can probably ask Bzr developers to implement a simple --reverse or
Kristian --use-other-as-base option to make even this extra step unnecessary.

Kristian  - Kristian.

Agree with above.

Kurt, can you talk with the bzr devlopers to implement a --reverse
merge option to bzr?

Regards,
Monty

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] Bzr merge order

2009-06-01 Thread Kristian Nielsen
Michael Widenius mo...@askmonty.org writes:

 K In this case, Joe will instead have to do this:

 K bzr branch lp:maria  # or bzr pull lp:maria into an existing clean 
 clone
 K bzr merge ../branch-with-patch
 K bzr commit -mmerged with trunk
 K bzr push lp:maria

 K Then the resulting history will be this:

 K 1004 Joe (merge)
 K   1000.1.1 Joe patch
 K 1003 | Other
 K 1002 | developer's
 K 1001 | commits
 K 1000 Starting point

 K Which is much nicer, IMHO.

 K So basically append_revisions_only enforces the merging style that I, 
 Sanja,
 K and Monty already proposed as a good practise.

 K So I propose to add this option to the 5.1, 5.2, and 6.0 trees on
 K Launchpad. This will provide a clear record of the push history in the
 K repository, at the cost of enforcing the good practise merge style with 
 one
 K extra bzr step.

 K Any opinions? Reasons not to do this?

 I like the end result of this approach, but I have got a couple of
 questions/concerns about this:

 - Doing the extra branch is a bit of a pain and slows down things if you
   pushes a lot.  It would be nice to get this done internally in bzr as an
   part of the merge.
   Should we talk with canonical to get this option into bzr ?

Yes, this is the main pain point.

There are a couple of things that can help with this:

1. If bzr people were to implement `bzr pull --reverse`, then there would be
no additional step, so problem would be solved (as long as one remembers to
use the --reverse). The merge to be done is exactly the same one way or the
other, the only difference is which tree becomes the primary parent and which
becomes the secondary.

2. I suppose most people (like me) in any case keep a branch around of the
main trees that we pull into regularly. This can be used to do the merge to
avoid the need for an extra `bzr branch`. But then if someone else were to
push to the main branch in-between, the clean local branch would have to be
re-created of course.

3. I often use bzr with `bzr branch --no-tree` branches and seperate `bzr
checkout --lightweight` working trees. This sometimes greatly speeds up
operations when one can use `bzr switch` to avoid having to make bzr check out
a complete working tree (which it does really slowly). But it has its own
hassles also due to the need to keep track of more directories, so it is no
silver bullet.

With --lightweight checkouts, one can do something like this, all the steps of
which run in a few seconds (on my machine at least):

(cd ~/repo  bzr branch --no-tree mariadb-5.1 mariadb-5.1-merge)
bzr switch ~/repo/mariadb-5.1-merge
bzr merge ~/repo/branch-with-patch
# compile, test
bzr push lp:maria

 - What is the sequence to do if someone does a push of this code at
   between the step 3) and step 4):

 1)bzr branch lp:maria  # or bzr pull lp:maria into an existing clean clone
 2)bzr merge ../branch-with-patch
 3)bzr commit -mmerged with trunk

 Someone else pushes here.

 4) bzr push lp:maria

 Doing the 'merge' again would be a real pain.

The issue is really the same no matter which of the two methods you use. If
someone else pushes in-between, you need to merge again.

 I assume one should in this case start again from step 1) and do the
 merge with the new tree?

Yes. One would need a new fresh branch of the now updated lp:maria tree, and
merge into that. One could merge in either the original branch-with-patch or
the branch with the first merge, depending on whatever works best (using the
original will save a merge changeset, but the line of primary parent commits
will be the same in the resulting push either way).

 K If there is general agreement I will add the option (the process is 
 somewhat
 K inconvenient, but I tested a procedure using sftp and that works ok).

I discussed this with Monty on IRC. I will implement it later this week in out
Launchpad trees (and if there turns out to be unforseen problems with it, it
will be easy enough to remove it again).

Monty also will talk to Jani to get this method into autopush (in the autopush
case this should be no extra hassle at all for the person pushing).

 - Kristian.

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


[Maria-developers] Updated (by Guest): index_merge: fair choice between index_merge union and range access (24)

2009-06-01 Thread worklog-noreply
---
  WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...: index_merge: fair choice between index_merge union and range 
access
CREATION DATE..: Tue, 26 May 2009, 12:10
SUPERVISOR.: Monty
IMPLEMENTOR: Psergey
COPIES TO..: Psergey
CATEGORY...: Server-RawIdeaBin
TASK ID: 24 (http://askmonty.org/worklog/?tid=24)
VERSION: 9.x
STATUS.: Un-Assigned
PRIORITY...: 60
WORKED HOURS...: 0
ESTIMATE...: 0 (hours remain)
ORIG. ESTIMATE.: 0

PROGRESS NOTES:

-=-=(Guest - Mon, 01 Jun 2009, 23:30)=-=-
High-Level Specification modified.
--- /tmp/wklog.24.old.21580 2009-06-01 23:30:06.0 +0300
+++ /tmp/wklog.24.new.21580 2009-06-01 23:30:06.0 +0300
@@ -64,6 +64,9 @@
 
 * How strict is the limitation on the form of the WHERE?
 
+* Which version should this be based on? 5.1? Which patches are should be in 
+  (google's/percona's/maria/etc?) 
+
 * TODO: The optimizer didn't compare costs of index_merge and range before (ok 
   it did but that was done for accesses to different tables). Will there be any
   possible gotchas here?

-=-=(Guest - Wed, 27 May 2009, 14:41)=-=-
Category updated.
--- /tmp/wklog.24.old.8414  2009-05-27 14:41:43.0 +0300
+++ /tmp/wklog.24.new.8414  2009-05-27 14:41:43.0 +0300
@@ -1 +1 @@
-Client-BackLog
+Server-RawIdeaBin

-=-=(Guest - Wed, 27 May 2009, 14:41)=-=-
Version updated.
--- /tmp/wklog.24.old.8414  2009-05-27 14:41:43.0 +0300
+++ /tmp/wklog.24.new.8414  2009-05-27 14:41:43.0 +0300
@@ -1 +1 @@
-Server-9.x
+9.x

-=-=(Guest - Wed, 27 May 2009, 13:59)=-=-
Title modified.
--- /tmp/wklog.24.old.9498  2009-05-27 13:59:23.0 +0300
+++ /tmp/wklog.24.new.9498  2009-05-27 13:59:23.0 +0300
@@ -1 +1 @@
-index_merge optimizer: dont discard index_merge union strategies when range is 
available
+index_merge: fair choice between index_merge union and range access

-=-=(Guest - Wed, 27 May 2009, 13:59)=-=-
Version updated.
--- /tmp/wklog.24.old.9498  2009-05-27 13:59:23.0 +0300
+++ /tmp/wklog.24.new.9498  2009-05-27 13:59:23.0 +0300
@@ -1 +1 @@
-Benchmarks-3.0
+Server-9.x

-=-=(Guest - Tue, 26 May 2009, 13:27)=-=-
High-Level Specification modified.
--- /tmp/wklog.24.old.305   2009-05-26 13:27:32.0 +0300
+++ /tmp/wklog.24.new.305   2009-05-26 13:27:32.0 +0300
@@ -1 +1,70 @@
+(Not a ready HLS but draft)
+contents
+Solution overview
+Limitations
+TODO
+
+/contents
+
+Solution overview
+=
+The idea is to delay discarding potential index_merge plans until the point 
+where it is really necessary. 
+
+This way, we won't have to do much changes in the range analyzer, but will be
+able to keep potential index_merge plan just enough so that it's possible to 
+take it into consideration together with range access plans. 
+
+Since there are no changes in the optimizer, the ability to consider both
+range and index_merge options will be limited to WHERE clauses of this form:
+
+ WHERE := range_cond(key1_1) AND 
+  range_cond(key2_1) AND
+  other_cond   AND 
+  index_merge_OR_cond1(key3_1, key3_2, ...)
+  index_merge_OR_cond2(key4_1, key4_2, ...)
+  
+where 
+  
+ index_merge_OR_cond{N} := (range_cond(keyN_1) OR 
+range_cond(keyN_2) OR ...)
+
+
+ range_cond(keyX) := condition that allows to construct range access of keyX
+ and doesn't allow to construct range/index_merge accesses
+ for any keys of the table in question. 
+
+
+For such WHERE clauses, the range analyzer will produce SEL_TREE of this form:
+
+  SEL_TREE(
+range(key1_1),
+...
+range(key2_1),
+SEL_IMERGE( (1)
+  SEL_TREE(key3_1})
+  SEL_TREE(key3_2})
+  ...
+)
+...
+  )
+
+which can be used to make a cost-based choice between range and index_merge.
+
+Limitations
+---
+This will not be a full solution in a sense that the range analyzer will not
+be able to produce sel_tree (1) if the WHERE clause is specified in other form
+(e.g. brackets were opened).
+
+TODO
+
+* is it a problem if there are keys that are referred to both from
+  index_merge and from range access?
+
+* How strict is the limitation on the form of the WHERE?
+
+* TODO: The optimizer didn't compare costs of index_merge and range before (ok 
+  it did but that was done for accesses to different tables). Will there be any
+  possible gotchas here?
 



DESCRIPTION:

Current range optimizer will discard possible index_merge/[sort]union 
strategies when there is a possible range plan. This action is a part of
measures we take to avoid combinatorial explosion of possible range/
index_merge strategies.

A bad side effect of this is that for WHERE clauses  in form 

t.key1=