I have some queries, involving a largish number of JOIN, which are apparently very slow or even take forever (a mysqladmin processlist shows them remain in the "statistics" status for a long time, in most cases I have to kill them after several minutes).
When I first had the problem I googled around and found some reference (which I've lost) saying that the "statistics" status is actually what one does with an EXPLAIN SELECT, and that this is done preliminarily to the actual query. It also said it might occur with a large number of joins because this analysis, for n joins MIGHT try up to n! combinations, unless one somehow specified the priorities (but the author did not remember how). I thought to have overcome the problem using a feature of the CREATE VIEW command (see below), but apparently I simply moved it to an higher n. Now I tried to see how it scales with the number of joins, and the curious things is that e.g. for n=9 it works fast, for n=15 it works slowly, for n=18 works fast again and for n=20 takes an infinite time. I'll first explain my background : - I have a number of tables (let's call them t0, t1, t2 ...) - all of them have an auto_increment column called seq which is also an index - one table (t0) is "more important" (actually it is list of celestial X-ray sources while the other are celestial objects in other wavebands but this is irrelevant to you). - I have precomputed correlation tables among t0 and each of the other. These tables are called eg. t0ti, have two columns t0 and ti corresponding to the t0.seq and ti.seq of objects which are "associated". They are indexed on (t0,ti). Note that an object in t0 can be associated with 1 or more or zero (t0ti.ti null) objects in ti. - I originally (already under mysql 3) devised a way to identify counterparts in MORE tables (all these associations are based on spherical distance + other criteria). This involved creating a working table G This table has columns named t0 t1 ... tn (containing the pointers t0.seq t1.seq ... for counterparts associated and validated according to some criteria) plus other service columns The simultaneous access was achieved in our interface by a mechanism we called virtual tables, which essentially was SELECT some subset of columns in some of the t0...tn or some expression thereof FROM G left join t0 on G.t0=t0.seq left join t1 on G.t1=t1.seq ... left join tn on G.tn=tn.seq We refer to the t0...tn as "the member tables" of G. We have different versions of G corresponding to different sets of member tables and different association criteria. The largest of our cases has 26 different members. Our mechanism was such that we defined a subset of columns in each of the ti (or expressions thereof like distances etc.) as "interesting", with an associated alias. Our interface usually showed only such "virtual columns", but had a possibility to add (naming them manually as ti.colname) to the SELECT also all other "member columns" normally hidden. We also allow to correlate a virtual table with a single physical table tk (be it member or not) using the t0tk correlation table (t0 is the "First Member"). - the above worked and still works, but has some clumsiness. When we upgraded to mysql 5 and discovered the CREATE VIEW command we decided to replace our virtual tables with views. - for each G we define a view as create algoritm=temptable view V as SELECT some subset of columns in some of the t0...tn or in G or some expression thereof FROM G left join t0 on G.t0=t0.seq left join t1 on G.t1=t1.seq ... left join tn on G.tn=tn.seq - the "algorithm=temptable" was required because without it some of our queries (see below) entered in the "statistics" status forever already with 11 members - for the rest the VIEWs work nicely when used standalone and are easier for the user ... - ... but on the other hand they HIDE the member columns which are not explicitly named in CREATE VIEW (where one wants to keep a manageable number of columns). Hide means here that their names ti.colname cannot be used in SELECT ! - so we devised an option by which on ticking on "show members also" one can also include these ti.colname in the query de facto this doubles the joins, because the statement built is SELECT list of (V.colname and ti.colname with i chosen among 0 and n) FROM ( G left join t0 on G.t0=t0.seq left join t1 on G.t1=t1.seq ... left join tn on G.tn=tn.seq ) left join V on G.seq=V.seq This statement NOW works (it did not work with e.g. 11 member tables before we switched to ALGORITHM=TEMPTABLE in the CREATE VIEW). An "explain select" for a query on such views gives that a view with n members "with members also enabled" involves 29+n queries (our [working] maximum is n=26 with 55 queries) I have also noticed that such "explain select" are relatively fast (0.3 sec) and are somehow cached (if I repeat one, it takes zero time) - the last step would be to make a query on a VIEW with "show members also" ticked, correlated with another table Tk (be Tk either a member, one of t1...tn, or a non-member) This involves a query like this SELECT list of (V.colname and ti.colname and Tk.colname) FROM ( VTk left join ( (G left join t0 on G.t0=t0.seq left join t1 on G.t1=t1.seq ... left join tn on G.tn=tn.seq ) left join V on G.seq=V.seq ) on VTk.V = V.seq ) left join Tk on VTk.Tk=Tk/seq where the correlation table VTk is also a view CREATE ALGORITHM=TEMPTABLE VIEW VTk as select G.seq as V, G.Tk as Tk from G left join t0Tk on G.t0=t0Tk.t0 which exploits the precomputed correlation among the "important" table t0 (the "First Member") and Tk. Note that t0Tk is indexed on (t0,Tk) but the view apparently is not indexed. - now such a query on our largest table (26 members) hangs in statistics status forever. I experimented therefore trying to use a variable names of left joins (this to tell when it stops working ... in real life I *might* need all, or any of the member). The query for "view with members also correlated with another table" in an "explain select" requires 33+n queries for n members. Now the funny thing is that up to e.g. 9 members "explain select" is rather fast (0.3 sec). For 15 members (48 queries) slows down significantly (6.3 sec) and is not cached (even if repeated soon still takes 6.3 sec). For 20 members it takes forever. But for 18 members (51 queries) instead takes only 0.5 sec. When I say 9,15,18,20 members I mean the first 9,15,18,20 of the full member list. Such a list is in an arbitrary order (sort of historical the various tables entered the database). Also I note that the output of "explain select" presents the various tables in a varying order. For instance the slow 15-member case but also the fast 9-member case have "<derived2>" (which is the 13000-element un-indexed VTk correlation view) interspersed with the other table, the fast 18-member case has it first. Questions : - what does "explain select" actually do and why sometimes hangs ? - can this be overcome rearranging the order of the joins (note that the bulk of the members are all joined with G), or introducing parentheses or with other syntax changes ? - or has it to do with some configuration parameter, maybe related to what is cached, cache size or other ? Thanks in advance to whoever is able to give hints. -- ----------------------------------------------------------------------- Lucio Chiappetti - INAF/IASF - via Bassini 15 - I-20133 Milano (Italy) For more info : http://www.iasf-milano.inaf.it/~lucio/personal.html ----------------------------------------------------------------------- -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]