Re: excessive time spent in "statistics" status
Just a quick note to add to this conversation: Pete Harlan wrote: On Fri, Sep 14, 2007 at 01:33:51AM -0700, Jeremy Cole wrote: MySQL's optimizer cannot reorder the joins because it has the potential to change the result of the query. Not in all cases. If a LEFT JOIN query also has a WHERE clause that disallows NULL rows from the right-hand table, it will (may? I'm not sure if it always will, but at least some times it will) treat it as an INNER JOIN, and it's therefore a candidate for reordering. I've also seen cases where a LEFT JOIN becomes a CROSS JOIN (http://bugs.mysql.com/bug.php?id=30842). All RIGHT JOIN are rewritten to equivalent LEFT JOIN, so the same optimizations can apply. As a side note, I've really benefited from attending Timour's talk on the optimizer at the conference this spring. His slides are good reading too: http://conferences.oreillynet.com/presentations/mysql07/katchaounov_timour.pdf Baron -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: excessive time spent in "statistics" status
On Fri, Sep 14, 2007 at 01:33:51AM -0700, Jeremy Cole wrote: > Hi Shawn, Lucio, > > >SELECT STRAIGHT_JOIN > >FROM ... > >LEFT JOIN ... > >WHERE ... > >... > > Just to correct a point here... if a query uses only LEFT JOIN or RIGHT > JOIN, the join order is fixed by the query's order itself, so using > STRAIGHT_JOIN should have no effect whatsoever. True, because you say "only" left or right joins. > Equally important, since the join order is fixed when you use LEFT > JOIN, you *must* list the joins in the correct order when writing > the query, otherwise you will see very poor performance. "Only" is missing from here, which could be misleading. MySQL will reorder the t0, t1 and t2 joins in: select ... fromt0 join t1 on ... join t2 on ... left join t3 on ... where ... > MySQL's optimizer cannot reorder the joins because it has the potential > to change the result of the query. Do have an example in mind? Thanks, --Pete -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: excessive time spent in "statistics" status
On Fri, 14 Sep 2007, Jeremy Cole wrote: > Just to correct a point here... if a query uses only LEFT JOIN or RIGHT JOIN, > the join order is fixed by the query's order itself, so using STRAIGHT_JOIN > should have no effect whatsoever. Equally important, since the join order is but experimentally using SELECT STRAIGHT_JOIN *does make* a difference. If I use it there is no time spent in the "statistics" phase, otherwise there is. Shawn can comment more conclusively since he knows the inner working of mysql. I just report a finding by experiment. > should have no effect whatsoever. Equally important, since the join order is > fixed when you use LEFT JOIN, you *must* list the joins in the correct order > when writing the query, otherwise you will see very poor performance. However the "correct order" is not always obvious to be determined a priori (like in my cases where I have a "glorified correlation" table G which shall go first, but all member tables which go next with equal rank (except the "first member" which is more equal than the other "=) ). The user can then build a query which uses only SOME of the members. And I've noticed that the result of EXPLAIN SELECT (the order) changed according to the content of the query (e.g. the particular WHERE condition). So for me experimentally use of SELECT STRAIGHT_JOIN is an effective solution. -- --- Lucio Chiappetti - INAF/IASF - via Bassini 15 - I-20133 Milano (Italy) For more info : http://www.iasf-milano.inaf.it/~lucio/personal.html --- () ascii ribbon campaign - against html mail /\ http://arc.pasp.de/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: excessive time spent in "statistics" status
Hi Shawn, Lucio, SELECT STRAIGHT_JOIN FROM ... LEFT JOIN ... WHERE ... ... Just to correct a point here... if a query uses only LEFT JOIN or RIGHT JOIN, the join order is fixed by the query's order itself, so using STRAIGHT_JOIN should have no effect whatsoever. Equally important, since the join order is fixed when you use LEFT JOIN, you *must* list the joins in the correct order when writing the query, otherwise you will see very poor performance. MySQL's optimizer cannot reorder the joins because it has the potential to change the result of the query. Regards, Jeremy -- high performance mysql consulting www.provenscaling.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: excessive time spent in "statistics" status
On Tue, 4 Sep 2007, Shawn Green wrote: > > On Tue, 4 Sep 2007, Lucio Chiappetti wrote: > > For each I tested 3 cases (total 16*3=48) : > > > > a) the query on the "virtual" table correlated with the external > > (the virtual is my G left join t1 ... left join tn). [...] > > represents my REFERENCE, this case will also exploit full (or default) optimization > > b) the query on the VIEW V with "members also" (which implies a > > redundant join of V with G left join t1 ... left join tn) and > > correlated with an external table. [...] I used here SELECT > > STRAIGHT_JOIN. > > c) the same query of (b) but with a normal select, and preceded by > >setting optimizer_search_depth=0 (auto). > In your b) test, did you use the SELECT STRAIGHT_JOIN as your outer > SELECT statement or within the CREATE VIEW statement? No. I left the CREATE VIEW alone (standard LEFT JOINs) in all cases, since that will be fully optimized, and used by a majority of users. SELECT STRAIGHT_JOIN was used only in (b) ["no optimization"] and not in (c) ["residual optimization"]. Our inclination would now be to use SELECT STRAIGHT_JOIN because it's the one involving only "elegant" changes to the code. Consider again that the case "view + member also + other table" will be rare (very few users) compare to "view alone". Anyhow all this was rather instructive. Now I'll start experimenting with UNIONs too ... -- --- Lucio Chiappetti - INAF/IASF - via Bassini 15 - I-20133 Milano (Italy) For more info : http://www.iasf-milano.inaf.it/~lucio/personal.html --- () ascii ribbon campaign - against html mail /\ http://arc.pasp.de/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: excessive time spent in "statistics" status
Lucio Chiappetti wrote: On Tue, 4 Sep 2007, Lucio Chiappetti wrote: I'll do some experimenting and report back. In lack of better ways of doing a tie-break, I've done the following tests (with the linemode client), checking both the results of a query and the total time spent. I tested 16 different combinations of arbitrary WHERE conditions and correlation with an arbitrary external tables, using my "maximal" G (the one with 26 members). For each I tested 3 cases (total 16*3=48) : a) the query on the "virtual" table correlated with the external (the virtual is my G left join t1 ... left join tn). This involves the lowest number of joins (29) and is of course the fastest (and also the easiest to get access to "hidden" columns by name, but is not what I intend our users use because the way to select columns is clumsy (they should use a VIEW on the 26 members instead ... normally with no correlation on external that will be equally fast), but represents my REFERENCE, i.e. I checked the results of the other test to be the same as this. b) the query on the VIEW V with "members also" (which implies a redundant join of V with G left join t1 ... left join tn) and correlated with an external table. Doubles the joins (59). I used here SELECT STRAIGHT_JOIN. c) the same query of (b) but with a normal select, and preceded by setting optimizer_search_depth=0 (auto). In your b) test, did you use the SELECT STRAIGHT_JOIN as your outer SELECT statement or within the CREATE VIEW statement>? If you only tried it one way, you could try it the other, too. [*] actually a larger minority (I hope it remains such) of our "public" users accesses the DB once to "take away" all data without performing any selection :-( It saddens me to see people abuse your hard work in this way. My inclination therefore would be to prefer optimizer_search_depth=0 to SELECT STRAIGHT_JOIN *if* it can be easily arranged in our tomcat environment, because of the marginal increase in speed despite the occasional worsening. But the two look almost equivalent. Another option would be to build your view dynamically based on the criteria that a user selects through a web-based interface. One other way to approach this project would be to normalize your relationships and have one skinny but very tall table similar to CREATE TABLE map_table ( object1_id int , object1_type int , object2_id int , object2_type int , confidence tinyint ) where confidence would be a whole number from 0 to 100. You would need only one of these to replace each G table you are generating now. However, that would be a major difference in how you currently use your data and I would not suggest this for a near-term solution. -- Shawn Green, Support Engineer MySQL Inc., USA, www.mysql.com Office: Blountville, TN __ ___ ___ __ / |/ /_ __/ __/ __ \/ / / /|_/ / // /\ \/ /_/ / /__ /_/ /_/\_, /___/\___\_\___/ <___/ Join the Quality Contribution Program Today! http://dev.mysql.com/qualitycontribution.html -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: excessive time spent in "statistics" status
On Tue, 4 Sep 2007, Lucio Chiappetti wrote: > I'll do some experimenting and report back. In lack of better ways of doing a tie-break, I've done the following tests (with the linemode client), checking both the results of a query and the total time spent. I tested 16 different combinations of arbitrary WHERE conditions and correlation with an arbitrary external tables, using my "maximal" G (the one with 26 members). For each I tested 3 cases (total 16*3=48) : a) the query on the "virtual" table correlated with the external (the virtual is my G left join t1 ... left join tn). This involves the lowest number of joins (29) and is of course the fastest (and also the easiest to get access to "hidden" columns by name, but is not what I intend our users use because the way to select columns is clumsy (they should use a VIEW on the 26 members instead ... normally with no correlation on external that will be equally fast), but represents my REFERENCE, i.e. I checked the results of the other test to be the same as this. b) the query on the VIEW V with "members also" (which implies a redundant join of V with G left join t1 ... left join tn) and correlated with an external table. Doubles the joins (59). I used here SELECT STRAIGHT_JOIN. c) the same query of (b) but with a normal select, and preceded by setting optimizer_search_depth=0 (auto). I did each test in a fresh mysql session to prevent cached valued to affect the timings. The good news are that all query go to end without getting stuck in the statistics state, and that (a),(b),(c) for the same query return the same results. Considering the timing instead : (a) are obviously the fastest (from 0.0 to 0.63 seconds at worst) (b) the queries with SELECT STRAIGHT_JOIN ("not optimized" ?) are *in general* the slowest but not too slow, i.e. complete within from 0.8 to 1.9 sec. There are however a couple of cases involving one external table in which they are slower (though not unfeasible), from 3 to 12 sec in one case, and from 24 to 45 sec in another. (c) the queries with optimizer_search_depth=0 have a speed comparable with (b). *In general* they are marginally faster than (b) (0.7 to 1.6 sec) when (b) is reasonably fast. When (b) is slow, however (c) is TWICE AS SLOWER (5 to 23 and 48 to 89 sec) The explain select does not give obvious clues while those particular table combinations are slower, and, considered that the queries involving "view + member also + external" concern an absolute minority of users [*] it is not worth spending more time investigating. [*] actually a larger minority (I hope it remains such) of our "public" users accesses the DB once to "take away" all data without performing any selection :-( My inclination therefore would be to prefer optimizer_search_depth=0 to SELECT STRAIGHT_JOIN *if* it can be easily arranged in our tomcat environment, because of the marginal increase in speed despite the occasional worsening. But the two look almost equivalent. -- --- Lucio Chiappetti - INAF/IASF - via Bassini 15 - I-20133 Milano (Italy) For more info : http://www.iasf-milano.inaf.it/~lucio/personal.html --- () ascii ribbon campaign - against html mail /\ http://arc.pasp.de/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: excessive time spent in "statistics" status
On Mon, 3 Sep 2007, Shawn Green wrote: > What I hope to do is to help you to make you queries work better by > applying hints, modifiers, and limits to the optimizer. Many thanks, Shawn. > "STRAIGHT_JOIN is identical to JOIN, except ... > > You asked what would happen if you replaced all of your view's LEFT JOIN > clauses with STRAIGHT_JOIN clauses. [...] This is not what you are > trying to achieve. I realized that. I overlooked "is identical to JOIN" since I generally use only LEFT JOINs to benefit of null returns (to us it is equally important to know "source A has these 1/2/n counterparts" and "source A has no counterparts" in the other catalogue/s). I originally tested that STRAIGHT_JOIN made the "explain select" arrive to an end without looping forever, but as soon as I tried a real select I realized the different behaviour. > However, there is another place you can put the STRAIGHT_JOIN modifier: > in the SELECT clause I discovered that too, and it looks promising. I plan to test whether that makes any difference on the query result and execution time in some representative cases. > Why are some sets of tables optimized more quickly than the others? My > best guess would be that some mix of WHERE conditions and table indexes > make it much faster to eliminate combinatorial permutations of JOIN > sequences than others. I also realized that the WHERE part enters the optimization too ... this is of course unpredictable a priori, as it depends on the particular user needs. > This is the variable optimizer_search_depth: This also looks interesting thanks. I did some quick experimenting. For some reasons a SHOW VARIABLE tells me its normal value is 62. Apparently this does not depend on the database I select (I thought my real "data" database with the views and G's could have an higher value than e.g. my own "administrative" DB or the mysql DB, but it is always 62. I tried lowering it to 10/15/20 and even to 0 (which according to 5.2.3 in the manual should be a sort of "automatic" value). In all cases the explain select on our maximal statement (the one which loops forever in statistics status under default conditions) reaches an end rather fast, and returns the same result (irrespective of 0/10/15/20). The ORDER returned by playign with optimizer_search_depth is however different from the one returned with SELECT STRAIGHT_JOIN. The difference is just in the order (the type, key and "extra" looks the same, in general "ref" or "eq_ref" and "using index") so I suppose it should not make much difference. > You can set this variable on a per-session basis so it would be possible > to tune it for each call to your views. While I'm doing the tests under the mysql linemode client, our production environment is a Java jsp interface under tomcat. I have to check with my colleague dealing with java programming what is a "session" for him (probably a JDBC connection in connection pool). Actually what looks appealing in optimizer_search_depth vs SELECT STRAIGHT_JOIN is the possibility of fixing (no tuning per session) optimizer_search_depth to a decent default (0 ?) once forever, because this requires no changes to the code (however if SELECT STRAIGHT_JOIN will require changes concentrated in a few places). I'll do some experimenting and report back. -- --- Lucio Chiappetti - INAF/IASF - via Bassini 15 - I-20133 Milano (Italy) For more info : http://www.iasf-milano.inaf.it/~lucio/personal.html --- () ascii ribbon campaign - against html mail /\ http://arc.pasp.de/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: excessive time spent in "statistics" status
Hello Lucio, Thank you for the excellent description of you problem. I believe I completely understand both the data you are handling and the problems you are facing. I would not ask you to change your schema at this point. There is far too much work put into it at this phase to suggest a redesign. What I hope to do is to help you to make you queries work better by applying hints, modifiers, and limits to the optimizer. http://dev.mysql.com/doc/refman/5.0/en/join.html states: "STRAIGHT_JOIN is identical to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer puts the tables in the wrong order." You asked what would happen if you replaced all of your view's LEFT JOIN clauses with STRAIGHT_JOIN clauses. When used in the FROM clause (as part of the definitions of where the data comes from) STRAIGHT_JOIN would be equivalent to a JOIN which is equivalent to an INNER JOIN. This is not what you are trying to achieve. However, there is another place you can put the STRAIGHT_JOIN modifier: in the SELECT clause (http://dev.mysql.com/doc/refman/5.0/en/select.html) . Quoting again: " STRAIGHT_JOIN forces the optimizer to join the tables in the order in which they are listed in the FROM clause. You can use this to speed up a query if the optimizer joins the tables in non-optimal order. See Section 6.2.1, “Optimizing Queries with EXPLAIN”. STRAIGHT_JOIN also can be used in the table_references list. See Section 12.2.7.1, “JOIN Syntax”. SELECT STRAIGHT_JOIN FROM ... LEFT JOIN ... WHERE ... ... This will allow you to keep your LEFT JOINs in the portion of your query (everything between FROM and WHERE) but avoid all of the permutations the optimizer performse related to trying to analyze which table to join first to which other table. Why are some sets of tables optimized more quickly than the others? My best guess would be that some mix of WHERE conditions and table indexes make it much faster to eliminate combinatorial permutations of JOIN sequences than others. There is another variable you could use to minimize how many table permutations the optimizer will examine. This is the variable optimizer_search_depth: (http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#option_mysqld_optimizer_search_depth) (My apologies if that link wraps incorrectly). You can set this variable on a per-session basis so it would be possible to tune it for each call to your views. I know that would be quite the administrative hassle to implement setting this variable for each call to your views but I provide it here for completeness. Please try out "SELECT STRAIGHT_JOIN" and let me know if your situation improves. BTW - we encourage everyone to reply to the full list on all responses (unless they are confidential) so that all members can gain from the knowledge transfer. -- Shawn Green, Support Engineer MySQL Inc., USA, www.mysql.com Office: Blountville, TN __ ___ ___ __ / |/ /_ __/ __/ __ \/ / / /|_/ / // /\ \/ /_/ / /__ /_/ /_/\_, /___/\___\_\___/ <___/ Join the Quality Contribution Program Today! http://dev.mysql.com/qualitycontribution.html -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: excessive time spent in "statistics" status
As I'm new on this list and can't find a "policy statement", I assume it is OK to reply to the list in "discussion list" fashion. If instead it is preferred to reply to each sender privately and later I'd post a summary, please let me know and I'll comply. On Sat, 1 Sep 2007, Shawn Green wrote: > Lucio Chiappetti wrote: > > I have some queries, involving a largish number of JOIN, which are > > [...] very slow or [...] remain in the "statistics" status [forever] > > This involved creating a working table G > > [...] > > The simultaneous access was achieved in our interface by a mechanism > > we called virtual tables, which essentially was > > > > SELECT > > some subset of columns in some of the t0...tn > > or some expression thereof > > FROM > > G left join t0 on G.t0=t0.seq > > left join t1 on G.t1=t1.seq > > ... > > left join tn on G.tn=tn.seq > > > > We refer to the t0...tn as "the member tables" of G. > I normally do not get lost in symbolic descriptions such as yours. > However your description of G and how you build it leaves me in the > dark. > > Can you show me a few sample rows of G (symbolically, if you like) and > describe what is in each column G.t0 to G.tn? What I am curious to know > is what do each of these n object have in common that allows them to > represented as a single tuple on the G table. The explanation was already (too tersely ?) contained in my sentence > > This table has columns named t0 t1 ... tn (containing the pointers > > t0.seq t1.seq ... for counterparts associated and validated according > > to some criteria) plus other service columns Let me make an example. t0 is a table of X-ray sources, it contains sky coordinates (ra,dec), count rates and fluxes in 5 energy bands, detection probabilities etc. In addition it contains a column named "seq" which is just an auto_increment sequential number, and is the unique way to identify a given source (there is a further complication due to the fact some records are redundant detections of the same source, but since they are not used in building the G's you'd better ignore this). t1 might be e.g. a table of optical sources, with sky coordinates, magnitudes, and, of course, its own "seq". There may be different tables of optical sources (t1, t2, t3). Or t4 can be a table of radio sources, with sky coordinates, fluxes, and its "seq". Or t5 can be a table of URLs into external astronomical sites like SIMBAD or NED (if you know them), again with sky coordinates and a "seq". Some of all these tables have their own identifiers, but sometimes these aren't unique, or aren't numeric. So in general these tables MAY have an UNIQUE PRIMARY index which can be either the original identifier, or some combination of original columns (for instance a source identifier and a field identifier) and an unique auxiliary key which is my auto_increment seq (numeric, built at data ingestion). This again is mostly irrelevant to you. The point is that each table has an unique auto_increment seq. A "G" table will simply contain the "seq's" in the "member tables". For ease of use the column names in G will be the table names of the member tables. I indicated them as t0 t1 t2 ... that their actual names are e.g. "nov06", "d1t3", "ukidss", "radio", "simbad" is irrelevant. So a record in G may contain for instance : - its own seq 253719 (do not be worried by the fact the number is large there are lots of gaps for records removed during construction) - the seq in the X-ray table (t0) : 1521 - the seq in an optical table (t1) : 1229 - the seq in another optical table (t2) : 42168 - the seq in the radio table (t3) : null - the seq in an IR table (t4) : 9 Another record with seq 260429 can have the same t0=1521, but e.g. the seq in the IR table t4=11, and all other t1 t2 t3 null. Etc. etc. Essentially G says that X-ray source 1521 can have up to 2 (or 1 or 7 or whatever) potential counterparts, one is optical t2=1229 which is the same as optical t3=42168 and the same as IR t4=9 ; the other is only IR t4=11, etc. etc. All associations are pre-computed via some sort of other (proximity) analysis. > You also mention other "service columns". What kinds of information are > you keeping in those? Information which is irrelevant to the present discussion, except for a marginal point (see below). Like for instance a numeric rank which says that the association 1521/1229/42168/null/9 is preferred, and the association 1521/null/null/null/11 is unlikely, or to be rejected. Or flags produced during the identification. Or the chance probabilities that the association of a source in t0 and t1 or t2 is real considered the distance and the density of objects having a given magnitude. > > We have different versions of G corresponding to different sets > > of member tables and different association criteria. > "Each G" ? Again, that makes the concept of what a G really is more > confusing to me. I under
Re: excessive time spent in "statistics" status
Hello Lucio, (reply below) Lucio Chiappetti wrote: I have some queries, involving a largish number of JOIN, which are apparently very slow or even take forever (a mysqladmin processlist shows them remain in the "statistics" status for a long time, in most cases I have to kill them after several minutes). When I first had the problem I googled around and found some reference (which I've lost) saying that the "statistics" status is actually what one does with an EXPLAIN SELECT, and that this is done preliminarily to the actual query. It also said it might occur with a large number of joins because this analysis, for n joins MIGHT try up to n! combinations, unless one somehow specified the priorities (but the author did not remember how). You can find those explanations in our manual at http://dev.mysql.com/doc/refman/5.0/en/thread-information.html I thought to have overcome the problem using a feature of the CREATE VIEW command (see below), but apparently I simply moved it to an higher n. Now I tried to see how it scales with the number of joins, and the curious things is that e.g. for n=9 it works fast, for n=15 it works slowly, for n=18 works fast again and for n=20 takes an infinite time. I'll first explain my background : - I have a number of tables (let's call them t0, t1, t2 ...) - all of them have an auto_increment column called seq which is also an index - one table (t0) is "more important" (actually it is list of celestial X-ray sources while the other are celestial objects in other wavebands but this is irrelevant to you). - I have precomputed correlation tables among t0 and each of the other. These tables are called eg. t0ti, have two columns t0 and ti corresponding to the t0.seq and ti.seq of objects which are "associated". They are indexed on (t0,ti). Note that an object in t0 can be associated with 1 or more or zero (t0ti.ti null) objects in ti. - I originally (already under mysql 3) devised a way to identify counterparts in MORE tables (all these associations are based on spherical distance + other criteria). This involved creating a working table G This table has columns named t0 t1 ... tn (containing the pointers t0.seq t1.seq ... for counterparts associated and validated according to some criteria) plus other service columns The simultaneous access was achieved in our interface by a mechanism we called virtual tables, which essentially was SELECT some subset of columns in some of the t0...tn or some expression thereof FROM G left join t0 on G.t0=t0.seq left join t1 on G.t1=t1.seq ... left join tn on G.tn=tn.seq We refer to the t0...tn as "the member tables" of G. I normally do not get lost in symbolic descriptions such as yours. However your description of G and how you build it leaves me in the dark. Can you show me a few sample rows of G (symbolically, if you like) and describe what is in each column G.t0 to G.tn? What I am curious to know is what do each of these n object have in common that allows them to represented as a single tuple on the G table. You also mention other "service columns". What kinds of information are you keeping in those? We have different versions of G corresponding to different sets of member tables and different association criteria. The largest of our cases has 26 different members. - for each G we define a view as "Each G" ? Again, that makes the concept of what a G really is more confusing to me. I understand databases and I know more than a little about stellar cartography, cosmology, and physics. Please don't hold back. Questions : - what does "explain select" actually do and why sometimes hangs ? - can this be overcome rearranging the order of the joins (note that the bulk of the members are all joined with G), or introducing parentheses or with other syntax changes ? - or has it to do with some configuration parameter, maybe related to what is cached, cache size or other ? Thanks in advance to whoever is able to give hints. EXPLAIN SELECT simply stops a normal SELECT statement from actually performing the data retrieval steps and shows us (the users) a description of the techniques the query engine was about to use to get at the data. That means that the steps of parsing the query, tokenizing the symbols, and optimizing the execution plan still take place. It is during this optimization phase that most of your CPU time is being used as the engine will work many permutations of joining one table to another until it reaches a decision about which plan is "less expensive" that all of the others. http://dev.mysql.com/doc/refman/5.0/en/where-optimizations.html http://dev.mysql.com/doc/refman/5.0/en/left-join-optimization.html and the rest of the optimization chapter describe this process in considerable details http://dev.mysql.com/doc/