self-joins in hierarchical queries: optimization problem

Olga Lyashevska Thu, 29 Oct 2009 08:30:35 -0700

Dear all,

I have a table which contains taxonomic data (species, genera, family,order, class) and it is organized as adjacency list model.


mysql> select* from taxonomic_units1 limit 5;
+-----+-------------------------------------------+---------+
 | tsn | name                         | parent_tsn | rank_id |
+-----+--------------------------+--------------+---------+
|  50 | Bacteria                         |            0 |           10 |
|  51 | Schizomycetes              202421 |           60 |
|  52 | Archangiaceae             |         51 |        140 |
|  53 | Pseudomonadale         |         51 |       100 |
|  54 | Rhodobacteriineae      |         53 |       110 |
+-----+-----------------------------+------------+---------+

I am trying to flatten it, so that it can be used in further analysis(e.g. in R)I have been trying to run the following query, and it does what I wantit to do, but it takes really long time to get it done.As a matter of fact I was not patient enough to get the whole outputand instead set LIMIT 10.

SELECT O1.name AS tclass, O2.name AS torder, O4.name AS tfamily,O5.name AS tgenus, O6.name AS tspecies

FROM taxonomic_units1 AS O1
LEFT OUTER JOIN
taxonomic_units1 AS O2
ON O1.tsn = O2.parent_tsn
LEFT OUTER JOIN
taxonomic_units1 AS O3
ON O2.tsn = O3.parent_tsn
LEFT OUTER JOIN
taxonomic_units1 AS O4
ON O3.tsn = O4.parent_tsn
LEFT OUTER JOIN
taxonomic_units1 AS O5
ON O4.tsn = O5.parent_tsn
LEFT OUTER JOIN
taxonomic_units1 AS O6
ON O5.tsn = O6.parent_tsn
LIMIT 10;

+---------------+-----------------+------------------+--------------+-----------------------------------------------------------+| tclass | torder |tfamily | tgenus |tspecies |+---------------+-----------------+------------------+--------------+-----------------------------------------------------------+| Bacteria | NULL |NULL | NULL |NULL || Schizomycetes | Archangiaceae | NULL |NULL | NULL || Schizomycetes | Pseudomonadales | NULL |NULL | NULL || Schizomycetes | Pseudomonadales | Nitrobacteraceae | Nitrobacter| Nitrobacteragilis || Schizomycetes | Pseudomonadales | Nitrobacteraceae | Nitrobacter| Nitrobacterflavus || Schizomycetes | Pseudomonadales | Nitrobacteraceae | Nitrobacter| Nitrobacteroligotrophis || Schizomycetes | Pseudomonadales | Nitrobacteraceae | Nitrobacter| Nitrobacterpolytrophus || Schizomycetes | Pseudomonadales | Nitrobacteraceae | Nitrobacter| Nitrobacterpunctata |

I have checked this query with EXPLAIN, and it is not using anyindices, even though column tsn is set as index in original table.

+----+-------------+-------+------+---------------+------+---------+------+--------+-------+| id | select_type | table | type | possible_keys | key | key_len |ref | rows | Extra |+----+-------------+-------+------+---------------+------+---------+------+--------+-------+| 1 | SIMPLE | O1 | ALL | NULL | NULL | NULL |NULL | 483305 | || 1 | SIMPLE | O2 | ALL | NULL | NULL | NULL |NULL | 483305 | || 1 | SIMPLE | O3 | ALL | NULL | NULL | NULL |NULL | 483305 | || 1 | SIMPLE | O4 | ALL | NULL | NULL | NULL |NULL | 483305 | || 1 | SIMPLE | O5 | ALL | NULL | NULL | NULL |NULL | 483305 | || 1 | SIMPLE | O6 | ALL | NULL | NULL | NULL |NULL | 483305 | |+----+-------------+-------+------+---------------+------+---------+------+--------+-------+

6 rows in set (0.00 sec)

What is wrong with this query? Or is it a problem of all adjacencylist models?

Is there a way to get columns indexed using self-joins?

Thanks,
Olga

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[email protected]

self-joins in hierarchical queries: optimization problem

Reply via email to