Hi Olga,
On Thu, Oct 29, 2009 at 03:29:58PM +0000, Olga Lyashevska wrote:
> I have a table which contains taxonomic data (species, genera, family,
> order, class) and it is organized as adjacency list model.
>
> mysql> select* from taxonomic_units1 limit 5;
> +-----+-------------------------------------------+---------+
> | tsn | name | parent_tsn | rank_id |
> +-----+--------------------------+--------------+---------+
> | 50 | Bacteria | 0 | 10 |
> | 51 | Schizomycetes 202421 | 60 |
> | 52 | Archangiaceae | 51 | 140 |
> | 53 | Pseudomonadale | 51 | 100 |
> | 54 | Rhodobacteriineae | 53 | 110 |
> +-----+-----------------------------+------------+---------+
>
> I am trying to flatten it, so that it can be used in further analysis
> (e.g. in R)
> I have been trying to run the following query, and it does what I want
> it to do, but it takes really long time to get it done.
> As a matter of fact I was not patient enough to get the whole output and
> instead set LIMIT 10.
>
> SELECT O1.name AS tclass, O2.name AS torder, O4.name AS tfamily, O5.name
> AS tgenus, O6.name AS tspecies
> FROM taxonomic_units1 AS O1
> LEFT OUTER JOIN
> taxonomic_units1 AS O2
> ON O1.tsn = O2.parent_tsn
> LEFT OUTER JOIN
> taxonomic_units1 AS O3
> ON O2.tsn = O3.parent_tsn
> LEFT OUTER JOIN
> taxonomic_units1 AS O4
> ON O3.tsn = O4.parent_tsn
> LEFT OUTER JOIN
> taxonomic_units1 AS O5
> ON O4.tsn = O5.parent_tsn
> LEFT OUTER JOIN
> taxonomic_units1 AS O6
> ON O5.tsn = O6.parent_tsn
> LIMIT 10;
>
> +---------------+-----------------+------------------+--------------
> +-----------------------------------------------------------+
> | tclass | torder | tfamily
> | tgenus | tspecies
> |
> +---------------+-----------------+------------------+--------------
> +-----------------------------------------------------------+
> | Bacteria | NULL | NULL
> | NULL | NULL |
> | Schizomycetes | Archangiaceae | NULL |
> NULL | NULL |
> | Schizomycetes | Pseudomonadales | NULL | NULL
> | NULL |
> | Schizomycetes | Pseudomonadales | Nitrobacteraceae | Nitrobacter |
> Nitrobacteragilis |
> | Schizomycetes | Pseudomonadales | Nitrobacteraceae | Nitrobacter |
> Nitrobacterflavus |
> | Schizomycetes | Pseudomonadales | Nitrobacteraceae | Nitrobacter |
> Nitrobacteroligotrophis |
> | Schizomycetes | Pseudomonadales | Nitrobacteraceae | Nitrobacter |
> Nitrobacterpolytrophus |
> | Schizomycetes | Pseudomonadales | Nitrobacteraceae | Nitrobacter |
> Nitrobacterpunctata |
>
>
> I have checked this query with EXPLAIN, and it is not using any indices,
> even though column tsn is set as index in original table.
>
> +----+-------------+-------+------+---------------+------+---------
> +------+--------+-------+
> | id | select_type | table | type | possible_keys | key | key_len | ref
> | rows | Extra |
> +----+-------------+-------+------+---------------+------+---------
> +------+--------+-------+
> | 1 | SIMPLE | O1 | ALL | NULL | NULL | NULL |
> NULL | 483305 | |
> | 1 | SIMPLE | O2 | ALL | NULL | NULL | NULL |
> NULL | 483305 | |
> | 1 | SIMPLE | O3 | ALL | NULL | NULL | NULL |
> NULL | 483305 | |
> | 1 | SIMPLE | O4 | ALL | NULL | NULL | NULL |
> NULL | 483305 | |
> | 1 | SIMPLE | O5 | ALL | NULL | NULL | NULL |
> NULL | 483305 | |
> | 1 | SIMPLE | O6 | ALL | NULL | NULL | NULL |
> NULL | 483305 | |
> +----+-------------+-------+------+---------------+------+---------
> +------+--------+-------+
> 6 rows in set (0.00 sec)
>
>
> What is wrong with this query? Or is it a problem of all adjacency list
> models?
> Is there a way to get columns indexed using self-joins?
For an outer join like
... taxonomic_units1 AS O1
LEFT OUTER JOIN taxonomic_units1 AS O2
ON O1.tsn = O2.parent_tsn
current optimizer has only one option(*): use Nested-Loops Join algorthm, with
the outer table being the first one. That is, it will execute these loops:
for each record from O1
for each record in O2 such that O1.tsn = O2.parent_tsn
...
this makes it clear that index on O1.tsn will not be useful. You need indexes
on parent_tsn column.
(*) certain kinds of WHERE clause may open other opportinites.
BR
Sergey
--
Sergey Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[email protected]