Hi Olga,

On Thu, Oct 29, 2009 at 03:29:58PM +0000, Olga Lyashevska wrote:
> I have a table which contains taxonomic data (species, genera, family,  
> order, class) and it is organized as adjacency list model.
>
> mysql> select* from taxonomic_units1 limit 5;
> +-----+-------------------------------------------+---------+
>  | tsn | name                         | parent_tsn | rank_id |
> +-----+--------------------------+--------------+---------+
> |  50 | Bacteria                         |            0 |           10 |
> |  51 | Schizomycetes              202421 |           60 |
> |  52 | Archangiaceae             |         51 |        140 |
> |  53 | Pseudomonadale         |         51 |       100 |
> |  54 | Rhodobacteriineae      |         53 |       110 |
> +-----+-----------------------------+------------+---------+
>
> I am trying to flatten it, so that it can be used in further analysis  
> (e.g. in R)
> I have been trying to run the following query, and it does what I want  
> it to do, but it takes really long time to get it done.
> As a matter of fact I was not patient enough to get the whole output and 
> instead set LIMIT 10.
>
> SELECT O1.name AS tclass, O2.name AS torder, O4.name AS tfamily, O5.name 
> AS tgenus, O6.name AS tspecies
> FROM taxonomic_units1 AS O1
> LEFT OUTER JOIN
> taxonomic_units1 AS O2
> ON O1.tsn = O2.parent_tsn
> LEFT OUTER JOIN
> taxonomic_units1 AS O3
> ON O2.tsn = O3.parent_tsn
> LEFT OUTER JOIN
> taxonomic_units1 AS O4
> ON O3.tsn = O4.parent_tsn
> LEFT OUTER JOIN
> taxonomic_units1 AS O5
> ON O4.tsn = O5.parent_tsn
> LEFT OUTER JOIN
> taxonomic_units1 AS O6
> ON O5.tsn = O6.parent_tsn
> LIMIT 10;
>
> +---------------+-----------------+------------------+-------------- 
> +-----------------------------------------------------------+
> | tclass                    | torder                       | tfamily      
>                | tgenus          | tspecies                               
>  |
> +---------------+-----------------+------------------+-------------- 
> +-----------------------------------------------------------+
> | Bacteria              |  NULL                       | NULL              
>         | NULL            | NULL                                     |
> | Schizomycetes | Archangiaceae        | NULL                      |  
> NULL            | NULL                                     |
> | Schizomycetes | Pseudomonadales | NULL                      | NULL      
>        | NULL                                    |
> | Schizomycetes | Pseudomonadales | Nitrobacteraceae | Nitrobacter    | 
> Nitrobacteragilis                 |
> | Schizomycetes | Pseudomonadales | Nitrobacteraceae | Nitrobacter    | 
> Nitrobacterflavus                |
> | Schizomycetes | Pseudomonadales | Nitrobacteraceae | Nitrobacter    | 
> Nitrobacteroligotrophis     |
> | Schizomycetes | Pseudomonadales | Nitrobacteraceae | Nitrobacter    | 
> Nitrobacterpolytrophus     |
> | Schizomycetes | Pseudomonadales | Nitrobacteraceae | Nitrobacter    | 
> Nitrobacterpunctata           |
>
>
> I have checked this query with EXPLAIN, and it is not using any indices, 
> even though  column tsn is set as index in original table.
>
> +----+-------------+-------+------+---------------+------+--------- 
> +------+--------+-------+
> | id | select_type | table | type | possible_keys | key  | key_len | ref  
> | rows   | Extra |
> +----+-------------+-------+------+---------------+------+--------- 
> +------+--------+-------+
> |  1 | SIMPLE      | O1    | ALL  | NULL          | NULL | NULL    |  
> NULL | 483305 |       |
> |  1 | SIMPLE      | O2    | ALL  | NULL          | NULL | NULL    |  
> NULL | 483305 |       |
> |  1 | SIMPLE      | O3    | ALL  | NULL          | NULL | NULL    |  
> NULL | 483305 |       |
> |  1 | SIMPLE      | O4    | ALL  | NULL          | NULL | NULL    |  
> NULL | 483305 |       |
> |  1 | SIMPLE      | O5    | ALL  | NULL          | NULL | NULL    |  
> NULL | 483305 |       |
> |  1 | SIMPLE      | O6    | ALL  | NULL          | NULL | NULL    |  
> NULL | 483305 |       |
> +----+-------------+-------+------+---------------+------+--------- 
> +------+--------+-------+
> 6 rows in set (0.00 sec)
>
>
> What is wrong with this query? Or is it a problem of all adjacency list 
> models?
> Is there a way to get columns indexed using self-joins?

For an outer join like

 ... taxonomic_units1 AS O1 
 LEFT OUTER JOIN taxonomic_units1 AS O2
 ON O1.tsn = O2.parent_tsn

current optimizer has only one option(*): use Nested-Loops Join algorthm, with
the outer table being the first one. That is, it will execute these loops:

  for each record from O1 
    for each record in O2 such that O1.tsn = O2.parent_tsn
      ...

this makes it clear that index on O1.tsn will not be useful. You need indexes
on parent_tsn column.


(*) certain kinds of WHERE clause may open other opportinites. 
BR
 Sergey
-- 
Sergey Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=arch...@jab.org

Reply via email to