On May 4, 2006, at 4:47 AM, Matthias Samwald wrote:
One of them is mussels. Q: what should this resolve to?
It seems that this is a VERY common problem when one has to map
existing taxonomies and classification schemes. Many of them do not
have a concise class-subclass structure. Two examples I had to deal
with already are Entrez Taxonomy and the IUPHAR naming scheme of
receptors. The problem is that in such complex taxonomies there are
many different levels of granularity (phylum, class, genus, species
etc.), but most taxonomies/naming schemes do not cover all of these
different levels of granularity. For instance, from the IUPHAR
naming scheme we can derive a code for a serotonin receptor
('2.1:5HT:') and codes for serotonin receptors 1A, 2A, 2B etcc
('2.1:5HT:1:5HT1A:' etc.), but there is no code for serotonin
receptor 2, or serotonin receptor 4 (including all of its
subtypes). This can be very annoying in practice.
Another problem is that many databases use more general names (e.g.
'frog') when they are in practice referring to a more specific
thing (e.g. 'Xenopus sp.').
kind regards,
Matthias Samwald
PS: I would guess that 'bivalvia' would be the common taxonomic
term that is semantically closest in meaning to 'mussels'.
We thought about this quite a bit when we developed the controlled
vocabulary for the Antibody database. The curator was dependent on
the information supplied by the manufacturer, so if the manufacturer
indicated "mussel" without specifying the species, that's what we
used. The only way to find out the exact species would be to contact
the manufacturer and try to track down the original source. This
seems impractical, to say the least.
Our scheme is based on what is available in the data. If we had
picked common term (e.g. bivalvia), we would have lost more specific
information. On the other hand, if the manufacturer provides a
nonspecific term (e.g. frog), we can't assume that it is xenopus
without confirmation. In the end, we figured that the individual
users would choose the taxonomic level that is important in their
specific application.
June