This message came from the CF Trac system. Do not reply. Instead, enter your
comments in the CF Trac system at http://kitt.llnl.gov/trac/.
#99: Taxon Names and Identifiers
-----------------------------+------------------------------
Reporter: lowry | Owner: cf-conventions@…
Type: enhancement | Status: new
Priority: high | Milestone:
Component: cf-conventions | Version:
Resolution: | Keywords:
-----------------------------+------------------------------
\
\
\
\
\
\
Comment (by graybeal):
I see this ticket, on Taxon Names and Identifiers, has not been addressed
since the original discussion over a year ago.
I think it is most important that the ticket move forward. Though Roy's
team may have moved on, this problem will need to be addressed in CF
sooner or later. While only Roy, Jonathan, and I have discussed it, I
suspect many CF lurkers have need for this capability.
The following issues seem acceptably resolved:
- promoting 6.1.1 on "Geographic regions" to 6.3 (i.e. remove it from
6.1), and adding Roy's as 6.4. Then 6.1 and 6.2 will describe mechanisms
in CF, and 6.3 and 6.4 applications of these mechanisms.
- Initial text rewording by Jonathan: "A taxon is a named level within a
biological classification, such as a class, genus and species. Quantities
dependent on taxa have generic standard_names containing the word taxon,
and the taxa are identified by auxiliary coordinate variables."
- Requiring name and identifier is reasonable (to make the description
self-contained).
The following questions are open:
- How many identifier/sources if multiple are available? Roy suggested 1,
Jonathan recommends 2, John suggests user's choice.
- How many sources? Roy suggested 2 (extensible), John says CF should not
limit (and if it does, the 2 suggested are not the best 2).
- What kind of identifier? Roy suggested namespace + ':' + local text ID;
Jonathan proposed (agreeable to Roy) separate int variables for WORMS
aphia ID vs ITIS taxon species name; and John prefers globally unique
identifiers, LSIDs being the common practice (not offered directly by
ITIS, only indirectly through Catalog of Life). In Jonathan's scheme each
ID type would have a separate int variable, dimensioned to the number of
taxa being defined.
(Incidentally, http://www.jbiomedsem.com/content/2/1/7 provides a detailed
analysis of the Catalog of Life identifier approach, which integrates the
data from ITIS, WORMS, and Species 2000, among many others, and includes
thoughts of why the CoL approach wasn't more widely adopted (at that time
anyway). Another extended discussion at
http://soyouthinkyoucandigitize.wordpress.com/2013/01/28/what-gets-linked-
to-global-unique-identifiers-guids-in-natural-history-collection-
digitization/. The point is that while going round and round is definitely
possible, I want to cleanly account for more than what a specific part of
the CF community does today, if we can.)
Looking for a common path, the following seems pretty close:
- Support multiple identifier sources; specifying those to be provided _if
available_
- if it isn't available in ITIS or WORMS, it should still be citable
- if the user always uses WORMS, we should not force them to translate
to ITIS, and vice versa
- While I happen to think Catalog of Life is more suitable than ITIS,
I'll forego the argument as long as we aren't exclusive
- Use Jonathan's proposed approach for WORMS and ITIS, but allow the
extension for others (e.g., CoL) for other globally unique identifiers;
with any globally unique identifier to be given the standard name
taxon_global_identifier, and can be text (which most will be) or int (for
UUIDs, for example)
- The comparability of identifiers A to B to C etc. will inevitably be
done at a domain-specific application level, well beyond the concern of CF
(but readily achievable by domain experts)
- It won't be necessary to define unique identifier types for each
source, since globally unique identifiers are by their nature
distinguishable and uniquely relatable to their source
- If we accept this adjustment, we don't have to argue on the merits
whether Catalog of Life is better than ITIS (not so much because of LSIDs,
but because it includes many more sources than just ITIS).
So this might give us the following example:
{{{
variables:
int aphiaID(taxa);
aphiaID:_FillValue=-1;
aphiaID:standard_name="taxon_identifier";
int tsn(taxa);
tsn:_FillValue=0;
tsn:standard_name="taxonomic_serial_number";
string col(taxa);
col:_FillValue="null";
col:standard_name="taxon_global_identifier";
col:comment="LSID from Catalog of Life";
data:
taxon_name="Homo sapiens", "Fraxinus excelsior", "Struthio camelus";
aphiaID=1,32768,-1;
tsn=42,0,7776;
col="urn:lsid:catalogueoflife.org:taxon:f33e0fe1-ac8e-
11e3-805d-020044200006:col20140401","urn:lsid:catalogueoflife.org:taxon
:0ad7462a-ac8f-
11e3-805d-020044200006:col20140401","urn:lsid:catalogueoflife.org:taxon:ebff2886
-ac8e-11e3-805d-020044200006:col20140401";
}}}
\
\
\
--
Ticket URL: <http://kitt.llnl.gov/trac/ticket/99#comment:9>
CF Metadata <http://cf-convention.github.io/>
CF Metadata
This message came from the CF Trac system. To unsubscribe, without
unsubscribing to the regular cf-metadata list, send a message to
"[email protected]" with "unsubscribe cf-metadata" in the body of your
message.