As far as I know there is no standard URI for a resource at NCBI. I
would like to propose that there be one, since we will all need them
to use when we refer to these resources in our RDF. (and I need one
*now*)
Following other styles I've seen, I propose the following:
1. http://www.ncbi.nlm.nih.gov/2006/entrez/<DATABASE_GOES_HERE>/
<IDENTIFIER_GOES_HERE>
or
2. http://www.ncbi.nlm.nih.gov/2006/entrez/
<DATABASE_GOES_HERE>#<IDENTIFIER_GOES_HERE>
The list of valid databases can be viewed, e.g. in the popup at
http://www.ncbi.nih.gov/Database/datamodel/index.html
I propose that they all be made lower case for use in the following URIs
e.g.
1: http://www.ncbi.nlm.nih.gov/2006/entrez/gene/596
2: http://www.ncbi.nlm.nih.gov/2006/entrez/gene#596
1: http://www.ncbi.nlm.nih.gov/2006/entrez/protein/NP_000624
2: http://www.ncbi.nlm.nih.gov/2006/entrez/protein#NP_000624
I am partial to #1, because from a document service point of view it
doesn't have the implication that there is a big document full of
e.g. genes, and you should find the one you are looking at a specific
place in that document.
Some suggestions, meant to avoid various excuses why we might not
make a decision about this promptly:
1. Initial proposal is that we don't have to choose from the
different identifiers, i.e.
http://www.ncbi.nlm.nih.gov/2006/entrez/protein/NP_000624
and
http://www.ncbi.nlm.nih.gov/2006/entrez/protein/72198189
Rational: can use owl:sameAs to make them the same if we need to. We
can suggest a best practice if we want to preferentially use one
numbering system versus another. (I like the alphanumeric ones, myself)
2. Initial proposal is that we don't include version information in
these identifiers
Rational: We can later decide to also have those, and then add
relations to connect the versions to the abstract, unversioned URIs.
I will claim that for most of the work we are doing in this WG, the
versions don't matter.
3. This proposal is not meant to oppose using LSIDs. However, I will
note that there doesn't seem to be a working combination of a)
specification of what these look like for NCBI, and b) a working
resolver for the few examples I've seen[*]. Thus implementing LSIDs
will require work = delay. However there is no reason that when an
LSID solution comes on line that it can't be compatible with the
choice we use now, by including a mention of it in the metadata, and
vice versa when documents start to be served from these addresses.
4. Just because no web page is available at these URL's currently,
doesn't mean we shouldn't use them. There is a pressing need for
stable identifiers, and I would argue that while having a document at
the URL is polite, not having one shouldn't block us have an
identifier solution. However an easy thing to do would be to put a
simple CGI that accepts all URLs below http://www.ncbi.nlm.nih.gov/
2006/entrez/, parses out the db and id, and says something polite.
5. If we screw up we can always bump to
http://www.ncbi.nlm.nih.gov/2007/
-Alan
[*]
plug
urn:lsid:ncbi.nlm.nih.gov.lsid.biopathways.org:genbank_gi:54306556
suggested at
http://lsid.biopathways.org/authorities.shtml
into
http://linnaeus.zoology.gla.ac.uk/~rpage/lsid/tester/