As far as I know there is no standard URI for a resource at NCBI. I would like to propose that there be one, since we will all need them to use when we refer to these resources in our RDF. (and I need one *now*)

Following other styles I've seen, I propose the following:

1. http://www.ncbi.nlm.nih.gov/2006/entrez/<DATABASE_GOES_HERE>/ <IDENTIFIER_GOES_HERE>

or

2. http://www.ncbi.nlm.nih.gov/2006/entrez/ <DATABASE_GOES_HERE>#<IDENTIFIER_GOES_HERE>

The list of valid databases can be viewed, e.g. in the popup at
http://www.ncbi.nih.gov/Database/datamodel/index.html

I propose that they all be made lower case for use in the following URIs

e.g.

1: http://www.ncbi.nlm.nih.gov/2006/entrez/gene/596
2: http://www.ncbi.nlm.nih.gov/2006/entrez/gene#596
1: http://www.ncbi.nlm.nih.gov/2006/entrez/protein/NP_000624
2: http://www.ncbi.nlm.nih.gov/2006/entrez/protein#NP_000624

I am partial to #1, because from a document service point of view it doesn't have the implication that there is a big document full of e.g. genes, and you should find the one you are looking at a specific place in that document.

Some suggestions, meant to avoid various excuses why we might not make a decision about this promptly:

1. Initial proposal is that we don't have to choose from the different identifiers, i.e.

http://www.ncbi.nlm.nih.gov/2006/entrez/protein/NP_000624
and
http://www.ncbi.nlm.nih.gov/2006/entrez/protein/72198189

Rational: can use owl:sameAs to make them the same if we need to. We can suggest a best practice if we want to preferentially use one numbering system versus another. (I like the alphanumeric ones, myself)

2. Initial proposal is that we don't include version information in these identifiers

Rational: We can later decide to also have those, and then add relations to connect the versions to the abstract, unversioned URIs. I will claim that for most of the work we are doing in this WG, the versions don't matter.

3. This proposal is not meant to oppose using LSIDs. However, I will note that there doesn't seem to be a working combination of a) specification of what these look like for NCBI, and b) a working resolver for the few examples I've seen[*]. Thus implementing LSIDs will require work = delay. However there is no reason that when an LSID solution comes on line that it can't be compatible with the choice we use now, by including a mention of it in the metadata, and vice versa when documents start to be served from these addresses.

4. Just because no web page is available at these URL's currently, doesn't mean we shouldn't use them. There is a pressing need for stable identifiers, and I would argue that while having a document at the URL is polite, not having one shouldn't block us have an identifier solution. However an easy thing to do would be to put a simple CGI that accepts all URLs below http://www.ncbi.nlm.nih.gov/ 2006/entrez/, parses out the db and id, and says something polite.

5. If we screw up we can always bump to

http://www.ncbi.nlm.nih.gov/2007/

-Alan

[*]

plug

urn:lsid:ncbi.nlm.nih.gov.lsid.biopathways.org:genbank_gi:54306556

suggested at

http://lsid.biopathways.org/authorities.shtml

into

http://linnaeus.zoology.gla.ac.uk/~rpage/lsid/tester/





Reply via email to