proposal for standard NCBI database URI

Alan Ruttenberg Mon, 08 May 2006 21:02:47 -0700

As far as I know there is no standard URI for a resource at NCBI. Iwould like to propose that there be one, since we will all need themto use when we refer to these resources in our RDF. (and I need one*now*)


Following other styles I've seen, I propose the following:

1. http://www.ncbi.nlm.nih.gov/2006/entrez/<DATABASE_GOES_HERE>/<IDENTIFIER_GOES_HERE>

or

2. http://www.ncbi.nlm.nih.gov/2006/entrez/<DATABASE_GOES_HERE>#<IDENTIFIER_GOES_HERE>


The list of valid databases can be viewed, e.g. in the popup at
http://www.ncbi.nih.gov/Database/datamodel/index.html

I propose that they all be made lower case for use in the following URIs

e.g.

1: http://www.ncbi.nlm.nih.gov/2006/entrez/gene/596
2: http://www.ncbi.nlm.nih.gov/2006/entrez/gene#596
1: http://www.ncbi.nlm.nih.gov/2006/entrez/protein/NP_000624
2: http://www.ncbi.nlm.nih.gov/2006/entrez/protein#NP_000624

I am partial to #1, because from a document service point of view itdoesn't have the implication that there is a big document full ofe.g. genes, and you should find the one you are looking at a specificplace in that document.

Some suggestions, meant to avoid various excuses why we might notmake a decision about this promptly:

1. Initial proposal is that we don't have to choose from thedifferent identifiers, i.e.


http://www.ncbi.nlm.nih.gov/2006/entrez/protein/NP_000624
and
http://www.ncbi.nlm.nih.gov/2006/entrez/protein/72198189

Rational: can use owl:sameAs to make them the same if we need to. Wecan suggest a best practice if we want to preferentially use onenumbering system versus another. (I like the alphanumeric ones, myself)

2. Initial proposal is that we don't include version information inthese identifiers

Rational: We can later decide to also have those, and then addrelations to connect the versions to the abstract, unversioned URIs.I will claim that for most of the work we are doing in this WG, theversions don't matter.

3. This proposal is not meant to oppose using LSIDs. However, I willnote that there doesn't seem to be a working combination of a)specification of what these look like for NCBI, and b) a workingresolver for the few examples I've seen[*]. Thus implementing LSIDswill require work = delay. However there is no reason that when anLSID solution comes on line that it can't be compatible with thechoice we use now, by including a mention of it in the metadata, andvice versa when documents start to be served from these addresses.

4. Just because no web page is available at these URL's currently,doesn't mean we shouldn't use them. There is a pressing need forstable identifiers, and I would argue that while having a document atthe URL is polite, not having one shouldn't block us have anidentifier solution. However an easy thing to do would be to put asimple CGI that accepts all URLs below http://www.ncbi.nlm.nih.gov/2006/entrez/, parses out the db and id, and says something polite.


5. If we screw up we can always bump to

http://www.ncbi.nlm.nih.gov/2007/

-Alan

[*]

plug

urn:lsid:ncbi.nlm.nih.gov.lsid.biopathways.org:genbank_gi:54306556

suggested at

http://lsid.biopathways.org/authorities.shtml

into

http://linnaeus.zoology.gla.ac.uk/~rpage/lsid/tester/

proposal for standard NCBI database URI

Reply via email to