Re: 303 +1, WSDL -1

Alan Ruttenberg Mon, 16 Jul 2007 09:08:58 -0700


Hi Daniel,

I hope you'll postpone your implementation decisions until the HCLSURI recommendations are published, and that the NCBO would followthose recommendations when the time comes to implement your system.As you can see, there is still interesting debate, and thepossibility of new insights. If there are things that the NCBO feelsstrongly about, or requirements that you have that Jonathan has notincorporated, then I'd urge you and other interesting parties to jointhe call today, and to participate in the document drafting thatJonathan is leading.

I'll note a minor concern in your statement - a number of ontologiesthat you host are not the product of NCBO work - are you suggestingthat you will be creating new URIs for all the entities in thoseontologies, even if they already have URIs? If so, it would seem thatthis could exacerbate the problems we are having, rather than helping- we've noted that that the proliferation of different URIs thatidentify the same thing is problematic from a SW point of view.


Regards,
Alan


On Jul 16, 2007, at 11:54 AM, [EMAIL PROTECTED] wrote:

Just to remind everyone--NCBO is planning on providing URIs forentities in the breadth of biomedical ontologies it hosts at http://bioportal.bioontology.org.This group has previously gave us a good set of functionalrequirements, and over the next few months we will be implementingthis.
Daniel

___

Daniel Rubin, MD, MS
Clinical Asst. Professor, Radiology
Research Scientist, Stanford Medical Informatics
Scientific Director, National Center of Biomedical Ontology
MSOB X-215
Stanford, CA 94305
650-725-5693


Quoting "Balaji S. Srinivasan" <[EMAIL PROTECTED]>:
Hi,
WSDL is a widely accepted W3C spec that is becoming increasingly
accepted worldwide (and is, generally, automatically generatedbased on
your interface, so requires little or no manual construction), and
which solves a problem that we *know without any doubt* URLs cannot
solve.

I may be mistaken, but isn't WSDL just an XML format? I don't see how
it solves a problem that URLs "cannot solve"...wouldn't thelocation of
"foo.wsdl" be best specified as a URL?
in fact, they [WSDL] are currently MORE POPULAR than RDF itself,
according to Google Trends
But the appropriate comparison is to URLs, not RDF...and theadvantage
of a URL is that there's tons of widely deployed, lightweight
technology for requesting data from a given URL (e.g. w/ a browser as
well as Perl/Python/etc. libraries) and for setting up web servers
(e.g. Apache).
I don't understand why it should be necessary to develop aparallel set
of technologies (e.g. the Firefox LSID plugin, or HTTP proxies) for
resolving LSIDs, particularly when most (all?) of these tools seem to
be built on top of tools (such as Firefox) which can already do URL
resolution without downloading anything.

It would seem to me that the best way to get a reliable set of
canonical URIs is to get NCBI involved. As soon as NCBI publisheda setof canonical URIs (e.g. for genes in Entrez Gene, compounds inPubchem,
etc.) then everyone could use them with confidence. Reasons:

1) NCBI identifiers (even more so than EBI) are the de facto standard
and can be mapped to anything.
2) NCBI is well funded, has serious bandwidth, etc.
3) NCBI can be trusted to stick around for a long time and to
maintain/redirect old URLs, unlike a research lab or most companies.
4) In terms of registering new URIs, NCBI is already a standard
location for data submissions (w/ NCBI GEO, GAIN, etc.).
5) People already use NCBI to get other kinds of data, so getting RDF
data from them is not a serious paradigm shift.

Perhaps there's someone from NCBI on the list; if not, it would be
worthwhile to contact them. If NCBI adopted the standard that
beta.uniprot.org is using, with different suffixes for different
formats (as per Eric Jain's email):
http://beta.uniprot.org/uniprot/P12345
http://beta.uniprot.org/uniprot/P12345.xml
http://beta.uniprot.org/uniprot/P12345.rdf
http://beta.uniprot.org/uniprot/P12345.fasta
....then I think people would adopt it immediately, especially ifthey
kept it on their front page for a month (like they do with other new
services). Regarding the way UniProt is doing things, I think itwas aparticularly good design decision to have the de-facto suffix beHTML,so that you can get a sense of what the URI represents by lookingat it
in a browser.

Also, from Matthias' recent email:
You should not try to pack ANY information about the 'resolution' of
a Semantic Web resource into its URI, quite to the contrary. Makeit asmeaningless and generic as possible, in the best case it shouldjust be
a large random alphanumeric string, e.g. tag:uri:a938fjhsdcHSDu39. If
all URIs look like this, nobody will be detered from re-using a URI
just because of how it looks.
I don't know if this is such a good idea -- when debugging, youwant to
have some information about what the URIs represent (e.g. the
"http://beta.uniprot.org/uniprot/"; prefix tells you that you'relooking
at a UniProt protein with the given ID number). If URIs are just
alphanumeric strings, you need to constantly be doing lookups toremind
yourself of what a particular object means.

--B

--
Balaji S. Srinivasan, Ph.D.
Stanford University
Lecturer, Depts. of Statistics and Computer Science
318 Campus Drive, Clark Center S251
(650) 380-0695
[EMAIL PROTECTED]
http://jinome.stanford.edu


On Jul 14, 2007, at 10:30 PM, Mark Wilkinson wrote:
Well... I apologize in advance, but I'm going to be*insultingly* blunt because I'm quite honestly losing interestin this seemingly pre-destined discussion...
"blinkers, are a piece of equipment used on a horse's face thatrestrict the horse's vision. They usually compose of leather orplastic cups that are places on either side of the eye, so thatthe horse can not see to his sides. Many racehorse trainersbelieve this keeps the horse focused on what is in front ofhim, encouraging him to pay attention to the race rather thanother distractions, such as crowds" (http://en.wikipedia.org/wiki/Blinders)
WSDL is a widely accepted W3C spec that is becoming increasinglyaccepted worldwide (and is, generally, automatically generatedbased on your interface, so requires little or no manualconstruction), and which solves a problem that we *know withoutany doubt* URLs cannot solve. I really don't see an advantagein trying to ignore them, circumvent them, or otherwise relegatethem to a secondary lookup, in the base spec for the SemanticWeb, when we know that we are going to have to deal with them atsome point (and in fact, they are currently MORE POPULAR thanRDF itself, according to Google Trends: http://www.google.com/trends?q=WSDL%2C+RDF&ctab=0&geo=all&date=all&sort=0
I really don't see the point in trying to build the Semantic Webby specifically avoiding acknowledgement of one of the mostpopular trends on the Web, when we already know that the vastmajority of information we need to access as bioinformaticiansis available through web forms or web services!
I'm sorry for being rude and disrespectful - I'm honestly quiteembarrassed to be saying these things so harshly - but I thinkthis discussion has started to become a singularity around a pre-contrived end-point, rather than a discussion of what the Web(and the Semantic Web) really is/can be!
WSDL -1 if you wish, but that puts you in opposition to themajority of the world, where WSDL (thanks to Ajax) is finallystarting to make it's mark!
Again, I apologize for being disrespectful and rude... it reallyisn't personal and I feel truly awful about writing this soharshly! I'm just losing patience with a discussion thatdoesn't seem to be a discussion, but rather a shoe-horn into apre-destined end point.
You are all free to crucify me the next time one of my grantscomes to you for review ;-)
M
On Fri, 13 Jul 2007 20:19:41 -0700, Alan Ruttenberg<[EMAIL PROTECTED]> wrote:
On Jul 13, 2007, at 12:20 AM, Mark Wilkinson wrote:
What worries me about the 303 solution (other than that weare not using it forit's primary purpose [1]) is that the redirection can onlybe to a *single* resource, specified in the Location header.
On Thu, 12 Jul 2007 03:57:34 -0700, Jonathan Rees<[EMAIL PROTECTED]> wrote:If this is an important functionality then it can be providedin avariety of ways - a mere matter of programming. LSID resolverhappens
to be the only way that comes ready made. But the functionality
doesn't need to be tied to the use of LSIDs.
If there is an alternative solution that provides the samefunctionality, and that can be applied universally to allexisting URIs (URLs), then I'm all for it! To be honest, thisis my *primary* objection to moving to a URL solution vs anLSID solution... if you can solve that problem, then I am*almost* in the URL camp.
Here is an alternative:

Problem statement:
Enable third parties to register the fact that they haveadditional statements to provide about something that a URIdenotes, in such a way as to make it easy for anyone todiscover this fact. Do this in a way which requires minimalcoordination (ideally none) between the minter of the originalURI, the provider of the additional statements, and theconsumer of all the statements.
Solution:
For a given URI http://a.b/c/d/e, construct a new URI http://purl.org/about/a.b/c/d/e
Configure the purl server so that http://purl.org/provide-about/a.b/c/d/e redirects to something akin to a structured wiki pageor a REST service (let us assume for the moment that whoevercurrently provides the LSID WSDL that contains this informationcurrently is the provider of this service).
This page may be edited (manually or programmatically) toinclude a description (suitable for a machine to understand) ofhow to access the resource and what sort of resource it is, andperhaps some additional useful information (what predicatesdoes the resource provide). This information rendered as RDFusing a standard vocabulary and saved.
Configure the purl server so that http://purl.org/about/a.b/c/d/e retrieves the RDF that was constructed (or a 404 if there isnone). Semantic web agents then interpret this RDF and go fetchwhat they want or need.
We all agree that 303s redirect to a human readable htmldocument, that this document uses a REL link to an RDF documentthat says what the provider wishes to say and that the RDF alsostates that http://purl.org/about/a.b/c/d/e may have moreinformation. (suitable shortcuts are provided to make bulkretrievals more efficient - we've already discussed suchmechanisms)
This can be done now, with effort analogous to what is beingdone with LSIDS. Let me point out some obvious advantages: 1)No requirement to use web services (though web services *could*be described as ways of accessing further statements usingthis scheme) 2) Requires *less* manual intervention than iscurrently required to maintain the WSDL. 3) Re-uses purl, whichis based on HTTP, which everyone knows how to use already 4)Makes clear that the description of these additional resourcesfor statements are to be in RDF, and requires that oneadvertises what to expect if you go to the resource (will youget an RDF document, a SPARQL endpoint, a Web service set ofmethods?)
---
With a bit more effort expended on extending the purl servercode we can get some more leverage - we enhance it so thatretrieving http://purl.org/about/a.b/c/d/e actually merges theRDF result of retrieving each of http://purl.org/about*/a.b/
http://purl.org/about*/a.b/c
http://purl.org/about*/a.b/c/d
http://purl.org/about/a.b/c/d/e
Where the about* top level domain indicates that theinformation about covers all URIs that start with the indicatedpath.
In this way different providers can note that they haveadditional statements about URIs located in varying amounts ofnamespace.
With some coordination among us, we could even decide todedicate a server to hosting the whole mess of this information(I don't expect that it needs too large a resource) so as tomake the service more efficient in answering queried, andmaking it easy to provide, to whoever wishes, a snapshot thatthey can host themselves.
---

May I now count you among those *almost* in the URL camp? ;-)

-Alan

Re: 303 +1, WSDL -1

Reply via email to