Re: Distributed ontology development (was Ontology entity IDs)

William Bug Sun, 16 Jul 2006 21:38:59 -0700

Many thanks for the info, Mark. That sounds very promising.

Are you referring to the JDBC Protégé (http://protege.cim3.net/cgi-bin/wiki.pl?JdbcDatabaseBackend), or are there other ways of connecting Protégé to an RDBMS backend?

It certainly is a hurculean task to work out the O-R mapping in a way that is flexible enough to accommodate all the graphs someone might construct either in Protege-Frames or Protege-OWL, so if this is already implemented and working, it behooves all us who need to support this sort of community ontology curation re-use what's being constructed by SMI and/or NCI.

I'd actually been poking around the Protégé Wiki for sometime (http://protege.cim3.net/) and was aware this system existed. We've been discussing on the BIRN Ontology Task Force Tcons - with input from Daniel - how we might be able to construct such a shared system.

I'm a strong proponent for this approach. On the BIRN Ontology Task Force, we'll need to label classes and attributes with various levels of curation status (e.g., "fully vetted", "good graph location; poor definition", "temporary graph location; needs to move eventually", etc.) We need to be able to release "versions" of the ontology based on these status tags and other attributes. Ultimately, we'll also want have node & edge-level unique IDs for the published version of the ontology (which will likely be used to create node-level URIs). All of this will be easier to manage from a RDBMS, than it will by issuing versions in CVS or SVN as is typically done. Actually, I think to manage the elements down to that level will be nearly impossible to do within CVS/SVN.

The only problem is creating an efficient means to support this sort of community curation - and sharing of ontologies from other sources - a direct JDBC connection isn't going to work well. They'll be firewall issues which I believe will add way too much to each individual's overhead of bringing this capability online. When the group is supported by a single IT staff and working within the same LAN environment (including those who'd connect via VPN), this can be a viable approach, but outside of that, it will probably be too much trouble for all the folks who need access.

This is why we've been talking with Daniel about expanding the web version of Protégé developed in your group so as to "open" it and release it from the JDBC port requirements using a combination of a service-oriented architecture (web services) and the Java Portlet framework. In our lab, we've implemented very simple WSDL web service response/request pairs to implement a generic SQL interface via web services to meet this need. It works extremely well, even for fairly complicated queries and can even be used to return binary objects (in our case histological images) via SOAP + attachments. This is all running over relatively firewall friendly ports such as are used by HTTP and the Tomcat Java Servlet framework.

I assume when you mention the NCIT community curation this is a project being developed/hosted/supported by the NCI Bioinformatics group as a part of the caBIG project? I know they are very committed to using Apache implementations of various Java specs and web-based architectural tools. By any chance is the work they are doing with the Protége-RDBMS shared ontology environment (CODS - Collaborative Ontology Development Server (or Collaborative Ontology Development Service project)) taking this approach to make the system less reliant on running JDBC over the net and through firewalls? I saw on one of the Protégé CODS server configuration pages ports 4020 - 4039 were used, which again, given these do have public assignments for proprietary applications (http://www.iana.org/assignments/port-numbers) can be difficult to use, unless all contributors are being hosted by the same IT staff and/or are on the same LAN (even if its a VLAN).

Are there pages on the Protégé Wiki where more complete documentation discusses some of these details for the NCI CODS project?

Many thanks again for the info, Mark.

Cheers,

Bill

On Jul 16, 2006, at 12:53 AM, Mark Musen wrote:

On Jul 10, 2006, at 11:40 PM, William Bug wrote:

However, there doesn't appear to be a means within the OBO/NCBO community for doing this sort of distributed ontology design right now. Two of the tools in wide spread use - Protégé and OBO-Edit are really not designed to support distributed and shared development, such as you'd find in a typical distributed architecture - whether it be a standard client-server RDBMS-based approach, one using some "active pages" technology such as php, Zope, Ruby on Rails, Java Servlet/Portlet frameworks, etc. - or a more asynchronous approach using messaging and/or web services to assemble the required components from the various authoritative sources.

Bill,

I hate to sound like a salesperson, but Protégé in its multi-user mode (using the relational database backend) would seem to be just what you are looking for. Protégé (both the frames and the OWL facility) allow distributed users to work simultaneously on an ontology stored on a remote server. As the ontology is updated, all the Protégé clients refresh automatically to display the changes.

NCI currently is experimenting with this architecture for the development of the NCI Thesaurus in OWL, and they have developers stationed all across the country. I'm told that Perot Systems, using the frame-based representation, has nearly 100 Protégé users working on the same ontology simultaneously.

Mark

P.S. While I'm plugging Protégé, don't forget that the Ninth Annual Protégé Conference takes place at Stanford next week (see http://protege.stanford.edu/conference/2006/).

Bill Bug

Senior Analyst/Ontological Engineer

Laboratory for Bioimaging & Anatomical Informatics

www.neuroterrain.org

Department of Neurobiology & Anatomy

Drexel University College of Medicine

2900 Queen Lane

Philadelphia, PA 19129

215 991 8430 (ph)

610 457 0443 (mobile)

215 843 9367 (fax)

Please Note: I now have a new email - [EMAIL PROTECTED]


This email and any accompanying attachments are confidential. 
This information is intended solely for the use of the individual 
to whom it is addressed. Any review, disclosure, copying, 
distribution, or use of this email communication by others is strictly 
prohibited. If you are not the intended recipient please notify us 
immediately by returning this message to the sender and delete 
all copies. Thank you for your cooperation.

Re: Distributed ontology development (was Ontology entity IDs)

Reply via email to