Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
The difference between URIs and URLs? I don't believe that URL is something that exists any more in any standard, it's all URIs. Correct me if I'm wrong. I don't entirely agree with either dogmatic side here, but I do think that we've arrived at an awfully confusing (for developers) environment. Re-reading the various semantic web TAG position papers people keep referencing, I actually don't entirely agree with all of their principles in practice. Jonatan From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Alexander Johannesen [alexander.johanne...@gmail.com] Sent: Tuesday, April 14, 2009 9:27 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) Hiya, Been meaning to jump into this discussion for a while, but I've been off to an alternative universe and I can't even say it's good to be back. :) Anwhoo ... On Fri, Apr 3, 2009 at 03:48, Ray Denenberg, Library of Congress r...@loc.gov wrote: You're right, if there were a web: URI scheme, the world would be a better place. But it's not, and the world is worse off for it. I'm rather confused by this statement. The web: URI scheme? The Web *is* the URI scheme; they are all identifiers to resources (ftp: http: gopher: https: etc.), and together they make up, the, um, web of things. What am I missing? Back in the old days, URIs (or URLs) were protocol based. No, which one do you mean, URIs or URLs? The ftp scheme was for retrieving documents via ftp. The telnet scheme was for telnet. And so on. Again, have I missed something? This has changed, as opposed to the good old days? A few years later the semantic web was conceived and alot of SW people began coining all manner of http URIs that had nothing to do with the http protocol. I've been browsing back and forth this discussion, and couldn't find much to back this up. What do you mean by this? Instead, they should have bit the bullet and coined a new scheme. They didn't, and that's why we're in the mess we're in. I'm sorry, but mess? Did you know the messiness of the web is probably what made it successful? Not to mention that having URIs be identifiers *and* have the ability to resolve them is a bonus; they're identifiers of things (as they've always been, as I'm sure you know URI stands for Unified Resource Identifier, right? :), as in they consists of a string of characters used to identify or name a resource on the Internet. And then, if you so choose, you can use the protocol level to *resolve* them. Not sure how anyone can consider this to be bad, though. Or is this just a misunderstanding of the difference between URIs and URLs? Kind regards, Alexander -- --- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps -- http://shelter.nu/blog/
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
From: Jonathan Rochkind rochk...@jhu.edu The difference between URIs and URLs? I don't believe that URL is something that exists any more in any standard, it's all URIs. The URL is alive and well. The W3C definition, http://www.w3.org/TR/uri-clarification/ a URL is a type of URI that identifies a resource via a representation of its primary access mechanism (e.g., its network location), rather than by some other attributes it may have. Thus as we noted, http: is a URI scheme. An http URI is a URL. SRU, for example, considers it's request to be URL. I do think this conversation has played itself out. --Ray
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
On Tue, Apr 14, 2009 at 23:34, Jonathan Rochkind rochk...@jhu.edu wrote: The difference between URIs and URLs? I don't believe that URL is something that exists any more in any standard, it's all URIs. Correct me if I'm wrong. Sure it exists: URLs are a subset of URIs. URLs are locators as opposed to just identifiers (which is an important distinction, much used in SemWeb lingo), where URLs are closer to the protocol like things Ray describe (or so I think). I don't entirely agree with either dogmatic side here, but I do think that we've arrived at an awfully confusing (for developers) environment. But what about it is confusing (apart from us having this discussion :) ? Is it that we have IDs that happens to *also* resolve? And why is that confusing? Re-reading the various semantic web TAG position papers people keep referencing, I actually don't entirely agree with all of their principles in practice. Well, let me just say that there's more to SemWeb than what comes out of W3C. :) Kind regards, Alex -- --- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps -- http://shelter.nu/blog/
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Can you show me where this definition of a URL vs. a URI is made in any RFC or standard-like document? Sure, we have a _sense_ of how the connotation is different, but I don't think that sense is actually formalized anywhere. And that's part of what makes it confusing, yeah. I think the sem web crowd actually embraces this confusingness, they want to have it both ways: Oh, a URI doesn't need to resolve, it's just an opaque identifier; but you really should use http URIs for all URIs; why? because it's important that they resolve. In general, combining two functions in one mechanism is a dangerous and confusing thing to do in data design, in my opinion. By analogy, it's what gets a lot of MARC/AACR2 into trouble. It's also often a very convenient thing to do, and convenience matters. Although ironically, my problem with some of those TAG documents is actually that they privilege pure theory over practical convenience. Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08-17.html They suggest: URI opacity'Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource.' I understand why that makes sense in theory, but it's entirely impractical for me, as I discovered with the SuDoc experiment (which turned out to be a useful experiment at least in understanding my own requirements). If I get a URI representing (eg) a Sudoc (or an ISSN, or an LCCN), I need to be able to tell from the URI alone that it IS a Sudoc, AND I need to be able to extract the actual SuDoc identifier from it. That completely violates their Opacity requirement, but it's entirely infeasible to require me to make an individual HTTP request for every URI I find, to figure out what it IS. Infeasible for performance and cost reasons, and infeasible because it requires a lot more development effort at BOTH ends -- it means that every single URI _would_ have to de-reference to an RDF representation capable of telling me it identifies a SuDoc and what the acutal bare SuDoc is. Contrary to the protestations that a URI is different than a URL and does not need to resolve, foll! owing the opacity recommendation/requirement would mean that resolution would be absolutely required in order for me to use it. Meaning that someone minting the URI would have to provide that infrastructure, and I as a client would have to write code to use it. But I just want a darn SuDoc in a URI -- and there are advantages to putting a SuDoc in a URI _precisely_ so it can be used in URI-using infrastructures like RDF, and these advantages hold _even if_ it's not resolvable and we ignore the 'opacity' reccommendation. There are trade-offs. I think a lot of that TAG stuff privileges the theoretically pure over the on the ground practicalities. They've got a great fantasy in their heads of what the semantic web _could_ be, and I agree it's theoretically sound and _could_ be; but you've got to make it convenient and cheap if you actually want it to happen for real, sometimes sacrificing theoretical purity. And THAT'S one important lesson of the success of the WWW. Jonathan From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Alexander Johannesen [alexander.johanne...@gmail.com] Sent: Tuesday, April 14, 2009 9:48 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) On Tue, Apr 14, 2009 at 23:34, Jonathan Rochkind rochk...@jhu.edu wrote: The difference between URIs and URLs? I don't believe that URL is something that exists any more in any standard, it's all URIs. Correct me if I'm wrong. Sure it exists: URLs are a subset of URIs. URLs are locators as opposed to just identifiers (which is an important distinction, much used in SemWeb lingo), where URLs are closer to the protocol like things Ray describe (or so I think). I don't entirely agree with either dogmatic side here, but I do think that we've arrived at an awfully confusing (for developers) environment. But what about it is confusing (apart from us having this discussion :) ? Is it that we have IDs that happens to *also* resolve? And why is that confusing? Re-reading the various semantic web TAG position papers people keep referencing, I actually don't entirely agree with all of their principles in practice. Well, let me just say that there's more to SemWeb than what comes out of W3C. :) Kind regards, Alex -- --- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps -- http://shelter.nu/blog/
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Thanks Ray. By that definition ALL http URIs are URLs, a priori. I read Alexander as trying to make a different distinction. Ray Denenberg, Library of Congress wrote: From: Jonathan Rochkind rochk...@jhu.edu The difference between URIs and URLs? I don't believe that URL is something that exists any more in any standard, it's all URIs. The URL is alive and well. The W3C definition, http://www.w3.org/TR/uri-clarification/ a URL is a type of URI that identifies a resource via a representation of its primary access mechanism (e.g., its network location), rather than by some other attributes it may have. Thus as we noted, http: is a URI scheme. An http URI is a URL. SRU, for example, considers it's request to be URL. I do think this conversation has played itself out. --Ray
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Tuesday, April 14, 2009 10:21 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08- 17.html They suggest: URI opacity'Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource.' I understand why that makes sense in theory, but it's entirely impractical for me, as I discovered with the SuDoc experiment (which turned out to be a useful experiment at least in understanding my own requirements). If I get a URI representing (eg) a Sudoc (or an ISSN, or an LCCN), I need to be able to tell from the URI alone that it IS a Sudoc, AND I need to be able to extract the actual SuDoc identifier from it. That completely violates their Opacity requirement, but it's entirely infeasible to require me to make an individual HTTP request for every URI I find, to figure out what it IS. Jonathan, you need to take URI opacity in context. The document is correct in suggesting that user agents should not attempt to infer properties of the referenced resource. The Architecture of the Web is also clear on this point and includes an example. Just because a resource URI ends in .html does not mean that HTML will be the representation being returned. The user agent is inferring a property by looking at the end of the URI to see if it ends in .html, e.g., that the Web Document will be returning HTML. If you really want to know for sure you need to dereference it with a HEAD request. Now having said that, URI opacity applies to user agents dealing with *any* URIs that they come across in the wild. They should not try to infer any semantics from the URI itself. However, this doesn't mean that the minter of a URI cannot create a policy decision for a group of URIs under their control that contain semantics. In your example, you made a policy decision about the URIs you were minting for SUDOCs such that the actual SUDOC identifier would appear someplace in the URI. This is perfectly fine and is the basis for REST URIs, but understand you created a specific policy statement for those URIs, and if a user agent is aware of your policy statements about the URIs you mint, then they can infer semantics from the URIs you minted. Does that break URI opacity from a user agents perspective? No. It just means that those user agents who know about your policy can infer semantics from your URIs and those that don't should not infer any semantics because they don't know what the policies are, e.g., you could be returning PDF representations when the URI ends in .html, if that was your policy, and the only way for a user agent to know that is to dereference the URI with either HEAD or GET when they don't know what the policies are. Andy.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Am I not an agent making use of a URI who is attempting to infer properties from it? Like that it represents a SuDoc, and in particular what that SuDoc is? If this kind of talmudic parsing of the TAG reccommendations to figure out what they _really_ mean is neccesary, I stand by my statement that the environment those TAG documents are encouraging is a confusing one. Jonathan Houghton,Andrew wrote: From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Tuesday, April 14, 2009 10:21 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08- 17.html They suggest: URI opacity'Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource.' I understand why that makes sense in theory, but it's entirely impractical for me, as I discovered with the SuDoc experiment (which turned out to be a useful experiment at least in understanding my own requirements). If I get a URI representing (eg) a Sudoc (or an ISSN, or an LCCN), I need to be able to tell from the URI alone that it IS a Sudoc, AND I need to be able to extract the actual SuDoc identifier from it. That completely violates their Opacity requirement, but it's entirely infeasible to require me to make an individual HTTP request for every URI I find, to figure out what it IS. Jonathan, you need to take URI opacity in context. The document is correct in suggesting that user agents should not attempt to infer properties of the referenced resource. The Architecture of the Web is also clear on this point and includes an example. Just because a resource URI ends in .html does not mean that HTML will be the representation being returned. The user agent is inferring a property by looking at the end of the URI to see if it ends in .html, e.g., that the Web Document will be returning HTML. If you really want to know for sure you need to dereference it with a HEAD request. Now having said that, URI opacity applies to user agents dealing with *any* URIs that they come across in the wild. They should not try to infer any semantics from the URI itself. However, this doesn't mean that the minter of a URI cannot create a policy decision for a group of URIs under their control that contain semantics. In your example, you made a policy decision about the URIs you were minting for SUDOCs such that the actual SUDOC identifier would appear someplace in the URI. This is perfectly fine and is the basis for REST URIs, but understand you created a specific policy statement for those URIs, and if a user agent is aware of your policy statements about the URIs you mint, then they can infer semantics from the URIs you minted. Does that break URI opacity from a user agents perspective? No. It just means that those user agents who know about your policy can infer semantics from your URIs and those that don't should not infer any semantics because they don't know what the policies are, e.g., you could be returning PDF representations when the URI ends in .html, if that was your policy, and the only way for a user agent to know that is to dereference the URI with either HEAD or GET when they don't know what the policies are. Andy.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
The User Agent is understood to be a typical browser, or other piece of software, like wget, curl, etc. It's the thing implementing the client side of the specs. I don't think you are operating as a user agent here as much as you are a server application. That is, assuming I have any idea what you're actually doing. --Joe On Tue, Apr 14, 2009 at 11:27 AM, Jonathan Rochkind rochk...@jhu.eduwrote: Am I not an agent making use of a URI who is attempting to infer properties from it? Like that it represents a SuDoc, and in particular what that SuDoc is? If this kind of talmudic parsing of the TAG reccommendations to figure out what they _really_ mean is neccesary, I stand by my statement that the environment those TAG documents are encouraging is a confusing one. Jonathan Houghton,Andrew wrote: From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Tuesday, April 14, 2009 10:21 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08- 17.html They suggest: URI opacity'Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource.' I understand why that makes sense in theory, but it's entirely impractical for me, as I discovered with the SuDoc experiment (which turned out to be a useful experiment at least in understanding my own requirements). If I get a URI representing (eg) a Sudoc (or an ISSN, or an LCCN), I need to be able to tell from the URI alone that it IS a Sudoc, AND I need to be able to extract the actual SuDoc identifier from it. That completely violates their Opacity requirement, but it's entirely infeasible to require me to make an individual HTTP request for every URI I find, to figure out what it IS. Jonathan, you need to take URI opacity in context. The document is correct in suggesting that user agents should not attempt to infer properties of the referenced resource. The Architecture of the Web is also clear on this point and includes an example. Just because a resource URI ends in .html does not mean that HTML will be the representation being returned. The user agent is inferring a property by looking at the end of the URI to see if it ends in .html, e.g., that the Web Document will be returning HTML. If you really want to know for sure you need to dereference it with a HEAD request. Now having said that, URI opacity applies to user agents dealing with *any* URIs that they come across in the wild. They should not try to infer any semantics from the URI itself. However, this doesn't mean that the minter of a URI cannot create a policy decision for a group of URIs under their control that contain semantics. In your example, you made a policy decision about the URIs you were minting for SUDOCs such that the actual SUDOC identifier would appear someplace in the URI. This is perfectly fine and is the basis for REST URIs, but understand you created a specific policy statement for those URIs, and if a user agent is aware of your policy statements about the URIs you mint, then they can infer semantics from the URIs you minted. Does that break URI opacity from a user agents perspective? No. It just means that those user agents who know about your policy can infer semantics from your URIs and those that don't should not infer any semantics because they don't know what the policies are, e.g., you could be returning PDF representations when the URI ends in .html, if that was your policy, and the only way for a user agent to know that is to dereference the URI with either HEAD or GET when they don't know what the policies are. Andy.
[CODE4LIB] code 4 museums
Hi all, I've been a software developer in a research library for several years, and I have worked with objects typically viewed as museum collections to a large degree (particularly ancient coins and eighteenth century European sheet music). Since I'm from a library and am familiar with library technological standards as far as metadata practices and software applications go, I tend to apply library standards toward the museum collections I have been in contact with--which involves Encoded Archival Description for metadata, opensource applications like tomcat, cocoon, and lucene/solr. My knowledge of museum practices is fairly limited, but I have noticed that many museums have tended to adopt proprietary databases to describe their collections. I feel museums tend to lag behind their library counterparts with respect to the adoption of opensource frameworks and open standards, but if you think about it, museums are scarcely different than many archives/special collections libraries in content and organization. I'm thinking of PastPerfect in particular. It's quite common in the museum world and costs almost $1000 per license. I'm wondering if anyone else on code4lib actually works for a museum or has first-hand experience in providing access to museum collections and has noticed the same general differences between libraries and museums that I have. Ethan Gruber University of Virginia Library
Re: [CODE4LIB] code 4 museums
Ethan, Mellon funded a project, CollectionSpace that addresses the needs of museums specifically. The Rutgers bibliographic utility, OpenMIC, which I hope will finally go open source in May, also supports the needs of museums in terms of rights and provenance information. We designed the utility to support a statewide consortium of libraries, museums, historical societies and archives. The museums were the most specific about their needs for source, technical and rights metadata, and we tried to address their needs in our METS implementation. Grace Agnew Rutgers University Libraries Hi all, I've been a software developer in a research library for several years, and I have worked with objects typically viewed as museum collections to a large degree (particularly ancient coins and eighteenth century European sheet music). Since I'm from a library and am familiar with library technological standards as far as metadata practices and software applications go, I tend to apply library standards toward the museum collections I have been in contact with--which involves Encoded Archival Description for metadata, opensource applications like tomcat, cocoon, and lucene/solr. My knowledge of museum practices is fairly limited, but I have noticed that many museums have tended to adopt proprietary databases to describe their collections. I feel museums tend to lag behind their library counterparts with respect to the adoption of opensource frameworks and open standards, but if you think about it, museums are scarcely different than many archives/special collections libraries in content and organization. I'm thinking of PastPerfect in particular. It's quite common in the museum world and costs almost $1000 per license. I'm wondering if anyone else on code4lib actually works for a museum or has first-hand experience in providing access to museum collections and has noticed the same general differences between libraries and museums that I have. Ethan Gruber University of Virginia Library
Re: [CODE4LIB] Something completely different
Alexander Johannesen wrote: We currently use topic maps, alot, in our infrastructure. If we were starting again tomorrow, I'd advocate using RDF instead, mainly because of the much better tool support and take-up. Hmm, not a good thing at all. Could you elaborate, though, as I use it too as part of infrastructure too, and wouldn't touch RDF / SemWeb without a long stick? I'm into application semantics and shared knowledge-bases. What are you guys doing where you feel the support and tools are lacking? And what are the RDF alternatives? RDF, unlike topic maps, is being used by substantial numbers of people who we interact with in the real world and would like to interoperate with. If we used RDF rather than topic maps internally, that interoperability would be much, much cheaper. It's tempting to say it's free, but it's not quite, because it does impose some constraints. In my eyes, the core thing that RDF supports that topic maps don't seem to is seamless reuse by people you don't care about. For example the people at http://lcsubjects.org have never heard of us (that I know of), but we can use their URLs like http://lcsubjects.org/subjects/sh90005545#concept to represent our roles. cheers stuart -- Stuart Yeates http://www.nzetc.org/ New Zealand Electronic Text Centre http://researcharchive.vuw.ac.nz/ Institutional Repository
Re: [CODE4LIB] Anyone else watching rev=canonical?
Wait, is this the same or different than link rel=canonical, as in: http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html link rel=canonical seemed like a good idea to me. But when I start reading some of those URLs, it's not clear to me if they're talking about the same thing or not. Jonathan Brett Bonfield wrote: Summary: URL shortening services, such as TinyURL, are a problem. The folks who have proposed rev=canonical have written some useful software around it, but rev=canonical has some potentially insurmountable issues. I suggest the following posts if you find this at all interesting: The post that drew attention to URL shorteners (by the creator of del.icio.us) http://joshua.schachter.org/2009/04/on-url-shorteners.html A summary of the work on rev=canonical, with good links and also a new bookmarklet http://simonwillison.net/2009/Apr/11/revcanonical/ An interesting post that makes the case for rev=canonical http://adactio.com/journal/1568 An interesting post that makes the case against rev=canonical http://www.mnot.net/blog/2009/04/14/rev_canonical_bad I (used to) like rev=canonical” http://decafbad.com/blog/2009/04/13/i-like-revcanonical An interesting assessment of the issues involved http://intertwingly.net/blog/2009/04/14/Canonical-Reverse-Or-Wisdom-Defying-Shorturl I'm not sure what happens now, but I hope the conversation results quickly in as much software as is needed. Brett Brett Bonfield Director Collingswood Public Library bonfi...@collingswoodlib.org 856.858.0649
Re: [CODE4LIB] Anyone else watching rev=canonical?
On Tue, Apr 14, 2009 at 5:30 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Wait, is this the same or different than link rel=canonical, as in: http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html link rel=canonical seemed like a good idea to me. But when I start reading some of those URLs, it's not clear to me if they're talking about the same thing or not. Different. Which is one of the problems with rev=canonical. Brett
Re: [CODE4LIB] Anyone else watching rev=canonical?
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Brett Bonfield Sent: Tuesday, April 14, 2009 6:48 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Anyone else watching rev=canonical? On Tue, Apr 14, 2009 at 5:53 PM, Houghton,Andrew hough...@oclc.org wrote: From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Brett Bonfield Different. Which is one of the problems with rev=canonical. Another issue is that Google, Microsoft, et al. couldn't see that their proposal was already taken care of by HTTP with its Content-Location header and that if they wanted people to embed the canonical URI into their HTML that they could have easily done: meta http-equiv=Content-Location content=canonical-URI / rather than creating a new link rel=canonical and BTW their strategy only works in HTML, it doesn't work in RDF, JSON, XML, etc., but using HTTP as it was intended, e.g., Content-Location header, it works for all media types. Similar issues are arising with the proposed rev=canonical. That is, there are different ways to provide the info that rev=canonical is providing. However, just to be clear, rev=canonical != rel=canonical. They are discrete responses to distinct issues. Agreed. Another issue with rev=canonical is that I don't believe that rev= is going to be supported in HTML 5. Andy.
Re: [CODE4LIB] Anyone else watching rev=canonical?
On Tue, Apr 14, 2009 at 7:10 PM, Houghton,Andrew hough...@oclc.org wrote: From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Brett Bonfield However, just to be clear, rev=canonical != rel=canonical. They are discrete responses to distinct issues. Agreed. Another issue with rev=canonical is that I don't believe that rev= is going to be supported in HTML 5. That's correct. As a couple of the posts I pointed to mention, the current plan isn't simply to deprecate rev (and then explain why rev has been deprecated) but to omit it completely. Brett
Re: [CODE4LIB] Something completely different
On Wed, Apr 15, 2009 at 07:10, stuart yeates stuart.yea...@vuw.ac.nz wrote: RDF, unlike topic maps, is being used by substantial numbers of people who we interact with in the real world and would like to interoperate with. If we used RDF rather than topic maps internally, that interoperability would be much, much cheaper. It's tempting to say it's free, but it's not quite, because it does impose some constraints. But it's not that hard to create a bridge from RDF to Topic Maps and back, no? Or is your interop story different? In my eyes, the core thing that RDF supports that topic maps don't seem to is seamless reuse by people you don't care about. Yes, this has been brought up on several occasions, including by me at the TMRA 2008. But then, it's not so much that RDF does something that Topic Maps doesn't *support*, it's that it's packaged differently. So, where RDF has got five standard ontology levels (RDF, RDFS, OWL DL/Lite/Full) Topic Maps got one simpler one (TMDM), yet neither can express anything better or differently than the other. My theory here is that people *like* 5 layers of RDF, because it gives the false sensation of choice. But it's all ontological definitions. However, the 5 levels of RDF does indeed create a defined platform for sharing (if not cast in iron), in which in the TM world you need to include it / create it. Oh, and of course the academics seem to have embraced W3C and anything by the authority of TBL, and its effect is trickling down. For example the people at http://lcsubjects.org have never heard of us (that I know of), but we can use their URLs like http://lcsubjects.org/subjects/sh90005545#concept to represent our roles. Not sure I understand your example. Here's my Topic Map identifier in a Topic Map ; http://psi.ontopedia.net/Alexander_Johannesen Identifier and locator, and resolvable, and can be used by anyone. Regards, Alex -- --- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps -- http://shelter.nu/blog/
Re: [CODE4LIB] code 4 museums
There is the Specify software for natural history collections: http://specifysoftware.org/ The source code has apparently just recently been deposited on SourceForge. -hilmar On Apr 14, 2009, at 3:12 PM, Ethan Gruber wrote: Hi all, I've been a software developer in a research library for several years, and I have worked with objects typically viewed as museum collections to a large degree (particularly ancient coins and eighteenth century European sheet music). Since I'm from a library and am familiar with library technological standards as far as metadata practices and software applications go, I tend to apply library standards toward the museum collections I have been in contact with--which involves Encoded Archival Description for metadata, opensource applications like tomcat, cocoon, and lucene/solr. My knowledge of museum practices is fairly limited, but I have noticed that many museums have tended to adopt proprietary databases to describe their collections. I feel museums tend to lag behind their library counterparts with respect to the adoption of opensource frameworks and open standards, but if you think about it, museums are scarcely different than many archives/special collections libraries in content and organization. I'm thinking of PastPerfect in particular. It's quite common in the museum world and costs almost $1000 per license. I'm wondering if anyone else on code4lib actually works for a museum or has first-hand experience in providing access to museum collections and has noticed the same general differences between libraries and museums that I have. Ethan Gruber University of Virginia Library -- === : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : ===
Re: [CODE4LIB] Something completely different
Alexander Johannesen wrote: On Wed, Apr 15, 2009 at 07:10, stuart yeates stuart.yea...@vuw.ac.nz wrote: For example the people at http://lcsubjects.org have never heard of us (that I know of), but we can use their URLs like http://lcsubjects.org/subjects/sh90005545#concept to represent our roles. Not sure I understand your example. Here's my Topic Map identifier in a Topic Map ; http://psi.ontopedia.net/Alexander_Johannesen Identifier and locator, and resolvable, and can be used by anyone. Yes, we mint something very similar (see http://authority.nzetc.org/52969/ for mine), but none of our interoperability partners do. None of our local libraries, none of our local archives and only one of our local museums (by virtue of some work we did with them). All of them publish and most consume some form RDF. Additionally many of the taxonomies we're interested in are available in RDF but not topic maps. cheers stuart -- Stuart Yeates http://www.nzetc.org/ New Zealand Electronic Text Centre http://researcharchive.vuw.ac.nz/ Institutional Repository
Re: [CODE4LIB] Something completely different
On Wed, Apr 15, 2009 at 10:32, stuart yeates stuart.yea...@vuw.ac.nz wrote: Yes, we mint something very similar (see http://authority.nzetc.org/52969/ for mine), but none of our interoperability partners do. None of our local libraries, none of our local archives and only one of our local museums (by virtue of some work we did with them). All of them publish and most consume some form RDF. Hmm, RDF resources are just URIs, so I'm still a bit unsure about what you mean. Are you talking about the fact that the RDF definitions (and not the RDF vocabs themselves) aren't encoded in your TM engine? Additionally many of the taxonomies we're interested in are available in RDF but not topic maps. Converting them to a Topic Map isn't that hard to do, but I guess there is *a* cost there. Regards, Alex -- --- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps -- http://shelter.nu/blog/
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
On Wed, Apr 15, 2009 at 00:20, Jonathan Rochkind rochk...@jhu.edu wrote: Can you show me where this definition of a URL vs. a URI is made in any RFC or standard-like document? From http://www.faqs.org/rfcs/rfc3986.html ; 1.1.3. URI, URL, and URN A URI can be further classified as a locator, a name, or both. The term Uniform Resource Locator (URL) refers to the subset of URIs that, in addition to identifying a resource, provide a means of locating the resource by describing its primary access mechanism (e.g., its network location). The term Uniform Resource Name (URN) has been used historically to refer to both URIs under the urn scheme [RFC2141], which are required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable, and to any other URI with the properties of a name. An individual scheme does not have to be classified as being just one of name or locator. Instances of URIs from any given scheme may have the characteristics of names or locators or both, often depending on the persistence and care in the assignment of identifiers by the naming authority, rather than on any quality of the scheme. Future specifications and related documentation should use the general term URI rather than the more restrictive terms URL and URN [RFC3305]. As you can see, an URI is an identifier, and a URL is a locator (mechanism for retrieval), and since a URL is a subset of an URI, you _can_ resolve URIs as well. Sure, we have a _sense_ of how the connotation is different, but I don't think that sense is actually formalized anywhere. It is, and the same stuff is documented in WikiPedia as well ; http://en.wikipedia.org/wiki/Uniform_Resource_Identifier http://en.wikipedia.org/wiki/Uniform_Resource_Locator I think the sem web crowd actually embraces this confusingness, No, I think they take it at face value; they(the URIs) are identifiers for things, and can be used for just that purpose, but they are also URLs which mean they resolve to something. What I think you're coming at is that something thing it resolves too, as *that* has no definition. But then, if you go from RDF to Topic Maps PSIs (PSIs are URIs with an extended meaning), *that* thing it resolves to indeed has a definition; it's the prose explaining what the identifier identifies, and this is the most important difference between RDF and Topic Maps (and a very subtle but important difference, too). they want to have it both ways: Oh, a URI doesn't need to resolve, it's just an opaque identifier; but you really should use http URIs for all URIs; why? because it's important that they resolve. I smell straw-man. :) But yes, they do want both, as both is in fact a friggin' smart thing to have. We all deal with identifiers all the time, in internal as external applications, so why not use an indetifier scheme that has the added bonus of adding a resolver mechanism? If you want to be stupid and lock yourself in your limited world, then using them as just identifiers is fine but perhaps a bit, well, stupid. But if you want to be smart about it, realizing that without ontological work there will *never* be proper interop, you use those identifiers and let them resolve to something. And if you're really smart, you let them resolve to either more RDF statements, or, if you're seriously Einsteinly smart, use PSIs (as in Topic Maps) :). In general, combining two functions in one mechanism is a dangerous and confusing thing to do in data design, in my opinion. Because ... ? By analogy, it's what gets a lot of MARC/AACR2 into trouble. Hmm, and I thought it was crap design that did that, coupled with poor metadata constraints and validation channels, untyped fields, poor tooling, the lack of machine understandability, and the general library idiom of not invented here. But correct me if I'm wrong. :) Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08-17.html Umm, I'd be wary to take as canon a draft with editorial notes going back 4 to 5 years that still aren't resolved. In other words, this document isn't relevant to the real world. Yet. They suggest: URI opacity 'Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource.' Well, as a RESTafarian I understand this argument quite well. It's about not assuming too much from the internal structure of the URI. Again, it's an identifier, not a scheme such as an URL where structure is defined. Again, for URIs, don't assume structure because at this point it isn't an URL. If I get a URI representing (eg) a Sudoc (or an ISSN, or an LCCN), I need to be able to tell from the URI alone that it IS a Sudoc, AND I need to be able to extract the actual SuDoc identifier from it. That completely violates their Opacity requirement I think you are quite mistaken on this, but before we leap into wheter the web is suitable for SuDoc I'd
Re: [CODE4LIB] Anyone else watching rev=canonical?
Google's Matt Cutts tweeted a few days ago that he didn't understand why Twitter and similar services don't simply resolve short URLs to their long form and store/display them that way. Things like that have been on my mind for a while, but I've only just put some of those thoughts to words: http://maisonbisson.com/blog/post/13719/not-sure-that-rev-canonical-is-really-the-solution/ And from the perspective of linked data, making our applications query the URLs that users submit to them just makes sense. It might seem like science fiction to suggest that Twitter resolve a URL to identify its canonical version and RDF that enriches the tweet, but Facebook's link sharing actually does that (though it looks for meta tags rather than RDF). --Casey ...rather than creating a new link rel=canonical and BTW their strategy only works in HTML, it doesn't work in RDF, JSON, XML, etc...