Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Jonathan Rochkind
The difference between URIs and URLs?  I don't believe that URL is something 
that exists any more in any standard, it's all URIs. Correct me if I'm wrong. 

I don't entirely agree with either dogmatic side here, but I do think that 
we've arrived at an awfully confusing (for developers) environment. Re-reading 
the various semantic web TAG position papers people keep referencing, I 
actually don't entirely agree with all of their principles in practice. 

Jonatan

From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Alexander 
Johannesen [alexander.johanne...@gmail.com]
Sent: Tuesday, April 14, 2009 9:27 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] 
registering info: uris?)

Hiya,

Been meaning to jump into this discussion for a while, but I've been
off to an alternative universe and I can't even say it's good to be
back. :) Anwhoo ...

On Fri, Apr 3, 2009 at 03:48, Ray Denenberg, Library of Congress
r...@loc.gov wrote:
 You're right, if there were a web:  URI scheme, the world would be a
 better place.   But it's not, and the world is worse off for it.

I'm rather confused by this statement. The web: URI scheme? The Web
*is* the URI scheme; they are all identifiers to resources (ftp: http:
gopher: https: etc.), and together they make up, the, um, web of
things. What am I missing?

 Back in the old days, URIs (or URLs)  were protocol based.

No, which one do you mean, URIs or URLs?

 The ftp scheme
 was for retrieving documents via ftp. The telnet scheme was for telnet. And
 so on.

Again, have I missed something? This has changed, as opposed to the
good old days?

 A few years later the semantic web was conceived and alot of SW people began
 coining all manner of http URIs that had nothing to do with the http
 protocol.

I've been browsing back and forth this discussion, and couldn't find
much to back this up. What do you mean by this?

 Instead, they should have bit the bullet and coined a new scheme.  They
 didn't, and that's why we're in the mess we're in.

I'm sorry, but mess? Did you know the messiness of the web is
probably what made it successful? Not to mention that having URIs be
identifiers *and* have the ability to resolve them is a bonus; they're
identifiers of things (as they've always been, as I'm sure you know
URI stands for Unified Resource Identifier, right? :), as in they
consists of a string of characters used to identify or name a resource
on the Internet. And then, if you so choose, you can use the protocol
level to *resolve* them. Not sure how anyone can consider this to be
bad, though.

Or is this just a misunderstanding of the difference between URIs and URLs?


Kind regards,

Alexander
--
---
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
-- http://shelter.nu/blog/ 


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Ray Denenberg, Library of Congress

From: Jonathan Rochkind rochk...@jhu.edu


The difference between URIs and URLs?  I don't believe that URL is 
something that exists any more in any standard, it's all URIs.


The URL is alive and well.

The W3C definition, http://www.w3.org/TR/uri-clarification/
a URL is a type of URI that identifies a resource via a representation of 
its primary access mechanism (e.g., its network location), rather than by 
some other attributes it may have. Thus as we noted, http: is a URI 
scheme. An http URI is a URL.


SRU, for example, considers it's request to be  URL.

I do think this conversation has played itself out.   --Ray


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Alexander Johannesen
On Tue, Apr 14, 2009 at 23:34, Jonathan Rochkind rochk...@jhu.edu wrote:
 The difference between URIs and URLs?  I don't believe that URL is 
 something that exists any more in any standard, it's all URIs. Correct me if 
 I'm wrong.

Sure it exists: URLs are a subset of URIs. URLs are locators as
opposed to just identifiers (which is an important distinction, much
used in SemWeb lingo), where URLs are closer to the protocol like
things Ray describe (or so I think).

 I don't entirely agree with either dogmatic side here, but I do think that 
 we've arrived at an
 awfully confusing (for developers) environment.

But what about it is confusing (apart from us having this discussion
:) ? Is it that we have IDs that happens to *also* resolve? And why is
that confusing?

 Re-reading the various semantic web TAG position papers people keep
 referencing, I actually don't entirely agree with all of their principles in 
 practice.

Well, let me just say that there's more to SemWeb than what comes out of W3C. :)


Kind regards,

Alex
-- 
---
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
-- http://shelter.nu/blog/ 


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Jonathan Rochkind
Can you show me where this definition of a URL vs. a URI is made in any RFC 
or standard-like document?

Sure, we have a _sense_ of how the connotation is different, but I don't think 
that sense is actually formalized anywhere. And that's part of what makes it 
confusing, yeah.  I think the sem web crowd actually embraces this 
confusingness, they want to have it both ways: Oh, a URI doesn't need to 
resolve, it's just an opaque identifier; but you really should use http URIs 
for all URIs; why? because it's important that they resolve. 

In general, combining two functions in one mechanism is a dangerous and 
confusing thing to do in data design, in my opinion. By analogy, it's what gets 
a lot of MARC/AACR2 into trouble.  It's also often a very convenient thing to 
do, and convenience matters. Although ironically, my problem with some of those 
TAG documents is actually that they privilege pure theory over practical 
convenience. 

Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08-17.html

They suggest: URI opacity'Agents making use of URIs SHOULD NOT attempt to 
infer properties of the referenced resource.'

I understand why that makes sense in theory, but it's entirely impractical for 
me, as I discovered with the SuDoc experiment (which turned out to be a useful 
experiment at least in understanding my own requirements).  If I get a URI 
representing (eg) a Sudoc (or an ISSN, or an LCCN), I need to be able to tell 
from the URI alone that it IS a Sudoc, AND I need to be able to extract the 
actual SuDoc identifier from it.  That completely violates their Opacity 
requirement, but it's entirely infeasible to require me to make an individual 
HTTP request for every URI I find, to figure out what it IS.  Infeasible for 
performance and cost reasons, and infeasible because it requires a lot more 
development effort at BOTH ends -- it means that every single URI _would_ have 
to de-reference to an RDF representation capable of telling me it identifies a 
SuDoc and what the acutal bare SuDoc is. Contrary to the protestations that a 
URI is different than a URL and does not need to resolve, foll!
 owing the opacity recommendation/requirement would mean that resolution 
would be absolutely required in order for me to use it.   Meaning that someone 
minting the URI would have to provide that infrastructure, and I as a client 
would have to write code to use it.  

But I just want a darn SuDoc in a URI -- and there are advantages to putting a 
SuDoc in a URI _precisely_ so it can be used in URI-using infrastructures like 
RDF, and these advantages hold _even if_ it's not resolvable and we ignore the 
'opacity' reccommendation. There are trade-offs.  I think a lot of that TAG 
stuff privileges the theoretically pure over the on the ground practicalities. 
They've got a great fantasy in their heads of what the semantic web _could_ be, 
and I agree it's theoretically sound and _could_ be; but you've got to make it 
convenient and cheap if you actually want it to happen for real, sometimes 
sacrificing theoretical purity.   And THAT'S one important lesson of the 
success of the WWW. 

Jonathan


From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Alexander 
Johannesen [alexander.johanne...@gmail.com]
Sent: Tuesday, April 14, 2009 9:48 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] 
registering info: uris?)

On Tue, Apr 14, 2009 at 23:34, Jonathan Rochkind rochk...@jhu.edu wrote:
 The difference between URIs and URLs?  I don't believe that URL is 
 something that exists any more in any standard, it's all URIs. Correct me if 
 I'm wrong.

Sure it exists: URLs are a subset of URIs. URLs are locators as
opposed to just identifiers (which is an important distinction, much
used in SemWeb lingo), where URLs are closer to the protocol like
things Ray describe (or so I think).

 I don't entirely agree with either dogmatic side here, but I do think that 
 we've arrived at an
 awfully confusing (for developers) environment.

But what about it is confusing (apart from us having this discussion
:) ? Is it that we have IDs that happens to *also* resolve? And why is
that confusing?

 Re-reading the various semantic web TAG position papers people keep
 referencing, I actually don't entirely agree with all of their principles in 
 practice.

Well, let me just say that there's more to SemWeb than what comes out of W3C. :)


Kind regards,

Alex
--
---
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
-- http://shelter.nu/blog/ 


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Jonathan Rochkind
Thanks Ray. By that definition ALL http URIs are URLs, a priori.  I read 
Alexander as trying to make a different distinction.


Ray Denenberg, Library of Congress wrote:

From: Jonathan Rochkind rochk...@jhu.edu


  
The difference between URIs and URLs?  I don't believe that URL is 
something that exists any more in any standard, it's all URIs.



The URL is alive and well.

The W3C definition, http://www.w3.org/TR/uri-clarification/
 a URL is a type of URI that identifies a resource via a representation of 
its primary access mechanism (e.g., its network location), rather than by 
some other attributes it may have. Thus as we noted, http: is a URI 
scheme. An http URI is a URL.


SRU, for example, considers it's request to be  URL.

I do think this conversation has played itself out.   --Ray

  


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Jonathan Rochkind
 Sent: Tuesday, April 14, 2009 10:21 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] resolution and identification (was Re:
 [CODE4LIB] registering info: uris?)
 
 Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08-
 17.html
 
 They suggest: URI opacity'Agents making use of URIs SHOULD NOT
 attempt to infer properties of the referenced resource.'
 
 I understand why that makes sense in theory, but it's entirely
 impractical for me, as I discovered with the SuDoc experiment (which
 turned out to be a useful experiment at least in understanding my own
 requirements).  If I get a URI representing (eg) a Sudoc (or an ISSN,
 or an LCCN), I need to be able to tell from the URI alone that it IS a
 Sudoc, AND I need to be able to extract the actual SuDoc identifier
 from it.  That completely violates their Opacity requirement, but it's
 entirely infeasible to require me to make an individual HTTP request
 for every URI I find, to figure out what it IS.

Jonathan, you need to take URI opacity in context.  The document is correct
in suggesting that user agents should not attempt to infer properties of
the referenced resource.  The Architecture of the Web is also clear on this
point and includes an example.  Just because a resource URI ends in .html
does not mean that HTML will be the representation being returned.  The
user agent is inferring a property by looking at the end of the URI to see
if it ends in .html, e.g., that the Web Document will be returning HTML.  If 
you really want to know for sure you need to dereference it with a HEAD 
request.

Now having said that, URI opacity applies to user agents dealing with *any*
URIs that they come across in the wild.  They should not try to infer any
semantics from the URI itself.  However, this doesn't mean that the minter
of a URI cannot create a policy decision for a group of URIs under their
control that contain semantics.  In your example, you made a policy 
decision about the URIs you were minting for SUDOCs such that the actual
SUDOC identifier would appear someplace in the URI.  This is perfectly
fine and is the basis for REST URIs, but understand you created a specific
policy statement for those URIs, and if a user agent is aware of your policy
statements about the URIs you mint, then they can infer semantics from
the URIs you minted.

Does that break URI opacity from a user agents perspective?  No.  It just
means that those user agents who know about your policy can infer semantics
from your URIs and those that don't should not infer any semantics because
they don't know what the policies are, e.g., you could be returning PDF
representations when the URI ends in .html, if that was your policy, and
the only way for a user agent to know that is to dereference the URI with 
either HEAD or GET when they don't know what the policies are.


Andy.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Jonathan Rochkind
Am I not an agent making use of a URI who is attempting to infer 
properties from it? Like that it represents a SuDoc, and in particular 
what that SuDoc is?


If this kind of talmudic parsing of the TAG reccommendations to figure 
out what they _really_ mean is neccesary, I stand by my statement that 
the environment those TAG documents are encouraging is a confusing one.


Jonathan

Houghton,Andrew wrote:

From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Jonathan Rochkind
Sent: Tuesday, April 14, 2009 10:21 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] resolution and identification (was Re:
[CODE4LIB] registering info: uris?)

Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08-
17.html

They suggest: URI opacity'Agents making use of URIs SHOULD NOT
attempt to infer properties of the referenced resource.'

I understand why that makes sense in theory, but it's entirely
impractical for me, as I discovered with the SuDoc experiment (which
turned out to be a useful experiment at least in understanding my own
requirements).  If I get a URI representing (eg) a Sudoc (or an ISSN,
or an LCCN), I need to be able to tell from the URI alone that it IS a
Sudoc, AND I need to be able to extract the actual SuDoc identifier
from it.  That completely violates their Opacity requirement, but it's
entirely infeasible to require me to make an individual HTTP request
for every URI I find, to figure out what it IS.



Jonathan, you need to take URI opacity in context.  The document is correct
in suggesting that user agents should not attempt to infer properties of
the referenced resource.  The Architecture of the Web is also clear on this
point and includes an example.  Just because a resource URI ends in .html
does not mean that HTML will be the representation being returned.  The
user agent is inferring a property by looking at the end of the URI to see
if it ends in .html, e.g., that the Web Document will be returning HTML.  If 
you really want to know for sure you need to dereference it with a HEAD 
request.


Now having said that, URI opacity applies to user agents dealing with *any*
URIs that they come across in the wild.  They should not try to infer any
semantics from the URI itself.  However, this doesn't mean that the minter
of a URI cannot create a policy decision for a group of URIs under their
control that contain semantics.  In your example, you made a policy 
decision about the URIs you were minting for SUDOCs such that the actual

SUDOC identifier would appear someplace in the URI.  This is perfectly
fine and is the basis for REST URIs, but understand you created a specific
policy statement for those URIs, and if a user agent is aware of your policy
statements about the URIs you mint, then they can infer semantics from
the URIs you minted.

Does that break URI opacity from a user agents perspective?  No.  It just
means that those user agents who know about your policy can infer semantics
from your URIs and those that don't should not infer any semantics because
they don't know what the policies are, e.g., you could be returning PDF
representations when the URI ends in .html, if that was your policy, and
the only way for a user agent to know that is to dereference the URI with 
either HEAD or GET when they don't know what the policies are.



Andy.

  


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Joe Atzberger
The User Agent is understood to be a typical browser, or other piece of
software, like wget, curl, etc.  It's the thing implementing the client side
of the specs.  I don't think you are operating as a user agent here as
much as you are a server application.  That is, assuming I have any idea
what you're actually doing.

--Joe

On Tue, Apr 14, 2009 at 11:27 AM, Jonathan Rochkind rochk...@jhu.eduwrote:

 Am I not an agent making use of a URI who is attempting to infer properties
 from it? Like that it represents a SuDoc, and in particular what that SuDoc
 is?

 If this kind of talmudic parsing of the TAG reccommendations to figure out
 what they _really_ mean is neccesary, I stand by my statement that the
 environment those TAG documents are encouraging is a confusing one.

 Jonathan


 Houghton,Andrew wrote:

 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Jonathan Rochkind
 Sent: Tuesday, April 14, 2009 10:21 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] resolution and identification (was Re:
 [CODE4LIB] registering info: uris?)

 Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08-
 17.html

 They suggest: URI opacity'Agents making use of URIs SHOULD NOT
 attempt to infer properties of the referenced resource.'

 I understand why that makes sense in theory, but it's entirely
 impractical for me, as I discovered with the SuDoc experiment (which
 turned out to be a useful experiment at least in understanding my own
 requirements).  If I get a URI representing (eg) a Sudoc (or an ISSN,
 or an LCCN), I need to be able to tell from the URI alone that it IS a
 Sudoc, AND I need to be able to extract the actual SuDoc identifier
 from it.  That completely violates their Opacity requirement, but it's
 entirely infeasible to require me to make an individual HTTP request
 for every URI I find, to figure out what it IS.



 Jonathan, you need to take URI opacity in context.  The document is
 correct
 in suggesting that user agents should not attempt to infer properties of
 the referenced resource.  The Architecture of the Web is also clear on
 this
 point and includes an example.  Just because a resource URI ends in .html
 does not mean that HTML will be the representation being returned.  The
 user agent is inferring a property by looking at the end of the URI to see
 if it ends in .html, e.g., that the Web Document will be returning HTML.
  If you really want to know for sure you need to dereference it with a HEAD
 request.

 Now having said that, URI opacity applies to user agents dealing with
 *any*
 URIs that they come across in the wild.  They should not try to infer any
 semantics from the URI itself.  However, this doesn't mean that the minter
 of a URI cannot create a policy decision for a group of URIs under their
 control that contain semantics.  In your example, you made a policy
 decision about the URIs you were minting for SUDOCs such that the actual
 SUDOC identifier would appear someplace in the URI.  This is perfectly
 fine and is the basis for REST URIs, but understand you created a specific
 policy statement for those URIs, and if a user agent is aware of your
 policy
 statements about the URIs you mint, then they can infer semantics from
 the URIs you minted.

 Does that break URI opacity from a user agents perspective?  No.  It just
 means that those user agents who know about your policy can infer
 semantics
 from your URIs and those that don't should not infer any semantics because
 they don't know what the policies are, e.g., you could be returning PDF
 representations when the URI ends in .html, if that was your policy, and
 the only way for a user agent to know that is to dereference the URI with
 either HEAD or GET when they don't know what the policies are.


 Andy.






[CODE4LIB] code 4 museums

2009-04-14 Thread Ethan Gruber
Hi all,

I've been a software developer in a research library for several years, and
I have worked with objects typically viewed as museum collections to a large
degree (particularly ancient coins and eighteenth century European sheet
music).  Since I'm from a library and am familiar with library technological
standards as far as metadata practices and software applications go, I tend
to apply library standards toward the museum collections I have been in
contact with--which involves Encoded Archival Description for metadata,
opensource applications like tomcat, cocoon, and lucene/solr.  My knowledge
of museum practices is fairly limited, but I have noticed that many museums
have tended to adopt proprietary databases to describe their collections.  I
feel museums tend to lag behind their library counterparts with respect to
the adoption of opensource frameworks and open standards, but if you think
about it, museums are scarcely different than many archives/special
collections libraries in content and organization.  I'm thinking of
PastPerfect in particular.  It's quite common in the museum world and costs
almost $1000 per license.

I'm wondering if anyone else on code4lib actually works for a museum or has
first-hand experience in providing access to museum collections and has
noticed the same general differences between libraries and museums that I
have.

Ethan Gruber
University of Virginia Library


Re: [CODE4LIB] code 4 museums

2009-04-14 Thread Grace Agnew
Ethan,

Mellon funded a project, CollectionSpace that addresses the needs of
museums specifically.  The Rutgers bibliographic utility, OpenMIC, which I
hope will finally go open source in May, also supports the needs of
museums in terms of rights and provenance information.  We designed the
utility to support a statewide consortium of libraries, museums,
historical societies and archives.  The museums were the most specific
about their needs for source, technical and rights metadata, and we tried
to address their needs in our METS implementation.

Grace Agnew
Rutgers University Libraries

 Hi all,

 I've been a software developer in a research library for several years,
 and
 I have worked with objects typically viewed as museum collections to a
 large
 degree (particularly ancient coins and eighteenth century European sheet
 music).  Since I'm from a library and am familiar with library
 technological
 standards as far as metadata practices and software applications go, I
 tend
 to apply library standards toward the museum collections I have been in
 contact with--which involves Encoded Archival Description for metadata,
 opensource applications like tomcat, cocoon, and lucene/solr.  My
 knowledge
 of museum practices is fairly limited, but I have noticed that many
 museums
 have tended to adopt proprietary databases to describe their collections.
 I
 feel museums tend to lag behind their library counterparts with respect to
 the adoption of opensource frameworks and open standards, but if you think
 about it, museums are scarcely different than many archives/special
 collections libraries in content and organization.  I'm thinking of
 PastPerfect in particular.  It's quite common in the museum world and
 costs
 almost $1000 per license.

 I'm wondering if anyone else on code4lib actually works for a museum or
 has
 first-hand experience in providing access to museum collections and has
 noticed the same general differences between libraries and museums that I
 have.

 Ethan Gruber
 University of Virginia Library



Re: [CODE4LIB] Something completely different

2009-04-14 Thread stuart yeates

Alexander Johannesen wrote:

We currently use topic maps, alot, in our infrastructure. If we were
starting again tomorrow, I'd advocate using RDF instead, mainly because of
the much better tool support and take-up.


Hmm, not a good thing at all. Could you elaborate, though, as I use it
too as part of infrastructure too, and wouldn't touch RDF / SemWeb
without a long stick? I'm into application semantics and shared
knowledge-bases. What are you guys doing where you feel the support
and tools are lacking? And what are the RDF alternatives?


RDF, unlike topic maps, is being used by substantial numbers of people 
who we interact with in the real world and would like to interoperate 
with. If we used RDF rather than topic maps internally, that 
interoperability would be much, much cheaper. It's tempting to say it's 
free, but it's not quite, because it does impose some constraints.


In my eyes, the core thing that RDF supports that topic maps don't seem 
to is seamless reuse by people you don't care about.


For example the people at http://lcsubjects.org have never heard of us 
(that I know of), but we can use their URLs like 
http://lcsubjects.org/subjects/sh90005545#concept to represent our roles.


cheers
stuart
--
Stuart Yeates
http://www.nzetc.org/   New Zealand Electronic Text Centre
http://researcharchive.vuw.ac.nz/ Institutional Repository


Re: [CODE4LIB] Anyone else watching rev=canonical?

2009-04-14 Thread Jonathan Rochkind

Wait, is this the same or different than link rel=canonical, as in:

http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html

link rel=canonical seemed like a good idea to me.  But when I start 
reading some of those URLs, it's not clear to me if they're talking 
about the same thing or not.


Jonathan

Brett Bonfield wrote:

Summary: URL shortening services, such as TinyURL, are a problem. The
folks who have proposed rev=canonical have written some useful
software around it, but rev=canonical has some potentially
insurmountable issues.

I suggest the following posts if you find this at all interesting:

The post that drew attention to URL shorteners (by the creator of del.icio.us)
http://joshua.schachter.org/2009/04/on-url-shorteners.html

A summary of the work on rev=canonical, with good links and also a new
bookmarklet
http://simonwillison.net/2009/Apr/11/revcanonical/

An interesting post that makes the case for rev=canonical
http://adactio.com/journal/1568

An interesting post that makes the case against rev=canonical
http://www.mnot.net/blog/2009/04/14/rev_canonical_bad

I (used to) like rev=canonical”
http://decafbad.com/blog/2009/04/13/i-like-revcanonical

An interesting assessment of the issues involved
http://intertwingly.net/blog/2009/04/14/Canonical-Reverse-Or-Wisdom-Defying-Shorturl

I'm not sure what happens now, but I hope the conversation results
quickly in as much software as is needed.

Brett

Brett Bonfield
Director
Collingswood Public Library
bonfi...@collingswoodlib.org
856.858.0649
  


Re: [CODE4LIB] Anyone else watching rev=canonical?

2009-04-14 Thread Brett Bonfield
On Tue, Apr 14, 2009 at 5:30 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 Wait, is this the same or different than link rel=canonical, as in:

 http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html

 link rel=canonical seemed like a good idea to me.  But when I start
 reading some of those URLs, it's not clear to me if they're talking about
 the same thing or not.

Different. Which is one of the problems with rev=canonical.

Brett


Re: [CODE4LIB] Anyone else watching rev=canonical?

2009-04-14 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Brett Bonfield
 Sent: Tuesday, April 14, 2009 6:48 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Anyone else watching rev=canonical?
 
 On Tue, Apr 14, 2009 at 5:53 PM, Houghton,Andrew hough...@oclc.org
 wrote:
  From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf
 Of
  Brett Bonfield
 
  Different. Which is one of the problems with rev=canonical.
 
  Another issue is that Google, Microsoft, et al. couldn't see that
 their
  proposal was already taken care of by HTTP with its Content-Location
  header and that if they wanted people to embed the canonical URI into
  their HTML that they could have easily done:
 
  meta http-equiv=Content-Location content=canonical-URI /
 
  rather than creating a new link rel=canonical and BTW their
 strategy
  only works in HTML, it doesn't work in RDF, JSON, XML, etc., but
 using
  HTTP as it was intended, e.g., Content-Location header, it works for
  all media types.
 
 Similar issues are arising with the proposed rev=canonical. That is,
 there are different ways to provide the info that rev=canonical is
 providing.
 
 However, just to be clear, rev=canonical != rel=canonical.
 
 They are discrete responses to distinct issues.

Agreed.  Another issue with rev=canonical is that I don't believe that
rev= is going to be supported in HTML 5.


Andy.


Re: [CODE4LIB] Anyone else watching rev=canonical?

2009-04-14 Thread Brett Bonfield
On Tue, Apr 14, 2009 at 7:10 PM, Houghton,Andrew hough...@oclc.org wrote:
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Brett Bonfield

 However, just to be clear, rev=canonical != rel=canonical.

 They are discrete responses to distinct issues.

 Agreed.  Another issue with rev=canonical is that I don't believe that
 rev= is going to be supported in HTML 5.

That's correct. As a couple of the posts I pointed to mention, the
current plan isn't simply to deprecate rev (and then explain why rev
has been deprecated) but to omit it completely.

Brett


Re: [CODE4LIB] Something completely different

2009-04-14 Thread Alexander Johannesen
On Wed, Apr 15, 2009 at 07:10, stuart yeates stuart.yea...@vuw.ac.nz wrote:
 RDF, unlike topic maps, is being used by substantial numbers of people who
 we interact with in the real world and would like to interoperate with. If
 we used RDF rather than topic maps internally, that interoperability would
 be much, much cheaper. It's tempting to say it's free, but it's not quite,
 because it does impose some constraints.

But it's not that hard to create a bridge from RDF to Topic Maps and
back, no? Or is your interop story different?

 In my eyes, the core thing that RDF supports that topic maps don't seem to
 is seamless reuse by people you don't care about.

Yes, this has been brought up on several occasions, including by me at
the TMRA 2008. But then, it's not so much that RDF does something that
Topic Maps doesn't *support*, it's that it's packaged differently. So,
where RDF has got five standard ontology levels (RDF, RDFS, OWL
DL/Lite/Full) Topic Maps got one simpler one (TMDM), yet neither can
express anything  better or differently than the other.

My theory here is that people *like* 5 layers of RDF, because it gives
the false sensation of choice. But it's all ontological definitions.
However, the 5 levels of RDF does indeed create a defined platform for
sharing (if not cast in iron), in which in the TM world you need to
include it / create it.

Oh, and of course the academics seem to have embraced W3C and anything
by the authority of TBL, and its effect is trickling down.

 For example the people at http://lcsubjects.org have never heard of us (that
 I know of), but we can use their URLs like
 http://lcsubjects.org/subjects/sh90005545#concept to represent our roles.

Not sure I understand your example. Here's my Topic Map identifier in
a Topic Map ;

   http://psi.ontopedia.net/Alexander_Johannesen

Identifier and locator, and resolvable, and can be used by anyone.


Regards,

Alex
-- 
---
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
-- http://shelter.nu/blog/ 


Re: [CODE4LIB] code 4 museums

2009-04-14 Thread Hilmar Lapp

There is the Specify software for natural history collections:

http://specifysoftware.org/

The source code has apparently just recently been deposited on  
SourceForge.


-hilmar

On Apr 14, 2009, at 3:12 PM, Ethan Gruber wrote:


Hi all,

I've been a software developer in a research library for several  
years, and
I have worked with objects typically viewed as museum collections to  
a large
degree (particularly ancient coins and eighteenth century European  
sheet
music).  Since I'm from a library and am familiar with library  
technological
standards as far as metadata practices and software applications go,  
I tend
to apply library standards toward the museum collections I have been  
in
contact with--which involves Encoded Archival Description for  
metadata,
opensource applications like tomcat, cocoon, and lucene/solr.  My  
knowledge
of museum practices is fairly limited, but I have noticed that many  
museums
have tended to adopt proprietary databases to describe their  
collections.  I
feel museums tend to lag behind their library counterparts with  
respect to
the adoption of opensource frameworks and open standards, but if you  
think

about it, museums are scarcely different than many archives/special
collections libraries in content and organization.  I'm thinking of
PastPerfect in particular.  It's quite common in the museum world  
and costs

almost $1000 per license.

I'm wondering if anyone else on code4lib actually works for a museum  
or has
first-hand experience in providing access to museum collections and  
has
noticed the same general differences between libraries and museums  
that I

have.

Ethan Gruber
University of Virginia Library


--
===
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===


Re: [CODE4LIB] Something completely different

2009-04-14 Thread stuart yeates

Alexander Johannesen wrote:

On Wed, Apr 15, 2009 at 07:10, stuart yeates stuart.yea...@vuw.ac.nz wrote:

For example the people at http://lcsubjects.org have never heard of us (that
I know of), but we can use their URLs like
http://lcsubjects.org/subjects/sh90005545#concept to represent our roles.


Not sure I understand your example. Here's my Topic Map identifier in
a Topic Map ;

   http://psi.ontopedia.net/Alexander_Johannesen

Identifier and locator, and resolvable, and can be used by anyone.


Yes, we mint something very similar (see 
http://authority.nzetc.org/52969/ for mine), but none of our 
interoperability partners do. None of our local libraries, none of our 
local archives and only one of our local museums (by virtue of some work 
we did with them).


All of them publish and most consume some form RDF.

Additionally many of the taxonomies we're interested in are available in 
RDF but not topic maps.


cheers
stuart
--
Stuart Yeates
http://www.nzetc.org/   New Zealand Electronic Text Centre
http://researcharchive.vuw.ac.nz/ Institutional Repository


Re: [CODE4LIB] Something completely different

2009-04-14 Thread Alexander Johannesen
On Wed, Apr 15, 2009 at 10:32, stuart yeates stuart.yea...@vuw.ac.nz wrote:
 Yes, we mint something very similar (see http://authority.nzetc.org/52969/
 for mine), but none of our interoperability partners do. None of our local
 libraries, none of our local archives and only one of our local museums (by
 virtue of some work we did with them).
 All of them publish and most consume some form RDF.

Hmm, RDF resources are just URIs, so I'm still a bit unsure about what
you mean. Are you talking about the fact that the RDF definitions (and
not the RDF vocabs themselves) aren't encoded in your TM engine?

 Additionally many of the taxonomies we're interested in are available in RDF
 but not topic maps.

Converting them to a Topic Map isn't that hard to do, but I guess
there is *a* cost there.


Regards,

Alex
-- 
---
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
-- http://shelter.nu/blog/ 


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Alexander Johannesen
On Wed, Apr 15, 2009 at 00:20, Jonathan Rochkind rochk...@jhu.edu wrote:
 Can you show me where this definition of a URL vs. a URI is made in any 
 RFC or standard-like document?

From http://www.faqs.org/rfcs/rfc3986.html ;

1.1.3.  URI, URL, and URN

   A URI can be further classified as a locator, a name, or both.  The
   term Uniform Resource Locator (URL) refers to the subset of URIs
   that, in addition to identifying a resource, provide a means of
   locating the resource by describing its primary access mechanism
   (e.g., its network location).  The term Uniform Resource Name
   (URN) has been used historically to refer to both URIs under the
   urn scheme [RFC2141], which are required to remain globally unique
   and persistent even when the resource ceases to exist or becomes
   unavailable, and to any other URI with the properties of a name.

   An individual scheme does not have to be classified as being just one
   of name or locator.  Instances of URIs from any given scheme may
   have the characteristics of names or locators or both, often
   depending on the persistence and care in the assignment of
   identifiers by the naming authority, rather than on any quality of
   the scheme.  Future specifications and related documentation should
   use the general term URI rather than the more restrictive terms
   URL and URN [RFC3305].

As you can see, an URI is an identifier, and a URL is a locator
(mechanism for retrieval), and since a URL is a subset of an URI, you
_can_ resolve URIs as well.

 Sure, we have a _sense_ of how the connotation is different, but
 I don't think that sense is actually formalized anywhere.

It is, and the same stuff is documented in WikiPedia as well ;

   http://en.wikipedia.org/wiki/Uniform_Resource_Identifier
   http://en.wikipedia.org/wiki/Uniform_Resource_Locator

 I think the sem web crowd actually embraces this confusingness,

No, I think they take it at face value; they(the URIs)  are
identifiers for things, and can be used for just that purpose, but
they are also URLs which mean they resolve to something. What I think
you're coming at is that something thing it resolves too, as *that*
has no definition. But then, if you go from RDF to Topic Maps PSIs
(PSIs are URIs with an extended meaning), *that* thing it resolves to
indeed has a definition; it's the prose explaining what the identifier
identifies, and this is the most important difference between RDF and
Topic Maps (and a very subtle but important difference, too).

 they want to have it both ways: Oh, a URI doesn't need to resolve,
 it's just an opaque identifier; but you really should use http URIs
 for all URIs; why? because it's important that they resolve.

I smell straw-man. :) But yes, they do want both, as both is in fact a
friggin' smart thing to have. We all deal with identifiers all the
time, in internal as external applications, so why not use an
indetifier scheme that has the added bonus of adding a resolver
mechanism? If you want to be stupid and lock yourself in your limited
world, then using them as just identifiers is fine but perhaps a bit,
well, stupid. But if you want to be smart about it, realizing that
without ontological work there will *never* be proper interop, you use
those identifiers and let them resolve to something. And if you're
really smart, you let them resolve to either more RDF statements, or,
if you're seriously Einsteinly smart, use PSIs (as in Topic Maps) :).

 In general, combining two functions in one mechanism is a
 dangerous and confusing thing to do in data design, in my opinion.

Because ... ?

 By analogy, it's what gets a lot of MARC/AACR2 into trouble.

Hmm, and I thought it was crap design that did that, coupled with poor
metadata constraints and validation channels, untyped fields, poor
tooling, the lack of machine understandability, and the general
library idiom of not invented here. But correct me if I'm wrong. :)

 Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08-17.html

Umm, I'd be wary to take as canon a draft with editorial notes going
back 4 to 5 years that still aren't resolved. In other words, this
document isn't relevant to the real world. Yet.

 They suggest: URI opacity    'Agents making use of URIs SHOULD NOT attempt 
 to infer properties of the referenced resource.'

Well, as a RESTafarian I understand this argument quite well. It's
about not assuming too much from the internal structure of the URI.
Again, it's an identifier, not a scheme such as an URL where structure
is defined. Again, for URIs, don't assume structure because at this
point it isn't an URL.

 If I get a URI representing (eg) a Sudoc (or an ISSN, or an LCCN), I need to
 be able to tell from the URI alone that it IS a Sudoc, AND I need to be able
 to extract the actual SuDoc identifier from it.  That completely violates 
 their
 Opacity requirement

I think you are quite mistaken on this, but before we leap into wheter
the web is suitable for SuDoc I'd 

Re: [CODE4LIB] Anyone else watching rev=canonical?

2009-04-14 Thread Casey Bisson
Google's Matt Cutts tweeted a few days ago that he didn't understand  
why Twitter and similar services don't simply resolve short URLs to  
their long form and store/display them that way.


Things like that have been on my mind for a while, but I've only just  
put some of those thoughts to words:


http://maisonbisson.com/blog/post/13719/not-sure-that-rev-canonical-is-really-the-solution/

And from the perspective of linked data, making our applications query  
the URLs that users submit to them just makes sense. It might seem  
like science fiction to suggest that Twitter resolve a URL to identify  
its canonical version and RDF that enriches the tweet, but Facebook's  
link sharing actually does that (though it looks for meta tags rather  
than RDF).


--Casey


...rather than creating a new link rel=canonical and BTW their  
strategy

only works in HTML, it doesn't work in RDF, JSON, XML, etc...