Brian Suda wrote:

But this isn´t unique to microformats, other semantic technologies
would have this issue as well.

FOAF (and RDF more generally) has a set of well-established conventions for merging data. Certain properties are taken to be what is called "inverse functional properties" (IFPs) - what that means in English is that if P is an IFP, and two people have a property P with the same value, then they're really the same person.

foaf:mbox is for example defined as an IFP - each mailbox marked up with foaf:mbox belongs to exactly one person. If two people share a foaf:mbox, then they are the same person, so their data can be merged. (I know what you're thinking... there are people who share a mailbox, so doesn't this break? In theory, no it doesn't break - the specification says that it's for "personal mailboxes" only, "ie. an Internet mailbox associated with exactly one owner". In practice, people occasionally ignore the spec, but for the most part it works well.) There are other IFPs too, such as foaf:jabberID, foaf:openid, etc.

So, for hCard/vCard, what are candidates for IFPs? We've discussed "uid" before, and the general agreement is that that should be fairly safe. "Photo" looks like it might be a good candidate to begin with, and probably will do in practice, but in theory the vCard spec defines it far too loosely - two people could allowably have the same photo. "Key" is pretty much in the same bucket as "photo", but is probably less useful as few people use it anyway. So really, "uid" is just about it - shame not many people use that either.

wouldn't you just keep a list of the pages you have already
crawled? So if you find a tagcloud on page /item1.html and it links to
/tags/tag1 then on page item2.htm you re-find the tag cloud which
links to /tags/tag1 you don't follow it again?


I don't think that that's quite André's point. A lot of blogs have tag clouds - long lists of perhaps a hundred tags, in various sized fonts which act as jumping off points to other parts of the site. They are not tags in the rel=tag sense of the word in that they do not describe the content of the current page, but of the site as a whole. People should not be marking them with rel=tag, but nonetheless some people do. And it means that essentially every single page on their site has the same massive set of tags - rel=tag becomes useless on the whole site.

--
Toby A Inkster
<mailto:m...@tobyinkster.co.uk>
<http://tobyinkster.co.uk>




_______________________________________________
microformats-new mailing list
microformats-new@microformats.org
http://microformats.org/mailman/listinfo/microformats-new

Reply via email to