On 1/14/09, André Luís <andreluis...@gmail.com> wrote: > I coded a script that looks at a given page and grabs the rel-tags in > that page. It then counts the occurrences and orders them in > descending order. > > the script is at http://workshop.andr3.net/tageater/ > > this was meant to infer the user's attention profile from the rel-tags... > > the problem starts if I follow the rel-* links. For example the > website macacos.com marks-up the tagcloud with rel-tags on every page,
> So, how to detect repetition in these cases? --- wouldn't you just keep a list of the pages you have already crawled? So if you find a tagcloud on page /item1.html and it links to /tags/tag1 then on page item2.htm you re-find the tag cloud which links to /tags/tag1 you don't follow it again? > So what you're saying is that this falls out of the spec's scope, > right? It should be the parsers adapting their behaviour depending on > their goal? --- probably out of side of the spec, but certainly a best-practices should cover these sorts of issues. > You're right. Do you have a link where I can read more about that > discussion? Thanks. There was discussion about canonical hCards 2 years ago http://microformats.org/discuss/mail/microformats-discuss/2007-January/008265.html I am not sure how helpful any of that discussion was/is to this problem. -brian -- brian suda http://suda.co.uk _______________________________________________ microformats-new mailing list microformats-new@microformats.org http://microformats.org/mailman/listinfo/microformats-new