On 1/14/09, André Luís <andreluis...@gmail.com> wrote:
 >  I coded a script that looks at a given page and grabs the rel-tags in
 >  that page. It then counts the occurrences and orders them in
 >  descending order.
 >  the script is at http://workshop.andr3.net/tageater/
 >  this was meant to infer the user's attention profile from the rel-tags...
 >  the problem starts if I follow the rel-* links. For example the
 >  website macacos.com marks-up the tagcloud with rel-tags on every page,

>  So, how to detect repetition in these cases?

--- wouldn't you just keep a list of the pages you have already
 crawled? So if you find a tagcloud on page /item1.html and it links to
 /tags/tag1 then on page item2.htm you re-find the tag cloud which
 links to /tags/tag1 you don't follow it again?

 > So what you're saying is that this falls out of the spec's scope,
 >  right? It should be the parsers adapting their behaviour depending on
 >  their goal?

--- probably out of side of the spec, but certainly a best-practices
 should cover these sorts of issues.

 > You're right. Do you have a link where I can read more about that
 >  discussion? Thanks.

There was discussion about canonical hCards 2 years ago

 I am not sure how helpful any of that discussion was/is to this problem.


brian suda

microformats-new mailing list

Reply via email to