Re: Autodiscovery in as well as

Nikolas 'Atrus' Coukouma Fri, 06 May 2005 21:41:09 -0700

Phil Ringnalda wrote:

>
> Nikolas 'Atrus' Coukouma wrote:
>
>> Using @rel with any linking element is perfectly valid and has been for
>> years.
>> @rel not being supported for anything other than the link element itself
>> has also been an outstanding bug for just as long. There's lot of debate
>> attached to at least one Mozilla bug (#57399 [1] - filed on 2000-10-20).
>>
>> Can we agree that this should be supported, but currently isn't? Unless
>> there's a compelling reason not to, I think we might as well allow
>> autodiscovery via either element. Any implementation guide should
>> recommend duplicating the information in the interest of autodiscovery
>> actually working.
>>
>> [1] https://bugzilla.mozilla.org/show_bug.cgi?id=57399
>
>
> -1 to saying in the spec that you can use either element, and in the
> guide saying to use both if you want it to work, not just look pretty.


You're absolutely right. I was thing in more immediate terms before, but
if we're going to make this part of the Atom working group, land of
well-defined and reasonable specs, everything should work.

>
> As I remember it, when RSS autodiscovery started this cowpath,
> aggregator developers generally didn't have an SGML parser handy, and
> weren't especially happy about the idea of having to write their own
> HTML parser. Finding one (or a few) of relatively few <link>s in the
> first bit of the document feels a lot easier than having to look at
> every <a> in the whole document.
>
> Now? I'd say most don't have an SGML parser handy, and won't be
> especially happy about writing their own HTML parser. It's fairly rare
> for someone to comment out bits of their <head>, and quite common for
> them to comment out huge swaths of their <body>, including things a
> template came with, like <a href="../xml/index.atom" rel="feed">Atom
> feed</a>, with no thought that something will be seeing and using that
> invisible link with an incorrect path. I added Atom autodiscovery to
> my current aggregator, Feed on Feeds, with a ten second
> copy/paste/change mime-type of the results of it using a regular
> expression on the HTML. If instead I had to correctly parse the entire
> HTML document, I'd... switch to something in Python, I guess.

Is there something wrong with the HTML parsers?
Perl has HTML::Parser
Python has htmllib.py
Ruby has ymHTML and a port of of the Python library called html-parseer
PHP has PHP-HTML
Common Lisp has phtml
The W3C  provides a simple parser written in C

I'm sure I can find more, but I think the above is a sufficiently long
list to illustrate my point.

> Then, since I foolishly took the Firefox bug for better autodiscovery,
> I'll also need to do it where I do have an excellent HTML parser, but
> I have to do it on every single page that every single Firefox user
> loads, whether or not they have any interest in feeds, or subscribed
> to the feed ten thousand loads of that particular page ago. <link> is
> easy, we've got a DOMLinkAdded event and most pages have very few of
> them. <a>? Well, the performance hit probably won't be noticeable on
> most pages.

This is a single XPath query.  Gecko has native support for it. I'm not
sure about the others, but Sarissa is a fine library for DOM
manipulation (including XSLT and XPath) from Javascript and it works
with IE, KHTML, Opera ...

>
> Phil Ringnalda
>
Of course, if your XML library copes with all the errors present in
normal HTML, it's probably nicer to use than any HTML parser.

The point here is that most developers have access to an HTML parser. I
admit that they might need patching, but at least 90% of the work is done.

I'll try to find time to examine each of these libraries and make any
changes needed. Hopefully they're already in good shape or the author is
open to this sort thing. If all else fails, there's forking.

If the problem is ignorance, I'll happily maintain a list. I'm also
willing to write some sample implementations in all of the languages I
listed before and more.

I don't think this is terribly difficult. In fact, I just took a shot at
altering Feeds on Feeds to support this and found it incredibly easy.

patch: http://zaphod.student.umd.edu/~atrus/FoF_mod/a-support.patch
There's other stuff in the same directory there if you want to poke at
it. The changes just use PHP-HTML, which I mentioned earlier.

Cheers,
-Nikolas 'Atrus' Coukouma

Re: Autodiscovery in as well as

Reply via email to