Mark Nottingham wrote:
Also, if a client doesn't visit for a long time, it will see
http://journals.aol.com/panzerjohn/abstractioneer/atom.xml?
page=2&count=10
and assume it already has all of the entries in it, because it's fetched
that URI before.
Yeah. That's what I was worried about too. The couple of test feeds that
I've subscribed to haven't had any new entries yet so I can't be sure, but
with urls like that I don't see how it can possibly work.
Did you find that algorithm wrong, too hard to understand/implement, or
did you just do a different take on it? Does the approach that you took
end up having the same result?
The problem I had with the algorithm was that it required two passes. The
first pass to gather all the links, starting with the current feed document
and moving back in time through the archives; the second pass to actually
process the documents, starting with the oldest and moving forwards in time.
Either this required retrieving everything twice, or caching every document
retrieved. Neither of those options sounded particularly appealing to me.
My implementation does everything in one pass. I start by processing the
current feed document. If it contains a history link which I haven't seen
before, I'll retrieve and process that document next. Repeat until there are
no more links or I encounter a link that I've seen before. There are subtle
differences in the results that you would get from my algorithm, and
technically what you're suggesting is more accurate, but I don't think the
differences are significant enough to care about.
Other than that, I skip steps 1 and 2, and I default to using the "next"
link relation (with a fallback to "previous" and "prev"). I may consider
adding support for fh:complete at some point, but for now I'm sticking with
Microsoft's cf:treatAs.
Regards
James