Thanks guys for this...it's been plugging some of the gaps I hadn't really 
considered.  The site is very much in development still, even if it doesn't 
appear to be changing on the surface.

I will definitely consider using the last modified headers.  I didn't realise 
you could retreive them without pulling down the whole file.  That in itself 
would alleviate things massively.

One thing I have noticed with the CF community is that the RSS feeds that are 
published seem to be all over the place, some doing things one way, some doing 
it another.  This doesn't help when trying to write a spider ;-)

Does anyone have any possible insight into why the async gateways might by 
unreliable?

>Neil -
>
>To prevent over-polling and, as Roger pointed out, potentially getting
>your IP blocked, consider Etag/If-None-Match headers as well as the
>Last-Modified/If-Modified-Since headers:
>
>1.  When you retrieve a feed, store the ETag and Last-Modified response headers
>2.  When you next poll the feed, only retrieve those feeds that have
>been updated
>
><cfhttp        url="#variables.feedURL#"
>       method="GET"
>       useragent="feedsquirrel.com (or whatever)"
>       throwonerror="yes"
>>
>       <cfhttpparam    type="header"
>                               name="If-None-Match"
>                               value="#variables.storedEtagValue#"
>       />
>       <cfhttpparam    type="header"
>                               name="If-Modified-Since"
>                               value="#variables.storedLastModifiedValue#"
>       />
></cfhttp>
>
>A nice way to reduce bandwidth consumption and be respectful of the
>host server/feed author.  A couple of additional suggestions:
>
>1.  Provide a user agent that allows a host server to know where the
>request is coming from and, if the feel it necessary, block that
>request.
>2.  Respect the feed authors TTL value (in the case of an RSS 2.0
>feed).  Don't update the feed any more often than requested in this
>value (if there is one).
>3.  Again, in the case of RSS 2.0 feeds, respect any skipDays and
>skipHours values.  Don't poll on Sundays if the author has told you
>that the feed won't be updated on Sundays.
>
>I know there is a TTL equivalent in Atom 1.0/RSS 1.0, but honestly
>can't remember what it is.  If you look at the specs, it should jump
>out.  It's been a while since I wrote the feed aggregator that is
>embedded in the product I build.  I don't recall there being a decent
>equivalent for RSS 1.0 or Atom 1.0 for skipDays and skipHours.
>
>On 4/19/06, Roger Benningfield <[EMAIL PROTECTED]> wrote:
>>

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Message: http://www.houseoffusion.com/lists.cfm/link=i:4:238130
Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Donations & Support: http://www.houseoffusion.com/tiny.cfm/54

Reply via email to