Re: RSS Aggregation?

2006-04-20 Thread Rob Wilkerson
That could be. I vaguely remember that the spec wasn't final, but was very, very close. This was probably 6 months ago or so. Maybe that timeline will tell you something. And I did use the wiki several times so who knows what insanity I mixed in. :-) I'll go back and take a look at my code

Re: RSS Aggregation?

2006-04-20 Thread Thomas Chiverton
On Wednesday 19 April 2006 15:43, Neil Middleton wrote: Currently the site is aggregating ~500 RSS feeds, but checking these feeds is growing to be a pain in the butt. Having to get CF to check each of these feeds regulary (ideally every 15 minutes) is more difficult than it sounds. Why not:

Re: RSS Aggregation?

2006-04-20 Thread Neil Middleton
Well, what I have gone for as an interim is something similar. Every feed has a lastCheck time, and every minute, the app checks the oldest 10 feeds. Therefore each feed should get checked roughly hourly. Seems to be working well at the moment, I'll consider dropping the frequency once I know

Re: RSS Aggregation?

2006-04-20 Thread Rob Wilkerson
So, based on Roger's comments, I checked my code and, in fact, there is no Atom 1.0 element analogous to TTL. And I didn't make one up for inclusion in my code. :-) Thanks for the clarification. On 4/20/06, Roger Benningfield [EMAIL PROTECTED] wrote: I surely thought I remembered one from

Re: RSS Aggregation?

2006-04-19 Thread Roger Benningfield
Currently the site is aggregating ~500 RSS feeds, but checking these feeds is growing to be a pain in the butt. Having to get CF to check each of these feeds regulary (ideally every 15 minutes) is more difficult than it sounds. Neil: Polling every fifteen minutes is an enormous waste of CPU and

Re: RSS Aggregation?

2006-04-19 Thread Rob Wilkerson
Neil - To prevent over-polling and, as Roger pointed out, potentially getting your IP blocked, consider Etag/If-None-Match headers as well as the Last-Modified/If-Modified-Since headers: 1. When you retrieve a feed, store the ETag and Last-Modified response headers 2. When you next poll the

Re: RSS Aggregation?

2006-04-19 Thread Neil Middleton
Thanks guys for this...it's been plugging some of the gaps I hadn't really considered. The site is very much in development still, even if it doesn't appear to be changing on the surface. I will definitely consider using the last modified headers. I didn't realise you could retreive them

Re: RSS Aggregation?

2006-04-19 Thread Rob Wilkerson
I still can't help with the unreliable gateways, but a couple of things to note. First, the last modified header will not help you retrieve a partial feed. It will still retrieve the entire feed, but only if the feed has changed (better than nothing). You're right on your second point. The

RE: RSS Aggregation?

2006-04-19 Thread Munson, Jacob
Neil: Polling every fifteen minutes is an enormous waste of CPU and bandwidth... for both you and the source sites. For example, if you're aggregating individual blogs, once every 24 hours will cover the vast majority just fine. I disagree. RSS was originally built as a solution to

Re: RSS Aggregation?

2006-04-19 Thread Rob Wilkerson
I'm going to chime in somewhere between Jacob and Roger. With the bandwidth saving features I mentioned earlier, I'd say 30 mins to an hour should be sufficient in almost every case. Of course, now we're just stating opinions, but that's mine. For what it's worth... On 4/19/06, Munson, Jacob

Re: RSS Aggregation?

2006-04-19 Thread Neil Middleton
This is what I was thinking. Sure there may well be some optimisations that can be done for the process, but I agree that RSS is something that should be checked often. Which brings me back to my original problem... How do I go about getting this data checked, parsed and dumped into the db

Re: RSS Aggregation?

2006-04-19 Thread Roger Benningfield
One thing I have noticed with the CF community is that the RSS feeds that are published seem to be all over the place, some doing things one way, some doing it another. Neil: Pete Freitag and I have both published tips for getting CF-based feeds to provide the correct headers and HTTP

Re: RSS Aggregation?

2006-04-19 Thread Roger Benningfield
RSS was originally built as a solution to provide near real-time updates on a site. Jacob: The blogosphere has mechanisms to handle real-time updates, and syndication feeds ain't one of 'em. Never has been. Fullasagoog polls every 15 minutes, not sure how often MXNA does it. If all they're

Re: RSS Aggregation?

2006-04-19 Thread Roger Benningfield
I know there is a TTL equivalent in Atom 1.0/RSS 1.0... Rob: Nope, there's no ttl equivalent in Atom 1.0. Someone brought up ttl/skipHours/skipDays in the IETF WG (or the pre-IETF group) at one point, and the consensus was that the elements are seldom used, not well understood when they *are*

Re: RSS Aggregation?

2006-04-19 Thread Rob Wilkerson
Thanks for the clarification. I surely thought I remembered one from when I built my reader. Hardly the first time my memory has failed me. On 4/19/06, Roger Benningfield [EMAIL PROTECTED] wrote: I know there is a TTL equivalent in Atom 1.0/RSS 1.0... Rob: Nope, there's no ttl equivalent in

Re: RSS Aggregation?

2006-04-19 Thread Roger Benningfield
I surely thought I remembered one from when I built my reader. Rob: Could it have been one of the interim specs you were looking at? 'Cause there was all kinds of odd stuff in there at certain points... particularly in the pre-IETF drafts. In addition, there was (and is) a lotta stuff on the