>Currently the site is aggregating ~500 RSS feeds, but checking these feeds
>is growing to be a pain in the butt.  Having to get CF to check each of
>these feeds regulary (ideally every 15 minutes) is more difficult than it
>sounds.

Neil: Polling every fifteen minutes is an enormous waste of CPU and 
bandwidth... for both you and the source sites. For example, if you're 
aggregating individual blogs, once every 24 hours will cover the vast majority 
just fine. Ideally, you'd either opt for some middle ground (once an hour or 
so), or come up with adaptive code that spaces out polling based upon observed 
update periods.

But even if you're gonna stick with over-polling (a good way to get your IP 
blocked), there are places to optimize:

* Use Conditional GET... since 90% of feeds won't have seen an update in the 
last fifteen minutes, you've saved nearly 90% of your server's effort.

* Make your spider compatible with RFC 3229. It won't help in most cases, but 
some high-flow publishers (Microsoft, etc.) will send you deltas of their 
sliding-window feeds. That'll cut down on parsing time.

* Try CFX_HTTP5 in async mode.

--
Roger Benningfield
http://admin.mxblogspace.journurl.com/

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Message: http://www.houseoffusion.com/lists.cfm/link=i:4:238116
Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
Donations & Support: http://www.houseoffusion.com/tiny.cfm/54

Reply via email to