http://cflib.org/udf/removeHTML
http://cflib.org/udf/stripHTML
http://cflib.org/udf/tagStripper
On Wed, Apr 29, 2009 at 4:50 AM, Robert Rawlins
robert.rawl...@thinkbluemedia.co.uk wrote:
Hey Chaps,
I've been doing a little work with some RSS feeds of late, and on the most
part all is very well, now, the one problem I'm running into is people who
publish RSS feeds containing lots of junk HTML (urgh!), like inline links,
images, divs and whatnot in the description content of the feed.
I only want to have the plain text version of these feeds and not all the
other junk. This means stripping out the html tags div, a etc, some of
which are being published as lt; and gt;. Also, I want to convert HTML
formatted characters into their nice plain text equivilants, for instance
making amp; just a standard .
Now presumably this can all be done with REGEX (I couldn't find any nice
built in CF functions) however my skills in this area are pretty much
non-existent, however I know some of you are fairly experienced with this
kind of thing.
I'm also hoping that I'll be able to do some form of REGEX related 'find'
on the rules first so I can say to the user 'this feed appears to contain
lots of redundant crap, would you like it cleaned for you? this may cause
formatting issues.' or something to that effect, I can then process the
replace rules if they choose to do so.
I'd appreciate any advice.
Rob
~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to
date
Get the Free Trial
http://ad.doubleclick.net/clk;207172674;29440083;f
Archive:
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:322055
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4