>>>Can anyone tell me how to remove the xml code, that text pasted from 
>word or works ads. For examlpe <P class=MsoNormal style="MARGIN: 0in 0in 
>0pt">
>
>This is not XML but HTML.
>Any way, to remove it, all you need are regular expressions.
>

To build on Claude's comments, there's a very good UDF at cflib.org called 
DeMoronize:
http://www.cflib.org/udf.cfm?id=725

>From cflib:
--
 Description
Fixes text using Microsoft Latin-1 "Extentions", namely ASCII characters 
128-160. Supplies semicolons where missing in HTML numeric and common 
non-numeric entities.

This is a rough port of John Walker's demoroniser, written in Perl. 
http://www.fourmilab.ch/webtools/demoroniser/

Parameters
Name    Description     Required
text    Text to be modified.    Yes

Return Values
Returns a string.

Example
<cfset MSText = "My name is #Chr(147)#Foo#Chr(148)##Chr(133)#<br>">
<cfoutput>With MS Latin-1 Extentions:<br>#MSText#</cfoutput>

<cfset ValidText = DeMoronize(MSText)>
<cfoutput>Valid ASCII:<br>#ValidText#</cfoutput> 

--

hth,

larry

--
Larry C. Lyons
Web Analyst
BEI Resources
American Type Culture Collection
http://www.beiresources.org
email: llyons(at)atcc(dot)org
tel: 703.365.2700.2678
--


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Enterprise web applications, build robust, secure 
scalable apps today - Try it now ColdFusion Today
ColdFusion 8 beta - Build next generation apps

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:290973
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4

Reply via email to