>>What do you folks have experience with that might be worth looking at?

It was quite a while ago, but I found there was too many flaws the way 
FCK was designed.
1. FCK only had Word cleanup as an option. If the user doen't use it, 
you will get all the garbage anyway.
2. all other ways to import text (ie: cut'n paste, drag'n drop, 
right-click and paste...)  would bypass the cleaner.
So I designed my own editor in which MSWord cleanup is mandatory.
Like in FCKeditor, I use Regexp in the process, but with many more 
expressions and more cleaning.

Here is the first par of the cleaning function (the whole thing will be 
cut by the CF_talk system)
function cleanWord (html)
    // cleans pasted text from Word
    {
    //alert(html)
    html = html.replace(/<o:p>\s*<\/o:p>/g, "") ;
    html = html.replace(/<o:p>.*?<\/o:p>/g, "") ;
   
    // Remove mso-xxx styles.
    html = html.replace( /\s*mso-[^:]+:[^;"]+;?/gi, "" ) ;

    // Remove margin styles.
    html = html.replace( /\s*MARGIN: 0cm 0cm 0pt\s*;/gi, "" ) ;
    html = html.replace( /\s*MARGIN: 0cm 0cm 0pt\s*"/gi, "\"" ) ;

    html = html.replace( /\s*TEXT-INDENT: 0cm\s*;/gi, "" ) ;
    html = html.replace( /\s*TEXT-INDENT: 0cm\s*"/gi, "\"" ) ;

    html = html.replace( /\s*TEXT-ALIGN: [^\s;]+;?"/gi, "\"" ) ;

    html = html.replace( /\s*PAGE-BREAK-BEFORE: [^\s;]+;?"/gi, "\"" ) ;

    html = html.replace( /\s*FONT-VARIANT: [^\s;]+;?"/gi, "\"" ) ;

    html = html.replace( /\s*tab-stops:[^;"]*;?/gi, "" ) ;
    html = html.replace( /\s*tab-stops:[^"]*/gi, "" ) ;

    html = html.replace( /\s*FONT-FAMILY:[^;"]*;?/gi, "" ) ;
   
    // Remove Class attributes
    html = html.replace(/<(\w[^>]*)\s*class=([^ |>]*)([^>]*)/gi, "<$1$3") ;

    // Remove styles.
    html = html.replace( /<(\w[^>]*)style="([^\"]*)"([^>]*)/gi, "<$1$3" ) ;

    // Remove empty styles.
    html =  html.replace( /\s*style="\s*"/gi, '' ) ;
   
    html = html.replace( /<SPAN[^>]*>\s*&nbsp;\s*<\/SPAN>/gi, '&nbsp;' ) ;
   
    html = html.replace( /<SPAN[^>]*>\s*<\/SPAN>/gi, '' ) ;
   
    // Remove Lang attributes
    html = html.replace(/<(\w[^>]*) lang=([^ |>]*)([^>]*)/gi, "<$1$3") ;
   
    html = html.replace( /<SPAN\s*>([\s\S]*?)<\/SPAN>/gi, '$1' ) ;
    html = html.replace( /<SPAN\s*>([\s\S]*?)<\/SPAN>/gi, '$1' ) ;
    html = html.replace( /<SPAN\s*>([\s\S]*?)<\/SPAN>/gi, '$1' ) ;
   
    // remove all font tags
    html = html.replace( /<\/?FONT[^>]*>/gi, '' ) ;
    html = html.replace( /<\/?FONT[^>]*>/gi, '' ) ;
    html = html.replace( /<\/?FONT[^>]*>/gi, '' ) ;
    html = html.replace( /<\/?DIV([^>]*)>/gi, '' ) ;

    // Remove XML elements and declarations
    html = html.replace(/<\\?\?xml[^>]*>/gi, "") ;
   
    // Remove Tags with XML namespace declarations: <o:p></o:p>
    html = html.replace(/<\/?\w+:[^>]*>/gi, "") ;
   
    html = html.replace( /<H\d>\s*<\/H\d>/gi, '' ) ;
....




~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;207172674;29440083;f

Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:317093
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4

Reply via email to