Use to do something similar to this back in the past with the following code which might well be a little faster (certainly was in the Tcl7.6 days it was being used in, but Tcl8.4 with its lovely TclObjs and bytecompilation might have reduced the difference)
 
define 'BadChars' as you will.
 
 
# <comment> persistant cache of bad chars so escape will not regenerate them each time it is called</comment>
 
namespace eval ::chars {
 set gBadChars ""
 for {set i 127} {$i < 256} {incr i} {
  append gBadChars [format %c $i]
 }
}
 
# <proc name="escapeChars">
#  <comment>
#   Escape the explicitly named chars into html escape sequences
#           - will not escape square brackets, always leaves them alone.
#   do not include them in gBadChars or else it will go wrong.
#  </comment>
# </proc>
 
proc escapeChars {pText} {
 # Put a backslash in front of all square brackets and backslashes before continuing
 regsub -all -- {\]|\[|\\} $pText {\\&} pText
 regsub -all -- "[format %c 91]$::chars::gBadChars[format %c 93]" $pText {[scan {&} %c c;set c "\&#$c;"]} vString
 return [subst -novariables $vString]
}
 
 
Of course this won't translate silly MS characters into anything more sensible, and still relies on the browser being able to correctly display the output.
 
you might need to play witht the encoding command to do that
 
e.g.
encoding convertto iso8859-1 $pText
 
or alteratively deal with each of them as a special case
e.g.
 
proc remove_MS_cruft { string2clean } {
 
 regsub -all {\\u2019} $string2clean {'} newString
 regsub -all {\\u201c} $newString {"} newString
 regsub -all {\\u201d} $newString {"} newString
 regsub -all {\\u2013} $newString {-} newString
 regsub -all {\\u201e} $newString {"} newString
 regsub -all {\\u2018} $newString {'} newString
 regsub -all {&#211;} $newString {-} newString
 return [escapeChars $newString]
 
}
 
-----Original Message-----
From: AOLserver Discussion [mailto:[EMAIL PROTECTED]On Behalf Of Michael Richman
Sent: Friday, October 17, 2003 9:15 PM
To: [EMAIL PROTECTED]
Subject: Re: [AOLSERVER] more on special characters: smart quotes

 
This is one particular implementation (a little different than what you're asking for, but could be modified to catch things you need to catch). This method replaces chars in question with the HTML equivalent, as Carson's mail mentions, which may or may not be what you're wishing to do:
 
 
proc replaceSpecialChars {oldStr newStrVar} {
    upvar $newStrVar newStr
    set newStr ""
    set nrepl  0
    set strlen [string length $oldStr]

    for {set i 0} {$i < $strlen} {incr i} {
        set char [string index $oldStr $i]
        scan $char %c c

        if {$c > 127} {
            append newStr "&#$c;"
            incr nrepl
        } else {
            append newStr $char
        }
    }

    return $nrepl
}
 
This could get slow if you're dealing with large strings, but it could be written in C for optimization if need be.
 
-- michael
 
In a message dated 10/17/2003 3:57:34 PM Eastern Daylight Time, [EMAIL PROTECTED] writes:
I have solved this problem on quite a few CMS implementations in both
Vignette and AOLServer using regsub's. I suggest replacing them with the
HTML equivilant rather than with replacement ascii, but that's more a
matter of opinion. The translation method is the best route, but sometimes
a lot of processing. Regsub is fine if you're willing to accept the fact
that occassionaly a special char you haven't identified yet (like vertical
TAB!) might slip in.

> So, I was a little unclear on the problem my coworker was facing-
> apologies
> for sort of repeating myself.
>
> We've built a CMS allowing editors to paste in stories, and we're having
> problems with them pasting in smart quotes- primarily, that apparently
> smart
> quotes aren't within the unicode set and are proprietary to Msoft.
>
> We're looking for an efficient way to substitute smart quotes for regular
> quotes.  Has anyone dealt with this (pesky/annoying) problem before?
>
> Thanks a ton,
> Scott
>
> P.S. If you're unclear what a smart quote is, open up Msoft Word and type
> {blah "blah"} and note that one is a "left quote" and the other is a
> "right
> quote"...
>
>
> --
> AOLserver - http://www.aolserver.com/
>
> To Remove yourself from this list, simply send an email to
> <[EMAIL PROTECTED]> with the
> body of "SIGNOFF AOLSERVER" in the email message. You can leave the
> Subject: field of your email blank.
>
 
______________________
michael richman

princ software engineer
aol web svcs & publishing

-- AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.

-- AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.

Reply via email to