I have actually written one that tries to map characters that I know
have failed to ascii characters, but there are probably thousands of
characters in other sets that would make my page puke.

??

Jon

-----Original Message-----
From: Brad Wood [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 15, 2006 11:49 AM
To: CF-Talk
Subject: RE: Does anybody "really" understand character encodings?

Well the trick is he would not only want to remove "bad" characters but
replace them with the correct ASCII equivalent.  For instance a MS Word
smart left quote would become a regular double quote, or a MS Word
Ellipses would become three periods.

I'm not sure if you can find special characters with char().  Do they
use an ASCII code?  

I swear, as much as people ask questions like this, someone has to have
made a replacebadwithgood() function which would fix strings for you to
use only standard ASCII characters.

~Brad

-----Original Message-----
From: Brent Nicholas [mailto:[EMAIL PROTECTED]
Sent: Friday, September 15, 2006 10:25 AM
To: CF-Talk
Subject: Re: Does anybody "really" understand character encodings?

I'm no expert here, but I'd try to use a regular expression that leaves
all the 'known good characters' and removes the unknown. Though you'd
really have to look into what is 'known good' if you think it maybe more
than A-Z, a-z, 1-0, and punctuation.

2 cents.

BN

>I have some users who enter data into my web application through one of

>two ways:
> 
>- copy/paste from microsoft word
>- XML export from InDesign UTF-16
>- XML export from Quark
> 
>In all 3 of the cases I've described above, the orign software is 
>putting through characters that do not display correctly on the web.
> 
>The problem I'm having is that some of the characters such as an 
>ellipsis mark or hyphen. When I run into these characters, they display

>as the wrong character... sometimes a question mark. Othertimes a
square
>box... yet other times sequences of characters that are just totally 
>crazy.
> 
>My basic understanding of character encoding tells me that I want to 
>reduce all of the characters down to ASCII. I do not know a good way to

>do this.
> 
>How can I accept text from each of the above mentioned sources, perhaps

>others, and somehow *normalize* all of the character data into a set of

>characters that will display properly on my page every time?
> 
>Thank you,
>Jon





~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting,
up-to-date ColdFusion information by your peers, delivered to your door four 
times a year.
http://www.fusionauthority.com/quarterly

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:253269
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4

Reply via email to