Rob, you had a good idea of my situation.  It's basically name, address etc. 
that shoppers enter when they are buying our product.  Then after the data 
is in the db, I output it as xml for a third party application.  Somehow, 
control characters sneak into that form data from time to time.

I do understand that it's best practice to clean the data before it goes 
into the db.  However in my particular case, it doesn't make a whole bunch 
of difference whether I clean it before or after.

Maybe I can summarize:
1) CDATA is not helpful when encountering control characters.
2) Thus, I have to use rereplace with all the known control characters that 
have broken the xml in the past (CF tells you which character that is the 
problem, in Unicode)
3) If I did the rereplace on the way into the db, it still may not catch all 
offending control characters.  There may be a new one that isn't in the 
regex yet.  Additionally, I don't want to disrupt the shopper's checkout 
process if at all possible.
4) Thus, the data with the new control character would still go into the db 
and break the xml on the way out of the db.
5) So, I may as well just do it on the way out of the db, where I don't have 
to worry about disrupting a shopper when they are about to buy something 
(can you imagine the error message: "Sorry, we have detected an invisible 
character in your address.  Please remove it and re-submit.")

-- Josh


----- Original Message ----- 
From: "Rob Wilkerson" <[EMAIL PROTECTED]>
To: "CF-Talk" <cf-talk@houseoffusion.com>
Sent: Tuesday, November 07, 2006 2:04 PM
Subject: Re: Cleaning XML - Unicode 0x0 SOLVED sorta


> On 11/7/06, Matt Quackenbush <[EMAIL PROTECTED]> wrote:
>> Josh,
>>
>> I think the point that Rob and others were making is that your data 
>> should
>> be validated and cleaned up BEFORE being inserted into the database -
>> whether it's inserted as XML or not is completely and utterly irrelevant.
>
> That's not exactly what I was saying, but I do agree that it's a good
> practice when possible.  On the whole, though, I'm a proponent of less
> rather than more restriction on what can be entered.  Some data is
> restrictive by its very nature (e.g. price, quantity, etc.), but other
> data is very unstructured (e.g. name, title, description).  In the
> latter case, I prefer to try to keep it as it was entered and then
> handle it when it's used - preferably without modification.
>
>> If you didn't have invalid data in the database, then you wouldn't have
>> invalid data in your XML.  But, since the data obviously is NOT being
>> validated and cleaned up before db entry, the best, most scalable, and 
>> most
>> widely accepted "good practice" would be to use CDATA in your XML.
>
> Exactly.  Any number of characters can creep into that unstructured
> text I mentioned above.  A LOT of them if the text is copied and
> pasted from MSWord.  Those characters can either be stripped
> one-by-one using REReplace() or another similar method or you can
> simply allow them in your XML by enclosing them in a CDATA block.  The
> latter is much easier and retains the data exactly as it was entered.
>
>> Again though, what you're doing is just a bandaid that covers up the real
>> issue, which is invalid data being entered into the database.
>
> In Josh's case, I don't think I have a good sense of what kind of data
> he's got nor of the process in which he's using it so the best I could
> do was throw out generic options.  Hopefully they made enough sense
> that he'll be able to use them if he feels the need to do so.
>
> 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting,
up-to-date ColdFusion information by your peers, delivered to your door four 
times a year.
http://www.fusionauthority.com/quarterly

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:259538
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4

Reply via email to