(Sorry for posting this here, but [EMAIL PROTECTED] bounced this mail as 
"off topic" and suggested that i post it here.)

Hello, mysqlers... (first post from a long-time mysql user)

i recently learned that mysqldump has an --xml flag to dump out a db into XML. 
COOL! Except that it mangles data by converting >, <, &, etc, into their XML 
equivalents. This is conversion Evil because:
The data may or may not have been stored encoded that way originally, and upon 
attempting to convert from XML back into some other format, the user cannot 
reliably convert XML entities back into "normal characters" in the case where 
both types of data (normal characters and XML entities) are in the same 
fields (an example is below). In short, it's an unannounced alteration of the 
user's data, and one which can potentially cause problems later on during 
conversion from XML to [data format X].

For example, i store PHP code in a database, and the <?php php?> tags get 
mangled with the --xml flag, as do the & signs in the code. Upon conversion 
from XML back into any other format, the data is useless: i would have to 
look through every &amp; entity and see if it's in a string (and thus is okay 
as '&amp;') or not (and thus it's a programming operator) and edit 
accordingly.

Here's a real-world example of a piece of data mangled by --xml:

                <content>&lt;?php
echo '&lt;hr&gt;&lt;b&gt;Session table:&lt;/b&gt;&lt;br&gt;';
$db=classload('DebugUtil');
echo $db-&gt;dumpArray( r_session(), '&lt;br&gt;' );
php?&gt;<content>


That cannot be 100% automatically/reliably converted back into it's original 
form.

Coincidentally, a couple months ago i wrote a Perl script which dumps a mysql 
db into XML, and the approach i took to this problem seems to be less 
intrusive, and keeps the user's data exactly as it is in the db:

If the data of a dumped field contains any non-word characters, simply wrap it 
up the output in a <![CDATA[...]]> block.

Using mysqldump --xml:
<myfield>&lt;?php echo "some code goes here, &amp; some code goes there."; 
php?&gt;</myfield>

proposed method:
<myfield><![CDATA[<?php echo "some code goes here, & some code goes there."; 
php?>]]</myfield>


So, the mangled example from above becomes:
                <content><![CDATA[<?php
echo '<hr><b>Session table:</b><br>';
$db=classload('DebugUtil');
echo $db->dumpArray( r_session(), '<br>' );
php?>]]></content>

Fields with only word characters (or word and any of ",.-") are left intact.

i strongly recommend a similar change in mysqldump's --xml behaviour, as the 
current behaviour seems downright evil.

Take care, :)

----- stephan
[EMAIL PROTECTED] - http://www.einsurance.de
Office: +49 (89)  552 92 862 Handy:  +49 (179) 211 97 67
This email is encrypted with ROT26 encoding. Decoding it
is in violation of the Digital Millennium Copyright Act.



---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to