At 06.04.2002  08:36, you wrote:

>I read in the contents of an old mysql database, and made XML files out
>of the data contained in it. The total job came out to about 6800 files,
>or "Documents".
>
>I'm finding that, sporadically, when reading these XML files, and passing
>the xmldata to sablotron, I'm getting sablotron errors. These errors stem
>from characters I'm finding throughout these documents. This is the meat
>of my problem.
>
>I take a look at a file which is giving errors by using 'less'. When
>looking at the file, I'll see chars like this: <A1> or <91> or <92> and
>so on, and so on. The chars are hilighted in bold reverse text,
>indicating that they are 'binary'??. I'm not sure whether to call them
>binary or hex... perhaps someone can tell me how to appropriately address
>these chars...
>
>Anyhow... these chars sometimes correspond to valid characters. Such as
><A9>... this is a "copyright" char, or &amp;copy.
>
>I've been manually replacing these characters as errors are generated,
>but it's getting a little tiring.
>
>Is there anyway I can force PHP to either strip out these 'binary'
>characters, or whatever they are, when I read the file?
>
>Is there any way to keep php from saving these chars to NEW documents
>when they are created?
>
>Does anyone even know what I'm talking about?? hehe.
Seems that you work on a win*box, as a tip (don´t know if it works, as I 
use LAMP)
try manual -> string functions ->

get_html_translation_table — Returns the translation table used by 
htmlspecialchars() and htmlentities()
htmlentities — Convert all applicable characters to HTML entities
htmlspecialchars — Convert special characters to HTML entities

Maybe here you´ll find something that make you happy.
Otherwise, as I learned the last two weeks on the list, ereg_replace or 
eregi could
make your life easier.
HTH Oliver


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to