At 06.04.2002 08:36, you wrote: >I read in the contents of an old mysql database, and made XML files out >of the data contained in it. The total job came out to about 6800 files, >or "Documents". > >I'm finding that, sporadically, when reading these XML files, and passing >the xmldata to sablotron, I'm getting sablotron errors. These errors stem >from characters I'm finding throughout these documents. This is the meat >of my problem. > >I take a look at a file which is giving errors by using 'less'. When >looking at the file, I'll see chars like this: <A1> or <91> or <92> and >so on, and so on. The chars are hilighted in bold reverse text, >indicating that they are 'binary'??. I'm not sure whether to call them >binary or hex... perhaps someone can tell me how to appropriately address >these chars... > >Anyhow... these chars sometimes correspond to valid characters. Such as ><A9>... this is a "copyright" char, or &copy. > >I've been manually replacing these characters as errors are generated, >but it's getting a little tiring. > >Is there anyway I can force PHP to either strip out these 'binary' >characters, or whatever they are, when I read the file? > >Is there any way to keep php from saving these chars to NEW documents >when they are created? > >Does anyone even know what I'm talking about?? hehe. Seems that you work on a win*box, as a tip (don´t know if it works, as I use LAMP) try manual -> string functions ->
get_html_translation_table — Returns the translation table used by htmlspecialchars() and htmlentities() htmlentities — Convert all applicable characters to HTML entities htmlspecialchars — Convert special characters to HTML entities Maybe here you´ll find something that make you happy. Otherwise, as I learned the last two weeks on the list, ereg_replace or eregi could make your life easier. HTH Oliver -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php