[PHP] Re: ampersand in dom with utf-8
seems that `DomDocument-createTextNode()' accepts only utf-8 string, try encode the text before pass it to that function. in browser side, you can view document property, browser generally allow you to specify any encoding used to view a page, try get familar with you browser first. On 11/3/05, jonathan [EMAIL PROTECTED] wrote: so I decided it would be best just to convert back to the original format so for this string: $string=braised beef shortribs with sauteacute;ed greens, pearl onions and horseradish cream I do an html_entity_decode($string); but this still gives me an error when I add it via the DOM functions but render with the è in php. If I look at the xml in firefox on a mac, it looks like this: namebraised beef shortribs with saut?ed greens, pearl onions and horseradish cream/name I know that this is a basic question but how could I get this go through? Here is how I'm actually adding via the DOMDocument class: $name-appendChild($dom-createTextNode (html_entity_decode($item_row[slot]))); $item-appendChild($name); Is there any way I could on the client side query the xml string for the encoding to be sure that that in fact is utf-8? The first line of the generated xml is: ?xml version=1.0 encoding=utf-8? thanks for any help. -jonathan On Oct 16, 2005, at 1:36 AM, ac wrote: try this, if you need more entities to be included, just refer to `http://www.w3.org/2003/entities/iso8879/isolat1.ent' or find out its charcode by yourself. ?xml version=1.0? !DOCTYPE html [ !ENTITY egrave #x00e8; !ENTITY icirc #x00ee; ] item_namefarm lettuces with reed avocado, cregrave;me fraicirc;che, radish and cilantro/item_name On 10/13/05, jonathan [EMAIL PROTECTED] wrote: I'm now getting this error: XML Parsing Error: undefined entity with the following entity at the first ampersand: item_namefarm lettuces with reed avocado, cregrave;me fraicirc;che, radish and cilantro/item_name Why is an ampersand considered an undefined entity? The xml version is: ?xml version=1.0? Any thoughts please? -jonathan -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- all born, to be dying -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- all born, to be dying -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: ampersand in dom with utf-8
so I decided it would be best just to convert back to the original format so for this string: $string=braised beef shortribs with sauteacute;ed greens, pearl onions and horseradish cream I do an html_entity_decode($string); but this still gives me an error when I add it via the DOM functions but render with the è in php. If I look at the xml in firefox on a mac, it looks like this: namebraised beef shortribs with saut?ed greens, pearl onions and horseradish cream/name I know that this is a basic question but how could I get this go through? Here is how I'm actually adding via the DOMDocument class: $name-appendChild($dom-createTextNode (html_entity_decode($item_row[slot]))); $item-appendChild($name); Is there any way I could on the client side query the xml string for the encoding to be sure that that in fact is utf-8? The first line of the generated xml is: ?xml version=1.0 encoding=utf-8? thanks for any help. -jonathan On Oct 16, 2005, at 1:36 AM, ac wrote: try this, if you need more entities to be included, just refer to `http://www.w3.org/2003/entities/iso8879/isolat1.ent' or find out its charcode by yourself. ?xml version=1.0? !DOCTYPE html [ !ENTITY egrave #x00e8; !ENTITY icirc #x00ee; ] item_namefarm lettuces with reed avocado, cregrave;me fraicirc;che, radish and cilantro/item_name On 10/13/05, jonathan [EMAIL PROTECTED] wrote: I'm now getting this error: XML Parsing Error: undefined entity with the following entity at the first ampersand: item_namefarm lettuces with reed avocado, cregrave;me fraicirc;che, radish and cilantro/item_name Why is an ampersand considered an undefined entity? The xml version is: ?xml version=1.0? Any thoughts please? -jonathan -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- all born, to be dying -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
[PHP] Re: ampersand in dom with utf-8
try this, if you need more entities to be included, just refer to `http://www.w3.org/2003/entities/iso8879/isolat1.ent' or find out its charcode by yourself. ?xml version=1.0? !DOCTYPE html [ !ENTITY egrave #x00e8; !ENTITY icirc #x00ee; ] item_namefarm lettuces with reed avocado, cregrave;me fraicirc;che, radish and cilantro/item_name On 10/13/05, jonathan [EMAIL PROTECTED] wrote: I'm now getting this error: XML Parsing Error: undefined entity with the following entity at the first ampersand: item_namefarm lettuces with reed avocado, cregrave;me fraicirc;che, radish and cilantro/item_name Why is an ampersand considered an undefined entity? The xml version is: ?xml version=1.0? Any thoughts please? -jonathan -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- all born, to be dying -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: ampersand in dom with utf-8
I've been setting the here's some output: ?xml version=1.0 encoding=utf-8? menu submenu submenu_nameStarters/submenu_name item item_name soupe au pistou with amaranth and grana breadcrumbs/item_name /item item item_namefarm lettuces with reed avocado, cramp;egrave;me fraicirc;che, radish and cilantro/item_name /item On the second item, this cramp;egrave;me is ok but this fraicirc;che is causing the error. -jonathan On Oct 14, 2005, at 4:22 PM, Jasper Bryant-Greene wrote: jonathan wrote: the real characters (presumably è) won't render correctly. Are you outputting the correct character set information (UTF-8), and are you sure that UTF-8 is being used throughout the entire process? -- Jasper Bryant-Greene General Manager Album Limited a: Freepost Album, PO Box 579, Christchurch 8015, New Zealand p: 0800 4 ALBUM (0800 425 286) or +64 21 232 3303 e: [EMAIL PROTECTED] w: http://www.album.co.nz/ -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: ampersand in dom with utf-8
So I'm reading up on character encoding in XML documents as I think this is the problem (after the many helpful suggestions on this list). With regards to your second question; no, I'm not sure if I'm using proper utf-8 througout the entire process. When I input from the form I'm converting everything via htmlentities . This is why I'm getting egrave; etc... (On a side note, is there a function or way to check to see if a form is using the native characters (from a copy and paste of a word document like è) or the HTML entity egrave; . I've changed the content-type from text/xml to application/xml but that doesn't seem to help. As only UTF-8 and UTF-16 have to be supported, I'm concerned whether the processor might think it is some other encoding. The HTTP headers are: Date = Sat, 15 Oct 2005 17:49:02 GMT Server = Apache/1.3.33 (Unix) mod_jk/1.2.8 PHP/5.0.4 mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 FrontPage/5.0.2.2635 mod_ssl/2.8.22 OpenSSL/0.9.7a X-Powered-By = PHP/5.0.4 Cache-Control = no-cache Connection = close Content-Type = application/xml I guess pursuant to cc's suggestion, I should do an html_entity_decode when I make the xml document and then do another htmlentities on the html representatoin. -jonathan On Oct 14, 2005, at 4:22 PM, Jasper Bryant-Greene wrote: jonathan wrote: the real characters (presumably è) won't render correctly. Are you outputting the correct character set information (UTF-8), and are you sure that UTF-8 is being used throughout the entire process? -- Jasper Bryant-Greene General Manager Album Limited a: Freepost Album, PO Box 579, Christchurch 8015, New Zealand p: 0800 4 ALBUM (0800 425 286) or +64 21 232 3303 e: [EMAIL PROTECTED] w: http://www.album.co.nz/ -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: ampersand in dom with utf-8
jonathan wrote: So I'm reading up on character encoding in XML documents as I think this is the problem (after the many helpful suggestions on this list). With regards to your second question; no, I'm not sure if I'm using proper utf-8 througout the entire process. When I input from the form I'm converting everything via htmlentities . This is why I'm getting egrave; etc... (On a side note, is there a function or way to check to see if a form is using the native characters (from a copy and paste of a word document like è) or the HTML entity egrave; . If you're using the correct character set all the way through, you only need to do htmlspecialchars() to convert things like amp; and lt;, as all the other characters should already be present in the character set you are using (UTF-8). htmlentities() is mostly used for converting characters outside of your character set into entities. I've changed the content-type from text/xml to application/xml but that doesn't seem to help. As only UTF-8 and UTF-16 have to be supported, I'm concerned whether the processor might think it is some other encoding. The HTTP headers are: Date = Sat, 15 Oct 2005 17:49:02 GMT Server = Apache/1.3.33 (Unix) mod_jk/1.2.8 PHP/5.0.4 mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 FrontPage/5.0.2.2635 mod_ssl/2.8.22 OpenSSL/0.9.7a X-Powered-By = PHP/5.0.4 Cache-Control = no-cache Connection = close Content-Type = application/xml header('Content-Type: text/xml; charset=UTF-8'); I guess pursuant to cc's suggestion, I should do an html_entity_decode when I make the xml document and then do another htmlentities on the html representatoin. Shouldn't be any need. Characters like è don't have any special meaning in XML, and they can be represented in the UTF-8 character set, so there's no need to convert them to entities at any stage. -- Jasper Bryant-Greene General Manager Album Limited a: Freepost Album, PO Box 579, Christchurch 8015, New Zealand p: 0800 4 ALBUM (0800 425 286) or +64 21 232 3303 e: [EMAIL PROTECTED] w: http://www.album.co.nz/ Memberships: * Institute of Electrical and Electronics Engineers (IEEE) * Association for Computing Machinery (ACM) -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: ampersand in dom with utf-8
are there php functions to change from these different formats as #e8; doesn't seem to render correctly in a browser. ugghhh. -jonathan On Oct 13, 2005, at 4:53 AM, cc wrote: è -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: ampersand in dom with utf-8
the real characters (presumably è) won't render correctly. it seems like there should be a set of functions for encoding this to a different but understandable format and then another function for decoding and display within a browser. it makes me not want to use DOM for creating xml files. -jonathan On Oct 13, 2005, at 1:53 AM, Marcus Bointon wrote: On 13 Oct 2005, at 07:24, cc wrote: both `egrave;' and `icirc;' are not entities in charset utf-8, use `amp;egrave;' and `amp;icirc;' instead. I would expect that to result in unconverted entities in the output. If you're intending to send that content as HTML, then I guess that would be OK. However, if you're using UTF-8 anyway, why not just use the real characters? Marcus -- Marcus Bointon Synchromedia Limited: Putting you in the picture [EMAIL PROTECTED] | http://www.synchromedia.co.uk -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: ampersand in dom with utf-8
jonathan wrote: the real characters (presumably è) won't render correctly. Are you outputting the correct character set information (UTF-8), and are you sure that UTF-8 is being used throughout the entire process? -- Jasper Bryant-Greene General Manager Album Limited a: Freepost Album, PO Box 579, Christchurch 8015, New Zealand p: 0800 4 ALBUM (0800 425 286) or +64 21 232 3303 e: [EMAIL PROTECTED] w: http://www.album.co.nz/ -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
[PHP] Re: ampersand in dom with utf-8
both `egrave;' and `icirc;' are not entities in charset utf-8, use `amp;egrave;' and `amp;icirc;' instead. On 10/13/05, jonathan [EMAIL PROTECTED] wrote: I'm now getting this error: XML Parsing Error: undefined entity with the following entity at the first ampersand: item_namefarm lettuces with reed avocado, cregrave;me fraicirc;che, radish and cilantro/item_name Why is an ampersand considered an undefined entity? The xml version is: ?xml version=1.0? Any thoughts please? -jonathan -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: ampersand in dom with utf-8
On 13 Oct 2005, at 07:24, cc wrote: both `egrave;' and `icirc;' are not entities in charset utf-8, use `amp;egrave;' and `amp;icirc;' instead. I would expect that to result in unconverted entities in the output. If you're intending to send that content as HTML, then I guess that would be OK. However, if you're using UTF-8 anyway, why not just use the real characters? Marcus -- Marcus Bointon Synchromedia Limited: Putting you in the picture [EMAIL PROTECTED] | http://www.synchromedia.co.uk -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
[PHP] Re: ampersand in dom with utf-8
maybe i should have said: egrave; is not an _xml_ entity. i m not very sure. sorry. `egrave;' is an html entity, represents the letter `è' in iso-8859-1 charset, which have ascii value of 0xe8 . to have it recognized by libxml, there are 3 ways to do this: 1, ?xml version=1.0item_name#e8;/item_name 2, ?xml version=1.0 encoding=iso-8859-1item_nameè/item_name 3, ?xml version=1.0item_nameè/item_name 1 can be saved using either utf-8 encoding or iso-8859-1 encoding; 2 must be saved using iso-8859-1 encoding 3 must be saved using utf-8 encoding ( to have `è' be converted properly) in php, we can do this: $html = html_entity_decode('item_namefarm lettuces with reed avocado, cregrave;me fraicirc;che, radish and cilantro/item_name'); $dom = DomDocument::loadXML(?xml version=\1.0\ encoding=\iso-8859-1\$html); On 10/13/05, Marcus Bointon [EMAIL PROTECTED] wrote: On 13 Oct 2005, at 07:24, cc wrote: both `egrave;' and `icirc;' are not entities in charset utf-8, use `amp;egrave;' and `amp;icirc;' instead. I would expect that to result in unconverted entities in the output. If you're intending to send that content as HTML, then I guess that would be OK. However, if you're using UTF-8 anyway, why not just use the real characters? Marcus -- Marcus Bointon Synchromedia Limited: Putting you in the picture [EMAIL PROTECTED] | http://www.synchromedia.co.uk -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php