[PHP] Re: ampersand in dom with utf-8

2005-11-03 Thread n.g.
seems that `DomDocument-createTextNode()' accepts only utf-8 string,
try encode the text before pass it to that function.

in browser side, you can view document property,
browser generally allow you to specify any encoding used to view a
page, try get familar with you browser first.

On 11/3/05, jonathan [EMAIL PROTECTED] wrote:
 so I decided it would be best just to convert back to the original
 format so for this string:
 $string=braised beef shortribs with sauteacute;ed greens, pearl
 onions and horseradish cream

 I do an html_entity_decode($string);

 but this still gives me an error when I add it via the DOM functions
 but render with the è in php. If I look at the xml in firefox on a
 mac, it looks like this:

 namebraised beef shortribs with saut?ed greens, pearl onions and
 horseradish cream/name

 I know that this is a basic question but how could I get this go
 through?

 Here is how I'm actually adding via the DOMDocument class:

  $name-appendChild($dom-createTextNode
 (html_entity_decode($item_row[slot])));
  $item-appendChild($name);

 Is there any way I could on the client side query the xml string for
 the encoding to be sure that that in fact is utf-8? The first line of
 the generated xml is:
 ?xml version=1.0 encoding=utf-8?

 thanks for any help.

 -jonathan

 On Oct 16, 2005, at 1:36 AM, ac wrote:

  try this,
  if you need more entities to be included,
  just refer to
  `http://www.w3.org/2003/entities/iso8879/isolat1.ent' or find out its
  charcode by yourself.
 
  ?xml version=1.0?
  !DOCTYPE html [
   !ENTITY egrave #x00e8;
   !ENTITY icirc #x00ee;
  ]
  item_namefarm lettuces with reed avocado, cregrave;me
   fraicirc;che, radish and cilantro/item_name
 
 
  On 10/13/05, jonathan [EMAIL PROTECTED] wrote:
 
  I'm now getting this error:
 
  XML Parsing Error: undefined entity
 
  with the following entity at the first ampersand:
  item_namefarm lettuces with reed avocado, cregrave;me
  fraicirc;che, radish and cilantro/item_name
 
  Why is an ampersand considered an undefined entity? The xml version
  is: ?xml version=1.0?
 
  Any thoughts please?
 
  -jonathan
 
  --
  PHP General Mailing List (http://www.php.net/)
  To unsubscribe, visit: http://www.php.net/unsub.php
 
 
 
 
 
  --
  all born, to be dying
 
  --
  PHP General Mailing List (http://www.php.net/)
  To unsubscribe, visit: http://www.php.net/unsub.php
 
 
 
 




--
all born, to be dying

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: ampersand in dom with utf-8

2005-11-02 Thread jonathan
so I decided it would be best just to convert back to the original  
format so for this string:
$string=braised beef shortribs with sauteacute;ed greens, pearl  
onions and horseradish cream


I do an html_entity_decode($string);

but this still gives me an error when I add it via the DOM functions  
but render with the è in php. If I look at the xml in firefox on a  
mac, it looks like this:


namebraised beef shortribs with saut?ed greens, pearl onions and  
horseradish cream/name


I know that this is a basic question but how could I get this go  
through?


Here is how I'm actually adding via the DOMDocument class:

$name-appendChild($dom-createTextNode 
(html_entity_decode($item_row[slot])));

$item-appendChild($name);

Is there any way I could on the client side query the xml string for  
the encoding to be sure that that in fact is utf-8? The first line of  
the generated xml is:

?xml version=1.0 encoding=utf-8?

thanks for any help.

-jonathan

On Oct 16, 2005, at 1:36 AM, ac wrote:


try this,
if you need more entities to be included,
just refer to
`http://www.w3.org/2003/entities/iso8879/isolat1.ent' or find out its
charcode by yourself.

?xml version=1.0?
!DOCTYPE html [
 !ENTITY egrave #x00e8;
 !ENTITY icirc #x00ee;
]
item_namefarm lettuces with reed avocado, cregrave;me
 fraicirc;che, radish and cilantro/item_name


On 10/13/05, jonathan [EMAIL PROTECTED] wrote:


I'm now getting this error:

XML Parsing Error: undefined entity

with the following entity at the first ampersand:
item_namefarm lettuces with reed avocado, cregrave;me
fraicirc;che, radish and cilantro/item_name

Why is an ampersand considered an undefined entity? The xml version
is: ?xml version=1.0?

Any thoughts please?

-jonathan

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php






--
all born, to be dying

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php






--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP] Re: ampersand in dom with utf-8

2005-10-16 Thread ac
try this,
if you need more entities to be included,
just refer to
`http://www.w3.org/2003/entities/iso8879/isolat1.ent' or find out its
charcode by yourself.

?xml version=1.0?
!DOCTYPE html [
 !ENTITY egrave #x00e8;
 !ENTITY icirc #x00ee;
]
item_namefarm lettuces with reed avocado, cregrave;me
 fraicirc;che, radish and cilantro/item_name


On 10/13/05, jonathan [EMAIL PROTECTED] wrote:
 I'm now getting this error:

 XML Parsing Error: undefined entity

 with the following entity at the first ampersand:
 item_namefarm lettuces with reed avocado, cregrave;me
 fraicirc;che, radish and cilantro/item_name

 Why is an ampersand considered an undefined entity? The xml version
 is: ?xml version=1.0?

 Any thoughts please?

 -jonathan

 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php




--
all born, to be dying

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: ampersand in dom with utf-8

2005-10-15 Thread jonathan

I've been setting the

here's some output:
?xml version=1.0 encoding=utf-8?
menu
  submenu
   submenu_nameStarters/submenu_name
  item
   item_name soupe au pistou with amaranth and grana  
breadcrumbs/item_name

  /item
  item
   item_namefarm lettuces with reed avocado,  
cramp;egrave;me fraicirc;che, radish and cilantro/item_name

  /item

On the second item, this cramp;egrave;me is ok but this  
fraicirc;che is causing the error.


-jonathan

On Oct 14, 2005, at 4:22 PM, Jasper Bryant-Greene wrote:


jonathan wrote:


the real characters (presumably è) won't render correctly.



Are you outputting the correct character set information (UTF-8),  
and are you sure that UTF-8 is being used throughout the entire  
process?


--
Jasper Bryant-Greene
General Manager
Album Limited

a: Freepost Album, PO Box 579, Christchurch 8015, New Zealand
p: 0800 4 ALBUM (0800 425 286) or +64 21 232 3303
e: [EMAIL PROTECTED]
w: http://www.album.co.nz/

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php






--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: ampersand in dom with utf-8

2005-10-15 Thread jonathan


So I'm reading up on character encoding in XML documents as I think  
this is the problem (after the many helpful suggestions on this list).


With regards to your second question; no, I'm not sure if I'm using  
proper utf-8 througout the entire process. When I input from the form  
I'm converting everything via htmlentities . This is why I'm getting  
egrave; etc... (On a side note, is there a function or way to check  
to see if a form is using the native characters (from a copy and  
paste of a word document like è) or the HTML entity egrave; .


I've changed the content-type from text/xml to application/xml but  
that doesn't seem to help.


As only UTF-8 and UTF-16 have to be supported, I'm concerned whether  
the processor might think it is some other encoding.


The HTTP headers are:

Date = Sat, 15 Oct 2005 17:49:02 GMT
Server = Apache/1.3.33 (Unix) mod_jk/1.2.8 PHP/5.0.4  
mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4  
FrontPage/5.0.2.2635 mod_ssl/2.8.22 OpenSSL/0.9.7a

X-Powered-By = PHP/5.0.4
Cache-Control = no-cache
Connection = close
Content-Type = application/xml

I guess pursuant to cc's suggestion, I should do an  
html_entity_decode when I make the xml document and then do another  
htmlentities on the html representatoin.


-jonathan


On Oct 14, 2005, at 4:22 PM, Jasper Bryant-Greene wrote:


jonathan wrote:


the real characters (presumably è) won't render correctly.



Are you outputting the correct character set information (UTF-8),  
and are you sure that UTF-8 is being used throughout the entire  
process?


--
Jasper Bryant-Greene
General Manager
Album Limited

a: Freepost Album, PO Box 579, Christchurch 8015, New Zealand
p: 0800 4 ALBUM (0800 425 286) or +64 21 232 3303
e: [EMAIL PROTECTED]
w: http://www.album.co.nz/

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php






--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: ampersand in dom with utf-8

2005-10-15 Thread Jasper Bryant-Greene

jonathan wrote:
So I'm reading up on character encoding in XML documents as I think  
this is the problem (after the many helpful suggestions on this list).


With regards to your second question; no, I'm not sure if I'm using  
proper utf-8 througout the entire process. When I input from the form  
I'm converting everything via htmlentities . This is why I'm getting  
egrave; etc... (On a side note, is there a function or way to check  to 
see if a form is using the native characters (from a copy and  paste of 
a word document like è) or the HTML entity egrave; .


If you're using the correct character set all the way through, you only 
need to do htmlspecialchars() to convert things like amp; and lt;, as 
all the other characters should already be present in the character set 
you are using (UTF-8). htmlentities() is mostly used for converting 
characters outside of your character set into entities.


I've changed the content-type from text/xml to application/xml but  that 
doesn't seem to help.


As only UTF-8 and UTF-16 have to be supported, I'm concerned whether  
the processor might think it is some other encoding.


The HTTP headers are:

Date = Sat, 15 Oct 2005 17:49:02 GMT
Server = Apache/1.3.33 (Unix) mod_jk/1.2.8 PHP/5.0.4  
mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4  
FrontPage/5.0.2.2635 mod_ssl/2.8.22 OpenSSL/0.9.7a

X-Powered-By = PHP/5.0.4
Cache-Control = no-cache
Connection = close
Content-Type = application/xml


header('Content-Type: text/xml; charset=UTF-8');

I guess pursuant to cc's suggestion, I should do an  html_entity_decode 
when I make the xml document and then do another  htmlentities on the 
html representatoin.


Shouldn't be any need. Characters like è don't have any special meaning 
in XML, and they can be represented in the UTF-8 character set, so 
there's no need to convert them to entities at any stage.


--
Jasper Bryant-Greene
General Manager
Album Limited

a: Freepost Album, PO Box 579, Christchurch 8015, New Zealand
p: 0800 4 ALBUM (0800 425 286) or +64 21 232 3303
e: [EMAIL PROTECTED]
w: http://www.album.co.nz/

Memberships:
* Institute of Electrical and Electronics Engineers (IEEE)
* Association for Computing Machinery (ACM)

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: ampersand in dom with utf-8

2005-10-14 Thread jonathan
are there php functions to change from these different formats as  
#e8; doesn't seem to render correctly in a browser. ugghhh.


-jonathan
On Oct 13, 2005, at 4:53 AM, cc wrote:


è


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: ampersand in dom with utf-8

2005-10-14 Thread jonathan

the real characters (presumably è) won't render correctly.

it seems like there should be a set of functions for encoding this to  
a different but understandable format and then another function for  
decoding and display within a browser.


it makes me not want to use DOM for creating xml files.

-jonathan

On Oct 13, 2005, at 1:53 AM, Marcus Bointon wrote:


On 13 Oct 2005, at 07:24, cc wrote:



both `egrave;' and `icirc;' are not entities in charset utf-8, use
`amp;egrave;' and `amp;icirc;' instead.



I would expect that to result in unconverted entities in the  
output. If you're intending to send that content as HTML, then I  
guess that would be OK. However, if you're using UTF-8 anyway, why  
not just use the real characters?


Marcus
--
Marcus Bointon
Synchromedia Limited: Putting you in the picture
[EMAIL PROTECTED] | http://www.synchromedia.co.uk

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php






--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: ampersand in dom with utf-8

2005-10-14 Thread Jasper Bryant-Greene

jonathan wrote:

the real characters (presumably è) won't render correctly.


Are you outputting the correct character set information (UTF-8), and 
are you sure that UTF-8 is being used throughout the entire process?


--
Jasper Bryant-Greene
General Manager
Album Limited

a: Freepost Album, PO Box 579, Christchurch 8015, New Zealand
p: 0800 4 ALBUM (0800 425 286) or +64 21 232 3303
e: [EMAIL PROTECTED]
w: http://www.album.co.nz/

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP] Re: ampersand in dom with utf-8

2005-10-13 Thread cc
both `egrave;' and `icirc;' are not entities in charset utf-8, use
`amp;egrave;' and `amp;icirc;' instead.

On 10/13/05, jonathan [EMAIL PROTECTED] wrote:
 I'm now getting this error:

 XML Parsing Error: undefined entity

 with the following entity at the first ampersand:
 item_namefarm lettuces with reed avocado, cregrave;me
 fraicirc;che, radish and cilantro/item_name

 Why is an ampersand considered an undefined entity? The xml version
 is: ?xml version=1.0?

 Any thoughts please?

 -jonathan

 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: ampersand in dom with utf-8

2005-10-13 Thread Marcus Bointon

On 13 Oct 2005, at 07:24, cc wrote:


both `egrave;' and `icirc;' are not entities in charset utf-8, use
`amp;egrave;' and `amp;icirc;' instead.


I would expect that to result in unconverted entities in the output.  
If you're intending to send that content as HTML, then I guess that  
would be OK. However, if you're using UTF-8 anyway, why not just use  
the real characters?


Marcus
--
Marcus Bointon
Synchromedia Limited: Putting you in the picture
[EMAIL PROTECTED] | http://www.synchromedia.co.uk

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP] Re: ampersand in dom with utf-8

2005-10-13 Thread cc
maybe i should have said: egrave; is not an _xml_ entity.
i m not very sure.
sorry.

`egrave;' is an html entity,
represents the letter `è' in iso-8859-1 charset,
which have ascii value of 0xe8 .

to have it recognized by libxml, there are 3 ways to do this:
1, ?xml version=1.0item_name#e8;/item_name
2, ?xml version=1.0 encoding=iso-8859-1item_nameè/item_name
3, ?xml version=1.0item_nameè/item_name

1 can be saved using either utf-8 encoding or iso-8859-1 encoding;
2 must be saved using iso-8859-1 encoding
3 must be saved using utf-8 encoding ( to have `è' be converted properly)


in php, we can do this:

   $html = html_entity_decode('item_namefarm lettuces with reed
avocado, cregrave;me
fraicirc;che, radish and cilantro/item_name');
   $dom = DomDocument::loadXML(?xml version=\1.0\
encoding=\iso-8859-1\$html);



On 10/13/05, Marcus Bointon [EMAIL PROTECTED] wrote:
 On 13 Oct 2005, at 07:24, cc wrote:

  both `egrave;' and `icirc;' are not entities in charset utf-8, use
  `amp;egrave;' and `amp;icirc;' instead.

 I would expect that to result in unconverted entities in the output.
 If you're intending to send that content as HTML, then I guess that
 would be OK. However, if you're using UTF-8 anyway, why not just use
 the real characters?

 Marcus
 --
 Marcus Bointon
 Synchromedia Limited: Putting you in the picture
 [EMAIL PROTECTED] | http://www.synchromedia.co.uk

 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php