www.TheVerseOfTheDay.info

-----Original Message----- From: Richard Quadling
Sent: Friday, September 30, 2011 2:53 PM
To: Ron Piggott
Cc: php-general@lists.php.net
Subject: Re: [PHP] RSS Feed Accented Characters

On 30 September 2011 18:22, Ron Piggott <ron....@actsministries.org> wrote:

-----Original Message----- From: Richard Quadling
Sent: Friday, September 30, 2011 12:31 PM
To: Ron Piggott
Cc: php-general@lists.php.net
Subject: Re: [PHP] RSS Feed Accented Characters

On 30 September 2011 17:26, Ron Piggott <ron....@actsministries.org> wrote:

I am trying to set up an RSS Feed in the Spanish language using a PHP cron
job.  I am unsure of how to deal with accented letters.

An example:

This syntax:

<?php

$rss_content .= "<description>" . htmlentities("El Versículo del Día") .
"</description>\r\n";

?>

Outputs:


<description>El Vers&iacute;culo del D&iacute;a</description>


When I use an RSS Feed validator I receive the error message

This feed does not validate.

 a.. line 24, column 20: XML parsing error: <unknown>:24:20: undefined
entity

I suspect the “;” is the issue, although it is needed for the accented
letters.  If I don’t use htmlentities() the accented characters can’t be
viewed, they become a “?”  How should I proceed?

Ron

Make sure you have ...

<?xml version="1.0" encode="UTF-8"?>

as the first line of the output. That tells the reader that the file
is a UTF-8 encoded file. Also, if you ejecting HTTP headers, make sure
that they say the encoding is UTF-8 and not a codepage.

Go UTF-8 everywhere.


--
Richard Quadling
Twitter : EE : Zend : PHPDoc
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea




Hi Richard:

Having "     <?xml version="1.0" encoding="UTF-8"?>     " as the starting
line didn't correct the problem.

The RSS Feed is @
http://www.elversiculodeldia.info/peticiones-de-rezo-rss.xml

There are a variety of errors related to accented characters while using a
feed valuator
http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fwww.elversiculodeldia.info%2Fpeticiones-de-rezo-rss.xml

- Also While viewing the feed in Firefox once the first accented character
is displayed none of the rest of the feed is visible, except by right
clicking and "view source"

The RSS Feed content will be populated by a database query.  The database
columns are set to utf8_unicode_ci

How should I proceed?
Ron


The byte sequence that is being received is just 0xED.

php -r "file_put_contents('a.rss',
file_get_contents('http://www.elversiculodeldia.info/peticiones-de-rezo-rss.xml'));"

This is NOT UTF-8 encoded data, but is ISO-8859-1 Latin-1 (most likely).

So as I see it you have 1 choice.

Either use <?xml version="1.0" encoding="ISO-8859-1"?> as the XML tag
or convert the encoded data to UTF-8.

It also means that the data in the sql server is NOT UTF-8 and will
need to be converted also.

I would recommend doing that first.

That will mean reading the data as ISO-8859-1 and converting it to
UTF-8 and then saving it again.

I'd also be looking at the app that inputs the data into the DB initially.

To convert the text, here are 2 examples. I'm sure there are more ways.

<?php
$iso_text = 'El Versículo del Día: Pray For Others: Incoming Prayer Requests';

$utf_8_text = utf8_encode($iso_text);
var_dump($iso_text, $utf_8_text);

$utf_8_text = iconv('ISO-8859-1', 'UTF-8', $iso_text);
var_dump($iso_text, $utf_8_text);
?>

outputs ...

string(63) "El Vers퀀culo del D퀀a: Pray For Others: Incoming Prayer Requests"
string(65) "El Versículo del Día: Pray For Others: Incoming Prayer Requests"
string(63) "El Vers퀀culo del D퀀a: Pray For Others: Incoming Prayer Requests"
string(65) "El Versículo del Día: Pray For Others: Incoming Prayer Requests"

notice that the correct strings are 2 bytes longer?

The í is encoded as 0xC3AD or U+00ED.

--
Richard Quadling
Twitter : EE : Zend : PHPDoc
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea


Richard I was unaware of the

utf8_encode

command. Thank you very much --- this now works. Now I may continue with the translation into Spanish.

Ron

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to