On 30 September 2011 18:22, Ron Piggott <ron....@actsministries.org> wrote:
>
> -----Original Message----- From: Richard Quadling
> Sent: Friday, September 30, 2011 12:31 PM
> To: Ron Piggott
> Cc: php-general@lists.php.net
> Subject: Re: [PHP] RSS Feed Accented Characters
>
> On 30 September 2011 17:26, Ron Piggott <ron....@actsministries.org> wrote:
>>
>> I am trying to set up an RSS Feed in the Spanish language using a PHP cron
>> job.  I am unsure of how to deal with accented letters.
>>
>> An example:
>>
>> This syntax:
>>
>> <?php
>>
>> $rss_content .= "<description>" . htmlentities("El Versículo del Día") .
>> "</description>\r\n";
>>
>> ?>
>>
>> Outputs:
>>
>>
>> <description>El Vers&iacute;culo del D&iacute;a</description>
>>
>>
>> When I use an RSS Feed validator I receive the error message
>>
>> This feed does not validate.
>>
>>  a.. line 24, column 20: XML parsing error: <unknown>:24:20: undefined
>> entity
>>
>> I suspect the “;” is the issue, although it is needed for the accented
>> letters.  If I don’t use htmlentities() the accented characters can’t be
>> viewed, they become a “?”  How should I proceed?
>>
>> Ron
>
> Make sure you have ...
>
> <?xml version="1.0" encode="UTF-8"?>
>
> as the first line of the output. That tells the reader that the file
> is a UTF-8 encoded file. Also, if you ejecting HTTP headers, make sure
> that they say the encoding is UTF-8 and not a codepage.
>
> Go UTF-8 everywhere.
>
>
> --
> Richard Quadling
> Twitter : EE : Zend : PHPDoc
> @RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea
>
>
>
>
> Hi Richard:
>
> Having "     <?xml version="1.0" encoding="UTF-8"?>     " as the starting
> line didn't correct the problem.
>
> The RSS Feed is @
> http://www.elversiculodeldia.info/peticiones-de-rezo-rss.xml
>
> There are a variety of errors related to accented characters while using a
> feed valuator
> http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fwww.elversiculodeldia.info%2Fpeticiones-de-rezo-rss.xml
>
> - Also While viewing the feed in Firefox once the first accented character
> is displayed none of the rest of the feed is visible, except by right
> clicking and "view source"
>
> The RSS Feed content will be populated by a database query.  The database
> columns are set to utf8_unicode_ci
>
> How should I proceed?
> Ron
>

The byte sequence that is being received is just 0xED.

php -r "file_put_contents('a.rss',
file_get_contents('http://www.elversiculodeldia.info/peticiones-de-rezo-rss.xml'));"

This is NOT UTF-8 encoded data, but is ISO-8859-1 Latin-1 (most likely).

So as I see it you have 1 choice.

Either use <?xml version="1.0" encoding="ISO-8859-1"?> as the XML tag
or convert the encoded data to UTF-8.

It also means that the data in the sql server is NOT UTF-8 and will
need to be converted also.

I would recommend doing that first.

That will mean reading the data as ISO-8859-1 and converting it to
UTF-8 and then saving it again.

I'd also be looking at the app that inputs the data into the DB initially.

To convert the text, here are 2 examples. I'm sure there are more ways.

<?php
$iso_text = 'El Versículo del Día: Pray For Others: Incoming Prayer Requests';

$utf_8_text = utf8_encode($iso_text);
var_dump($iso_text, $utf_8_text);

$utf_8_text = iconv('ISO-8859-1', 'UTF-8', $iso_text);
var_dump($iso_text, $utf_8_text);
?>

outputs ...

string(63) "El Vers퀀culo del D퀀a: Pray For Others: Incoming Prayer Requests"
string(65) "El Versículo del Día: Pray For Others: Incoming Prayer Requests"
string(63) "El Vers퀀culo del D퀀a: Pray For Others: Incoming Prayer Requests"
string(65) "El Versículo del Día: Pray For Others: Incoming Prayer Requests"

notice that the correct strings are 2 bytes longer?

The í is encoded as 0xC3AD or U+00ED.

-- 
Richard Quadling
Twitter : EE : Zend : PHPDoc
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to