hi again,

As you might know from one of my earlier mails to this list, raggle
sometimes is unable to parse the rss feeds of http://planet.debian.net.

It seems like some people use unicode characters in their descriptions
(especially people from spain (U+2019 right single quotation mark)).
This unfortunately makes raggle break while reading the feeds.

Attached you will find a simple rss file that contains U+2019, raggle
version 0.3.2 will stop updating the feed with:

 Wed Jan 19 19:33:08 CET 2005: Updating feed "Untitled Feed" from 
"http://localhost/web/rss.xml";
 content: <?xml version='1.0' encoding='utf-8' ?>
 <channel>
     <title>unicode</title>
      <li
 modified: Wed, 19 Jan 2005 18:29:34 GMT
 Wed Jan 19 19:33:08 CET 2005: Error: "\342\200\231t \n  "
 Wed Jan 19 19:33:08 CET 2005: Done checking.  Sleeping for 60s.

After removing the character raggle works nicely.

So, my question is, how to get rid of this problem. Using ruby's
libiconv to convert the feed discriptions? Any other ideas? Whats the
status of ruby's unicode support nowadays?

bye,
    - michael

Attachment: example.xml
Description: application/xml

Attachment: signature.asc
Description: Digital signature

Reply via email to