Michael Brand wrote:
>Hi all,

>org-feed is becoming very useful for me, so far to manage the
>episodes of podcasts. Now I have a patch and a request for help.

>1. patch for an issue with XML entities
>=======================================

>I found that some XML entities in my feeds are not substituted. The
>comments of two recent org-feed.el commits by David Maus
>http://repo.or.cz/w/org-mode.git/commitdiff/6875716e76acfbe1084a47e59d18a30a933d92b6
>and
>http://repo.or.cz/w/org-mode.git/commitdiff/6875716e76acfbe1084a47e59d18a30a933d92b6
>lead me to the thread
>http://thread.gmane.org/gmane.emacs.orgmode/26352
>and invited me to replace org-feed-unescape with xml-substitute-special
>which converts more XML entities. The resulting patch below helps for
>me but of course I would like it to be reviewed by an experienced elisp
>programmer and org-feed user before being applied.

This patch is fine and `xml-substitute-special' is the right thing to
do (i.e. convert numeric character references, too).

>2. request for help about an issue with multibyte character encoding
>====================================================================

>There is an issue with multibyte characters that appear in the input
>as unescaped, multibyte encoded characters (not as XML entities, as XML
>entities multibyte characters are simply substituted correctly). I
>looked for an example with a character encoding specified in the first
>line of the XML feed like
><?xml version="1.0" encoding="utf-8"?>
>and found one here:
>http://www.openscreencast.de/blog/rss.xml

The problem with this feed is, that it contains raw unicode characters
that must be converted to utf-8 before they can be properly inserted
in the target buffer.

Attached patch does this by explicitely decoding new entries according
to their detected character encoding.

Btw.: Helpful introduction to the topic gives

The Absolute Minimum Every Software Developer Absolutely, Positively
Must Know About Unicode and Character Sets (No Excuses!)

by Joel Spolsky

http://www.joelonsoftware.com/articles/Unicode.html

Best,
  -- David
--
OpenPGP... 0x99ADB83B5A4478E6
Jabber.... dmj...@jabber.org
Email..... dm...@ictsoc.de
From 9e4885c9f1b987fb04c934f17dceb1a5f2bb3544 Mon Sep 17 00:00:00 2001
From: David Maus <dm...@ictsoc.de>
Date: Fri, 13 Aug 2010 17:26:47 +0200
Subject: [PATCH] Decode entry according to its character encoding

* org-feed.el (org-feed-format-entry): Decode entry according to its
character encoding.
---
 lisp/org-feed.el |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/lisp/org-feed.el b/lisp/org-feed.el
index 073d344..984f896 100644
--- a/lisp/org-feed.el
+++ b/lisp/org-feed.el
@@ -553,7 +553,8 @@ If that property is already present, nothing changes."
                  (setq tmp (org-feed-make-indented-block
                             tmp (org-get-indentation))))))
            (replace-match tmp t t))))
-       (buffer-string)))))
+       (decode-coding-string
+        (buffer-string) (detect-coding-region (point-min) (point-max) t))))))
 
 (defun org-feed-make-indented-block (s n)
   "Add indentation of N spaces to a multiline string S."
-- 
1.7.1

Attachment: pgpuppZuRTnZG.pgp
Description: PGP signature

_______________________________________________
Emacs-orgmode mailing list
Please use `Reply All' to send replies to the list.
Emacs-orgmode@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-orgmode

Reply via email to