Michael Brand wrote: >Hi all, >org-feed is becoming very useful for me, so far to manage the >episodes of podcasts. Now I have a patch and a request for help.
>1. patch for an issue with XML entities >======================================= >I found that some XML entities in my feeds are not substituted. The >comments of two recent org-feed.el commits by David Maus >http://repo.or.cz/w/org-mode.git/commitdiff/6875716e76acfbe1084a47e59d18a30a933d92b6 >and >http://repo.or.cz/w/org-mode.git/commitdiff/6875716e76acfbe1084a47e59d18a30a933d92b6 >lead me to the thread >http://thread.gmane.org/gmane.emacs.orgmode/26352 >and invited me to replace org-feed-unescape with xml-substitute-special >which converts more XML entities. The resulting patch below helps for >me but of course I would like it to be reviewed by an experienced elisp >programmer and org-feed user before being applied. This patch is fine and `xml-substitute-special' is the right thing to do (i.e. convert numeric character references, too). >2. request for help about an issue with multibyte character encoding >==================================================================== >There is an issue with multibyte characters that appear in the input >as unescaped, multibyte encoded characters (not as XML entities, as XML >entities multibyte characters are simply substituted correctly). I >looked for an example with a character encoding specified in the first >line of the XML feed like ><?xml version="1.0" encoding="utf-8"?> >and found one here: >http://www.openscreencast.de/blog/rss.xml The problem with this feed is, that it contains raw unicode characters that must be converted to utf-8 before they can be properly inserted in the target buffer. Attached patch does this by explicitely decoding new entries according to their detected character encoding. Btw.: Helpful introduction to the topic gives The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky http://www.joelonsoftware.com/articles/Unicode.html Best, -- David -- OpenPGP... 0x99ADB83B5A4478E6 Jabber.... dmj...@jabber.org Email..... dm...@ictsoc.de
From 9e4885c9f1b987fb04c934f17dceb1a5f2bb3544 Mon Sep 17 00:00:00 2001 From: David Maus <dm...@ictsoc.de> Date: Fri, 13 Aug 2010 17:26:47 +0200 Subject: [PATCH] Decode entry according to its character encoding * org-feed.el (org-feed-format-entry): Decode entry according to its character encoding. --- lisp/org-feed.el | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/lisp/org-feed.el b/lisp/org-feed.el index 073d344..984f896 100644 --- a/lisp/org-feed.el +++ b/lisp/org-feed.el @@ -553,7 +553,8 @@ If that property is already present, nothing changes." (setq tmp (org-feed-make-indented-block tmp (org-get-indentation)))))) (replace-match tmp t t)))) - (buffer-string))))) + (decode-coding-string + (buffer-string) (detect-coding-region (point-min) (point-max) t)))))) (defun org-feed-make-indented-block (s n) "Add indentation of N spaces to a multiline string S." -- 1.7.1
pgpuppZuRTnZG.pgp
Description: PGP signature
_______________________________________________ Emacs-orgmode mailing list Please use `Reply All' to send replies to the list. Emacs-orgmode@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-orgmode