Hello Renzo & Christian Thanks for the test files and sharing your views on this issue. With the attached patch I can export the test files successfully.
The attached patch ensures that component xml files created by the odt exporter are always utf-8 encoded. This is irrespective of the coding system used by the Org buffer. Jambunathan K.
>From 1ec1e3c9248387ab2daabe7b9c7cc4a3c42b4998 Mon Sep 17 00:00:00 2001 From: Jambunathan K <kjambunat...@gmail.com> Date: Mon, 18 Jul 2011 00:26:41 +0530 Subject: [PATCH] org-odt: Correctly export iso-8859-1 files with non-ascii chars * contrib/lisp/org-odt.el (org-odt-get): Set CODING-SYSTEM-FOR-WRITE and CODING-SYSTEM-FOR-SAVE to 'utf-8 irrespective of buffer-file-coding-system. Fixes issue reported by Renzo Been in the following post. http://lists.gnu.org/archive/html/emacs-orgmode/2011-07/msg00795.html --- contrib/lisp/org-odt.el | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/contrib/lisp/org-odt.el b/contrib/lisp/org-odt.el index f3a4067..bd2ea33 100644 --- a/contrib/lisp/org-odt.el +++ b/contrib/lisp/org-odt.el @@ -1380,6 +1380,8 @@ MAY-INLINE-P allows inlining it as an image." (PLAIN-TEXT-MAP '(("&" . "&") ("<" . "<") (">" . ">"))) (TABLE-FIRST-COLUMN-AS-LABELS nil) (FOOTNOTE-SEPARATOR (org-lparse-format 'FONTIFY "," 'superscript)) + (CODING-SYSTEM-FOR-WRITE 'utf-8) + (CODING-SYSTEM-FOR-SAVE 'utf-8) (t (error "Unknown property: %s" what)))) (defun org-odt-parse-label (label) -- 1.7.2.3
> Hi Jambunathan, > > See comments below. > > Ciao, > Renzo > P.S. I'm on a camping-site right now, so I do not have good Internet access... > > On 16 July 2011 22:13, Jambunathan K <kjambunat...@gmail.com> wrote: >> >> Renzo >> >>> I just want to add one point that I did not find in the org-manual. I >>> tested >>> some of my org-files and exported them to the OpenOffice format. When I >>> tried to >>> open these documents in OpenOffice, they were corrupt and could not be >>> opened. >>> >>> I soon found out why. If you want to export an org-mode file to .odt, you >>> need >>> to explicitly set the file encoding to UTF-8 (I usually use iso-8859-1 >>> encoding >>> for my files), like: >>> #-*- mode: org; coding: utf-8; -*- >>> After that OpenOffice could open the files without any problems. >> >> I use English for communication and I have to admit that I have zero >> understanding of things like character sets, encodings etc. > > As for communicating; I'm from the border regions of The Netherlands, Belgium > and Germany... And therefore I'm multilingual, and often need to type words > with accents. > >> Thanks for the above note. I surely see is a bug but my poor >> understanding prevents me from quantifying it further. > > Well... I would not really see it as a bug... As long as it is mentioned in > the > documentation, that org-file encoding's other then utf-8 could result in > corrupt > output-files. > >> Could you please send me a minimal iso-8859-1 test.org file and the >> associated corrupted test.odt file? I will look in to this issue. > > See attachment. I can only send you the org file, because I do not have access > to a working Emacs at the moment... > >> 1. Do you have any specific requirement on how the component xml files >> be encoded? A cursory look at the odt exporter suggests that it could >> actually be emitting xml files in iso-8859-1 format while wrongly >> claiming UTF-8 encoding as below >> >> --8<---------------cut here---------------start------------->8--- >> <?xml version="1.0" encoding="UTF-8"?> >> --8<---------------cut here---------------end--------------->8--- >> >> 2. Should the xml file be always ejected in UTF-8 irrespective of how >> the original Org file is encoded. > > Yes that would seem a good solution to me... If the odt-exporter checks the > files encoding, and then changes the encoding to utf-8 (maybe using a > temporary > buffer?) before the actual exporting, then there would be no further > problems... > > As for the idea that the OpenOffice xml can actually be in another encoding > than utf-8; I do not know how much work that would be for you, to implement in > the odt-exporter. It might be to much effort... > Also I don't know if such an OpenOffice document will open with no problems in > all OpenOffice applications. > >> [Notes to Self] >> [Notes from odbook] >> >> Para 3 of http://books.evc-cit.info/odbook/apa.html#appc-11-fm2xml >> says >> >> --8<---------------cut here---------------start------------->8--- >> OpenDocument files are always encoded in UTF-8. >> --8<---------------cut here---------------end--------------->8--- >> >> Para 2 of >> http://books.evc-cit.info/odbook/apa.html#xml-other-char-encodings-section >> says >> >> --8<---------------cut here---------------start------------->8--- >> XML 1.0 allows a document to be encoded in any character set registered >> with the Internet Assigned Numbers Authority (IANA). European documents >> are commonly encoded in one of the ISO Latin character sets, such as >> ISO-8859-1. Japanese documents commonly use Shift-JIS, and Chinese >> documents use GB2312 and Big 5. >> --8<---------------cut here---------------end--------------->8--- >> >> Para 4 of >> http://books.evc-cit.info/odbook/apa.html#xml-other-char-encodings-section >> says >> >> --8<---------------cut here---------------start------------->8--- >> XML processors are not required by the XML 1.0 specification to support >> any more than UTF-8 and UTF-16, but most commonly support other >> encodings, such as US-ASCII and ISO-8859-1. >> --8<---------------cut here---------------end--------------->8--- >> >> >> [Notes from XMLmind XSL-FO Converter] >> >> >> XFC supports outputting of content.xml and styles.xml in UTF-8 as well >> as ISO-8859-1. >> >> http://xml.web.cern.ch/XML/www.xmlmind.com/xfc_perso_java-4_4_0/doc/user/command_line_java.html >> >> says >> >> ,---- [see outputEncoding section] >> | For OpenDocument output (.odt), this option specifies the encoding of >> | XML content (files styles.xml and content.xml) in the output >> | document. All encodings available in the current JVM are supported. The >> | option value may be either the encoding name (e.g. ISO8859_1) or the >> | charset name (e.g. ISO-8859-1). The default value is UTF8. >> `---- >> >> -- > --