Here is another patch - I really would like to have Thai inputs in utf8 instead 
of TIS620, so it happened :-). plus example input file and change log.

This patch is a somewhat unusual approach - it isn't using C70 font definition, 
nor doing font re-encoding, but uses emacs's character encoding capability to 
transform unicode-Thai to tis620-Thai before doing word-breaking.

Do you think it is worth adding similar unicode->regional hooks for the other 
babel single-byte-encodings? (I read up on emacs-mule and it is really a family 
of encodings rather than a single one like unicode...that's possibly how emacs 
preserves charset info) - one definitely does not want to add the double-byte 
ones. I suppose only Thai is dependent on an external word-breaking program .

While knowledge of lisp isn't as common as that of C/C++, emacs is (currently) 
more portable/ported than swath... So what are the advantages of using 
ThaiLaTeX? (besides the obvious and vague one like 'written by a native' - 
there are a lot of ugly latex things from the Chinese as well...)

--- On Mon, 26/12/11, Werner LEMBERG <w...@gnu.org> wrote:

> > Here it is - should work just with
> >     git am < ...patch
> 
> Thanks again!  I've applied it to the git repository
> (after massaging
> the ChangeLog entry and the source comments).
> 
> 
>     Werner
>
From 06017f15e498002e1667bd8ede3a9592d059520e Mon Sep 17 00:00:00 2001
From: Hin-Tak Leung <ht...@users.sourceforge.net>
Date: Tue, 27 Dec 2011 12:44:12 +0000
Subject: [PATCH] [cjk-enc.el] Accept Thai inputs in utf-8 encoding.

---
 ChangeLog                   |    8 ++++++++
 examples/thai-utf8.tex      |   43 +++++++++++++++++++++++++++++++++++++++++++
 utils/lisp/emacs/cjk-enc.el |   10 ++++++++++
 3 files changed, 61 insertions(+), 0 deletions(-)
 create mode 100644 examples/thai-utf8.tex

diff --git a/ChangeLog b/ChangeLog
index 9db9591..43d2df0 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,11 @@
+2011-12-27 Hin-Tak Leung <ht...@users.sourceforge.net>
+
+	[cjk-enc.el] Accept Thai inputs in utf-8 encoding.
+
+	* utils/lisp/emacs/cjk-enc.el, examples/thai-utf8.tex:
+	Treat Unicode 0x0EXX inputs as Thai, and a new example
+	made from the tis620-based example.
+
 2011-12-16 Hin-Tak Leung <ht...@users.sourceforge.net>
 
 	[cjk-enc.el] Make it work with emacs 23 and newer.
diff --git a/examples/thai-utf8.tex b/examples/thai-utf8.tex
new file mode 100644
index 0000000..c5f9874
--- /dev/null
+++ b/examples/thai-utf8.tex
@@ -0,0 +1,43 @@
+% This is the file thai-utf8.tex of the CJK package
+%   for testing Thai (in utf-8 encoding).
+%
+% written by Werner Lemberg <w...@gnu.org>
+%
+% Version 4.8.2 (29-Dec-2008)
+
+% This file must be processed with cjk-enc.el to get
+%
+%   . proper word breaks
+%   . font switching between Thai and non-Thai
+%   . intercharacter glue
+%
+% Please read cjk-enc.txt for usage instructions.
+%
+% To process without cjk-enc.el, comment out the line containing
+% `\extrasthaicjk'.  Note, however, that you get overlong lines, and you
+% must manually insert proper Thai word breaks.
+
+
+\documentclass[12pt]{article}
+
+\usepackage[thaicjk]{babel}
+
+% \addto\extrasthaicjk{\fontencoding{C90}\selectfont}
+
+
+\begin{document}
+
+รายการ FAQ นี้สร้างขึ้นเพื่อสรุปคำถามที่ถามกันบ่อยครั้งและคำตอบคำถามในรูปแบบทีสะดวก.
+โครงสร้างของรายการ FAQ นี้เปลี่ยนไปมากตั้งแต่รุ่นที่แล้ว.
+\textbf{ดูรายละเอียดสำหรับโครงสร้างใหม่ได้จากช่วง ``โครงสร้างและวิธีการอ่าน
+  FAQ.''}
+
+\end{document}
+
+
+%%% Local Variables:
+%%% coding: utf-8-unix
+%%% mode: latex
+%%% TeX-master: t
+%%% TeX-command-default: "CJKLaTeX"
+%%% End:
diff --git a/utils/lisp/emacs/cjk-enc.el b/utils/lisp/emacs/cjk-enc.el
index 12ead58..7aa9615 100644
--- a/utils/lisp/emacs/cjk-enc.el
+++ b/utils/lisp/emacs/cjk-enc.el
@@ -653,6 +653,16 @@
         (if (eq charset 'tis620-2533)
             (setq charset (char-charset ch '(thai-tis620 ascii))))
 
+        ;; See if any Unicode-based inputs are recognisable
+        (if (eq charset 'unicode)
+            (let ((l (split-char ch)))
+              (progn
+                ;; Unicode 0x0EXX is Thai. Transform back to TIS620
+                (setq ch2 (nth 2 l))
+                (if (eq ch2 14)
+                    (setq charset 'thai-tis620
+                          ch (encode-char ch 'thai-tis620))))))
+
         ;; `split-char' in emacs 23+ is sensitive to charset priority.
         (cond ((> emacs-major-version 22)
                (if (not (eq charset 'ascii))
-- 
1.7.7.4

_______________________________________________
Cjk maillist  -  Cjk@ffii.org
https://lists.ffii.org/mailman/listinfo/cjk

Reply via email to