Here is another patch - I really would like to have Thai inputs in utf8 instead
of TIS620, so it happened :-). plus example input file and change log.
This patch is a somewhat unusual approach - it isn't using C70 font definition,
nor doing font re-encoding, but uses emacs's character encoding capability to
transform unicode-Thai to tis620-Thai before doing word-breaking.
Do you think it is worth adding similar unicode->regional hooks for the other
babel single-byte-encodings? (I read up on emacs-mule and it is really a family
of encodings rather than a single one like unicode...that's possibly how emacs
preserves charset info) - one definitely does not want to add the double-byte
ones. I suppose only Thai is dependent on an external word-breaking program .
While knowledge of lisp isn't as common as that of C/C++, emacs is (currently)
more portable/ported than swath... So what are the advantages of using
ThaiLaTeX? (besides the obvious and vague one like 'written by a native' -
there are a lot of ugly latex things from the Chinese as well...)
--- On Mon, 26/12/11, Werner LEMBERG <w...@gnu.org> wrote:
> > Here it is - should work just with
> > git am < ...patch
>
> Thanks again! I've applied it to the git repository
> (after massaging
> the ChangeLog entry and the source comments).
>
>
> Werner
>
From 06017f15e498002e1667bd8ede3a9592d059520e Mon Sep 17 00:00:00 2001
From: Hin-Tak Leung <ht...@users.sourceforge.net>
Date: Tue, 27 Dec 2011 12:44:12 +0000
Subject: [PATCH] [cjk-enc.el] Accept Thai inputs in utf-8 encoding.
---
ChangeLog | 8 ++++++++
examples/thai-utf8.tex | 43 +++++++++++++++++++++++++++++++++++++++++++
utils/lisp/emacs/cjk-enc.el | 10 ++++++++++
3 files changed, 61 insertions(+), 0 deletions(-)
create mode 100644 examples/thai-utf8.tex
diff --git a/ChangeLog b/ChangeLog
index 9db9591..43d2df0 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,11 @@
+2011-12-27 Hin-Tak Leung <ht...@users.sourceforge.net>
+
+ [cjk-enc.el] Accept Thai inputs in utf-8 encoding.
+
+ * utils/lisp/emacs/cjk-enc.el, examples/thai-utf8.tex:
+ Treat Unicode 0x0EXX inputs as Thai, and a new example
+ made from the tis620-based example.
+
2011-12-16 Hin-Tak Leung <ht...@users.sourceforge.net>
[cjk-enc.el] Make it work with emacs 23 and newer.
diff --git a/examples/thai-utf8.tex b/examples/thai-utf8.tex
new file mode 100644
index 0000000..c5f9874
--- /dev/null
+++ b/examples/thai-utf8.tex
@@ -0,0 +1,43 @@
+% This is the file thai-utf8.tex of the CJK package
+% for testing Thai (in utf-8 encoding).
+%
+% written by Werner Lemberg <w...@gnu.org>
+%
+% Version 4.8.2 (29-Dec-2008)
+
+% This file must be processed with cjk-enc.el to get
+%
+% . proper word breaks
+% . font switching between Thai and non-Thai
+% . intercharacter glue
+%
+% Please read cjk-enc.txt for usage instructions.
+%
+% To process without cjk-enc.el, comment out the line containing
+% `\extrasthaicjk'. Note, however, that you get overlong lines, and you
+% must manually insert proper Thai word breaks.
+
+
+\documentclass[12pt]{article}
+
+\usepackage[thaicjk]{babel}
+
+% \addto\extrasthaicjk{\fontencoding{C90}\selectfont}
+
+
+\begin{document}
+
+รายà¸à¸²à¸£ FAQ à¸à¸µà¹à¸ªà¸£à¹à¸²à¸à¸à¸¶à¹à¸à¹à¸à¸·à¹à¸à¸ªà¸£à¸¸à¸à¸à¸³à¸à¸²à¸¡à¸à¸µà¹à¸à¸²à¸¡à¸à¸±à¸à¸à¹à¸à¸¢à¸à¸£à¸±à¹à¸à¹à¸¥à¸°à¸à¸³à¸à¸à¸à¸à¸³à¸à¸²à¸¡à¹à¸à¸£à¸¹à¸à¹à¸à¸à¸à¸µà¸ªà¸°à¸à¸§à¸.
+à¹à¸à¸£à¸à¸ªà¸£à¹à¸²à¸à¸à¸à¸à¸£à¸²à¸¢à¸à¸²à¸£ FAQ à¸à¸µà¹à¹à¸à¸¥à¸µà¹à¸¢à¸à¹à¸à¸¡à¸²à¸à¸à¸±à¹à¸à¹à¸à¹à¸£à¸¸à¹à¸à¸à¸µà¹à¹à¸¥à¹à¸§.
+\textbf{à¸à¸¹à¸£à¸²à¸¢à¸¥à¸°à¹à¸à¸µà¸¢à¸à¸ªà¸³à¸«à¸£à¸±à¸à¹à¸à¸£à¸à¸ªà¸£à¹à¸²à¸à¹à¸«à¸¡à¹à¹à¸à¹à¸à¸²à¸à¸à¹à¸§à¸ ``à¹à¸à¸£à¸à¸ªà¸£à¹à¸²à¸à¹à¸¥à¸°à¸§à¸´à¸à¸µà¸à¸²à¸£à¸à¹à¸²à¸
+ FAQ.''}
+
+\end{document}
+
+
+%%% Local Variables:
+%%% coding: utf-8-unix
+%%% mode: latex
+%%% TeX-master: t
+%%% TeX-command-default: "CJKLaTeX"
+%%% End:
diff --git a/utils/lisp/emacs/cjk-enc.el b/utils/lisp/emacs/cjk-enc.el
index 12ead58..7aa9615 100644
--- a/utils/lisp/emacs/cjk-enc.el
+++ b/utils/lisp/emacs/cjk-enc.el
@@ -653,6 +653,16 @@
(if (eq charset 'tis620-2533)
(setq charset (char-charset ch '(thai-tis620 ascii))))
+ ;; See if any Unicode-based inputs are recognisable
+ (if (eq charset 'unicode)
+ (let ((l (split-char ch)))
+ (progn
+ ;; Unicode 0x0EXX is Thai. Transform back to TIS620
+ (setq ch2 (nth 2 l))
+ (if (eq ch2 14)
+ (setq charset 'thai-tis620
+ ch (encode-char ch 'thai-tis620))))))
+
;; `split-char' in emacs 23+ is sensitive to charset priority.
(cond ((> emacs-major-version 22)
(if (not (eq charset 'ascii))
--
1.7.7.4
_______________________________________________
Cjk maillist - Cjk@ffii.org
https://lists.ffii.org/mailman/listinfo/cjk