Sorry the last patch has a minor problem - it treats the whole of 0x0EXX as 
Thai (0x0E80-0x0EFF is Laos). while it is unlikely(?) somebody might want to do 
both - is there a LaTeX package for Laos? - we better do it correctly.

BTW, your TUGboat article 11 years ago says the word-breaking algorithm might 
be in the next version of emacs... That hadn't happened yet? :-). 

--- On Tue, 27/12/11, Hin-Tak Leung <hintak_le...@yahoo.co.uk> wrote:

> Here is another patch - I really
> would like to have Thai inputs in utf8 instead of TIS620, so
> it happened :-). plus example input file and change log.
> 
> This patch is a somewhat unusual approach - it isn't using
> C70 font definition, nor doing font re-encoding, but uses
> emacs's character encoding capability to transform
> unicode-Thai to tis620-Thai before doing word-breaking.
> 
> Do you think it is worth adding similar
> unicode->regional hooks for the other babel
> single-byte-encodings? (I read up on emacs-mule and it is
> really a family of encodings rather than a single one like
> unicode...that's possibly how emacs preserves charset info)
> - one definitely does not want to add the double-byte ones.
> I suppose only Thai is dependent on an external
> word-breaking program .
> 
> While knowledge of lisp isn't as common as that of C/C++,
> emacs is (currently) more portable/ported than swath... So
> what are the advantages of using ThaiLaTeX? (besides the
> obvious and vague one like 'written by a native' - there are
> a lot of ugly latex things from the Chinese as well...)
From 8bcb26d2739e8e2cf4c59edcfca0f9d80d174ad7 Mon Sep 17 00:00:00 2001
From: Hin-Tak Leung <ht...@users.sourceforge.net>
Date: Fri, 30 Dec 2011 14:30:17 +0000
Subject: [PATCH] [cjk-enc.el] Correct minor issue with last commit.

Thai is 0x0E00-0x0E7F only. The previous commit mistakenly
treats all of 0x0EXX (the upper range being Laos) as Thai.
---
 ChangeLog                   |    6 ++++++
 utils/lisp/emacs/cjk-enc.el |    7 ++++---
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 43d2df0..6edace8 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2011-12-30 Hin-Tak Leung <ht...@users.sourceforge.net>
+	[cjk-enc.el] Correct minor issue with last commit.
+
+	Thai is 0x0E00-0x0E7F only. The previous commit mistakenly
+	treats all of 0x0EXX (the upper range being Laos) as Thai.
+
 2011-12-27 Hin-Tak Leung <ht...@users.sourceforge.net>
 
 	[cjk-enc.el] Accept Thai inputs in utf-8 encoding.
diff --git a/utils/lisp/emacs/cjk-enc.el b/utils/lisp/emacs/cjk-enc.el
index 7aa9615..11d26e1 100644
--- a/utils/lisp/emacs/cjk-enc.el
+++ b/utils/lisp/emacs/cjk-enc.el
@@ -657,9 +657,10 @@
         (if (eq charset 'unicode)
             (let ((l (split-char ch)))
               (progn
-                ;; Unicode 0x0EXX is Thai. Transform back to TIS620
-                (setq ch2 (nth 2 l))
-                (if (eq ch2 14)
+                ;; Unicode 0x0E00-0x0E7F is Thai. Transform back to TIS620
+                (setq ch2 (nth 2 l)
+                      ch3 (nth 3 l))
+                (if (and (eq ch2 14) (< ch3 128))
                     (setq charset 'thai-tis620
                           ch (encode-char ch 'thai-tis620))))))
 
-- 
1.7.7.4

_______________________________________________
Cjk maillist  -  Cjk@ffii.org
https://lists.ffii.org/mailman/listinfo/cjk

Reply via email to