Finally!

Here is a patch against your git-head (same as v4.8.2) of your cjk-enc.el. It 
includes your define-coding-system patch also. Tested okay for both emacs 22 
and 23, and I should expect the same for 24, caveat the thai issue below.

So, in the end, it is almost all unicode-related. The changes are:

- define-coding-system (make-coding-system deprecated)
- char-charset returns unicode (and also sensitive to priority) in emacs 23.
switch over to use text-property:charset as charset, which seems more reliable
- there is a new charset/text-property called 'tis620-2533', which is a 
superset of ascii and thai-tis620 , this has the tendency of swallowing up 
every ascii character to the end of file and make the code go into an infinite 
loop... This is seen with thai.tex, which is just thai and ascii. so back out 
of that and go back to char-charset with restriction. 

- split-char also returns unicode plus code point and also sensitive to 
priority, instead of charset + code point. so set priority to text-property for 
it.

Now that I have it working, it probably explain why I had an almost correct 
version earlier, then lost it. Then I had priority set to high for known ones, 
restrict search to known ones, then make the priority choice sticky. That 
differ in that the sticky choice could overflow into the next language change, 
whereas this "correct" solution, while priority is set for split-char to work 
and almost, it is reset back to that from text-property in the next round.

I suspect an alternative 'correct' solution would be to set priority to a fixed 
known list before each char-charset pushing unicode to the end, then maybe even 
the split-char would work (I had it set before the whole loop, therefore it 
spill over to the next language section).

Give it a go with emacs 24/bzr, and see if it works?
diff --git a/emacs/cjk-enc.el b/emacs/cjk-enc.el
index 4d1bae5..65e41b0 100644
--- a/emacs/cjk-enc.el
+++ b/emacs/cjk-enc.el
@@ -549,12 +549,44 @@
      "Coding-system for LaTeX2e CJK Package"
      '(mnemonic "CJK"
        pre-write-conversion cjk-encode))
-  (make-coding-system
-   'cjk-coding 0 ?c
-   "Coding-system for LaTeX2e CJK Package"
-   nil
-   '((pre-write-conversion . cjk-encode))))
-
+  (if (< emacs-major-version 23)
+      (make-coding-system
+       'cjk-coding 0 ?c
+       "Coding-system for LaTeX2e CJK Package"
+       nil
+       '((pre-write-conversion . cjk-encode)))
+    (define-coding-system
+      'cjk-coding
+      "Coding-system for LaTeX2e CJK Package"
+      :mnemonic ?c
+      :coding-type 'emacs-mule
+      :default-char ?
+      :charset-list '(ascii
+                      latin-iso8859-1
+                      latin-iso8859-2
+                      latin-iso8859-3
+                      latin-iso8859-4
+                      cyrillic-iso8859-5
+                      greek-iso8859-7
+                      thai-tis620
+                      vietnamese-viscii-lower
+                      vietnamese-viscii-upper
+                      latin-jisx0201
+                      katakana-jisx0201
+                      japanese-jisx0208
+                      japanese-jisx0212
+                      korean-ksc5601
+                      chinese-gb2312
+                      chinese-big5-1
+                      chinese-big5-2
+                      chinese-cns11643-1
+                      chinese-cns11643-2
+                      chinese-cns11643-3
+                      chinese-cns11643-4
+                      chinese-cns11643-5
+                      chinese-cns11643-6
+                      chinese-cns11643-7)
+      :pre-write-conversion 'cjk-encode)))
 
 ;; XEmacs doesn't have set-buffer-multibyte.
 ;;
@@ -602,11 +634,25 @@
       (setq prev-charset 'ascii)
 
       (while (not (eobp))
+        (setq tpch (get-text-property (point) 'charset)) ;; new in emacs 23 but return harmless nil in 22
         (setq ch (following-char))
         (set-buffer work-buf)
 
         ;; Set CHARSET to the character set of the current character.
-        (setq charset (char-charset ch))
+        ;; use text-property (emacs 23) in preference to char-charset)
+        (if (not (eq tpch nil))
+            (setq charset tpch)
+          ;; char-charset in emacs can accept an optional list to search, may use that
+          (setq charset (char-charset ch))
+          )
+        ;; tis620-2533 has a problem with swallowing all the ascii
+        (if (eq charset 'tis620-2533)
+            ;; emacs 23's char-charset takes an optional restriction list
+            (setq charset (char-charset ch '(thai-tis620 ascii))))
+        ;; split-char below in emacs 23+ is sensitive to priority-charset
+        (cond ((> emacs-major-version 22)
+               (if (not (eq charset 'ascii))
+                   (set-charset-priority charset))))
         (if (eq charset 'ascii)
             ;; Not a multibyte character.
             (progn
_______________________________________________
Cjk maillist  -  Cjk@ffii.org
https://lists.ffii.org/mailman/listinfo/cjk

Reply via email to