Re: [Cjk] finally! Re: fixed your emacs problem!... Re: Thai example on CJK

Hin-Tak Leung Fri, 16 Dec 2011 05:59:48 -0800

Tested it with EMACS_PRETEST_24_0_92-142-g559675b (today-ish's emacs git master 
head) and it produces identical result *.cjk as emacs 22 with the unpatched 
version, for the 4 files, CJKbabel.tex, muletest.tex, rubytest.tex and thai.tex 
. So I think the problem is fixed.


I checked the thai issue - emacs 24 has the same behavior as emacs 23 - for 
thai.tex, it shows tis620-2533 everywhere for the whole document (i.e. the 
ascii portions are tagged as thai), whereas in CJKbabel.tex, the thai part are 
thai-tis620 while the ascii part are nil; I think this is probably a difference 
of actually claiming tis620 for the whole document in thai.tex .

According to emacs's source code tis620-2533 is the superset of thai-tis620 and 
ascii, so that's what how get-text-property behaves; charset also behaves the 
same way, unless restricted, for thai.tex  . For CJKbabel.tex, 'FAQ' , 'textbf' 
are treated as ascii and distinct. 

--- On Thu, 15/12/11, Hin-Tak Leung <hintak_le...@yahoo.co.uk> wrote:

> Finally!
> 
> Here is a patch against your git-head (same as v4.8.2) of
> your cjk-enc.el. It includes your define-coding-system patch
> also. Tested okay for both emacs 22 and 23, and I should
> expect the same for 24, caveat the thai issue below.
> 
> So, in the end, it is almost all unicode-related. The
> changes are:
> 
> - define-coding-system (make-coding-system deprecated)
> - char-charset returns unicode (and also sensitive to
> priority) in emacs 23.
> switch over to use text-property:charset as charset, which
> seems more reliable
> - there is a new charset/text-property called
> 'tis620-2533', which is a superset of ascii and thai-tis620
> , this has the tendency of swallowing up every ascii
> character to the end of file and make the code go into an
> infinite loop... This is seen with thai.tex, which is just
> thai and ascii. so back out of that and go back to
> char-charset with restriction. 
> 
> - split-char also returns unicode plus code point and also
> sensitive to priority, instead of charset + code point. so
> set priority to text-property for it.
> 
> Now that I have it working, it probably explain why I had
> an almost correct version earlier, then lost it. Then I had
> priority set to high for known ones, restrict search to
> known ones, then make the priority choice sticky. That
> differ in that the sticky choice could overflow into the
> next language change, whereas this "correct" solution, while
> priority is set for split-char to work and almost, it is
> reset back to that from text-property in the next round.
> 
> I suspect an alternative 'correct' solution would be to set
> priority to a fixed known list before each char-charset
> pushing unicode to the end, then maybe even the split-char
> would work (I had it set before the whole loop, therefore it
> spill over to the next language section).
> 
> Give it a go with emacs 24/bzr, and see if it works?

_______________________________________________
Cjk maillist  -  Cjk@ffii.org
https://lists.ffii.org/mailman/listinfo/cjk

Re: [Cjk] finally! Re: fixed your emacs problem!... Re: Thai example on CJK

Reply via email to