Bug#130397: Bug 130397 (Was: Emacs - Ispell problem with i[no]german dictionary)

2005-01-10 Thread Kenichi Handa
In article <[EMAIL PROTECTED]>, Agustin Martin <[EMAIL PROTECTED]> writes:

> (Handa, your patch worked better than I thought, read below)

Thank you, that's a good news.

> Also Kenichi Handa provided us with a patch to ensure that all equivalent
> accented chars are mapped to the same char, if available under different
> encodings, so are not considered as word boundaries if spell-checkable,
> but I still got misalignment errors with it. This would however fixed
> the word boundaries problem for a iso-8859-15 buffer using a iso-8859-1
> dict.

> But I have just noticed that if I add coeur (with oe-1char) to
> the french dict (ifrench, it contained only the oe-2char version) the
> misalignment errors disappear (I only tested with coeur, do not know which
> other words have the same char although I guess that most the oeu)

Sorry I don't understand.  What is oe-1char?  U+0153 or
U+0276?  But, neither of them are not included in
iso-8859-1/iso-8859-15?   And, I have no idea why adding
coeur (with oe-1char) to the dictionary solves the
misalignment error.  Is it because of ispell's bug?

> So, patch from Kenichi Handa seems to work well for sid emacs21, much better
> than I thought. However it uses code that has only been recently added to
> emacs21, and things that are not available for xemacs or emacs20.

Then, how about using your workaround for them, and enable
my patch for an emacs that has
ucs-mule-8859-to-mule-unicode?

---
Ken'ichi HANDA
[EMAIL PROTECTED]


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#130397: Bug 130397 (Was: Emacs - Ispell problem with i[no]german dictionary)

2005-01-11 Thread Kenichi Handa
In article <[EMAIL PROTECTED]>, Agustin Martin <[EMAIL PROTECTED]> writes:
> I meant with oe-1char oe as a single char (U+0153), available
> in iso-8859-15 (octal \275 here), but not in iso-8859-1 (you
> have one half instead), and with oe-2char the two 7bit chars sequence
> 'oe', available anywhere, and that is the trick usually used in
> iso-8859-1 to represent that char.

Ah, I see.

> I like the idea, and at a first glance it should not be difficult to
> implement, even for a person like me, whose lisp skills are limited. It will
> help for old emacs21, for emacs20 my workaround will do nothing since it has
> no iso-8859-15, and for Debian xemacs21 my workaround is also doing nothing
> since xemacs21 seems to return some extra (IMHO wrong) stuff in
> buffer-file-coding-system.

> I will first retest everything with a 'good' (built to my taste) french
> dict, pen and paper, to know in detail the differences in the results for
> both systems. Last day I could notice that the misalignment error
> disappeared and that things worked better, but not much more. Need to test
> also with aspell and other languages.

> Even in the case both systems give mostly similar results I will try
> integrating them, since I guess your patch will be more appropriate for
> future emacs21 development, and looks somewhat more general. 

How are ispell.el of Emacs and that of dictionaries-common
maintained?  Are they synched somehow?  Should I install my
patch for CVS Emacs.  Or, is it better to wait for you or
some other maintainer work on it?

---
Ken'ichi HANDA
[EMAIL PROTECTED]


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#130397: Bug 130397 (Was: Emacs - Ispell problem with i[no]german dictionary)

2005-01-12 Thread Kenichi Handa
In article <[EMAIL PROTECTED]>, Richard Stallman <[EMAIL PROTECTED]> writes:

> People have been discussing this issue for a while now,
> and due to the volume of mail, I could not read it all.

I'm reading it, and I think I understand what is the
problem.

> Handa, is it clear what we should do now for the coming release?

As far as I understand, there are two problems, and my patch
solves one of them for the CVS Emacs.  I asked how to do
with it in my previous mail.

---
Ken'ichi HANDA
[EMAIL PROTECTED]


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#130397: Bug 130397

2005-01-12 Thread Kenichi Handa
In article <[EMAIL PROTECTED]>, Juri Linkov <[EMAIL PROTECTED]> writes:

> Agustin Martin <[EMAIL PROTECTED]> writes:
>>  *Ken*, since you are being cc'ed I vaguely remembered some info I somewhere
>>  read about this misalignements. I finally found it,
>> 
>>   http://lists.gnu.org/archive/html/emacs-devel/2002-09/msg01007.html

> The bug reported on this URL occurs only in Emacs 21.3, not in Emacs CVS.
> It seems something was fixed already.

> However, with a strange coincidence I got the same error in Emacs CVS just
> today for the first time.  So I can describe how this bug can be reproduced
> in Emacs CVS: when the first part of a word was copied from an external
> application and got encoded in the buffer in mule-unicode-0100-24ff,
> and the second part of the word typed with an input method and gets encoded
> in cyrillic-iso8859-5, then calling ispell-buffer on a buffer with the word
> composed with different encodings with `russian' dictionary signals the
> error "Ispell misalignment".

Please try the latest ispell.el.  I think at least this
misalignment error is fixed now.

> And while on this topic, I want to remind that many Emacs users suffer
> from the inability of ispell.el to simultaneously check mixed multi-language
> texts.  So, whoever fixes ispell.el, please take that into account.
> Such combining is quite easily doable for any disjoint alphabets, as well
> as for alphabets where one alphabet is a superset of another, like e.g.
> English and some other Latin-based alphabets.  Even for overlapping
> alphabets it would be possible with using the `w' syntax to get a word
> and to feed it to different ispell instances for each dictionary.

As for this, I agree with the following statement.

Geoff Kuenning <[EMAIL PROTECTED]> writes:
> I'm not entirely sure what you mean here.  For disjoint alphabets,
> it's certainly relatively easy to figure out which word should go to
> which ispell instance.  For identical, superset, or overlapping
> alphabets, the problem is basically insoluable.  For example, "fra" is
> a misspelling in English but legal in Italian.  If it appears in a
> mixed passage, which dictionary should it be fed to?  The only
> solution would seem to be to require the user to mark passages in some
> way, as is done in HTML.

---
Ken'ichi HANDA
[EMAIL PROTECTED]


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#130397: Bug 130397

2005-01-13 Thread Kenichi Handa
In article <[EMAIL PROTECTED]>, David Kastrup <[EMAIL PROTECTED]> writes:
>>>  If ispell wants utf-8, it's easy enough to convert each input line to
>>>  utf-8 and deal with offsets into that in the event of a mispelling;
>> 
>>  Or account for byte offsets by (variable) multibyte lenght of each
>>  character, which Emacs knows.  I don't remember for the moment whether
>>  the multibyte length of the UTF-8 encoding can be gotten at by a Lisp
>>  program, but if not, we could add some primitive to do that.

> Just encode the line to utf-8, find the correct point in the byte
> string, cut off the line there, convert back and check the length of
> the string.  This works unless you are in the middle of a character.

> But it would be much saner if our conversion facilities would preserve
> markers (which they don't do right now): encode to utf-8, place a
> marker at the right byte offset, undo the conversion.

You can encode a text to utf-8, place several makers, encode
regions between markers one by one.

---
Ken'ichi HANDA
[EMAIL PROTECTED]


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#130397: Bug 130397

2005-01-18 Thread Kenichi Handa
In article <[EMAIL PROTECTED]>, Juri Linkov <[EMAIL PROTECTED]> writes:

> Kenichi Handa <[EMAIL PROTECTED]> writes:
>>  Please try the latest ispell.el.  I think at least this
>>  misalignment error is fixed now.

> I tried the latest ispell.el and I see that your change is a definite
> improvement since it now allows to check words in mule-unicode charsets.
> But it still doesn't fix the misalignment error.  It even makes this
> error more frequent because it now occurs in all UTF-8 texts checked
> with ispell-region (which earlier were simply skipped before your change).

> The cause of the error is the following: a line sent by ispell.el
> to the ispell process is converted from mule-unicode charset to the
> process charset, and the accepted output gets converted from process
> coding to the internal Emacs charset iso8859.  So `search-forward' in
> `ispell-process-line' fails to find a string in iso8859 charset
> in the buffer with the same string in mule-unicode charset.

Ah! I see.  I've just installed the attached change which
should fix that misalignment error.  ispell-looking-at is
not that tuned yet, and there will be a better way to
implemente it.

---
Ken'ichi HANDA
[EMAIL PROTECTED]

Index: ispell.el
===
RCS file: /cvsroot/emacs/emacs/lisp/textmodes/ispell.el,v
retrieving revision 1.152
retrieving revision 1.153
diff -u -c -r1.152 -r1.153
cvs diff: conflicting specifications of output style
*** ispell.el   13 Jan 2005 04:33:05 -  1.152
--- ispell.el   18 Jan 2005 23:16:27 -  1.153
***
*** 2794,2799 
--- 2794,2808 
  string))
  
  
+ (defun ispell-looking-at (string)
+   (let ((coding (ispell-get-coding-system))
+   (len (length string)))
+ (and (<= (+ (point) len) (point-max))
+(equal (encode-coding-string string coding)
+   (encode-coding-string (buffer-substring-no-properties
+  (point) (+ (point) len))
+ coding)
+ 
  ;;; Avoid error messages when compiling for these dynamic variables.
  (eval-when-compile
(defvar start)
***
*** 2842,2853 
  
;; Alignment cannot be tracked and this error will occur when
;; `query-replace' makes multiple corrections on the starting line.
!   (if (/= (+ word-len (point))
!   (progn
! ;; NB: Search can fail with Mule coding systems that don't
! ;;  display properly.  Ignore the error in this case?
! (search-forward (car poss) (+ word-len (point)) t)
! (point)))
;; This occurs due to filter pipe problems
(error (concat "Ispell misalignment: word "
   "`%s' point %d; probably incompatible versions")
--- 2851,2857 
  
;; Alignment cannot be tracked and this error will occur when
;; `query-replace' makes multiple corrections on the starting line.
!   (or (ispell-looking-at (car poss))
;; This occurs due to filter pipe problems
(error (concat "Ispell misalignment: word "
   "`%s' point %d; probably incompatible versions")


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#130397: Bug 130397

2005-01-19 Thread Kenichi Handa
In article <[EMAIL PROTECTED]>, Juri Linkov <[EMAIL PROTECTED]> writes:
> Now a new problem was uncovered: after selecting a correct word from
> a list of near misses returned from ispell, ispell.el replaces the
> misspelled word with a selected word, and inserts it into the buffer
> not in its original mule-unicode charset, but in iso8859.

Perhaps the following function can be utilized somewhere in
ispell to do that, but, as I still don't understand ispell
code that much, I'd like to ask someone else to modify
ispell to use it.

;; Destructively modify WORD by converting each character in it to the
;; equivalent character of CHARSET.

(defun ispell-adjust-charset (word charset)
  (let ((len (length word)))
(if (< len (string-bytes word))
(dotimes (i len)
  (let ((c (aref word i))
this-charset equiv-chars)
(if (and (>= c 128)
 (not (eq (setq this-charset (char-charset c)) charset))
 (or (memq this-charset '(mule-unicode-0100-24ff 
  mule-unicode-2500-34ff))
 (setq c (aref ucs-mule-8859-to-mule-unicode c)))
 (setq equivs (aref ispell-unified-chars-table c)))
(catch 'tag
  (dotimes (j (length equiv-chars))
(when (eq (char-charset (aref equiv-chars j)) charset)
  (aset word i (aref equiv-chars j))
  (throw 'tag nil))

---
Ken'ichi HANDA
[EMAIL PROTECTED]

PS. I personally feel it's a waste of time to struggle with
charset matters in ispell that much because emacs-unicode
should not have such a problem.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#435452: emacs22: `set-keyboard-coding-system' fails in non-X11 mode]

2007-08-29 Thread Kenichi Handa
In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] (Ludovic Court$(D+2(Bs) 
writes:

> Hi,
> Sven Joachim <[EMAIL PROTECTED]> writes:

>>>> Invoking `set-keyboard-coding-system' in an "emacs -nw" session fails.
>>>> For instance, asking it `no-conversion' (which is needed so that dead
>>>> keys work as expected) fails:
>>> 
>>>> Unsupported coding system in Encoded-kbd mode: no-conversion
>>> 
>>> I don't understand why you have to set
>>> keyboard-coding-system to no-conversion for dead keys.  Dead
>>> keys must be handled by terminal, and Emacs just receives
>>> the resulting character (encoded in your locale) from the
>>> terminal.  So, setting keyboard-coding-system to what is
>>> appropriate for your locale should work well, and that
>>> should be done automatically.

> Indeed, using "C" as my locale fixes the problem (I used to have
> "LC_CTYPE=fr_FR").

> Strangely enough, Emacs 21.4.1 with "LC_CTYPE=fr_FR" doesn't have the
> problem (i.e., dead keys are usable).

Does it mean that you have a problem with Emacs 22 with
LC_CTYPE=fr_FR?  In that locale, "emacs -nw" should
automatically set keyboard-coding-system to latin-1 and the
input mode to (t nil 0 7), and thus it should accept latin-1
characters sent from a terminal correctly.  What happens
when you type some latin-1 character with dead-key method
under LC_CTYPE=fr_FR?

> Checking the "Meta Sends Escape" box of the xterm in which I run Emacs
> 22 also fixes the problem, even with a non-C locale.

It seems that your Emacs' input mode is set not to accept
8-bit input.  Please tell me what is shown by
  ESC : (current-input-mode) RET

> I guess I'm just displaying my lack of familiarity with how terminals
> work...

>>> What other choices were tried?  utf-8, latin-X should all
>>> work.  What is your locale?

> With a "C" locale, utf-8, latin-1, and others are accepted, whereas
> `no-conversion' yields the above error message.

That is because setting keyboard-coding-system to
no-conversion is useless.

---
Kenichi Handa
[EMAIL PROTECTED]


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]