Bug#130397: Bug 130397 (Was: Emacs - Ispell problem with i[no]german dictionary)
In article <[EMAIL PROTECTED]>, Agustin Martin <[EMAIL PROTECTED]> writes: > (Handa, your patch worked better than I thought, read below) Thank you, that's a good news. > Also Kenichi Handa provided us with a patch to ensure that all equivalent > accented chars are mapped to the same char, if available under different > encodings, so are not considered as word boundaries if spell-checkable, > but I still got misalignment errors with it. This would however fixed > the word boundaries problem for a iso-8859-15 buffer using a iso-8859-1 > dict. > But I have just noticed that if I add coeur (with oe-1char) to > the french dict (ifrench, it contained only the oe-2char version) the > misalignment errors disappear (I only tested with coeur, do not know which > other words have the same char although I guess that most the oeu) Sorry I don't understand. What is oe-1char? U+0153 or U+0276? But, neither of them are not included in iso-8859-1/iso-8859-15? And, I have no idea why adding coeur (with oe-1char) to the dictionary solves the misalignment error. Is it because of ispell's bug? > So, patch from Kenichi Handa seems to work well for sid emacs21, much better > than I thought. However it uses code that has only been recently added to > emacs21, and things that are not available for xemacs or emacs20. Then, how about using your workaround for them, and enable my patch for an emacs that has ucs-mule-8859-to-mule-unicode? --- Ken'ichi HANDA [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#130397: Bug 130397 (Was: Emacs - Ispell problem with i[no]german dictionary)
In article <[EMAIL PROTECTED]>, Agustin Martin <[EMAIL PROTECTED]> writes: > I meant with oe-1char oe as a single char (U+0153), available > in iso-8859-15 (octal \275 here), but not in iso-8859-1 (you > have one half instead), and with oe-2char the two 7bit chars sequence > 'oe', available anywhere, and that is the trick usually used in > iso-8859-1 to represent that char. Ah, I see. > I like the idea, and at a first glance it should not be difficult to > implement, even for a person like me, whose lisp skills are limited. It will > help for old emacs21, for emacs20 my workaround will do nothing since it has > no iso-8859-15, and for Debian xemacs21 my workaround is also doing nothing > since xemacs21 seems to return some extra (IMHO wrong) stuff in > buffer-file-coding-system. > I will first retest everything with a 'good' (built to my taste) french > dict, pen and paper, to know in detail the differences in the results for > both systems. Last day I could notice that the misalignment error > disappeared and that things worked better, but not much more. Need to test > also with aspell and other languages. > Even in the case both systems give mostly similar results I will try > integrating them, since I guess your patch will be more appropriate for > future emacs21 development, and looks somewhat more general. How are ispell.el of Emacs and that of dictionaries-common maintained? Are they synched somehow? Should I install my patch for CVS Emacs. Or, is it better to wait for you or some other maintainer work on it? --- Ken'ichi HANDA [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#130397: Bug 130397 (Was: Emacs - Ispell problem with i[no]german dictionary)
In article <[EMAIL PROTECTED]>, Richard Stallman <[EMAIL PROTECTED]> writes: > People have been discussing this issue for a while now, > and due to the volume of mail, I could not read it all. I'm reading it, and I think I understand what is the problem. > Handa, is it clear what we should do now for the coming release? As far as I understand, there are two problems, and my patch solves one of them for the CVS Emacs. I asked how to do with it in my previous mail. --- Ken'ichi HANDA [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#130397: Bug 130397
In article <[EMAIL PROTECTED]>, Juri Linkov <[EMAIL PROTECTED]> writes: > Agustin Martin <[EMAIL PROTECTED]> writes: >> *Ken*, since you are being cc'ed I vaguely remembered some info I somewhere >> read about this misalignements. I finally found it, >> >> http://lists.gnu.org/archive/html/emacs-devel/2002-09/msg01007.html > The bug reported on this URL occurs only in Emacs 21.3, not in Emacs CVS. > It seems something was fixed already. > However, with a strange coincidence I got the same error in Emacs CVS just > today for the first time. So I can describe how this bug can be reproduced > in Emacs CVS: when the first part of a word was copied from an external > application and got encoded in the buffer in mule-unicode-0100-24ff, > and the second part of the word typed with an input method and gets encoded > in cyrillic-iso8859-5, then calling ispell-buffer on a buffer with the word > composed with different encodings with `russian' dictionary signals the > error "Ispell misalignment". Please try the latest ispell.el. I think at least this misalignment error is fixed now. > And while on this topic, I want to remind that many Emacs users suffer > from the inability of ispell.el to simultaneously check mixed multi-language > texts. So, whoever fixes ispell.el, please take that into account. > Such combining is quite easily doable for any disjoint alphabets, as well > as for alphabets where one alphabet is a superset of another, like e.g. > English and some other Latin-based alphabets. Even for overlapping > alphabets it would be possible with using the `w' syntax to get a word > and to feed it to different ispell instances for each dictionary. As for this, I agree with the following statement. Geoff Kuenning <[EMAIL PROTECTED]> writes: > I'm not entirely sure what you mean here. For disjoint alphabets, > it's certainly relatively easy to figure out which word should go to > which ispell instance. For identical, superset, or overlapping > alphabets, the problem is basically insoluable. For example, "fra" is > a misspelling in English but legal in Italian. If it appears in a > mixed passage, which dictionary should it be fed to? The only > solution would seem to be to require the user to mark passages in some > way, as is done in HTML. --- Ken'ichi HANDA [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#130397: Bug 130397
In article <[EMAIL PROTECTED]>, David Kastrup <[EMAIL PROTECTED]> writes: >>> If ispell wants utf-8, it's easy enough to convert each input line to >>> utf-8 and deal with offsets into that in the event of a mispelling; >> >> Or account for byte offsets by (variable) multibyte lenght of each >> character, which Emacs knows. I don't remember for the moment whether >> the multibyte length of the UTF-8 encoding can be gotten at by a Lisp >> program, but if not, we could add some primitive to do that. > Just encode the line to utf-8, find the correct point in the byte > string, cut off the line there, convert back and check the length of > the string. This works unless you are in the middle of a character. > But it would be much saner if our conversion facilities would preserve > markers (which they don't do right now): encode to utf-8, place a > marker at the right byte offset, undo the conversion. You can encode a text to utf-8, place several makers, encode regions between markers one by one. --- Ken'ichi HANDA [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#130397: Bug 130397
In article <[EMAIL PROTECTED]>, Juri Linkov <[EMAIL PROTECTED]> writes: > Kenichi Handa <[EMAIL PROTECTED]> writes: >> Please try the latest ispell.el. I think at least this >> misalignment error is fixed now. > I tried the latest ispell.el and I see that your change is a definite > improvement since it now allows to check words in mule-unicode charsets. > But it still doesn't fix the misalignment error. It even makes this > error more frequent because it now occurs in all UTF-8 texts checked > with ispell-region (which earlier were simply skipped before your change). > The cause of the error is the following: a line sent by ispell.el > to the ispell process is converted from mule-unicode charset to the > process charset, and the accepted output gets converted from process > coding to the internal Emacs charset iso8859. So `search-forward' in > `ispell-process-line' fails to find a string in iso8859 charset > in the buffer with the same string in mule-unicode charset. Ah! I see. I've just installed the attached change which should fix that misalignment error. ispell-looking-at is not that tuned yet, and there will be a better way to implemente it. --- Ken'ichi HANDA [EMAIL PROTECTED] Index: ispell.el === RCS file: /cvsroot/emacs/emacs/lisp/textmodes/ispell.el,v retrieving revision 1.152 retrieving revision 1.153 diff -u -c -r1.152 -r1.153 cvs diff: conflicting specifications of output style *** ispell.el 13 Jan 2005 04:33:05 - 1.152 --- ispell.el 18 Jan 2005 23:16:27 - 1.153 *** *** 2794,2799 --- 2794,2808 string)) + (defun ispell-looking-at (string) + (let ((coding (ispell-get-coding-system)) + (len (length string))) + (and (<= (+ (point) len) (point-max)) +(equal (encode-coding-string string coding) + (encode-coding-string (buffer-substring-no-properties + (point) (+ (point) len)) + coding) + ;;; Avoid error messages when compiling for these dynamic variables. (eval-when-compile (defvar start) *** *** 2842,2853 ;; Alignment cannot be tracked and this error will occur when ;; `query-replace' makes multiple corrections on the starting line. ! (if (/= (+ word-len (point)) ! (progn ! ;; NB: Search can fail with Mule coding systems that don't ! ;; display properly. Ignore the error in this case? ! (search-forward (car poss) (+ word-len (point)) t) ! (point))) ;; This occurs due to filter pipe problems (error (concat "Ispell misalignment: word " "`%s' point %d; probably incompatible versions") --- 2851,2857 ;; Alignment cannot be tracked and this error will occur when ;; `query-replace' makes multiple corrections on the starting line. ! (or (ispell-looking-at (car poss)) ;; This occurs due to filter pipe problems (error (concat "Ispell misalignment: word " "`%s' point %d; probably incompatible versions") -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#130397: Bug 130397
In article <[EMAIL PROTECTED]>, Juri Linkov <[EMAIL PROTECTED]> writes: > Now a new problem was uncovered: after selecting a correct word from > a list of near misses returned from ispell, ispell.el replaces the > misspelled word with a selected word, and inserts it into the buffer > not in its original mule-unicode charset, but in iso8859. Perhaps the following function can be utilized somewhere in ispell to do that, but, as I still don't understand ispell code that much, I'd like to ask someone else to modify ispell to use it. ;; Destructively modify WORD by converting each character in it to the ;; equivalent character of CHARSET. (defun ispell-adjust-charset (word charset) (let ((len (length word))) (if (< len (string-bytes word)) (dotimes (i len) (let ((c (aref word i)) this-charset equiv-chars) (if (and (>= c 128) (not (eq (setq this-charset (char-charset c)) charset)) (or (memq this-charset '(mule-unicode-0100-24ff mule-unicode-2500-34ff)) (setq c (aref ucs-mule-8859-to-mule-unicode c))) (setq equivs (aref ispell-unified-chars-table c))) (catch 'tag (dotimes (j (length equiv-chars)) (when (eq (char-charset (aref equiv-chars j)) charset) (aset word i (aref equiv-chars j)) (throw 'tag nil)) --- Ken'ichi HANDA [EMAIL PROTECTED] PS. I personally feel it's a waste of time to struggle with charset matters in ispell that much because emacs-unicode should not have such a problem. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#435452: emacs22: `set-keyboard-coding-system' fails in non-X11 mode]
In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] (Ludovic Court$(D+2(Bs) writes: > Hi, > Sven Joachim <[EMAIL PROTECTED]> writes: >>>> Invoking `set-keyboard-coding-system' in an "emacs -nw" session fails. >>>> For instance, asking it `no-conversion' (which is needed so that dead >>>> keys work as expected) fails: >>> >>>> Unsupported coding system in Encoded-kbd mode: no-conversion >>> >>> I don't understand why you have to set >>> keyboard-coding-system to no-conversion for dead keys. Dead >>> keys must be handled by terminal, and Emacs just receives >>> the resulting character (encoded in your locale) from the >>> terminal. So, setting keyboard-coding-system to what is >>> appropriate for your locale should work well, and that >>> should be done automatically. > Indeed, using "C" as my locale fixes the problem (I used to have > "LC_CTYPE=fr_FR"). > Strangely enough, Emacs 21.4.1 with "LC_CTYPE=fr_FR" doesn't have the > problem (i.e., dead keys are usable). Does it mean that you have a problem with Emacs 22 with LC_CTYPE=fr_FR? In that locale, "emacs -nw" should automatically set keyboard-coding-system to latin-1 and the input mode to (t nil 0 7), and thus it should accept latin-1 characters sent from a terminal correctly. What happens when you type some latin-1 character with dead-key method under LC_CTYPE=fr_FR? > Checking the "Meta Sends Escape" box of the xterm in which I run Emacs > 22 also fixes the problem, even with a non-C locale. It seems that your Emacs' input mode is set not to accept 8-bit input. Please tell me what is shown by ESC : (current-input-mode) RET > I guess I'm just displaying my lack of familiarity with how terminals > work... >>> What other choices were tried? utf-8, latin-X should all >>> work. What is your locale? > With a "C" locale, utf-8, latin-1, and others are accepted, whereas > `no-conversion' yields the above error message. That is because setting keyboard-coding-system to no-conversion is useless. --- Kenichi Handa [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]