Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
I don't really care one way or another, but Peter (Dyballa) suggests that it would be more user-friendly if non-ASCII characters entered/searched-for via C-q used a standard like unicode to interpret , rather than Emacs internal character numbers as it does now. [In Emacs 23, of course, Emacs internal character numbers will _be_ unicode, so the distinction will go away.] Let's leave it alone for now. ___ emacs-pretest-bug mailing list emacs-pretest-bug@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
Peter Dyballa <[EMAIL PROTECTED]> writes: > it needs to be emphasised that C-s C-q uses a Unicode search and does > not take into account the file's proper encoding. Could be there are > just a few that care about these encoding details. That's misleading. There's no "unicode search"; if the variable I added is set to `ucs', it _converts_ a unicode codepoint entered via C-q to Emacs' internal representation; after that, it works exactly like the old C-q. Since I-search (for instance) currently seems to correctly handle, for instance, searching for a latin-1 ä in a latin-2 buffer -- even though the underlying buffer representation is in fact different -- then searching should continue to work correctly even in "unicode C-q mode". [However, I think that character insertion via C-q won't work as the user-expects; for instance, C-q e4 would insert a latin-1 ä even in a unicode-2 buffer -- using the default settings, this situation will get fixed up at file write time, because unify-8859-on-encoding-mode is on by default, but until then, the inconsistent buffer contents might confuse a user.] -Miles -- "Whatever you do will be insignificant, but it is very important that you do it." Mahatma Ghandi ___ emacs-pretest-bug mailing list emacs-pretest-bug@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
Am 23.09.2006 um 01:25 schrieb Miles Bader: UCS codepoints are good because they allow _all_ emacs characters to be entered in a consistent way. Having C-q use the buffer's file encoding on the other hand seems quite annoying, because it requires users to use different numbers depending on what the file they're editing was saved in (and I suspect a large portion of the time, users don't even _know_ what encoding their file uses). This is a good enough method for me! (And others probably too.) The problem I wanted to point out is that not the file's contents but its presentation forms are now found. This needs to be documented, and it needs to be emphasised that C-s C-q uses a Unicode search and does not take into account the file's proper encoding. Could be there are just a few that care about these encoding details. This is like pressing u on the keyboard and an x appears on screen ... -- Greetings Pete Know thyself. Need help, call GOOGLE. ___ emacs-pretest-bug mailing list emacs-pretest-bug@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
Richard Stallman <[EMAIL PROTECTED]> writes: > Could you give a self-contained explanation of why you propose this to > be added now? I don't really care one way or another, but Peter (Dyballa) suggests that it would be more user-friendly if non-ASCII characters entered/searched-for via C-q used a standard like unicode to interpret , rather than Emacs internal character numbers as it does now. [In Emacs 23, of course, Emacs internal character numbers will _be_ unicode, so the distinction will go away.] -Miles -- The car has become... an article of dress without which we feel uncertain, unclad, and incomplete. [Marshall McLuhan, Understanding Media, 1964] ___ emacs-pretest-bug mailing list emacs-pretest-bug@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
Whether this is a serious enough problem to consider adding a patch this latein the release cycle to consider, I don't know. [I think the default value of read-quoted-char-charset would probably have to remain nil though...] Could you give a self-contained explanation of why you propose this to be added now? ___ emacs-pretest-bug mailing list emacs-pretest-bug@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
Am 22.09.2006 um 13:27 schrieb Miles Bader: Peter Dyballa <[EMAIL PROTECTED]> writes: There is also the option to change the 'base' of the character code notation from 8 to 16 This feature is supported; see the variable `read-quoted-char-radix'. Right, it works a bit, i.e. in the ASCII range it works well. When it comes to ISO Latin it interprets all in ISO Latin-1, i.e. C-s C-q 0 0 a 4 RET searches in an ISO 8859-16 encoded buffer for CURRENCY UNIT although it is EURO in this case. A translation to the buffer local encoding obviously does not happen ... (setq read-quoted-char-radix 16) (setq read-quoted-char-charset 'ucs) After applying your patch this behaviour does not change, it's still assumed that the encoding is ISO Latin-1. 00A4 is categorically ``¦ ´´. The improvement is that I can find via an Unicode value an ISO Latin encoded character – is this an improvement? The file code is A4 in any ISO Latin case, and the character is U+20AC in Unicode when in ISO Latin-10/ISO Latin-0 or ISO Latin-9. This looks like a Do What I Mean. Really not bad! But the real way should be C-s C-q 2 4 4 RET or C-s C-q A 4 RET or C-s C-q 1 6 4 RET (decimal), because it searches for the codes one expects in the encoded file, and which does not work. -- Greetings Pete Some day we may discover how to make magnets that can point in any direction. ___ emacs-pretest-bug mailing list emacs-pretest-bug@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
Peter Dyballa <[EMAIL PROTECTED]> writes: > There is also the option to change the 'base' of the character code > notation from 8 to 16 This feature is supported; see the variable `read-quoted-char-radix'. > This might be the correct way in a GNU Emacs way, but not in the way an > Emacs user would use it. Or can I type C-q 4245 RET to input ¥ in some > file? (Well, it actually works ...) Having to use other numbers than > the well-known three digits wide ones is not the usual user > experience. I suppose that a patch such as the following could be used to support at least unicode input in `read-quoted-char' (the function underlying C-q). (set `read-quoted-char-charset' to `ucs' to input unicode-codes) Whether this is a serious enough problem to consider adding a patch this latein the release cycle to consider, I don't know. [I think the default value of read-quoted-char-charset would probably have to remain nil though...] -Miles 2006-09-22 Miles Bader <[EMAIL PROTECTED]> * subr.el (read-quoted-char-charset): New variable. (read-quoted-char): Use it. --- orig/lisp/subr.el +++ mod/lisp/subr.el @@ -1539,6 +1548,17 @@ :type '(choice (const 8) (const 10) (const 16)) :group 'editing-basics) +(defvar read-quoted-char-charset nil + "*The character-set used for numeric codepoints entered with `read-quoted-char'. +If nil, Emacs' internal codepoints are used.") + +(custom-declare-variable-early + 'read-quoted-char-charset nil + "*The character-set used for numeric codepoints entered with `read-quoted-char'. +If nil, Emacs' internal codepoints are used." + :type '(choice (const nil) (const ucs)) + :group 'editing-basics) + (defun read-quoted-char (&optional prompt) "Like `read-char', but do not allow quitting. Also, if the first character read is an octal digit, @@ -1595,7 +1615,13 @@ (t (setq code translated done t))) (setq first nil)) -code)) +(if (null read-quoted-char-charset) + code + (let ((decoded (decode-char read-quoted-char-charset code))) + (when (null decoded) + (error "Invalid %s character: %d, #o%o, #x%x" +read-quoted-char-charset code code code)) + decoded (defun read-passwd (prompt &optional confirm default) "Read a password, prompting with PROMPT, and return it. -- The secret to creativity is knowing how to hide your sources. --Albert Einstein ___ emacs-pretest-bug mailing list emacs-pretest-bug@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
Am 22.09.2006 um 12:31 schrieb Miles Bader: Peter Dyballa <[EMAIL PROTECTED]> writes: C-s C-q 245 in ISO 8859-16 does not find ``„´´ (U+201E) – mini- buffer tells me that ``¥´´ (\245 in ISO 8859-1) cannot be found. That's because the numeric code following C-q is _not_ a unicode code point, it's an Emacs character code. In Emacs 22 those two things are very different (in Emacs 23, I guess they are the same, as Emacs 23 uses unicode for its internal codes). You can see the "Emacs character code" of a character by hitting C-x = on top of that character in a buffer. E.g., C-x = says that ``„´´ has Emacs code 1234576, and indeed entering `C-s C-q 1234576 RET' successfully searches for „ ! Similarly, the Emacs code for ¥ is 4245, and that also works correctly following C-q. This might be the correct way in a GNU Emacs way, but not in the way an Emacs user would use it. Or can I type C-q 4245 RET to input ¥ in some file? (Well, it actually works ...) Having to use other numbers than the well-known three digits wide ones is not the usual user experience. The so-called character code is a known quantity and supported by some operating systems. (There is also the option to change the 'base' of the character code notation from 8 to 16, to be able to input the Unicode slot number. This should work also IMO.) -- Greetings Pete Basic, n.: A programming language. Related to certain social diseases in that those who have it will not admit it in polite company. ___ emacs-pretest-bug mailing list emacs-pretest-bug@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
Peter Dyballa <[EMAIL PROTECTED]> writes: > C-s C-q 245 in ISO 8859-16 does not find ``„´´ (U+201E) – mini- > buffer tells me that ``¥´´ (\245 in ISO 8859-1) cannot be found. That's because the numeric code following C-q is _not_ a unicode code point, it's an Emacs character code. In Emacs 22 those two things are very different (in Emacs 23, I guess they are the same, as Emacs 23 uses unicode for its internal codes). You can see the "Emacs character code" of a character by hitting C-x = on top of that character in a buffer. E.g., C-x = says that ``„´´ has Emacs code 1234576, and indeed entering `C-s C-q 1234576 RET' successfully searches for „ ! Similarly, the Emacs code for ¥ is 4245, and that also works correctly following C-q. > Which is the formula to map octal 0156772 to a Unicode slot/position? > Octal 0156772 is DDFA in hex, which is different from 5B57, 字's > position in Unicode. (encode-char #o156772 'ucs) => 23383 (#o55527, #x5b57) > Or: how can I find the octal value for a given Unicode slot (U+ABCD)? (decode-char 'ucs #x5b57) => 56826 (#o156772, #xddfa) [There seems to be no such unicode character #xABCD known to Emacs.] Note that (decode-char 'ucs CODE) continues to work properly in Emacs 23, even though Emacs internal codes are completely different (in Emacs 23, of course, it basically just returns its 2nd argument), so it seems a good function to use for code portable between Emacs 22 and 23. -Miles -- (\(\ (^.^) (")") *This is the cute bunny virus, please copy this into your sig so it can spread. ___ emacs-pretest-bug mailing list emacs-pretest-bug@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
Am 22.09.2006 um 03:06 schrieb Kenichi Handa: In article <[EMAIL PROTECTED]>, Peter Dyballa <[EMAIL PROTECTED]> writes: OK, you're right: it really works better now, I had made some mistake! I wonder whether I picked up the characters with C-s C-w ... As you wrote, this won't work. It didn't work, but should work now. I attached 3 files (temp1,2,7 encoded in iso-8859-1,2,7 respectively). C-x C-f temp1 RET ESC < C-n C-s C-w C-x C-f temp2 C-s C-s should find " á", and C-x C-f temp1 RET ESC < C-n C-n C-s C-w C-x C-f temp7 C-s C-s should find " °". Yes, I can confirm: it works! It works also in my own test files – except one: the ISO 8859-6 encoded one. I was searching for HYPHEN- MINUS, U+00AD. I'll attach my test file. It could be also useful in the ISO 8859-6 possible bug I reported recently. Anyway, what also does not work is: C-s C-q . For those with really small keyboards this is the (almost?) only chance to find some of the x times 64 K characters in Unicode ... This should work now too. For instance, " " and "á" are 0255 and 0341 in iso-8859-1 charset. So, if your primary charset is iso-8859-1, C-q 255 C-q 341 RET should input " á". And, C-x C-f temp2 ESC < C-s C-q 255 C-q 341 RET should find " á" even if the characters in that buffer is from iso-8859-2. I did not try this test because it is too simple: LATIN SMALL LETTER A WITH ACUTE (U+00E1) is in the two encodings on 341/225/E1. Please use my answer to Miles Bader as test case! I can send you my other ISO 8859-X test files. -- Greetings Pete "Eternity is a terrible thought. I mean, where's it going to end?" - Tom Stoppard ;;; -*- coding: iso-8859-6; -*- ; ; Time-stamp: <2006-09-22 00:25:10 pete> ; ; Arabic Glyphs ; ; oct dec hexUCS2UTF-8 ;= = 240 = 160 = A0 = U+00A0 =C2 A0 : NO-BREAK SPACE ¤ = 244 = 164 = A4 = U+00A4 =C2 A4 : CURRENCY SIGN ¬ = 254 = 172 = AC = U+060C =D8 8C : ARABIC COMMA = 255 = 173 = AD = U+00AD =C2 AD : HYPHEN-MINUS » = 273 = 187 = BB = U+061B =D8 9B : ARABIC SEMICOLON ¿ = 277 = 191 = BF = U+061F =D8 9F : ARABIC QUESTION MARK Á = 301 = 193 = C1 = U+0621 =D8 A1 : ARABIC LETTER HAMZA Â = 302 = 194 = C2 = U+0622 =D8 A2 : ARABIC LETTER ALEF WITH MADDA ABOVE Ã = 303 = 195 = C3 = U+0623 =D8 A3 : ARABIC LETTER ALEF WITH HAMZA ABOVE Ä = 304 = 196 = C4 = U+0624 =D8 A4 : ARABIC LETTER WAW WITH HAMZA ABOVE Å = 305 = 197 = C5 = U+0625 =D8 A5 : ARABIC LETTER ALEF WITH HAMZA BELOW Æ = 306 = 198 = C6 = U+0626 =D8 A6 : ARABIC LETTER YEH WITH HAMZA ABOVE Ç = 307 = 199 = C7 = U+0627 =D8 A7 : ARABIC LETTER ALEF È = 310 = 200 = C8 = U+0628 =D8 A8 : ARABIC LETTER BEH É = 311 = 201 = C9 = U+0629 =D8 A9 : ARABIC LETTER TEH MARBUTA Ê = 312 = 202 = CA = U+062A =D8 AA : ARABIC LETTER TEH Ë = 313 = 203 = CB = U+062B =D8 AB : ARABIC LETTER THEH Ì = 314 = 204 = CC = U+062C =D8 AC : ARABIC LETTER JEEM Í = 315 = 205 = CD = U+062D =D8 AD : ARABIC LETTER HAH Î = 316 = 206 = CE = U+062E =D8 AE : ARABIC LETTER KHAH Ï = 317 = 207 = CF = U+062F =D8 AF : ARABIC LETTER DAL Ð = 320 = 208 = D0 = U+0630 =D8 B0 : ARABIC LETTER THAL Ñ = 321 = 209 = D1 = U+0631 =D8 B1 : ARABIC LETTER REHe Ò = 322 = 210 = D2 = U+0632 =D8 B2 : ARABIC LETTER ZAIN Ó = 323 = 211 = D3 = U+0633 =D8 B3 : ARABIC LETTER SEEN Ô = 324 = 212 = D4 = U+0634 =D8 B4 : ARABIC LETTER SHEEN Õ = 325 = 213 = D5 = U+0635 =D8 B5 : ARABIC LETTER SAD Ö = 326 = 214 = D6 = U+0636 =D8 B6 : ARABIC LETTER DAD × = 327 = 215 = D7 = U+0637 =D8 B7 : ARABIC LETTER TAH Ø = 330 = 216 = D8 = U+0638 =D8 B8 : ARABIC LETTER ZAH Ù = 331 = 217 = D9 = U+0639 =D8 B9 : ARABIC LETTER AIN Ú = 332 = 218 = DA = U+063A =D8 BA : ARABIC LETTER GHAIN à = 340 = 224 = E0 = U+0640 =D9 80 : ARABIC TATWEEL á = 341 = 225 = E1 = U+0641 =D9 81 : ARABIC LETTER FEH â = 342 = 226 = E2 = U+0642 =D9 82 : ARABIC LETTER QAF ã = 343 = 227 = E3 = U+0643 =D9 83 : ARABIC LETTER KAF ä = 344 = 228 = E4 = U+0644 =D9 84 : ARABIC LETTER LAM å = 345 = 229 = E5 = U+0645 =D9 85 : ARABIC LETTER MEEM æ = 346 = 230 = E6 = U+0646 =D9 86 : ARABIC LETTER NOON ç = 347 = 231 = E7 = U+0647 =D9 87 : ARABIC LETTER HEH è = 350 = 232 = E8 = U+0648 =D9 88 : ARABIC LETTER WAW é = 351 = 233 = E9 = U+0649 =D9 89 : ARABIC LETTER ALEF MAKSURA ê = 352 = 234 = EA = U+064A =D9 8A : ARABIC LETTER YEH ë = 353 = 235 = EB = U+064B =D9 8B : ARABIC FATHATAN ì = 354 = 236 = EC = U+064C =D9 8C : ARABIC DAMMATAN í = 355 = 237 = ED = U+064D =D9 8D : ARABIC KASRATAN î = 356 = 238 = EE = U+064E =D9 8E : ARABIC FATHA ï = 357 = 239 = EF = U+064F =D9 8F : ARABIC DAMMA ð = 360 = 240 = F0 = U+0650 =D9 90 : ARABIC KASRA ñ = 361 = 241 = F1 = U+0651 =D9 91 : ARABIC SHADDA ò = 362 = 242 = F2 = U+0652 =D9 92 : ARABIC SUKUN __
Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
Am 22.09.2006 um 02:44 schrieb Miles Bader: Peter Dyballa <[EMAIL PROTECTED]> writes: Anyway, what also does not work is: C-s C-q greater 177 octal code>. For those with really small keyboards this is the (almost?) only chance to find some of the x times 64 K characters in Unicode ... Eh? It works for me: E.g., the Emacs 22 character code of "字" is octal 0156772. If I enter C-s C-q 0156772 (followed by some other char to terminate the octal code), it correctly adds that character to the search string (and finds in the buffer). OK, I did not check in the "higher" Unicode regions, and I did not check in an UTF-8 encoded buffer, and I did not input so long numbers I cannot compute, I was still in my simple ISO 8859-X test files (your example works for me too in an UTF-8 encoded buffer). After launching GNU Emacs 22.0.50 with -Q the phenomenon seems to be that input like C-s C-q <[23][0-7][0-7]> RET is interpreted as trying to "name/point to" an ISO 8859-1 encoded character. For example: C-s C-q 245 in ISO 8859-16 does not find ``„´´ (U+201E) – mini- buffer tells me that ``¥´´ (\245 in ISO 8859-1) cannot be found. C-s C-q 241 RET searches for ¡. C-s C-q 242 RET searches for ¢. C-s C-q 243 RET searches for £. C-s C-q 244 RET searches for ¤ (CURRENCY SIGN, U+00A4). Evaluating (unify-8859-on-decoding-mode t) does not change this specific behaviour. Which is the formula to map octal 0156772 to a Unicode slot/position? Octal 0156772 is DDFA in hex, which is different from 5B57, 字's position in Unicode. Or: how can I find the octal value for a given Unicode slot (U+ABCD)? There is probably some function for this purpose ... -- Greetings Pete "It isn't pollution that's harming the environment. It's the impurities in our air and water that are doing it." ___ emacs-pretest-bug mailing list emacs-pretest-bug@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
Peter Dyballa <[EMAIL PROTECTED]> writes: > Anyway, what also does not work is: C-s C-q 177 octal code>. For those with really small keyboards this is the > (almost?) only chance to find some of the x times 64 K characters in > Unicode ... Eh? It works for me: E.g., the Emacs 22 character code of "字" is octal 0156772. If I enter C-s C-q 0156772 (followed by some other char to terminate the octal code), it correctly adds that character to the search string (and finds in the buffer). -Miles -- P.S. All information contained in the above letter is false, for reasons of military security. ___ emacs-pretest-bug mailing list emacs-pretest-bug@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
In article <[EMAIL PROTECTED]>, Peter Dyballa <[EMAIL PROTECTED]> writes: > OK, you're right: it really works better now, I had make some > mistake! I wonder whether I picked up the characters with C-s C-w ... > As you wrote, this won't work. It didn't work, but should work now. I attached 3 files (temp1,2,7 encoded in iso-8859-1,2,7 respectively). C-x C-f temp1 RET ESC < C-n C-s C-w C-x C-f temp2 C-s C-s should find "á", and C-x C-f temp1 RET ESC < C-n C-n C-s C-w C-x C-f temp7 C-s C-s should find "°". > Anyway, what also does not work is: C-s C-q greater 177 octal code>. For those with really small keyboards this > is the (almost?) only chance to find some of the x times 64 K > characters in Unicode ... This should work now too. For instance, "" and "á" are 0255 and 0341 in iso-8859-1 charset. So, if your primary charset is iso-8859-1, C-q 255 C-q 341 RET should input "á". And, C-x C-f temp2 ESC < C-s C-q 255 C-q 341 RET should find "á" even if the characters in that buffer is from iso-8859-2. --- Kenichi Handa [EMAIL PROTECTED] temp.tar.gz Description: Binary data ___ emacs-pretest-bug mailing list emacs-pretest-bug@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
Am 21.09.2006 um 04:13 schrieb Kenichi Handa: In article <[EMAIL PROTECTED]>, Peter Dyballa <[EMAIL PROTECTED]> writes: The CVS code is from Sunday or Monday. After applying your patch nothing changes for my simple test (emacs-22.0.50 -Q). I did it also for °, which can't be found in ISO 8859-7 and ISO 8859-8 although it exists there additionally/instead of ä. Hmmm, strange, it doesn't fail for me. Are you sure that Emacs is re-built after isearch.el is byte-compiled? OK, you're right: it really works better now, I had make some mistake! I wonder whether I picked up the characters with C-s C-w ... As you wrote, this won't work. Anyway, what also does not work is: C-s C-q greater 177 octal code>. For those with really small keyboards this is the (almost?) only chance to find some of the x times 64 K characters in Unicode ... -- Greetings Pete Hard Disk: A device that allows users to delete vast quantities of data with simple mnemonic commands. ___ emacs-pretest-bug mailing list emacs-pretest-bug@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
So, I've just installed the attached change. But, there still exists a case that isearch fails. For instance, if your buffer's buffer-file-coding-system is iso-8859-2, and you somehow insert a-acute of iso-8859-1, isearch won't be able to find that a-acute. The fix for that case is very difficult in Emacs 22. Do you think we should document this in the Emacs manual? ___ emacs-pretest-bug mailing list emacs-pretest-bug@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
Am 21.09.2006 um 04:13 schrieb Kenichi Handa: In article <[EMAIL PROTECTED]>, Peter Dyballa <[EMAIL PROTECTED]> writes: The CVS code is from Sunday or Monday. After applying your patch nothing changes for my simple test (emacs-22.0.50 -Q). I did it also for °, which can't be found in ISO 8859-7 and ISO 8859-8 although it exists there additionally/instead of ä. Hmmm, strange, it doesn't fail for me. Are you sure that Emacs is re-built after isearch.el is byte-compiled? Yes, I did. I made this mistake a few times so I learned of it. Actually I just re-made and installed GNU Emacs and then byte- compiled isearch.el in /usr/local/share/emacs/22.0.50/lisp. I'll cvs-update tomorrow or on Saturday and I'll check isearch.el again (I'm a bit busy today). -- Greetings Pete Time flies like an error -- but fruit flies like a banana! (almost Groucho Marx) ___ emacs-pretest-bug mailing list emacs-pretest-bug@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
In article <[EMAIL PROTECTED]>, Peter Dyballa <[EMAIL PROTECTED]> writes: > The CVS code is from Sunday or Monday. After applying your patch > nothing changes for my simple test (emacs-22.0.50 -Q). I did it also > for °, which can't be found in ISO 8859-7 and ISO 8859-8 although it > exists there additionally/instead of ä. Hmmm, strange, it doesn't fail for me. Are you sure that Emacs is re-built after isearch.el is byte-compiled? --- Kenichi Handa [EMAIL PROTECTED] ___ emacs-pretest-bug mailing list emacs-pretest-bug@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
Am 20.09.2006 um 10:05 schrieb Kenichi Handa: My test was very simple: I opened the ISO 8859-1 encoded file (starts with ;;; -*- mode: Text; coding: iso-8859-1; -*-) and typed C-s ä C-s RET. The I opened the other ISO Latin test file, which all have a coding set in the first line. Then I re-used the ä via C-s C-s. That "re-using" is also the case that the previous change didn't take care. Could you please try the test with the latest code? The CVS code is from Sunday or Monday. After applying your patch nothing changes for my simple test (emacs-22.0.50 -Q). I did it also for °, which can't be found in ISO 8859-7 and ISO 8859-8 although it exists there additionally/instead of ä. For , U+00AD HYPHEN-MINUS, the most common character, I get when starting from ISO 8859-1: failure: ISO 8859-2, ISO 8859-3, ISO 8859-4, ISO 8859-5, ISO 8859-7, ISO 8859-8, ISO 8859-9, ISO 8859-14, ISO 8859-15 success: ISO 8859-10, ISO 8859-13, ISO 8859-16 although in all these 13 encodings it's (oct/dec/hex) 255 - 173 - AD. Starting the search for HYPHEN-MINUS in an ISO 8859-15 encoded file it's found in no other ISO 8859 encoded file. This is not satisfactory. This is not unified. -- Greetings Pete Some day we may discover how to make magnets that can point in any direction. ___ emacs-pretest-bug mailing list emacs-pretest-bug@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
In article <[EMAIL PROTECTED]>, Peter Dyballa <[EMAIL PROTECTED]> writes: > My test was very simple: I opened the ISO 8859-1 encoded file (starts > with ;;; -*- mode: Text; coding: iso-8859-1; -*-) and typed C-s ä C-s > RET. The I opened the other ISO Latin test file, which all have a > coding set in the first line. Then I re-used the ä via C-s C-s. That "re-using" is also the case that the previous change didn't take care. Could you please try the test with the latest code? --- Kenichi Handa [EMAIL PROTECTED] ___ emacs-pretest-bug mailing list emacs-pretest-bug@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
Am 20.09.2006 um 09:10 schrieb Kenichi Handa: The problem is that the change took care only for a typed character. If isearch-string is set from a (possibly different) buffer (e.g. by C-s C-w), the translation doesn't happen. My test was very simple: I opened the ISO 8859-1 encoded file (starts with ;;; -*- mode: Text; coding: iso-8859-1; -*-) and typed C-s ä C-s RET. The I opened the other ISO Latin test file, which all have a coding set in the first line. Then I re-used the ä via C-s C-s. I do not set buffer-file-coding-system, it's mule-utf-8 from UTF-8 in LC_CTYPE or LANG. But the value is adjusted to a local value due to the '-*- coding: iso-8859-X; -*-' in the files' first lines. -- Greetings Pete "What do you think of Western Civilisation?" "I think it would be a good idea!" -- Mohandas Karamchand Gandhi ___ emacs-pretest-bug mailing list emacs-pretest-bug@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
In article <[EMAIL PROTECTED]>, Richard Stallman <[EMAIL PROTECTED]> writes: > A while ago, I proposed to change isearch so that it > translates characters by translation-table-for-input to > solve such a problem, but there raised an objection that > read-char should do that translation. RMS asked to check if > such a change to read-char is surely safe or not, but as > such a check is very difficult and time-consuiming, no one > took on the job. > So, this problem is still unfixed. > I again propose to change isearch. > Yes, let's do it that way. Could you do it now? Oops, it seems that my brain is seriously damaged :-(. I have already installed such a change (perhaps accoding to your decision). The problem is that the change took care only for a typed character. If isearch-string is set from a (possibly different) buffer (e.g. by C-s C-w), the translation doesn't happen. So, I've just installed the attached change. But, there still exists a case that isearch fails. For instance, if your buffer's buffer-file-coding-system is iso-8859-2, and you somehow insert a-acute of iso-8859-1, isearch won't be able to find that a-acute. The fix for that case is very difficult in Emacs 22. --- Kenichi Handa [EMAIL PROTECTED] 2006-09-20 Kenichi Handa <[EMAIL PROTECTED]> * isearch.el (isearch-process-search-char): Cancel the previous change. (isearch-search-string): New function. (isearch-search): Use isearch-search-string. (isearch-lazy-highlight-search): Likewise. Index: isearch.el === RCS file: /cvsroot/emacs/emacs/lisp/isearch.el,v retrieving revision 1.289 retrieving revision 1.290 diff -u -r1.289 -r1.290 --- isearch.el 9 Jul 2006 11:04:18 - 1.289 +++ isearch.el 20 Sep 2006 06:13:43 - 1.290 @@ -1807,8 +1807,6 @@ ((eq char ?|) (isearch-fallback t nil t))) ;; Append the char to the search string, update the message and re-search. - (if (char-table-p translation-table-for-input) - (setq char (or (aref translation-table-for-input char) char))) (isearch-process-search-string (char-to-string char) (if (>= char ?\200) @@ -1993,6 +1991,36 @@ (t (if isearch-forward 'search-forward 'search-backward) +(defun isearch-search-string (string bound noerror) + ;; Search for the first occurance of STRING or its translation. If + ;; found, move point to the end of the occurance, update + ;; isearch-match-beg and isearch-match-end, and return point. + (let ((func (isearch-search-fun)) + (len (length string)) + pos1 pos2) +(setq pos1 (save-excursion (funcall func string bound noerror))) +(if (and (char-table-p translation-table-for-input) +(> (string-bytes string) len)) + (let (translated match-data) + (dotimes (i len) + (let ((x (aref translation-table-for-input (aref string i + (when x + (or translated (setq translated (copy-sequence string))) + (aset translated i x + (when translated + (save-match-data + (save-excursion + (if (setq pos2 (funcall func translated bound noerror)) + (setq match-data (match-data t) + (when (and pos2 + (or (not pos1) + (if isearch-forward (< pos2 pos1) (> pos2 pos1 + (setq pos1 pos2) + (set-match-data match-data) +(if pos1 + (goto-char pos1)) +pos1)) + (defun isearch-search () ;; Do the search with the current search string. (isearch-message nil t) @@ -2008,9 +2036,7 @@ (setq isearch-error nil) (while retry (setq isearch-success - (funcall -(isearch-search-fun) -isearch-string nil t)) + (isearch-search-string isearch-string nil t)) ;; Clear RETRY unless we matched some invisible text ;; and we aren't supposed to do that. (if (or (eq search-invisible t) @@ -2353,7 +2379,7 @@ (isearch-regexp isearch-lazy-highlight-regexp) (search-spaces-regexp search-whitespace-regexp)) (condition-case nil - (funcall (isearch-search-fun) + (isearch-search-string isearch-lazy-highlight-last-string (if isearch-forward (min (or isearch-lazy-highlight-end-limit (point-max)) ___ emacs-pretest-bug mailing list emacs-pretest-bug@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
A while ago, I proposed to change isearch so that it translates characters by translation-table-for-input to solve such a problem, but there raised an objection that read-char should do that translation. RMS asked to check if such a change to read-char is surely safe or not, but as such a check is very difficult and time-consuiming, no one took on the job. So, this problem is still unfixed. I again propose to change isearch. Yes, let's do it that way. Could you do it now? David Kastrup wrote: "in the future", namely after the release, we are going to switch to the unicode2 branch and presumably the problem will go away. That is true, so we already have our long-term solution. For now, fixing isearch is sufficient. ___ emacs-pretest-bug mailing list emacs-pretest-bug@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
Kenichi Handa <[EMAIL PROTECTED]> writes: > Peter Dyballa <[EMAIL PROTECTED]> writes: > >> Launched with -Q > >> unify-8859-on-decoding-mode is nil >> unify-8859-on-encoding-mode is t > >> I start i-search in an Unicode encoded buffer (*Help*). In an ISO >> 8859-1 encoded buffer ä and Ä are found, also in ISO 8859-10, ISO >> 8859-13 (except for Ä), ISO 8859-14, and ISO 8859-16 encoded buffers, >> but fails in ISO 8859-2, ISO 8859-3, ISO 8859-4, ISO 8859-9, and ISO >> 8859-15. It is similiar to ö and ü, accept that these are not found >> in the ISO 8859-14 encoded buffer. Changing unify-8859-on-decoding- >> mode's value makes no difference. > > This is the story I remember. > > A while ago, I proposed to change isearch so that it > translates characters by translation-table-for-input to > solve such a problem, but there raised an objection that > read-char should do that translation. RMS asked to check if > such a change to read-char is surely safe or not, but as > such a check is very difficult and time-consuiming, no one > took on the job. > > So, this problem is still unfixed. > > I again propose to change isearch. When we know that > changing read-char is safe in the future, we can cancel that > change in isearch. "in the future", namely after the release, we are going to switch to the unicode2 branch and presumably the problem will go away. So it does not sound like we should attempt any complicated fix for Emacs 22 that is not going to stay around, anyway. If the one problem where people are complaining is search-and-replace, we should fix that case for Emacs 22 and that's it for Emacs 22, in my opinion. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ___ emacs-pretest-bug mailing list emacs-pretest-bug@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
In article <[EMAIL PROTECTED]>, Peter Dyballa <[EMAIL PROTECTED]> writes: > Hello! > Launched with -Q > unify-8859-on-decoding-mode is nil > unify-8859-on-encoding-mode is t > I start i-search in an Unicode encoded buffer (*Help*). In an ISO > 8859-1 encoded buffer ä and Ä are found, also in ISO 8859-10, ISO > 8859-13 (except for Ä), ISO 8859-14, and ISO 8859-16 encoded buffers, > but fails in ISO 8859-2, ISO 8859-3, ISO 8859-4, ISO 8859-9, and ISO > 8859-15. It is similiar to ö and ü, accept that these are not found > in the ISO 8859-14 encoded buffer. Changing unify-8859-on-decoding- > mode's value makes no difference. This is the story I remember. A while ago, I proposed to change isearch so that it translates characters by translation-table-for-input to solve such a problem, but there raised an objection that read-char should do that translation. RMS asked to check if such a change to read-char is surely safe or not, but as such a check is very difficult and time-consuiming, no one took on the job. So, this problem is still unfixed. I again propose to change isearch. When we know that changing read-char is safe in the future, we can cancel that change in isearch. --- Kenichi Handa [EMAIL PROTECTED] ___ emacs-pretest-bug mailing list emacs-pretest-bug@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
Peter Dyballa <[EMAIL PROTECTED]> writes: > Hello! > > Launched with -Q > > unify-8859-on-decoding-mode is nil > unify-8859-on-encoding-mode is t > > I start i-search in an Unicode encoded buffer (*Help*). In an ISO > 8859-1 encoded buffer ä and Ä are found, also in ISO 8859-10, ISO > 8859-13 (except for Ä), ISO 8859-14, and ISO 8859-16 encoded buffers, > but fails in ISO 8859-2, ISO 8859-3, ISO 8859-4, ISO 8859-9, and ISO > 8859-15. It is similiar to ö and ü, accept that these are not found > in the ISO 8859-14 encoded buffer. Changing unify-8859-on-decoding- > mode's value makes no difference. Could you supply a precise, step-by-step recipe for reproducing this problem? ___ emacs-pretest-bug mailing list emacs-pretest-bug@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug