Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings

2006-09-23 Thread Richard Stallman
I don't really care one way or another, but Peter (Dyballa) suggests
that it would be more user-friendly if non-ASCII characters
entered/searched-for via C-q  used a standard like unicode to
interpret , rather than Emacs internal character numbers as it
does now.

[In Emacs 23, of course, Emacs internal character numbers will _be_
unicode, so the distinction will go away.]

Let's leave it alone for now.


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings

2006-09-23 Thread Miles Bader
Peter Dyballa <[EMAIL PROTECTED]> writes:
> it needs to be emphasised that C-s C-q uses a Unicode search and does
> not take into account the file's proper encoding. Could be there are
> just a few that care about these encoding details.

That's misleading.  There's no "unicode search"; if the variable I added
is set to `ucs', it _converts_ a unicode codepoint entered via C-q to
Emacs' internal representation; after that, it works exactly like the
old C-q.

Since I-search (for instance) currently seems to correctly handle, for
instance, searching for a latin-1 ä in a latin-2 buffer -- even though
the underlying buffer representation is in fact different -- then
searching should continue to work correctly even in "unicode C-q mode".

[However, I think that character insertion via C-q won't work as the
user-expects; for instance, C-q e4 would insert a latin-1 ä even in a
unicode-2 buffer -- using the default settings, this situation will get
fixed up at file write time, because unify-8859-on-encoding-mode is on
by default, but until then, the inconsistent buffer contents might
confuse a user.]

-Miles
-- 
"Whatever you do will be insignificant, but it is very important that
 you do it."  Mahatma Ghandi


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings

2006-09-23 Thread Peter Dyballa


Am 23.09.2006 um 01:25 schrieb Miles Bader:


UCS codepoints are good because they allow _all_ emacs characters to
be entered in a consistent way.  Having C-q use the buffer's file
encoding on the other hand seems quite annoying, because it requires
users to use different numbers depending on what the file they're
editing was saved in (and I suspect a large portion of the time, users
don't even _know_ what encoding their file uses).


This is a good enough method for me! (And others probably too.) The  
problem I wanted to point out is that not the file's contents but its  
presentation forms are now found. This needs to be documented, and it  
needs to be emphasised that C-s C-q uses a Unicode search and does  
not take into account the file's proper encoding. Could be there are  
just a few that care about these encoding details.


This is like pressing u on the keyboard and an x appears on screen ...

--
Greetings

  Pete

Know thyself. Need help, call GOOGLE.




___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings

2006-09-22 Thread Miles Bader
Richard Stallman <[EMAIL PROTECTED]> writes:
> Could you give a self-contained explanation of why you propose this to
> be added now?

I don't really care one way or another, but Peter (Dyballa) suggests
that it would be more user-friendly if non-ASCII characters
entered/searched-for via C-q  used a standard like unicode to
interpret , rather than Emacs internal character numbers as it
does now.

[In Emacs 23, of course, Emacs internal character numbers will _be_
unicode, so the distinction will go away.]

-Miles

-- 
The car has become... an article of dress without which we feel uncertain,
unclad, and incomplete.  [Marshall McLuhan, Understanding Media, 1964]


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings

2006-09-22 Thread Richard Stallman
Whether this is a serious enough problem to consider adding a patch this
latein the release cycle to consider, I don't know.  [I think the
default value of read-quoted-char-charset would probably have to remain
nil though...]

Could you give a self-contained explanation of why you propose this to
be added now?



___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings

2006-09-22 Thread Peter Dyballa


Am 22.09.2006 um 13:27 schrieb Miles Bader:


Peter Dyballa <[EMAIL PROTECTED]> writes:

There is also the option to change the 'base' of the character code
notation from 8 to 16


This feature is supported; see the variable `read-quoted-char-radix'.


Right, it works a bit, i.e. in the ASCII range it works well. When it  
comes to ISO Latin it interprets all in ISO Latin-1, i.e. C-s C-q 0 0  
a 4 RET searches in an ISO 8859-16 encoded buffer for CURRENCY UNIT  
although it is EURO in this case. A translation to the buffer local  
encoding obviously does not happen ...



(setq read-quoted-char-radix 16)
(setq read-quoted-char-charset 'ucs)

After applying your patch this behaviour does not change, it's still  
assumed that the encoding is ISO Latin-1. 00A4 is categorically ``¦ 
´´. The improvement is that I can find via an Unicode value an ISO  
Latin encoded character – is this an improvement? The file code is A4  
in any ISO Latin case, and the character is U+20AC in Unicode when in  
ISO Latin-10/ISO Latin-0 or ISO Latin-9. This looks like a Do What I  
Mean. Really not bad! But the real way should be C-s C-q 2 4 4 RET or  
C-s C-q A 4 RET or C-s C-q 1 6 4 RET (decimal), because it searches  
for the codes one expects in the encoded file, and which does not work.


--
Greetings

  Pete

Some day we may discover how to make magnets that can point in any  
direction.






___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings

2006-09-22 Thread Miles Bader
Peter Dyballa <[EMAIL PROTECTED]> writes:
> There is also the option to change the 'base' of the character code
> notation from 8 to 16

This feature is supported; see the variable `read-quoted-char-radix'.

> This might be the correct way in a GNU Emacs way, but not in the way  an
> Emacs user would use it. Or can I type C-q 4245 RET to input ¥ in  some
> file? (Well, it actually works ...) Having to use other numbers  than
> the well-known three digits wide ones is not the usual user
> experience.

I suppose that a patch such as the following could be used to support
at least unicode input in `read-quoted-char' (the function underlying C-q).

(set `read-quoted-char-charset' to `ucs' to input unicode-codes)

Whether this is a serious enough problem to consider adding a patch this
latein the release cycle to consider, I don't know.  [I think the
default value of read-quoted-char-charset would probably have to remain
nil though...]

-Miles


2006-09-22  Miles Bader  <[EMAIL PROTECTED]>

* subr.el (read-quoted-char-charset): New variable.
(read-quoted-char): Use it.

--- orig/lisp/subr.el
+++ mod/lisp/subr.el
@@ -1539,6 +1548,17 @@
   :type '(choice (const 8) (const 10) (const 16))
   :group 'editing-basics)
 
+(defvar read-quoted-char-charset nil
+  "*The character-set used for numeric codepoints entered with 
`read-quoted-char'.
+If nil, Emacs' internal codepoints are used.")
+
+(custom-declare-variable-early
+ 'read-quoted-char-charset nil
+ "*The character-set used for numeric codepoints entered with 
`read-quoted-char'.
+If nil, Emacs' internal codepoints are used."
+  :type '(choice (const nil) (const ucs))
+  :group 'editing-basics)
+
 (defun read-quoted-char (&optional prompt)
   "Like `read-char', but do not allow quitting.
 Also, if the first character read is an octal digit,
@@ -1595,7 +1615,13 @@
(t (setq code translated
 done t)))
   (setq first nil))
-code))
+(if (null read-quoted-char-charset)
+   code
+  (let ((decoded (decode-char read-quoted-char-charset code)))
+   (when (null decoded)
+ (error "Invalid %s character: %d, #o%o, #x%x"
+read-quoted-char-charset code code code))
+   decoded
 
 (defun read-passwd (prompt &optional confirm default)
   "Read a password, prompting with PROMPT, and return it.


-- 
The secret to creativity is knowing how to hide your sources.
  --Albert Einstein


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings

2006-09-22 Thread Peter Dyballa


Am 22.09.2006 um 12:31 schrieb Miles Bader:


Peter Dyballa <[EMAIL PROTECTED]> writes:

C-s C-q 245 in ISO 8859-16 does not find ``„´´ (U+201E) – mini-
buffer tells me that ``¥´´ (\245 in ISO 8859-1) cannot be found.


That's because the numeric code following C-q is _not_ a unicode code
point, it's an Emacs character code.  In Emacs 22 those two things are
very different (in Emacs 23, I guess they are the same, as Emacs 23  
uses

unicode for its internal codes).

You can see the "Emacs character code" of a character by hitting C-x =
on top of that character in a buffer.

E.g., C-x = says that ``„´´ has Emacs code 1234576, and indeed  
entering

`C-s C-q 1234576 RET' successfully searches for „ !  Similarly, the
Emacs code for ¥ is 4245, and that also works correctly following C-q.


This might be the correct way in a GNU Emacs way, but not in the way  
an Emacs user would use it. Or can I type C-q 4245 RET to input ¥ in  
some file? (Well, it actually works ...) Having to use other numbers  
than the well-known three digits wide ones is not the usual user  
experience. The so-called character code is a known quantity and  
supported by some operating systems. (There is also the option to  
change the 'base' of the character code notation from 8 to 16, to be  
able to input the Unicode slot number. This should work also IMO.)


--
Greetings

  Pete

  Basic, n.:
A programming language.  Related to certain social diseases in
that those who have it will not admit it in polite company.




___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings

2006-09-22 Thread Miles Bader
Peter Dyballa <[EMAIL PROTECTED]> writes:
>   C-s C-q 245 in ISO 8859-16 does not find ``„´´ (U+201E) – mini- 
> buffer tells me that ``¥´´ (\245 in ISO 8859-1) cannot be found.

That's because the numeric code following C-q is _not_ a unicode code
point, it's an Emacs character code.  In Emacs 22 those two things are
very different (in Emacs 23, I guess they are the same, as Emacs 23 uses
unicode for its internal codes).

You can see the "Emacs character code" of a character by hitting C-x =
on top of that character in a buffer.

E.g., C-x = says that ``„´´ has Emacs code 1234576, and indeed entering
`C-s C-q 1234576 RET' successfully searches for „ !  Similarly, the
Emacs code for ¥ is 4245, and that also works correctly following C-q.

> Which is the formula to map octal 0156772 to a Unicode slot/position?
> Octal 0156772 is DDFA in hex, which is different from 5B57, 字's
> position in Unicode.

(encode-char #o156772 'ucs)
  => 23383 (#o55527, #x5b57)

> Or: how can I find the octal value for a given Unicode slot (U+ABCD)?

(decode-char 'ucs #x5b57)
  => 56826 (#o156772, #xddfa)

[There seems to be no such unicode character #xABCD known to Emacs.]

Note that (decode-char 'ucs CODE) continues to work properly in Emacs
23, even though Emacs internal codes are completely different (in Emacs
23, of course, it basically just returns its 2nd argument), so it seems
a good function to use for code portable between Emacs 22 and 23.

-Miles

-- 
(\(\
(^.^)
(")")
*This is the cute bunny virus, please copy this into your sig so it can spread.


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings

2006-09-22 Thread Peter Dyballa


Am 22.09.2006 um 03:06 schrieb Kenichi Handa:

In article <[EMAIL PROTECTED]>, Peter  
Dyballa <[EMAIL PROTECTED]> writes:



OK, you're right: it really works better now, I had made some
mistake! I wonder whether I picked up the characters with C-s C-w ...
As you wrote, this won't work.


It didn't work, but should work now.  I attached 3 files
(temp1,2,7 encoded in iso-8859-1,2,7 respectively).
  C-x C-f temp1 RET ESC < C-n C-s C-w C-x C-f temp2 C-s C-s
should find " á", and
  C-x C-f temp1 RET ESC < C-n C-n C-s C-w C-x C-f temp7 C-s C-s
should find " °".


Yes, I can confirm: it works! It works also in my own test files –  
except one: the ISO 8859-6 encoded one. I was searching for HYPHEN- 
MINUS, U+00AD. I'll attach my test file. It could be also useful in  
the ISO 8859-6 possible bug I reported recently.






Anyway, what also does not work is: C-s C-q . For those with really small keyboards this
is the (almost?) only chance to find some of the x times 64 K
characters in Unicode ...


This should work now too.  For instance, " " and "á" are
0255 and 0341 in iso-8859-1 charset.  So, if your primary
charset is iso-8859-1, C-q 255 C-q 341 RET should input
" á".  And,
  C-x C-f temp2 ESC < C-s C-q 255 C-q 341 RET
should find " á" even if the characters in that buffer is
from iso-8859-2.



I did not try this test because it is too simple: LATIN SMALL LETTER  
A WITH ACUTE (U+00E1) is in the two encodings on 341/225/E1.


Please use my answer to Miles Bader as test case! I can send you my  
other ISO 8859-X test files.



--
Greetings

  Pete

"Eternity is a terrible thought. I mean, where's it going to end?"
- Tom Stoppard

;;; -*- coding: iso-8859-6; -*-
;
;   Time-stamp: <2006-09-22 00:25:10 pete>
;
;   Arabic Glyphs
;
;   oct   dec   hexUCS2UTF-8
;=
  = 240 = 160 = A0 = U+00A0 =C2 A0 : NO-BREAK SPACE
¤ = 244 = 164 = A4 = U+00A4 =C2 A4 : CURRENCY SIGN
¬ = 254 = 172 = AC = U+060C =D8 8C : ARABIC COMMA
­ = 255 = 173 = AD = U+00AD =C2 AD : HYPHEN-MINUS
» = 273 = 187 = BB = U+061B =D8 9B : ARABIC SEMICOLON
¿ = 277 = 191 = BF = U+061F =D8 9F : ARABIC QUESTION MARK
Á = 301 = 193 = C1 = U+0621 =D8 A1 : ARABIC LETTER HAMZA
 = 302 = 194 = C2 = U+0622 =D8 A2 : ARABIC LETTER ALEF WITH MADDA ABOVE
à = 303 = 195 = C3 = U+0623 =D8 A3 : ARABIC LETTER ALEF WITH HAMZA ABOVE
Ä = 304 = 196 = C4 = U+0624 =D8 A4 : ARABIC LETTER WAW WITH HAMZA ABOVE
Å = 305 = 197 = C5 = U+0625 =D8 A5 : ARABIC LETTER ALEF WITH HAMZA BELOW
Æ = 306 = 198 = C6 = U+0626 =D8 A6 : ARABIC LETTER YEH WITH HAMZA ABOVE
Ç = 307 = 199 = C7 = U+0627 =D8 A7 : ARABIC LETTER ALEF
È = 310 = 200 = C8 = U+0628 =D8 A8 : ARABIC LETTER BEH
É = 311 = 201 = C9 = U+0629 =D8 A9 : ARABIC LETTER TEH MARBUTA
Ê = 312 = 202 = CA = U+062A =D8 AA : ARABIC LETTER TEH
Ë = 313 = 203 = CB = U+062B =D8 AB : ARABIC LETTER THEH
Ì = 314 = 204 = CC = U+062C =D8 AC : ARABIC LETTER JEEM
Í = 315 = 205 = CD = U+062D =D8 AD : ARABIC LETTER HAH
Î = 316 = 206 = CE = U+062E =D8 AE : ARABIC LETTER KHAH
Ï = 317 = 207 = CF = U+062F =D8 AF : ARABIC LETTER DAL
Ð = 320 = 208 = D0 = U+0630 =D8 B0 : ARABIC LETTER THAL
Ñ = 321 = 209 = D1 = U+0631 =D8 B1 : ARABIC LETTER REHe
Ò = 322 = 210 = D2 = U+0632 =D8 B2 : ARABIC LETTER ZAIN
Ó = 323 = 211 = D3 = U+0633 =D8 B3 : ARABIC LETTER SEEN
Ô = 324 = 212 = D4 = U+0634 =D8 B4 : ARABIC LETTER SHEEN
Õ = 325 = 213 = D5 = U+0635 =D8 B5 : ARABIC LETTER SAD
Ö = 326 = 214 = D6 = U+0636 =D8 B6 : ARABIC LETTER DAD
× = 327 = 215 = D7 = U+0637 =D8 B7 : ARABIC LETTER TAH
Ø = 330 = 216 = D8 = U+0638 =D8 B8 : ARABIC LETTER ZAH
Ù = 331 = 217 = D9 = U+0639 =D8 B9 : ARABIC LETTER AIN
Ú = 332 = 218 = DA = U+063A =D8 BA : ARABIC LETTER GHAIN
à = 340 = 224 = E0 = U+0640 =D9 80 : ARABIC TATWEEL
á = 341 = 225 = E1 = U+0641 =D9 81 : ARABIC LETTER FEH
â = 342 = 226 = E2 = U+0642 =D9 82 : ARABIC LETTER QAF
ã = 343 = 227 = E3 = U+0643 =D9 83 : ARABIC LETTER KAF
ä = 344 = 228 = E4 = U+0644 =D9 84 : ARABIC LETTER LAM
å = 345 = 229 = E5 = U+0645 =D9 85 : ARABIC LETTER MEEM
æ = 346 = 230 = E6 = U+0646 =D9 86 : ARABIC LETTER NOON
ç = 347 = 231 = E7 = U+0647 =D9 87 : ARABIC LETTER HEH
è = 350 = 232 = E8 = U+0648 =D9 88 : ARABIC LETTER WAW
é = 351 = 233 = E9 = U+0649 =D9 89 : ARABIC LETTER ALEF MAKSURA
ê = 352 = 234 = EA = U+064A =D9 8A : ARABIC LETTER YEH
ë = 353 = 235 = EB = U+064B =D9 8B : ARABIC FATHATAN
ì = 354 = 236 = EC = U+064C =D9 8C : ARABIC DAMMATAN
í = 355 = 237 = ED = U+064D =D9 8D : ARABIC KASRATAN
î = 356 = 238 = EE = U+064E =D9 8E : ARABIC FATHA
ï = 357 = 239 = EF = U+064F =D9 8F : ARABIC DAMMA
ð = 360 = 240 = F0 = U+0650 =D9 90 : ARABIC KASRA
ñ = 361 = 241 = F1 = U+0651 =D9 91 : ARABIC SHADDA
ò = 362 = 242 = F2 = U+0652 =D9 92 : ARABIC SUKUN


__

Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings

2006-09-22 Thread Peter Dyballa


Am 22.09.2006 um 02:44 schrieb Miles Bader:


Peter Dyballa <[EMAIL PROTECTED]> writes:
Anyway, what also does not work is: C-s C-q greater

177 octal code>. For those with really small keyboards this  is the
(almost?) only chance to find some of the x times 64 K  characters in
Unicode ...


Eh?  It works for me:

E.g., the Emacs 22 character code of "字" is octal 0156772.

If I enter C-s C-q 0156772 (followed by some other char to  
terminate the
octal code), it correctly adds that character to the search string  
(and

finds in the buffer).



OK, I did not check in the "higher" Unicode regions, and I did not  
check in an UTF-8 encoded buffer, and I did not input so long numbers  
I cannot compute, I was still in my simple ISO 8859-X test files  
(your example works for me too in an UTF-8 encoded buffer). After  
launching GNU Emacs 22.0.50 with -Q the phenomenon seems to be that  
input like


C-s C-q <[23][0-7][0-7]> RET

is interpreted as trying to "name/point to" an ISO 8859-1 encoded  
character. For example:


	C-s C-q 245 in ISO 8859-16 does not find ``„´´ (U+201E) – mini- 
buffer tells me that ``¥´´ (\245 in ISO 8859-1) cannot be found.


C-s C-q 241 RET searches for ¡.
C-s C-q 242 RET searches for ¢.
C-s C-q 243 RET searches for £.
C-s C-q 244 RET searches for ¤ (CURRENCY SIGN, U+00A4).

Evaluating (unify-8859-on-decoding-mode t) does not change this  
specific behaviour.




Which is the formula to map octal 0156772 to a Unicode slot/position?  
Octal 0156772 is DDFA in hex, which is different from 5B57, 字's  
position in Unicode. Or: how can I find the octal value for a given  
Unicode slot (U+ABCD)? There is probably some function for this  
purpose ...


--
Greetings

  Pete

"It isn't pollution that's harming the environment. It's the  
impurities in our air and water that are doing it."






___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings

2006-09-21 Thread Miles Bader
Peter Dyballa <[EMAIL PROTECTED]> writes:
> Anyway, what also does not work is: C-s C-q  177 octal code>. For those with really small keyboards this  is the
> (almost?) only chance to find some of the x times 64 K  characters in
> Unicode ...

Eh?  It works for me:

E.g., the Emacs 22 character code of "字" is octal 0156772.

If I enter C-s C-q 0156772 (followed by some other char to terminate the
octal code), it correctly adds that character to the search string (and
finds in the buffer).

-Miles

-- 
P.S.  All information contained in the above letter is false,
  for reasons of military security.


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings

2006-09-21 Thread Kenichi Handa
In article <[EMAIL PROTECTED]>, Peter Dyballa <[EMAIL PROTECTED]> writes:

> OK, you're right: it really works better now, I had make some  
> mistake! I wonder whether I picked up the characters with C-s C-w ...  
> As you wrote, this won't work.

It didn't work, but should work now.  I attached 3 files
(temp1,2,7 encoded in iso-8859-1,2,7 respectively).
  C-x C-f temp1 RET ESC < C-n C-s C-w C-x C-f temp2 C-s C-s
should find "­á", and
  C-x C-f temp1 RET ESC < C-n C-n C-s C-w C-x C-f temp7 C-s C-s
should find "­°".

> Anyway, what also does not work is: C-s C-q  greater 177 octal code>. For those with really small keyboards this  
> is the (almost?) only chance to find some of the x times 64 K  
> characters in Unicode ...

This should work now too.  For instance, "­" and "á" are
0255 and 0341 in iso-8859-1 charset.  So, if your primary
charset is iso-8859-1, C-q 255 C-q 341 RET should input
"­á".  And,
  C-x C-f temp2 ESC < C-s C-q 255 C-q 341 RET
should find "­á" even if the characters in that buffer is
from iso-8859-2.

---
Kenichi Handa
[EMAIL PROTECTED]



temp.tar.gz
Description: Binary data
___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings

2006-09-21 Thread Peter Dyballa


Am 21.09.2006 um 04:13 schrieb Kenichi Handa:

In article <[EMAIL PROTECTED]>, Peter  
Dyballa <[EMAIL PROTECTED]> writes:



The CVS code is from Sunday or Monday. After applying your patch
nothing changes for my simple test (emacs-22.0.50 -Q). I did it also
for °, which can't be found in ISO 8859-7 and ISO 8859-8 although it
exists there additionally/instead of ä.


Hmmm, strange, it doesn't fail for me.  Are you sure that
Emacs is re-built after isearch.el is byte-compiled?



OK, you're right: it really works better now, I had make some  
mistake! I wonder whether I picked up the characters with C-s C-w ...  
As you wrote, this won't work.


Anyway, what also does not work is: C-s C-q greater 177 octal code>. For those with really small keyboards this  
is the (almost?) only chance to find some of the x times 64 K  
characters in Unicode ...


--
Greetings

  Pete

Hard Disk:  A device that allows users to delete vast quantities of  
data with

simple mnemonic commands.





___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings

2006-09-21 Thread Richard Stallman
So, I've just installed the attached change.  But, there
still exists a case that isearch fails.  For instance, if
your buffer's buffer-file-coding-system is iso-8859-2, and
you somehow insert a-acute of iso-8859-1, isearch won't be
able to find that a-acute.  The fix for that case is very
difficult in Emacs 22.

Do you think we should document this in the Emacs manual?


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings

2006-09-21 Thread Peter Dyballa


Am 21.09.2006 um 04:13 schrieb Kenichi Handa:

In article <[EMAIL PROTECTED]>, Peter  
Dyballa <[EMAIL PROTECTED]> writes:



The CVS code is from Sunday or Monday. After applying your patch
nothing changes for my simple test (emacs-22.0.50 -Q). I did it also
for °, which can't be found in ISO 8859-7 and ISO 8859-8 although it
exists there additionally/instead of ä.


Hmmm, strange, it doesn't fail for me.  Are you sure that
Emacs is re-built after isearch.el is byte-compiled?



Yes, I did. I made this mistake a few times so I learned of it.  
Actually I just re-made and installed GNU Emacs and then byte- 
compiled isearch.el in /usr/local/share/emacs/22.0.50/lisp.


I'll cvs-update tomorrow or on Saturday and I'll check isearch.el  
again (I'm a bit busy today).


--
Greetings

  Pete

Time flies like an error -- but fruit flies like a banana!
 (almost Groucho Marx)




___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings

2006-09-20 Thread Kenichi Handa
In article <[EMAIL PROTECTED]>, Peter Dyballa <[EMAIL PROTECTED]> writes:

> The CVS code is from Sunday or Monday. After applying your patch  
> nothing changes for my simple test (emacs-22.0.50 -Q). I did it also  
> for °, which can't be found in ISO 8859-7 and ISO 8859-8 although it  
> exists there additionally/instead of ä.

Hmmm, strange, it doesn't fail for me.  Are you sure that
Emacs is re-built after isearch.el is byte-compiled?

---
Kenichi Handa
[EMAIL PROTECTED]


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings

2006-09-20 Thread Peter Dyballa


Am 20.09.2006 um 10:05 schrieb Kenichi Handa:


My test was very simple: I opened the ISO 8859-1 encoded file (starts
with ;;; -*- mode: Text; coding: iso-8859-1; -*-) and typed C-s ä C-s
RET. The I opened the other ISO Latin test file, which all have a
coding set in the first line. Then I re-used the ä via C-s C-s.


That "re-using" is also the case that the previous change
didn't take care.  Could you please try the test with the
latest code?


The CVS code is from Sunday or Monday. After applying your patch  
nothing changes for my simple test (emacs-22.0.50 -Q). I did it also  
for °, which can't be found in ISO 8859-7 and ISO 8859-8 although it  
exists there additionally/instead of ä.


For ­, U+00AD HYPHEN-MINUS, the most common character, I get when  
starting from ISO 8859-1:


	failure: ISO 8859-2, ISO 8859-3, ISO 8859-4, ISO 8859-5, ISO 8859-7,  
ISO 8859-8, ISO 8859-9, ISO 8859-14, ISO 8859-15

success: ISO 8859-10, ISO 8859-13, ISO 8859-16

although in all these 13 encodings it's (oct/dec/hex) 255 - 173 - AD.  
Starting the search for HYPHEN-MINUS in an ISO 8859-15 encoded file  
it's found in no other ISO 8859 encoded file.



This is not satisfactory. This is not unified.

--
Greetings

  Pete

Some day we may discover how to make magnets that can point in any  
direction.






___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings

2006-09-20 Thread Kenichi Handa
In article <[EMAIL PROTECTED]>, Peter Dyballa <[EMAIL PROTECTED]> writes:

> My test was very simple: I opened the ISO 8859-1 encoded file (starts  
> with ;;; -*- mode: Text; coding: iso-8859-1; -*-) and typed C-s ä C-s  
> RET. The I opened the other ISO Latin test file, which all have a  
> coding set in the first line. Then I re-used the ä via C-s C-s.

That "re-using" is also the case that the previous change
didn't take care.  Could you please try the test with the
latest code?

---
Kenichi Handa
[EMAIL PROTECTED]


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings

2006-09-20 Thread Peter Dyballa


Am 20.09.2006 um 09:10 schrieb Kenichi Handa:


The problem is that the change took care only for a typed
character.  If isearch-string is set from a (possibly
different) buffer (e.g. by C-s C-w), the translation doesn't
happen.


My test was very simple: I opened the ISO 8859-1 encoded file (starts  
with ;;; -*- mode: Text; coding: iso-8859-1; -*-) and typed C-s ä C-s  
RET. The I opened the other ISO Latin test file, which all have a  
coding set in the first line. Then I re-used the ä via C-s C-s.


I do not set buffer-file-coding-system, it's mule-utf-8 from UTF-8 in  
LC_CTYPE or LANG. But the value is adjusted to a local value due to  
the '-*- coding: iso-8859-X; -*-' in the files' first lines.


--
Greetings

  Pete

"What do you think of Western Civilisation?"
"I think it would be a good idea!"
 -- Mohandas Karamchand Gandhi




___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings

2006-09-20 Thread Kenichi Handa
In article <[EMAIL PROTECTED]>, Richard Stallman <[EMAIL PROTECTED]> writes:

> A while ago, I proposed to change isearch so that it
> translates characters by translation-table-for-input to
> solve such a problem, but there raised an objection that
> read-char should do that translation.  RMS asked to check if
> such a change to read-char is surely safe or not, but as
> such a check is very difficult and time-consuiming, no one
> took on the job.

> So, this problem is still unfixed.

> I again propose to change isearch.

> Yes, let's do it that way.  Could you do it now?

Oops, it seems that my brain is seriously damaged :-(.  I
have already installed such a change (perhaps accoding to
your decision).

The problem is that the change took care only for a typed
character.  If isearch-string is set from a (possibly
different) buffer (e.g. by C-s C-w), the translation doesn't
happen.

So, I've just installed the attached change.  But, there
still exists a case that isearch fails.  For instance, if
your buffer's buffer-file-coding-system is iso-8859-2, and
you somehow insert a-acute of iso-8859-1, isearch won't be
able to find that a-acute.  The fix for that case is very
difficult in Emacs 22.

---
Kenichi Handa
[EMAIL PROTECTED]

2006-09-20  Kenichi Handa  <[EMAIL PROTECTED]>

* isearch.el (isearch-process-search-char): Cancel the previous
change.
(isearch-search-string): New function.
(isearch-search): Use isearch-search-string.
(isearch-lazy-highlight-search): Likewise.

Index: isearch.el
===
RCS file: /cvsroot/emacs/emacs/lisp/isearch.el,v
retrieving revision 1.289
retrieving revision 1.290
diff -u -r1.289 -r1.290
--- isearch.el  9 Jul 2006 11:04:18 -   1.289
+++ isearch.el  20 Sep 2006 06:13:43 -  1.290
@@ -1807,8 +1807,6 @@
((eq   char ?|)   (isearch-fallback t nil t)))
 
   ;; Append the char to the search string, update the message and re-search.
-  (if (char-table-p translation-table-for-input)
-  (setq char (or (aref translation-table-for-input char) char)))
   (isearch-process-search-string
(char-to-string char)
(if (>= char ?\200)
@@ -1993,6 +1991,36 @@
  (t
   (if isearch-forward 'search-forward 'search-backward)
 
+(defun isearch-search-string (string bound noerror)
+  ;; Search for the first occurance of STRING or its translation.  If
+  ;; found, move point to the end of the occurance, update
+  ;; isearch-match-beg and isearch-match-end, and return point.
+  (let ((func (isearch-search-fun))
+   (len (length string))
+   pos1 pos2)
+(setq pos1 (save-excursion (funcall func string bound noerror)))
+(if (and (char-table-p translation-table-for-input)
+(> (string-bytes string) len))
+   (let (translated match-data)
+ (dotimes (i len)
+   (let ((x (aref translation-table-for-input (aref string i
+ (when x
+   (or translated (setq translated (copy-sequence string)))
+   (aset translated i x
+ (when translated
+   (save-match-data
+ (save-excursion
+   (if (setq pos2 (funcall func translated bound noerror))
+   (setq match-data (match-data t)
+   (when (and pos2
+  (or (not pos1)
+  (if isearch-forward (< pos2 pos1) (> pos2 pos1
+ (setq pos1 pos2)
+ (set-match-data match-data)
+(if pos1
+   (goto-char pos1))
+pos1))
+
 (defun isearch-search ()
   ;; Do the search with the current search string.
   (isearch-message nil t)
@@ -2008,9 +2036,7 @@
(setq isearch-error nil)
(while retry
  (setq isearch-success
-   (funcall
-(isearch-search-fun)
-isearch-string nil t))
+   (isearch-search-string isearch-string nil t))
  ;; Clear RETRY unless we matched some invisible text
  ;; and we aren't supposed to do that.
  (if (or (eq search-invisible t)
@@ -2353,7 +2379,7 @@
(isearch-regexp isearch-lazy-highlight-regexp)
(search-spaces-regexp search-whitespace-regexp))
 (condition-case nil
-   (funcall (isearch-search-fun)
+   (isearch-search-string
 isearch-lazy-highlight-last-string
 (if isearch-forward
 (min (or isearch-lazy-highlight-end-limit (point-max))


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings

2006-09-19 Thread Richard Stallman
A while ago, I proposed to change isearch so that it
translates characters by translation-table-for-input to
solve such a problem, but there raised an objection that
read-char should do that translation.  RMS asked to check if
such a change to read-char is surely safe or not, but as
such a check is very difficult and time-consuiming, no one
took on the job.

So, this problem is still unfixed.

I again propose to change isearch.

Yes, let's do it that way.  Could you do it now?

David Kastrup wrote:

"in the future", namely after the release, we are going to switch to
the unicode2 branch and presumably the problem will go away.

That is true, so we already have our long-term solution.
For now, fixing isearch is sufficient.



___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings

2006-09-18 Thread David Kastrup
Kenichi Handa <[EMAIL PROTECTED]> writes:

> Peter Dyballa <[EMAIL PROTECTED]> writes:
>
>> Launched with -Q
>
>>  unify-8859-on-decoding-mode is nil
>>  unify-8859-on-encoding-mode is t
>
>> I start i-search in an Unicode encoded buffer (*Help*). In an ISO  
>> 8859-1 encoded buffer ä and Ä are found, also in ISO 8859-10, ISO  
>> 8859-13 (except for Ä), ISO 8859-14, and ISO 8859-16 encoded buffers,  
>> but fails in ISO 8859-2, ISO 8859-3, ISO 8859-4, ISO 8859-9, and ISO  
>> 8859-15. It is similiar to ö and ü, accept that these are not found  
>> in the ISO 8859-14 encoded buffer. Changing unify-8859-on-decoding- 
>> mode's value makes no difference.
>
> This is the story I remember.
>
> A while ago, I proposed to change isearch so that it
> translates characters by translation-table-for-input to
> solve such a problem, but there raised an objection that
> read-char should do that translation.  RMS asked to check if
> such a change to read-char is surely safe or not, but as
> such a check is very difficult and time-consuiming, no one
> took on the job.
>
> So, this problem is still unfixed.
>
> I again propose to change isearch.  When we know that
> changing read-char is safe in the future, we can cancel that
> change in isearch.

"in the future", namely after the release, we are going to switch to
the unicode2 branch and presumably the problem will go away.  So it
does not sound like we should attempt any complicated fix for Emacs 22
that is not going to stay around, anyway.  If the one problem where
people are complaining is search-and-replace, we should fix that case
for Emacs 22 and that's it for Emacs 22, in my opinion.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings

2006-09-18 Thread Kenichi Handa
In article <[EMAIL PROTECTED]>, Peter Dyballa <[EMAIL PROTECTED]> writes:

> Hello!
> Launched with -Q

>   unify-8859-on-decoding-mode is nil
>   unify-8859-on-encoding-mode is t

> I start i-search in an Unicode encoded buffer (*Help*). In an ISO  
> 8859-1 encoded buffer ä and Ä are found, also in ISO 8859-10, ISO  
> 8859-13 (except for Ä), ISO 8859-14, and ISO 8859-16 encoded buffers,  
> but fails in ISO 8859-2, ISO 8859-3, ISO 8859-4, ISO 8859-9, and ISO  
> 8859-15. It is similiar to ö and ü, accept that these are not found  
> in the ISO 8859-14 encoded buffer. Changing unify-8859-on-decoding- 
> mode's value makes no difference.

This is the story I remember.

A while ago, I proposed to change isearch so that it
translates characters by translation-table-for-input to
solve such a problem, but there raised an objection that
read-char should do that translation.  RMS asked to check if
such a change to read-char is surely safe or not, but as
such a check is very difficult and time-consuiming, no one
took on the job.

So, this problem is still unfixed.

I again propose to change isearch.  When we know that
changing read-char is safe in the future, we can cancel that
change in isearch.

---
Kenichi Handa
[EMAIL PROTECTED]


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings

2006-09-18 Thread Chong Yidong
Peter Dyballa <[EMAIL PROTECTED]> writes:

> Hello!
>
> Launched with -Q
>
>   unify-8859-on-decoding-mode is nil
>   unify-8859-on-encoding-mode is t
>
> I start i-search in an Unicode encoded buffer (*Help*). In an ISO
> 8859-1 encoded buffer ä and Ä are found, also in ISO 8859-10, ISO
> 8859-13 (except for Ä), ISO 8859-14, and ISO 8859-16 encoded buffers,
> but fails in ISO 8859-2, ISO 8859-3, ISO 8859-4, ISO 8859-9, and ISO
> 8859-15. It is similiar to ö and ü, accept that these are not found
> in the ISO 8859-14 encoded buffer. Changing unify-8859-on-decoding- 
> mode's value makes no difference.

Could you supply a precise, step-by-step recipe for reproducing this
problem?



___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug