On Mon, Sep 01, 2003 at 03:02:21PM -0400, Vadim Vygonets wrote: > Quoth Tzafrir Cohen on Mon, Sep 01, 2003: > > A small test (I hope you won't mind the Hebrew): > > [snip -- can't do Hebrew ATM] > > > It should have given the same output. Indeed the range between the Yud > > and the Tav worked, so the regex worked on multibyte Hebrew chars. > > No it didn't. It replaced vav, which is not between yud and tav. > I tried replacing the range yud-to-lamed, and it happily gave me > the same output (i.e., it replaced shin as well). Something is > wrong here; and if you think for a second how sed works and how > UTF-8 is encoded, you will immediately see what it is.
So it seems it was working on the bytes level after-all (and not replacing the Vav). OTOH: even when I put multiple Lamed-s, I got the same output. > > Try to do "| sed s/....../foo/" and see what happens -- you will > get "fooM", where M is mem sofit. I'm not exactly sure what should happen. On a redhat 9.0 computer I get different results. I figure the behaviour here is still "undefined" > > > And I had hell of a time editing this: I practically couldn't insert > > text, because bash calculated internally Hebrew chars as taking two > > places (assumed here char==byte). > > I used mlterm to test it, and my zsh had problems as well. > (mlterm 2.7.0, zsh 4.0.6, FreeBSD 4.8-STABLE) tcsh and zsh on RH7.3 simply don't support multi-byte chars. They display UTF-8 as two different chars. The same goes for tcsh on RH9. I couldn't check for zsh. Is this a matter of missig some compile-time switches? > > > But this is RedHat 7.3, and the version of bash doesn't support UTF-8 > > well enough. In RH9 it seems much better. I checked it, and it is indeed working well (allows editing). Consider 7.3 a sort of "pre-release" regarding unicode support. > > That's exactly what I'm talking about. That thing supports this > encoding, this thing doesn't, and what you have *in the end* is a > system which, in some rare situations, can take Unicode text and > deal with it, but mostly it can't. The assumption of single-byte > characters shines through, and if you're not careful it bites > you. When you have a file name on your system, what exactly does it mean? > > > > Good to know, thanks. Will mutt re-code text from anything to > > > Unicode? > > > > Yes. (Thus is generally more "sensetive" than most GUI clients to bad > > encoding, as overriding bad encoding tends to be a less than trivial > > operation) > > You lost me here. What do you mean by overriding bad encoding, > and what do other apps do? Look at the title of this thread. (Though I know of no easy way to override the subject and sender/recepients names in standard GUIs). Think if the same happens to the content. -- Tzafrir Cohen +---------------------------+ http://www.technion.ac.il/~tzafrir/ |vim is a mutt's best friend| mailto:[EMAIL PROTECTED] +---------------------------+ ================================================================= To unsubscribe, send mail to [EMAIL PROTECTED] with the word "unsubscribe" in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]