On Tue, Feb 26, 2008 at 9:24 PM, erik quanstrom <[EMAIL PROTECTED]> wrote: > > i think the comments about this problem are missing the point > a bit. utf8 should be transparent to awk unless the situation demands
No. It is not transparent at all. It is semitranslucid because someone did it partways and because of that I have been bitten hardly by this in different situations (I am not complaining, just saying that this may not be the right approach to take in the future). What someone did is make it so: /a.j/ matches a☺j because someone fixed the regexp part of awk somehow it already understands this which made me (falsely) think originally that it works and conned me into the bug. There is split and other functions, for example: toupper("aí") gives Aí My guess is that there are many more little (or not) corners where it doesn't work. We can go on and on looking for crevices and hiding the bugs further under the rug so that they are not evident and find everyone completely unaware, leave awk as it is now or really fix the problem. The first approach doesn't work. I am going to take the second till I have time to take the third which means use runes or at least revise all the code so that it is uniformly aware of the existance of non-ascii characters. -- - curiosity sKilled the cat