* Alex Malinovich ([EMAIL PROTECTED]) [030614 03:59]: > I've been working on converting my system over to using UTF-8 wherever > possible. I've already configured galeon, evolution, gnome-terminal and > just about every other graphical application to use UTF-8 by default. > I've set my locale to "en_US.UTF-8". And just about everything works > just fine. Unfortunately, as I'm not all that familiar with all of the > details of an i18n interface, there are a few things that still elude > me. > > 1) I've set up an .Xmodmap file to map my left Windows key to Multi_key > so that I can type extended characters. However, I have to run "xmodmap > .Xmodmap" manually every time I restart X. I'm guessing that I should > put this in an X startup script. A .bashrc equivalent for X. > Unfortunately, I'm not sure what the proper file to put it in is.
If this is all your .Xmodmap file does, you might think about just using Option "XkbModel" "pc104compose" in your /etc/X11/XF86Config-4 . This will make the change "global": every time the X server starts, the right windows key will by Multi_key. No futzing with xmodmap required. See /etc/X11/xkb/symbols/us (and other files in that directory, if not using us) for different things you can use for your XkbModel. In fact, even if you are using xmodmap for other things, you might consdier making the above change and removing the rwin->multi_key mapping from your ~/.Xmodmap . That's up to you. As to getting xmodmap to load your ~/.Xmodmap each time you start X, you might want to craft your own ~/.xsession . Assuming /etc/X11/Xsession.options contains allow-user-xsession (which it should, by default), all you need to do is create a ~/.xesssion file, and the global Xsession (/etc/X11/Xsession) will exec it by default, after setting up any other neat tricks the debian packages have added to /etc/X11/Xsession.d (i.e. starting an ssh-agent, etc.). I probably shouldn't have mentioned so many files above; it may have been confusing. The short of it is that you create a ~/.xsession file and put something like this in it: xmodmap ~/.xmodmap exec x-session-manager # EOF For another example, on my laptop I have no x-session-manager, I just use WindowMaker. My ~/.xsession looks like this: screensaver -nosplash & exec x-window-manager > 5) Just to satisfy my own curiosity, could someone explain the > difference between all of the different UTF flavors? I've seen UTF-7, > UTF-8, UTF-16, etc. My first guess would be that the number represents > the number of bits used to represent any single character. Yet that > seems unlikely since UTF-8 has WELL over 255 characters. Could anyone > enlighten me? Before UTF-8 came along, there were UCS-2 and UCS-4, which used 2 and 4 bytes per character respectively. The negative aspects were that files consisting of only ASCII characters encoded in UCS-4, for example, would be 4 times larger and incompatible with non-Unicode-aware tools. UCS-2 could represent U0000-UFFFF, and UCS-4 U00000000-U7FFFFFFF. I don't know much about UTF-16 and UTF-32, but I know that they're compatible with UCS-2 and UCS-4 respectively. I believe UTF-16 has a 21-bit capacity. UTF-8 is a variable-length encoding. (As I type that, I think to myself that I should point out that I'm no expert in this field, and I may not be using the canonical terminology.) UTF-8 can represent up to U7FFFFFFF, which means a whole heckuvalot of characters. It works by using 1-6 bytes per character. The first 128 bytes are simply the ASCII character set. This is one of the reasons UTF-8 is great; it's backwards-compatible ASCII, but it's not limited to 256 characters. Let me switch back to hex, since it's easier on my brain. Then check out this table: U00000000-U0000007F 0xxxxxxx U00000080-U000007FF 110xxxxx 10xxxxxx U00000800-U0000FFFF 1110xxxx 10xxxxxx 10xxxxxx U00010000-U001FFFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx U00020000-U03FFFFFF 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx U04000000-U7FFFFFFF 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx (that was a fun little exercise in hex arithmetic!) The 'x' characters represent bits used in encoding the character data. The others are the overhead. 10xxxxxx is used as a continuation byte, and any byte starting with 11xxxxxx is the start of a multi-byte sequence. The number of initial ones shows how long this sequence will be. The largest sequences, starting with 1111110x can represent a 31-bit character. I believe that there's a unicode howto around. I learned what I know from Markus Kuhn's web site. He also taught me about the ISO paper sizes, which made me want to go out and buy A4 next time I run out of paper! (stupid bass-ackwards US ...) good times, Vineet -- http://www.doorstop.net/ -- Microsoft has argued that open source is bad for business, but you have to ask, "Whose business? Theirs, or yours?" --Tim O'Reilly
signature.asc
Description: Digital signature