Re: [Tutor] character format
On May 12, 2005, at 02:42, Tony Meyer wrote: From the email address, chances are that this was a New Zealand cultural assumption. Ah, the French, lumping all English speakers under the American banner wink. Touché. :D -- Max ( What makes it even more unforgivable is that I'm living in the UK at the moment -- one would have thought that by now I'd have realized that all English speakers are not American. ^^ ) ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] character format
On May 12, 2005, at 03:00, [EMAIL PROTECTED] wrote: As was pointed out, I'm not American. I guess the problem stems from an American cultural assumption, though, in that Americans (I think) developed the ASCII character set without any thought for other languages. At that time, it was a reasonable decision. Written French can still be understood without accented characters. It's just a bit harder, since the only convention we have for this is to replace accented characters with their non-accented versions (e.g. é, è and ë become e), but rarely if ever causes any trouble. The Germans are better in that regard (ä - ae, ß - ss...). The true problem comes from the lateness in standardizing extended ASCII (characters 128 to 255) -- which, I guess, does in some way stem from the ACA (as in we already have what we need to write English, so we'll worry about that later). Opening a text file that contains extended chars in an editor is usually followed by up to 5 minutes of guess the encoding, as the very nature of text files makes it virtually impossible for an editor to do it automatically and reliably. Now, I only write Unicode text files (most real text editors support this), but notepad.exe, as far as I know, only writes Windows- encoded files, which themselves are different from DOS-encoded files (I still have some of those lying around on some of my hard drives, written with edit.exe or e.com)... It gets very messy, very quickly. Will a standard xterm display chr(130) as é in linux for you, Max? Or under Mac OS X? I just made a few tests. So far, it seems that it depends on the character encoding in use: - If it's Western Latin 1, yes, but cat'ing Unicode text files doesn't work properly (which is to be expected). - If it's Unicode, it then depends on the application in use (although most of them just fail). bash 2.05 and zsh 4.2.3 get in big trouble when I type accented characters. ksh seems a bit more tolerant, but backspace behavior becomes erratic. vim sprouts random garbage, and emacs beeps at me angrily. Unicode text files cat nicely, though. Anyway, in Python3000 all strings will be unicode, so it won't matter then :-) As should be evident from the above post, I'm *really* looking forward to that. ;) -- Max maxnoel_fr at yahoo dot fr -- ICQ #85274019 Look at you hacker... A pathetic creature of meat and bone, panting and sweating as you run through my corridors... How can you challenge a perfect, immortal machine? ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] character format
Quoting D. Hartley [EMAIL PROTECTED]: Does anyone have a hint as to what things like this: \xaf\x82\r\x00\x00\x01\ refer to? Basically, they are unprintable characters. ord('\x82') 130 chr(130) '\x82' If you look at http://asciitable.com/, you will see that ascii chracter 130 is an e with a tick on its head. This is not something you can find on your keyboard, so python can't/won't display it. Also, if you do some maths, you will see that 82 in hexadecimal is 130 in decimal: 8*16 + 2 130 So that explains why it is x82 :-) (with the backslash to indicate an escape sequence, and an x to indicate hexadecimal) '\r' is the carriage return character [1]. Looking at asciitable.com, I can see that it is hex d / ascii 13. '\x0d' == '\r' True Hope this helps! -- John. [1] Historical note: In the days of typewriters [2], a carriage return would send the carriage (with the ink in it) back to the left, so you could start typing a new line. A line feed would advance the paper by one line. These were separate operations 'cause sometimes you wouldn't want to do both. The characters entered ASCII (because of early printers?) and different operating systems used them to end lines in different ways: in UNIX, the end-of-line character is a newline character ('\n'); in MSDOS/Windows it is both ('\r\n'); and I think Macintoshes just use a '\r'. Fortunately, python has so-called universal newline support, so you don't need to worry about all that :-) [2] Before my time, so there could be errors of fact... ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] character format
Max - yep, and the hint was BUSY (... BZ...)... Unfortunately that hint doesnt lead me anywhere (except to bz2, which involves compression, and didnt seem very likely). I went through and removed all the \x## 's that represented 'unprintable'/carraigereturn/etc characters, but that wasnt it, ha ha. It just may be that I don't know enough python to recognize the module I need! (I got pickle right away, but bz leaves me blinking). Am I missing something obvious? ~Denise On 5/11/05, Max Noel [EMAIL PROTECTED] wrote: On May 12, 2005, at 01:50, [EMAIL PROTECTED] wrote: chr(130) '\x82' If you look at http://asciitable.com/, you will see that ascii chracter 130 is an e with a tick on its head. This is not something you can find on your keyboard, so python can't/won't display it. You mean é? Oh, it is perfectly printable. It's even on my keyboard (as unshifted 2), along with è, ç, à and ù. Ah, American cultural assumption... ^^ Denise: If it is what I think it is, you may want to have a look at the first 2 characters of the string, which if I recall are printable, and should point you toward the module you have to use to solve that one. -- Max maxnoel_fr at yahoo dot fr -- ICQ #85274019 Look at you hacker... A pathetic creature of meat and bone, panting and sweating as you run through my corridors... How can you challenge a perfect, immortal machine? ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] character format
[D. Hartley] Max - yep, and the hint was BUSY (... BZ...)... Unfortunately that hint doesnt lead me anywhere (except to bz2, which involves compression, and didnt seem very likely). I went through and removed all the \x## 's that represented 'unprintable'/carraigereturn/etc characters, but that wasnt it, ha ha. It just may be that I don't know enough python to recognize the module I need! (I got pickle right away, but bz leaves me blinking). Am I missing something obvious? Not at all. The only problem is a propensity toward dismissing the obvious as being not very likely wink. Try reading those docs from the end instead of from the beginning. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] character format
On May 12, 2005, at 02:22, D. Hartley wrote: Max - yep, and the hint was BUSY (... BZ...)... Unfortunately that hint doesnt lead me anywhere (except to bz2, which involves compression, and didnt seem very likely). I went through and removed all the \x## 's that represented 'unprintable'/carraigereturn/etc characters, but that wasnt it, ha ha. It just may be that I don't know enough python to recognize the module I need! (I got pickle right away, but bz leaves me blinking). Am I missing something obvious? ~Denise Think again. -- Max maxnoel_fr at yahoo dot fr -- ICQ #85274019 Look at you hacker... A pathetic creature of meat and bone, panting and sweating as you run through my corridors... How can you challenge a perfect, immortal machine? ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] character format
You mean é? Oh, it is perfectly printable. It's even on my keyboard (as unshifted 2), along with è, ç, à and ù. Ah, American cultural assumption... ^^ From the email address, chances are that this was a New Zealand cultural assumption. Ah, the French, lumping all English speakers under the American banner wink. Anyway, the explanation was right, if the label wasn't. They are simply hexidecimal representations of characters. Denise: there are many uses for this - to know what you need to do, we need to know what you are trying to do. Where are you finding these characters? Are they in a file? If so, what type of file is it, and what do you want to do with the file? Those questions are more likely to lead you to the module you're after. I believe Max's guess was that the file is compressed with bzip (the first two characters will be BZ, as you found). Try doing: import bz2 print bz2.decompress(data) Where data is a string containing the characters you have. (Although you say that compression is unlikely, the BZ characters would be a big co-incidence). =Tony.Meyer ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] character format
Quoting Max Noel [EMAIL PROTECTED]: You mean é? Oh, it is perfectly printable. It's even on my keyboard (as unshifted 2), along with è, ç, à and ù. Ah, American cultural assumption... ^^ I was waiting for someone to call me on that ... As was pointed out, I'm not American. I guess the problem stems from an American cultural assumption, though, in that Americans (I think) developed the ASCII character set without any thought for other languages. Will a standard xterm display chr(130) as é in linux for you, Max? Or under Mac OS X? Anyway, in Python3000 all strings will be unicode, so it won't matter then :-) -- John. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] character format
At 06:42 PM 5/11/2005, Tony Meyer wrote: You mean é? Oh, it is perfectly printable. It's even on my keyboard (as unshifted 2), along with è, ç, à and ù. Ah, American cultural assumption... ^^ From the email address, chances are that this was a New Zealand cultural assumption. Ah, the French, lumping all English speakers under the American banner wink. Anyway, the explanation was right, if the label wasn't. They are simply hexidecimal representations of characters. Did you mean hexadecimal? Denise: there are many uses for this - to know what you need to do, we need to know what you are trying to do. Where are you finding these characters? Are they in a file? If so, what type of file is it, and what do you want to do with the file? Those questions are more likely to lead you to the module you're after. I believe Max's guess was that the file is compressed with bzip (the first two characters will be BZ, as you found). Try doing: import bz2 print bz2.decompress(data) Where data is a string containing the characters you have. (Although you say that compression is unlikely, the BZ characters would be a big co-incidence). =Tony.Meyer ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor Bob Gailer mailto:[EMAIL PROTECTED] 510 558 3275 home 720 938 2625 cell ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] character format
[me, typo'ing] hexidecimal representations of characters. [Bob Gailer] Did you mean hexadecimal? Sigh. Yes. I did a one character typo. Please forgive me. =Tony.Meyer ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] character format
On Wednesday, May 11, 2005, at 20:43 America/Chicago, [EMAIL PROTECTED] wrote: I believe Max's guess was that the file is compressed with bzip (the first two characters will be BZ, as you found). Try doing: import bz2 print bz2.decompress(data) Where data is a string containing the characters you have. (Although you say that compression is unlikely, the BZ characters would be a big co-incidence). That interactive mode is *very* helpful. If you import a module and then do a directory on it to see what it has for tools and then start playing with them, you can learn some interesting things without a lot of overhead: ### import bz2 dir(bz2) ['BZ2Compressor', 'BZ2Decompressor', 'BZ2File', '__author__', '__doc__', '__file__', '__name__', 'compress', 'decompress'] bz2.compress('foo') BZh91AYSYI\xfe\xc4\xa5\x00\x00\x00\x01\x00\x01\x00\xa0\x00!\x00\x82,]\ xc9\x14\xe1BA'\xfb\x12\x94 bz2.decompress(_) #underscore to reference last thing 'foo' ### Hmmm... /c ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor