Re: [Tutor] character format

2005-05-12 Thread Max Noel

On May 12, 2005, at 02:42, Tony Meyer wrote:


 From the email address, chances are that this was a New Zealand  
 cultural

 assumption.  Ah, the French, lumping all English speakers under the  
 American
 banner wink.

 Touché. :D

-- Max
( What makes it even more unforgivable is that I'm living in the UK  
at the moment -- one would have thought that by now I'd have realized  
that all English speakers are not American. ^^ )
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] character format

2005-05-12 Thread Max Noel

On May 12, 2005, at 03:00, [EMAIL PROTECTED] wrote:

 As was pointed out, I'm not American.  I guess the problem stems  
 from an
 American cultural assumption, though, in that Americans (I think)  
 developed the
 ASCII character set without any thought for other languages.

 At that time, it was a reasonable decision. Written French can  
still be understood without accented characters. It's just a bit  
harder, since the only convention we have for this is to replace  
accented characters with their non-accented versions (e.g. é, è and ë  
become e), but rarely if ever causes any trouble. The Germans are  
better in that regard (ä - ae, ß - ss...).

 The true problem comes from the lateness in standardizing  
extended ASCII (characters 128 to 255) -- which, I guess, does in  
some way stem from the ACA (as in we already have what we need to  
write English, so we'll worry about that later).
 Opening a text file that contains extended chars in an editor is  
usually followed by up to 5 minutes of guess the encoding, as the  
very nature of text files makes it virtually impossible for an editor  
to do it automatically and reliably.
 Now, I only write Unicode text files (most real text editors  
support this), but notepad.exe, as far as I know, only writes Windows- 
encoded files, which themselves are different from DOS-encoded files  
(I still have some of those lying around on some of my hard drives,  
written with edit.exe or e.com)... It gets very messy, very quickly.

 Will a standard xterm display chr(130) as é in linux for you, Max?   
 Or under Mac
 OS X?

 I just made a few tests. So far, it seems that it depends on the  
character encoding in use:
- If it's Western Latin 1, yes, but cat'ing Unicode text files  
doesn't work properly (which is to be expected).
- If it's Unicode, it then depends on the application in use  
(although most of them just fail). bash 2.05 and zsh 4.2.3 get in big  
trouble when I type accented characters. ksh seems a bit more  
tolerant, but backspace behavior becomes erratic. vim sprouts random  
garbage, and emacs beeps at me angrily. Unicode text files cat  
nicely, though.

 Anyway, in Python3000 all strings will be unicode, so it won't  
 matter then :-)

 As should be evident from the above post, I'm *really* looking  
forward to that. ;)

-- Max
maxnoel_fr at yahoo dot fr -- ICQ #85274019
Look at you hacker... A pathetic creature of meat and bone, panting  
and sweating as you run through my corridors... How can you challenge  
a perfect, immortal machine?

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] character format

2005-05-11 Thread jfouhy
Quoting D. Hartley [EMAIL PROTECTED]:

 Does anyone have a hint as to what things like this:
 \xaf\x82\r\x00\x00\x01\
 
 refer to? 

Basically, they are unprintable characters.

 ord('\x82')
130
 chr(130)
'\x82'

If you look at http://asciitable.com/, you will see that ascii chracter 130 is
an e with a tick on its head.  This is not something you can find on your
keyboard, so python can't/won't display it.

Also, if you do some maths, you will see that 82 in hexadecimal is 130 in 
decimal:

 8*16 + 2
130

So that explains why it is x82 :-) (with the backslash to indicate an escape
sequence, and an x to indicate hexadecimal)

'\r' is the carriage return character [1].  Looking at asciitable.com, I can see
that it is hex d / ascii 13.

 '\x0d' == '\r'
True

Hope this helps!

-- 
John.

[1] Historical note: In the days of typewriters [2], a carriage return would
send the carriage (with the ink in it) back to the left, so you could start
typing a new line.  A line feed would advance the paper by one line.  These were
separate operations 'cause sometimes you wouldn't want to do both.

The characters entered ASCII (because of early printers?) and different
operating systems used them to end lines in different ways: in UNIX, the
end-of-line character is a newline character ('\n'); in MSDOS/Windows it is both
('\r\n'); and I think Macintoshes just use a '\r'.

Fortunately, python has so-called universal newline support, so you don't need
to worry about all that :-)

[2] Before my time, so there could be errors of fact...
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] character format

2005-05-11 Thread D. Hartley
Max - yep, and the hint was BUSY (... BZ...)...

Unfortunately that hint doesnt lead me anywhere (except to bz2, which
involves compression, and didnt seem very likely).

I went through and removed all the \x## 's that represented
'unprintable'/carraigereturn/etc characters, but that wasnt it, ha ha.

It just may be that I don't know enough python to recognize the module
I need! (I got pickle right away, but bz leaves me blinking).

Am I missing something obvious?

~Denise

On 5/11/05, Max Noel [EMAIL PROTECTED] wrote:
 
 On May 12, 2005, at 01:50, [EMAIL PROTECTED] wrote:
 
  chr(130)
 
  '\x82'
 
  If you look at http://asciitable.com/, you will see that ascii
  chracter 130 is
  an e with a tick on its head.  This is not something you can find
  on your
  keyboard, so python can't/won't display it.
 
 
 You mean é? Oh, it is perfectly printable. It's even on my
 keyboard (as unshifted 2), along with è, ç, à and ù. Ah, American
 cultural assumption... ^^
 
 Denise: If it is what I think it is, you may want to have a look
 at the first 2 characters of the string, which if I recall are
 printable, and should point you toward the module you have to use to
 solve that one.
 
 -- Max
 maxnoel_fr at yahoo dot fr -- ICQ #85274019
 Look at you hacker... A pathetic creature of meat and bone, panting
 and sweating as you run through my corridors... How can you challenge
 a perfect, immortal machine?
 
 ___
 Tutor maillist  -  Tutor@python.org
 http://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] character format

2005-05-11 Thread Tim Peters
[D. Hartley]
 Max - yep, and the hint was BUSY (... BZ...)...

 Unfortunately that hint doesnt lead me anywhere (except to bz2, which
 involves compression, and didnt seem very likely).

 I went through and removed all the \x## 's that represented
 'unprintable'/carraigereturn/etc characters, but that wasnt it, ha ha.

 It just may be that I don't know enough python to recognize the module
 I need! (I got pickle right away, but bz leaves me blinking).

 Am I missing something obvious?

Not at all.  The only problem is a propensity toward dismissing the
obvious as being not very likely wink.  Try reading those docs from
the end instead of from the beginning.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] character format

2005-05-11 Thread Max Noel

On May 12, 2005, at 02:22, D. Hartley wrote:

 Max - yep, and the hint was BUSY (... BZ...)...

 Unfortunately that hint doesnt lead me anywhere (except to bz2, which
 involves compression, and didnt seem very likely).

 I went through and removed all the \x## 's that represented
 'unprintable'/carraigereturn/etc characters, but that wasnt it, ha ha.

 It just may be that I don't know enough python to recognize the module
 I need! (I got pickle right away, but bz leaves me blinking).

 Am I missing something obvious?

 ~Denise


 Think again.

-- Max
maxnoel_fr at yahoo dot fr -- ICQ #85274019
Look at you hacker... A pathetic creature of meat and bone, panting  
and sweating as you run through my corridors... How can you challenge  
a perfect, immortal machine?

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] character format

2005-05-11 Thread Tony Meyer
 You mean é? Oh, it is perfectly printable. It's even on my  
 keyboard (as unshifted 2), along with è, ç, à and ù. Ah, American  
 cultural assumption... ^^

From the email address, chances are that this was a New Zealand cultural
assumption.  Ah, the French, lumping all English speakers under the American
banner wink.

Anyway, the explanation was right, if the label wasn't.  They are simply
hexidecimal representations of characters.

Denise: there are many uses for this - to know what you need to do, we need
to know what you are trying to do.  Where are you finding these characters?
Are they in a file?  If so, what type of file is it, and what do you want to
do with the file?  Those questions are more likely to lead you to the module
you're after.

I believe Max's guess was that the file is compressed with bzip (the first
two characters will be BZ, as you found).  Try doing:

 import bz2
 print bz2.decompress(data)

Where data is a string containing the characters you have.  (Although you
say that compression is unlikely, the BZ characters would be a big
co-incidence).

=Tony.Meyer

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] character format

2005-05-11 Thread jfouhy
Quoting Max Noel [EMAIL PROTECTED]:

  You mean é? Oh, it is perfectly printable. It's even on my 
 keyboard (as unshifted 2), along with è, ç, à and ù. Ah, American 
 cultural assumption... ^^

I was waiting for someone to call me on that ...

As was pointed out, I'm not American.  I guess the problem stems from an
American cultural assumption, though, in that Americans (I think) developed the
ASCII character set without any thought for other languages.

Will a standard xterm display chr(130) as é in linux for you, Max?  Or under Mac
OS X?

Anyway, in Python3000 all strings will be unicode, so it won't matter then :-)

-- 
John.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] character format

2005-05-11 Thread Bob Gailer


At 06:42 PM 5/11/2005, Tony Meyer wrote:
 You mean é? Oh, it is
perfectly printable. It's even on my 
 keyboard (as unshifted 2), along with è, ç, à and ù. Ah,
American 
 cultural assumption... ^^
From the email address, chances are that this was a New Zealand
cultural
assumption. Ah, the French, lumping all English speakers under the
American
banner wink.
Anyway, the explanation was right, if the label wasn't. They are
simply
hexidecimal representations of characters.
Did you mean hexadecimal?

Denise: there are many uses for
this - to know what you need to do, we need
to know what you are trying to do. Where are you finding these
characters?
Are they in a file? If so, what type of file is it, and what do you
want to
do with the file? Those questions are more likely to lead you to
the module
you're after.
I believe Max's guess was that the file is compressed with bzip (the
first
two characters will be BZ, as you found). Try doing:
 import bz2
 print bz2.decompress(data)
Where data is a string containing the characters you have.
(Although you
say that compression is unlikely, the BZ characters would be a big
co-incidence).
=Tony.Meyer
___
Tutor maillist - Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Bob Gailer
mailto:[EMAIL PROTECTED]
510 558 3275 home
720 938 2625 cell 

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] character format

2005-05-11 Thread Tony Meyer
[me, typo'ing]
 hexidecimal representations of characters.

[Bob Gailer]
 Did you mean hexadecimal?

Sigh.  Yes.  I did a one character typo.  Please forgive me.

=Tony.Meyer

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] character format

2005-05-11 Thread Chris Smith

On Wednesday, May 11, 2005, at 20:43 America/Chicago,  
[EMAIL PROTECTED] wrote:

 I believe Max's guess was that the file is compressed with bzip (the  
 first
 two characters will be BZ, as you found).  Try doing:

 import bz2
 print bz2.decompress(data)

 Where data is a string containing the characters you have.  (Although  
 you
 say that compression is unlikely, the BZ characters would be a big
 co-incidence).


That interactive mode is *very* helpful.  If you import a module and  
then do a directory on it to see what it has for tools and then start  
playing with them, you can learn some interesting things without a lot  
of overhead:

###
  import bz2
  dir(bz2)
['BZ2Compressor', 'BZ2Decompressor', 'BZ2File', '__author__',  
'__doc__', '__file__', '__name__', 'compress', 'decompress']
  bz2.compress('foo')
BZh91AYSYI\xfe\xc4\xa5\x00\x00\x00\x01\x00\x01\x00\xa0\x00!\x00\x82,]\ 
xc9\x14\xe1BA'\xfb\x12\x94
  bz2.decompress(_) #underscore to reference last thing
'foo'
###

Hmmm...

/c

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor