Re: [Tutor] String encoding

2011-08-26 Thread Prasad, Ramit
Think about it this way... if I gave you a block of data as hex bytes:

240F91BC03...FF90120078CD45

and then asked you whether that was a bitmap image or a sound file or 
something else, how could you tell? It's just *bytes*, it could be anything.

Yes, but if you give me data and then tell me it is a sound file then I might 
be able to reverse engineer or reconstruct it. I know what the character 
does/should look like. I just need the equivalent to the ASCII table for the 
various encodings; once I have the table I can compare different characters at 
\311 and see if they are the correct character. I have not been able to find an 
encoding table (other than ASCII).

Ramit


Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology
712 Main Street | Houston, TX 77002
work phone: 713 - 216 - 5423


This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] String encoding

2011-08-26 Thread Jerry Hill
On Thu, Aug 25, 2011 at 7:07 PM, Prasad, Ramit
ramit.pra...@jpmorgan.com wrote:
 Nice catch! Yeah, I am stuck on the encoding mechanism as well. I know how to 
 encode/decode...but not what encoding to use. Is there a reference that I can 
 look up to find what encoding that would correspond to? I know what the 
 character looks like if that helps. I know that Python does display the 
 correct character sometimes, but not sure when or why.

In this case, the encoding is almost certainly latin-1.  I know that
from playing around at the interactive interpreter, like this:

 s = 'M\xc9XICO'
 print s.decode('latin-1')
MÉXICO

If you want to see charts of various encodings, wikipedia has a bunch.
 For instance, the Latin-1 encoding is here:
http://en.wikipedia.org/wiki/ISO/IEC_8859-1 and UTF-8 is here:
http://en.wikipedia.org/wiki/Utf-8

As the other respondents have said, it's really hard to figure this
out just in code.  The chardet module mentioned by Steven D'Aprano is
probably the best bet if you really *have* to guess the encoding of an
arbitrary sequence of bytes, but it much, much better to actually know
the encoding of your inputs.

Good luck!

-- 
Jerry
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] String encoding

2011-08-26 Thread Prasad, Ramit
In this case, the encoding is almost certainly latin-1.  I know that
from playing around at the interactive interpreter, like this:

  s = 'M\xc9XICO'
  print s.decode('latin-1')
 MÉXICO

If you want to see charts of various encodings, wikipedia has a bunch.
 For instance, the Latin-1 encoding is here:
http://en.wikipedia.org/wiki/ISO/IEC_8859-1 and UTF-8 is here:
http://en.wikipedia.org/wiki/Utf-8

Yep, it is. Thanks those charts are exactly what I wanted! Now I have another 
question. What is the difference between what print shows and what the 
interpreter shows?

 print s.decode('latin-1')
MÉXICO
 s.decode('latin-1')
u'M\xc9XICO'
 print repr(s)
'M\xc9XICO'
 repr(s)
'M\\xc9XICO'


Ramit


Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology
712 Main Street | Houston, TX 77002
work phone: 713 - 216 - 5423



This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] String encoding

2011-08-26 Thread Steven D'Aprano

Prasad, Ramit wrote:

Think about it this way... if I gave you a block of data as hex
bytes:

240F91BC03...FF90120078CD45

and then asked you whether that was a bitmap image or a sound file
or something else, how could you tell? It's just *bytes*, it could
be anything.


Yes, but if you give me data and then tell me it is a sound file then
I might be able to reverse engineer or reconstruct it. I know what
the character does/should look like. I just need the equivalent to
the ASCII table for the various encodings; once I have the table I
can compare different characters at \311 and see if they are the
correct character. I have not been able to find an encoding table
(other than ASCII).


In practice, you can often guess the encoding by trying the most common 
ones (such as Latin-1 and UTF-8) and seeing if the strings you get make 
sense.


But note that more than one encoding may give sensible results for a 
specific string:


 b = 'M\311XICO'  # byte-string
 print b.decode('latin-1')
MÉXICO
 print b.decode('iso 8859-9')  # Turkish
MÉXICO


So was M\311XICO encoded using the Latin-1 or Turkish encoding, or 
something else? There is no way to tell. Many encodings overlap.


If you have arbitrary byte-strings, and no context to tell what makes 
sense, then all bets are off. Just because something *can* be decoded 
doesn't make it meaningful:


 b = '...\xf7...'
 print b.decode('macroman')
...˜...
 print b.decode('latin-1')
...÷...

Which is the right encoding to use and which string is intended?

So guessing can sometimes work, but guesses can be wrong because 
encodings overlap. In general, you must know the encoding to be sure. 
But if you have to guess, try to guess using the largest byte-string 
that you can.


Python 2.7 comes with 108 encodings:

http://docs.python.org/library/codecs.html#standard-encodings

Since anyone can define their own encoding, there is no upper limit to 
the number of encodings, and no promise that Python will include them 
all. There are even two joke encodings, invented for April's Fool Day, 
that use nine-bit nonets instead of eight-bit octets (bytes): UTF-9 and 
UTF-18.




--
Steven

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] String encoding

2011-08-26 Thread Dave Angel

On 08/26/2011 11:49 AM, Prasad, Ramit wrote:

snip
Yep, it is. Thanks those charts are exactly what I wanted! Now I have another 
question. What is the difference between what print shows and what the 
interpreter shows?


print s.decode('latin-1')

MÉXICO
The decoded characters are a Unicode string.  Python prints that string 
by encoding it according to whatever sys.stdout is defaulted to. If that 
matches your actual terminal, then you see it properly.

s.decode('latin-1')

u'M\xc9XICO'
Here, because you don't assign it to anything, the interpreter is 
printing a repr() of the object.

print repr(s)

'M\xc9XICO'

Here your code is doing the same thing, but explicitly this time.

repr(s)

'M\\xc9XICO'


Here, the repr() is created (which is a string containing single 
quotes), but then you don't print it, you just leave it.  So the 
interpreter shows you the repr()  of that object, enclosing it in double 
quotes for simplicity.




--

DaveA

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] String encoding

2011-08-25 Thread Prasad, Ramit
I have a string question for Python2. Basically I have two strings with 
non-ASCII characters and I would like to have a better understanding of what 
the escapes are from and how to possibly remove/convert/encode the string to 
something else. If the description of my intended action is vague it is because 
my intent at this point is vague until I understand the situation better. 

' M\xc9XICO' and ' M\311XICO'

Ramit


Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology
712 Main Street | Houston, TX 77002
work phone: 713 - 216 - 5423



This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] String encoding

2011-08-25 Thread Alan Gauld

On 25/08/11 15:36, Prasad, Ramit wrote:

I have a string question for Python2. Basically I have two strings with

 non-ASCII characters and I would like to have a better understanding
 of what the escapes are from


' M\xc9XICO' and ' M\311XICO'


I don't know what they are from but they are both the same value, one in 
hex and one in octal.


0xC9 == 0311

As for the encoding mechanisms I'm afraid I can't help there!

HTH
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] String encoding

2011-08-25 Thread Prasad, Ramit
I don't know what they are from but they are both the same value, one in 
hex and one in octal.

0xC9 == 0311

As for the encoding mechanisms I'm afraid I can't help there!

Nice catch! Yeah, I am stuck on the encoding mechanism as well. I know how to 
encode/decode...but not what encoding to use. Is there a reference that I can 
look up to find what encoding that would correspond to? I know what the 
character looks like if that helps. I know that Python does display the correct 
character sometimes, but not sure when or why.

Ramit


Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology
712 Main Street | Houston, TX 77002
work phone: 713 - 216 - 5423


This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] String encoding

2011-08-25 Thread Steven D'Aprano

Prasad, Ramit wrote:

I don't know what they are from but they are both the same value,
one in hex and one in octal.

0xC9 == 0311

As for the encoding mechanisms I'm afraid I can't help there!


Nice catch! Yeah, I am stuck on the encoding mechanism as well. I
know how to encode/decode...but not what encoding to use. Is there a
reference that I can look up to find what encoding that would
correspond to? I know what the character looks like if that helps. I
know that Python does display the correct character sometimes, but
not sure when or why.


In general, no. The same byte value (0xC9) could correspond to many 
different encodings. In general, you *must* know what the encoding is in 
order to tell how to decode the bytes.


Think about it this way... if I gave you a block of data as hex bytes:

240F91BC03...FF90120078CD45

and then asked you whether that was a bitmap image or a sound file or 
something else, how could you tell? It's just *bytes*, it could be anything.


All is not quite lost though. You could try decoding the bytes and see 
what you get, and see if it makes sense. Start with ASCII, Latin-1, 
UTF-8, UTF-16 and any other encodings in common use. (This would be like 
pretending the bytes were a bitmap, and looking at it, and trying to 
decide whether it looked like an actual picture or like a bunch of 
random pixels. Hopefully it wasn't meant to look like a bunch of random 
pixels.)


Web browsers such as Internet Explorer and Mozilla will try to guess the 
encoding by doing frequency analysis of the bytes. Mozilla's encoding 
guesser has been ported to Python:


http://chardet.feedparser.org/

But any sort of guessing algorithm is just a nasty hack. You are always 
better off ensuring that you accurately know the encoding.



--
Steven
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string encoding

2010-06-18 Thread Lie Ryan
On 06/18/10 14:21, Rick Pasotto wrote:
 Remember, even if your terminal display is restricted to ASCII, you can
 still use Beautiful Soup to parse, process, and write documents in UTF-8
 and other encodings. You just can't print certain strings with print.
 
 I can print the string fine. It's f.write(string_with_unicode) that fails 
 with:
 
 UnicodeEncodeError: 'ascii' codec can't encode characters in position 31-32: 
 ordinal not in range(128)
 
 Shouldn't I be able to f.write() *any* 8bit byte(s)?
 
 repr() gives: uRealtors\\xc2\\xae
 
 BTW, I'm running python 2.5.5 on debian linux.
 

The FAQ explains half of it, except that in your case, substitute what
it says about terminal with file object. Python plays it safe and
does not implicitly encode a unicode string when writing into a file. If
you have a unicode string and you want to .write() that unicode string
to a file, you need to .encode() the string first, so:

string_with_unicode = uRealtors\xc2\xae
f.write(string_with_unicode.encode('utf-8'))

otherwise, you can use the codecs module to wrap the file object:

f = codecs.open('filename.txt', 'w', encoding=utf-8)
f.write(string_with_unicode) # now you can send unicode string to f


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string encoding

2010-06-18 Thread Dave Angel

Rick Pasotto wrote:

snip
I can print the string fine. It's f.write(string_with_unicode) that fails with:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 31-32: 
ordinal not in range(128)

Shouldn't I be able to f.write() *any* 8bit byte(s)?

repr() gives: uRealtors\\xc2\\xae

BTW, I'm running python 2.5.5 on debian linux.

  
You can write any 8 bit string.  But you have a Unicode string, which is 
16 or 32 bits per character.  To write it to a file, it must be encoded, 
and the default encoder is ASCII.  The cure is to encode it yourself, 
using the encoding that your spec calls for.  I'll assume utf8 below:


 name = uRealtors\xc2\xae
 repr(name)
u'Realtors\\xc2\\xae'
 outfile = open(junk.txt, w)
 outfile.write(name)
Traceback (most recent call last):
 File stdin, line 1, in module
UnicodeEncodeError: 'ascii' codec can't encode characters in position 
8-9: ordin

al not in range(128)
 outfile.write(name.encode(utf8))
 outfile.close()


DaveA

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] string encoding

2010-06-17 Thread Rick Pasotto
I'm using BeautifulSoup to process a webpage. One of the fields has a
unicode character in it. (It's the 'registered trademark' symbol.) When
I try to write this string to another file I get this error:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 31-32: 
ordinal not in range(128)

In the interpreter the  offending string portion shows as: 'Realtors\xc2\xae'.

How can I deal with this single string? The rest of the document works
fine.

-- 
Freedom can't be kept for nothing. If you set a high value on liberty,
you must set a low value on everything else. -- Lucius Annaeus Seneca, 65
A.D.
Rick Pasottor...@niof.nethttp://www.niof.net
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string encoding

2010-06-17 Thread Lie Ryan
On 06/18/10 06:41, Rick Pasotto wrote:
 I'm using BeautifulSoup to process a webpage. One of the fields has a
 unicode character in it. (It's the 'registered trademark' symbol.) When
 I try to write this string to another file I get this error:
 
 UnicodeEncodeError: 'ascii' codec can't encode characters in position 31-32: 
 ordinal not in range(128)
 
 In the interpreter the  offending string portion shows as: 'Realtors\xc2\xae'.
 
 How can I deal with this single string? The rest of the document works
 fine.

You need to tell BeautifulSoup the encoding of the HTML document. You
can encode this information in either the:

- (preferred) Encoding is specified externally from HTTP Header
ContentType declaration, e.g.:
Content-Type: text/html; charset=utf-8

- HTML ContentType declaration: e.g.
meta http-equiv=Content-Type content=text/html; charset=utf-8

- XML declaration -- for XHTML document used for parsing using XML
parser (hint: BeautifulSoup isn't XML/XHTML parser), e.g.:
?xml version=1.0 encoding=utf-8?

However, BeautifulSoup will also uses some heuristics to *guess* the
encoding of a tag soup that doesn't have a proper encoding.

So, the most likely reason is this, from Beautiful Soup's FAQ:
http://www.crummy.com/software/BeautifulSoup/documentation.html#Why
can't Beautiful Soup print out the non-ASCII characters I gave it?

Why can't Beautiful Soup print out the non-ASCII characters I gave it?

If you're getting errors that say: 'ascii' codec can't encode character
'x' in position y: ordinal not in range(128), the problem is probably
with your Python installation rather than with Beautiful Soup. Try
printing out the non-ASCII characters without running them through
Beautiful Soup and you should have the same problem. For instance, try
running code like this:

latin1word = 'Sacr\xe9 bleu!'
unicodeword = unicode(latin1word, 'latin-1')
print unicodeword

If this works but Beautiful Soup doesn't, there's probably a bug in
Beautiful Soup. However, if this doesn't work, the problem's with your
Python setup. Python is playing it safe and not sending non-ASCII
characters to your terminal. There are two ways to override this behavior.

1. The easy way is to remap standard output to a converter that's not
afraid to send ISO-Latin-1 or UTF-8 characters to the terminal.

import codecs
import sys
streamWriter = codecs.lookup('utf-8')[-1]
sys.stdout = streamWriter(sys.stdout)

codecs.lookup returns a number of bound methods and other objects
related to a codec. The last one is a StreamWriter object capable of
wrapping an output stream.

2. The hard way is to create a sitecustomize.py file in your Python
installation which sets the default encoding to ISO-Latin-1 or to UTF-8.
Then all your Python programs will use that encoding for standard
output, without you having to do something for each program. In my
installation, I have a /usr/lib/python/sitecustomize.py which looks like
this:

import sys
sys.setdefaultencoding(utf-8)

For more information about Python's Unicode support, look at Unicode for
Programmers or End to End Unicode Web Applications in Python. Recipes
1.20 and 1.21 in the Python cookbook are also very helpful.

Remember, even if your terminal display is restricted to ASCII, you can
still use Beautiful Soup to parse, process, and write documents in UTF-8
and other encodings. You just can't print certain strings with print.


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string encoding

2010-06-17 Thread Rick Pasotto
On Fri, Jun 18, 2010 at 12:24:25PM +1000, Lie Ryan wrote:
 On 06/18/10 06:41, Rick Pasotto wrote:
  I'm using BeautifulSoup to process a webpage. One of the fields has a
  unicode character in it. (It's the 'registered trademark' symbol.) When
  I try to write this string to another file I get this error:
  
  UnicodeEncodeError: 'ascii' codec can't encode characters in position 
  31-32: ordinal not in range(128)
  
  In the interpreter the  offending string portion shows as: 
  'Realtors\xc2\xae'.
  
  How can I deal with this single string? The rest of the document works
  fine.
 
 You need to tell BeautifulSoup the encoding of the HTML document. You
 can encode this information in either the:
 
 - (preferred) Encoding is specified externally from HTTP Header
 ContentType declaration, e.g.:
 Content-Type: text/html; charset=utf-8
 
 - HTML ContentType declaration: e.g.
 meta http-equiv=Content-Type content=text/html; charset=utf-8

The document has:

meta http-equiv=Content-Type content=text/html; charset=iso-8859-1

When I look at the document in vim and when I 'print' in python I see
the two characters of an acented capital A and the circled 'r'.

 latin1word = 'Sacr\xe9 bleu!'
 unicodeword = unicode(latin1word, 'latin-1')
 print unicodeword

TypeError: decoding Unicode is not supported

 If this works but Beautiful Soup doesn't, there's probably a bug in
 Beautiful Soup. However, if this doesn't work, the problem's with your
 Python setup. Python is playing it safe and not sending non-ASCII
 characters to your terminal. There are two ways to override this behavior.
 
 1. The easy way is to remap standard output to a converter that's not
 afraid to send ISO-Latin-1 or UTF-8 characters to the terminal.
 
 import codecs
 import sys
 streamWriter = codecs.lookup('utf-8')[-1]
 sys.stdout = streamWriter(sys.stdout)
 
 codecs.lookup returns a number of bound methods and other objects
 related to a codec. The last one is a StreamWriter object capable of
 wrapping an output stream.

Those four lines executed but I still get

TypeError: decoding Unicode is not supported

 Remember, even if your terminal display is restricted to ASCII, you can
 still use Beautiful Soup to parse, process, and write documents in UTF-8
 and other encodings. You just can't print certain strings with print.

I can print the string fine. It's f.write(string_with_unicode) that fails with:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 31-32: 
ordinal not in range(128)

Shouldn't I be able to f.write() *any* 8bit byte(s)?

repr() gives: uRealtors\\xc2\\xae

BTW, I'm running python 2.5.5 on debian linux.

-- 
Making fun of born-again christians is like hunting dairy cows with a
 high powered rifle and scope. -- P.J. O'Rourke
Rick Pasottor...@niof.nethttp://www.niof.net
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] String Encoding problem

2009-04-20 Thread Matt
Hey everyone,

I'm hoping someone here can help me solve an odd problem (bug?). I'm having
trouble with string encoding, object deletion, and the xml.etree library. If
this isn't the right list to be posting this question, please let me know.
I'm new to Python and don't know of any other help me Python mailing
lists. I have tried debugging this ad-infinitem. Anyway, at the bottom of
this e-mail you will find the code of a python file. This is a gross
over-simplification of my code, with little exception handling so that the
errors are obvious.

Running this interactively, if you finish off with 'del db', it exits fine
and creates a skeleton xml file called 'db.xml' with text 'root /'.
However, if you instead CTRL-D, it throws at exception while quitting and
then leaves an empty 'db.xml' which won't work. Can anyone here help me
figure out why this is?

Stuff I've done:
I've traced this down to the self.commit() call in __del__. The stacktrace
and a few print statements injected into xml.etree leads me to the call
'root'.encode('us-ascii') throwing a LookupError on line 751 of
xml.etree.ElementTree. This makes no sense to me, since it works fine
normally.

Thank you very much. Any and all help or pointers are appreciated.

~Matt

 db.py ###
from xml.etree import ElementTree as ET
import os

class Database(object):
def __init__(self, path):
self.__dbpath = path## Path to the database
self.load()
def __del__(self):
## FIXME: Known bug:
##  del db at command line works properly
##  Ctrl-D, when there is no db file present, results in a
LookupError
##and empty xml file
from StringIO import StringIO
from traceback import print_exc
trace = StringIO()
try:
print 5
self.commit()
print 7
except Exception:
print_exc(100, trace)
print trace.getvalue()
def load(self):
if os.path.exists(self.__dbpath):
self.root = ET.parse(self.__dbpath).getroot()
else:
self.root = ET.Element(root)
def commit(self):
ET.ElementTree(self.root).write(self.__dbpath)
db = Database('db.xml')
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] String Encoding problem

2009-04-20 Thread spir
Le Mon, 20 Apr 2009 10:46:47 -0400,
Matt hellzfury+pyt...@gmail.com s'exprima ainsi:

 Hey everyone,
 
 I'm hoping someone here can help me solve an odd problem (bug?). I'm having
 trouble with string encoding, object deletion, and the xml.etree library. If
 this isn't the right list to be posting this question, please let me know.
 I'm new to Python and don't know of any other help me Python mailing
 lists. I have tried debugging this ad-infinitem. Anyway, at the bottom of
 this e-mail you will find the code of a python file. This is a gross
 over-simplification of my code, with little exception handling so that the
 errors are obvious.
 
 Running this interactively, if you finish off with 'del db', it exits fine
 and creates a skeleton xml file called 'db.xml' with text 'root /'.
 However, if you instead CTRL-D, it throws at exception while quitting and
 then leaves an empty 'db.xml' which won't work. Can anyone here help me
 figure out why this is?
 
 Stuff I've done:
 I've traced this down to the self.commit() call in __del__. The stacktrace
 and a few print statements injected into xml.etree leads me to the call
 'root'.encode('us-ascii') throwing a LookupError on line 751 of
 xml.etree.ElementTree. This makes no sense to me, since it works fine
 normally.
 
 Thank you very much. Any and all help or pointers are appreciated.
 
 ~Matt
 
  db.py ###
 from xml.etree import ElementTree as ET
 import os
 
 class Database(object):
 def __init__(self, path):
 self.__dbpath = path## Path to the database
 self.load()
 def __del__(self):
 ## FIXME: Known bug:
 ##  del db at command line works properly
 ##  Ctrl-D, when there is no db file present, results in a
 LookupError
 ##and empty xml file
 from StringIO import StringIO
 from traceback import print_exc
 trace = StringIO()
 try:
 print 5
 self.commit()
 print 7
 except Exception:
 print_exc(100, trace)
 print trace.getvalue()
 def load(self):
 if os.path.exists(self.__dbpath):
 self.root = ET.parse(self.__dbpath).getroot()
 else:
 self.root = ET.Element(root)
 def commit(self):
 ET.ElementTree(self.root).write(self.__dbpath)
 db = Database('db.xml')

Actually, it all runs well for me -- after the following modification:

def __del__(self):
## FIXME: Known bug:
##  del db at command line works properly
##  Ctrl-D, when there is no db file present, results in a LookupError
##and empty xml file
try:
print 5
self.commit()
print 7
except Exception:
raise

Notes:
* I don't know for what reason you needed such a complicated traceback 
construct.
* Before I did this modif, I indeed had a weird exception about stringIO.
* __del__() seems to do the contrary: it writes back to file through commit()???
* del db works fine, anyway
* When I run without any bd.xml, it properly creates one with text root /.
* When I run with an ampty db.xml, I have the following exception message:

Traceback (most recent call last):
  File xmlTree.py, line 29, in module
db = Database('db.xml')
  File xmlTree.py, line 10, in __init__
self.load()
  File xmlTree.py, line 24, in load
self.root = ET.parse(self.__dbpath).getroot()
  File /usr/lib/python2.5/xml/etree/ElementTree.py, line 862, in parse
tree.parse(source, parser)
  File /usr/lib/python2.5/xml/etree/ElementTree.py, line 587, in parse
self._root = parser.close()
  File /usr/lib/python2.5/xml/etree/ElementTree.py, line 1254, in close
self._parser.Parse(, 1) # end of data
xml.parsers.expat.ExpatError: no element found: line 2, column 0
5
Exception exceptions.AttributeError: AttributeError('Database' object has no 
attribute 'root',) in bound method Database.__del__ of __main__.Database 
object at 0xb7e78fec ignored

--
la vita e estrany
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] String Encoding problem

2009-04-20 Thread Strax-Haber, Matthew (LARC-D320)



 From: spir denis.s...@free.fr
 Date: Mon, 20 Apr 2009 12:22:59 -0500
 To: Python Tutor tutor@python.org
 Subject: Re: [Tutor] String Encoding problem

 Le Mon, 20 Apr 2009 10:46:47 -0400,
 Matt hellzfury+pyt...@gmail.com s'exprima ainsi:

 Hey everyone,

 I'm hoping someone here can help me solve an odd problem (bug?). I'm having
 trouble with string encoding, object deletion, and the xml.etree library. If
 this isn't the right list to be posting this question, please let me know.
 I'm new to Python and don't know of any other help me Python mailing
 lists. I have tried debugging this ad-infinitem. Anyway, at the bottom of
 this e-mail you will find the code of a python file. This is a gross
 over-simplification of my code, with little exception handling so that the
 errors are obvious.

 Running this interactively, if you finish off with 'del db', it exits fine
 and creates a skeleton xml file called 'db.xml' with text 'root /'.
 However, if you instead CTRL-D, it throws at exception while quitting and
 then leaves an empty 'db.xml' which won't work. Can anyone here help me
 figure out why this is?

 Stuff I've done:
 I've traced this down to the self.commit() call in __del__. The stacktrace
 and a few print statements injected into xml.etree leads me to the call
 'root'.encode('us-ascii') throwing a LookupError on line 751 of
 xml.etree.ElementTree. This makes no sense to me, since it works fine
 normally.

 Thank you very much. Any and all help or pointers are appreciated.

 ~Matt

  db.py ###
 from xml.etree import ElementTree as ET
 import os

 class Database(object):
 def __init__(self, path):
 self.__dbpath = path## Path to the database
 self.load()
 def __del__(self):
 ## FIXME: Known bug:
 ##  del db at command line works properly
 ##  Ctrl-D, when there is no db file present, results in a
 LookupError
 ##and empty xml file
 from StringIO import StringIO
 from traceback import print_exc
 trace = StringIO()
 try:
 print 5
 self.commit()
 print 7
 except Exception:
 print_exc(100, trace)
 print trace.getvalue()
 def load(self):
 if os.path.exists(self.__dbpath):
 self.root = ET.parse(self.__dbpath).getroot()
 else:
 self.root = ET.Element(root)
 def commit(self):
 ET.ElementTree(self.root).write(self.__dbpath)
 db = Database('db.xml')

 Actually, it all runs well for me -- after the following modification:

 def __del__(self):
 ## FIXME: Known bug:
 ##  del db at command line works properly
 ##  Ctrl-D, when there is no db file present, results in a LookupError
 ##and empty xml file
 try:
 print 5
 self.commit()
 print 7
 except Exception:
 raise

I must be missing something I run the following code (in DB.py) without any 
other files in the current directory:
from xml.etree import ElementTree as ET import os class Database(object):
def __init__(self, path):self.dbpath = path## Path to the database  
  self.load()def __del__(self):try:print 5  
  self.commit()print 7except Exception:raise
def load(self):if os.path.exists(self.dbpath):self.root = 
ET.parse(self.dbpath).getroot()else:self.root = 
ET.Element(root)def commit(self):
ET.ElementTree(self.root).write(self.dbpath) db = Database('db.xml')

Output:
5 Exception LookupError: LookupError('unknown encoding: us-ascii',) in bound 
method Database.__del__ of __main__.Database object at 0x87870 ignored

If you're not getting the same output, please let me know what your environment 
is. Perhaps this is an implementation difference across platforms.



 Notes:
 * I don't know for what reason you needed such a complicated traceback
 construct.

That was only to demonstrate the error. Without that, you see a LookupError 
without any trace.

 * Before I did this modif, I indeed had a weird exception about stringIO.
Top-level imports are not consistently available in __del__. That shouldn't be 
necessary with the code I have above.

 * __del__() seems to do the contrary: it writes back to file through 
 commit()???
Yes, I know. In my actual code, there is a flag that is set when certain 
run-time conditions are met or when the user wants the DB to be saved on quit. 
Most of the time, however, modifications to the database need to be done in 
memory because they are not intended to be saved.

 * del db works fine, anyway
 * When I run without any bd.xml, it properly creates one with text root /.
 * When I run with an ampty db.xml, I have the following exception message:

 Traceback (most recent call last):
   File xmlTree.py, line 29, in module
 db

Re: [Tutor] String Encoding problem

2009-04-20 Thread Martin Walsh
Matt wrote:
 Hey everyone,
 
 I'm hoping someone here can help me solve an odd problem (bug?). I'm
 having trouble with string encoding, object deletion, and the xml.etree
 library. If this isn't the right list to be posting this question,
 please let me know. I'm new to Python and don't know of any other help
 me Python mailing lists. I have tried debugging this ad-infinitem.
 Anyway, at the bottom of this e-mail you will find the code of a python
 file. This is a gross over-simplification of my code, with little
 exception handling so that the errors are obvious.
 
 Running this interactively, if you finish off with 'del db', it exits
 fine and creates a skeleton xml file called 'db.xml' with text 'root
 /'. However, if you instead CTRL-D, it throws at exception while
 quitting and then leaves an empty 'db.xml' which won't work. Can anyone
 here help me figure out why this is?
 
 Stuff I've done:
 I've traced this down to the self.commit() call in __del__. The
 stacktrace and a few print statements injected into xml.etree leads me
 to the call 'root'.encode('us-ascii') throwing a LookupError on line 751
 of xml.etree.ElementTree. This makes no sense to me, since it works fine
 normally.

The environment available to __del__ methods during program termination
is wonky, and apparently not very consistent either. I can't say that I
completely understand it myself, perhaps someone else can provide a
better explanation for both of us, but some of the causes are described
in the documentation:

http://docs.python.org/reference/datamodel.html#object.__del__

What is your rationale for using __del__? Are you trying to force a
'commit()' call on Database instances when your program terminates -- in
the case of an unhandled exception, for example?

HTH,
Marty

 
 Thank you very much. Any and all help or pointers are appreciated.
 
 ~Matt
 
  db.py ###
 from xml.etree import ElementTree as ET
 import os
 
 class Database(object):
 def __init__(self, path):
 self.__dbpath = path## Path to the database
 self.load()
 def __del__(self):
 ## FIXME: Known bug:
 ##  del db at command line works properly
 ##  Ctrl-D, when there is no db file present, results in a
 LookupError
 ##and empty xml file
 from StringIO import StringIO
 from traceback import print_exc
 trace = StringIO()
 try:
 print 5
 self.commit()
 print 7
 except Exception:
 print_exc(100, trace)
 print trace.getvalue()
 def load(self):
 if os.path.exists(self.__dbpath):
 self.root = ET.parse(self.__dbpath).getroot()
 else:
 self.root = ET.Element(root)
 def commit(self):
 ET.ElementTree(self.root).write(self.__dbpath)
 db = Database('db.xml')

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] String Encoding problem

2009-04-20 Thread Kent Johnson
On Mon, Apr 20, 2009 at 10:46 AM, Matt hellzfury+pyt...@gmail.com wrote:
 Running this interactively, if you finish off with 'del db', it exits fine
 and creates a skeleton xml file called 'db.xml' with text 'root /'.
 However, if you instead CTRL-D, it throws at exception while quitting and
 then leaves an empty 'db.xml' which won't work. Can anyone here help me
 figure out why this is?

 Stuff I've done:
 I've traced this down to the self.commit() call in __del__. The stacktrace
 and a few print statements injected into xml.etree leads me to the call
 'root'.encode('us-ascii') throwing a LookupError on line 751 of
 xml.etree.ElementTree. This makes no sense to me, since it works fine
 normally.

Please show the exact error message and stack trace when you post
errors, it can be very helpful.

What you are doing with __del__ is unusual and not common practice. A
better way to ensure cleanup is to use a close() method which a client
must call, or to use a context manager and 'with' statement.

I think the reason your code is failing is because some module needed
by the encode() call has already been unloaded before your __del__()
method is called.

 Thank you very much. Any and all help or pointers are appreciated.

If you defined a close() method, you could write client code like this:

from contextlib import closing
with closing(Database('db.xml')) as db:
  # do something with db
  # when this block exits db will be closed

It's also not too hard to make an openDatabase() function so you could write
with (openDatabase('db.xml')) as db:
  # etc

though that is not really a beginner challenge. Some notes and further
pointers here:
http://personalpages.tds.net/~kent37/kk/00015.html

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] String Encoding problem

2009-04-20 Thread Strax-Haber, Matthew (LARC-D320)
Sorry about that. Hopefully this is better:
In operator:
def __init__(self, path, saveDB=True, cleanUp=True):
'''Constructor'''
## Calculate filesystem paths
self.WORK_DIR= path + '.tmp'
DB_PATH= path + '.xml'
self.SAVE_DB= saveDB## finish(): Delete unnecessary files created 
by run?
self.CLEANUP= cleanUp## finish(): Delete database at end of run?

## Make sure we have a working directory (exception on failed write)
if not os.path.isdir(self.WORK_DIR):
os.mkdir(self.WORK_DIR)

self._db = DB.Database(DB_PATH)
## SOME OTHER ENVIRONMENT SETUP STUFF

def _cleanUpEnvironment(self):
'''Delete temp files created for this run'''
try:
for path,dirs,files in os.walk(self.WORK_DIR, topdown=False):
for f in files:os.unlink(os.path.join(path,f))
for d in dirs:os.rmdir(os.path.join(path,d))
os.rmdir(self.WORK_DIR)
except:
print sys.stderr, 'Could not delete temp files; left at:'
print sys.stderr, self.WORK_DIR

def finish(self):
'''Clean up and finish the run (write out to the database)'''
if self.SAVE_DB:self._db.commit()
if self.CLEANUP:self._cleanUpEnvironment()

def __del__(self):
## FIXME: Known bug:
##  del t at command line works properly
##  Ctrl-D, when there is no db file present, results in a LookupError
self.finish()

if __name__ == '__main__':
printHelp()
## Provide tab completion to the user
import readline, rlcompleter
readline.parse_and_bind('tab: complete')
t= OperatorClassName(os.path.splitext(__file__)[0])


In database:
def __init__(self, path):
'''Constructor'''
self.__dbpath = path## Path to the database
self.load()

def load(self):
'''Read the database out from the file'''
from xml.parsers.expat import ExpatError

if os.path.exists(self.__dbpath):
## Noticed exceptions: IOError, ExpatError
try:
self.root = ET.parse(self.__dbpath).getroot()
except ExpatError:
raise ExpatError('Invalid XML in ' + self.__dbpath)
else:
self.root = ET.Element(root)

def commit(self):
'''Write the database back to the file'''
## Noticed exceptions: IOError
ET.ElementTree(self.root).write(self.__dbpath)
--
~Matthew Strax-Haber
National Aeronautics and Space Administration
Langley Research Center (LaRC)
Co-op, Safety-Critical Avionics Systems Branch
W: 757-864-7378; C: 561-704-0029
Mail Stop 130
matthew.strax-ha...@nasa.gov



From: Martin Walsh mwa...@mwalsh.org
Date: Mon, 20 Apr 2009 16:05:01 -0500
To: Python Tutor tutor@python.org
Cc: Strax-Haber, Matthew (LARC-D320) matthew.strax-ha...@nasa.gov
Subject: Re: [Tutor] String Encoding problem

Forwarding to the list. Matt, perhaps you can repost in plain text, my
mail client seems to have mangled your source ...

Strax-Haber, Matthew (LARC-D320) wrote:
 *From: *Martin Walsh mwa...@mwalsh.org

 The environment available to __del__ methods during program termination
 is wonky, and apparently not very consistent either. I can't say that I
 completely understand it myself, perhaps someone else can provide a
 better explanation for both of us, but some of the causes are described
 in the documentation:

 http://docs.python.org/reference/datamodel.html#object.__del__

 What is your rationale for using __del__? Are you trying to force a
 'commit()' call on Database instances when your program terminates -- in
 the case of an unhandled exception, for example?

 Perhaps I oversimplified a bit. In my actual code, there is a database
 class and an operator class. The actual structure is this:

 In operator:
 def __init__(self, path, saveDB=True, cleanUp=True):
'''Constructor'''## Calculate filesystem paths
self.WORK_DIR= path + '.tmp'DB_PATH= path
 + '.xml'self.SAVE_DB= saveDB## finish(): Delete
 unnecessary files created by run?self.CLEANUP= cleanUp##
 finish(): Delete database at end of run?## Make sure we
 have a working directory (exception on failed write)if not
 os.path.isdir(self.WORK_DIR):os.mkdir(self.WORK_DIR)

self._db = DB.Database(DB_PATH)
 ## SOME OTHER ENVIRONMENT SETUP STUFF
 def _cleanUpEnvironment(self):  try:## Delete
 temp files created for this runfor path,dirs,files in
 os.walk(self.WORK_DIR, topdown=False):for f in files:
os.unlink(os.path.join(path,f))for d in dirs:
os.rmdir(os.path.join(path,d))os.rmdir(self.WORK_DIR)
except:print sys.stderr, 'Could not delete temp
 files; left at:'print sys.stderr, self.WORK_DIRdef
 finish(self):'''Clean up and finish the run (write out to
 the database)'''if self.SAVE_DB

Re: [Tutor] String Encoding problem

2009-04-20 Thread Strax-Haber, Matthew (LARC-D320)
I've solved the problem by passing on the work of deciding when to commit to 
client code. This isn't ideal but it will do what is necessary and 
unfortunately I don't have any more time to dedicate to this. I hate not being 
able to find a reasonable workaround :/.
--
~Matthew Strax-Haber
National Aeronautics and Space Administration
Langley Research Center (LaRC)
Co-op, Safety-Critical Avionics Systems Branch
W: 757-864-7378; C: 561-704-0029
Mail Stop 130
matthew.strax-ha...@nasa.gov



From: Kent Johnson ken...@tds.net
Date: Mon, 20 Apr 2009 16:55:16 -0500
To: Strax-Haber, Matthew (LARC-D320) matthew.strax-ha...@nasa.gov
Cc: Python Tutor tutor@python.org
Subject: Re: [Tutor] String Encoding problem

Can you give us a simple description of what you are trying to do? And
if you can post in plain text instead of HTML that would be helpful.

Maybe this will give you some ideas - you can trap the control-D and
do your cleanup:
http://openbookproject.net/pybiblio/tips/wilson/simpleExceptions.php

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor