[Tutor] parsing XML into a python dictionary

2009-11-13 Thread Christopher Spears
I've been working on a way to parse an XML document and convert it into a 
python dictionary.  I want to maintain the hierarchy of the XML.  Here is the 
sample XML I have been working on:


  
Neil Gaiman
Glyn Dillon
Charles Vess
  


This is my first stab at this:

#!/usr/bin/env python

from lxml import etree

def generateKey(element):
if element.attrib:
key = (element.tag, element.attrib)
else:
key = element.tag
return key  

class parseXML(object):
def __init__(self, xmlFile = 'test.xml'):
self.xmlFile = xmlFile

def parse(self):
doc = etree.parse(self.xmlFile)
root = doc.getroot()
key = generateKey(root)
dictA = {}
for r in root.getchildren():
keyR = generateKey(r)
if r.text:
dictA[keyR] = r.text
if r.getchildren():
dictA[keyR] = r.getchildren()

newDict = {}
newDict[key] = dictA
return newDict

if __name__ == "__main__":
px = parseXML()
newDict = px.parse()
print newDict

This is the output:
163>./parseXML.py
{'collection': {('comic', {'number': '62', 'title': 'Sandman'}): [, , ]}}

The script doesn't descend all of the way down because I'm not sure how to hand 
a XML document that may have multiple layers.  Advice anyone?  Would this be a 
job for recursion?

Thanks!
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Iterable Understanding

2009-11-13 Thread spir
Le Fri, 13 Nov 2009 17:58:30 +,
Stephen Nelson-Smith  s'exprima ainsi:

> I think I'm having a major understanding failure.
> 
> So having discovered that my Unix sort breaks on the last day of the
> month, I've gone ahead and implemented a per log search, using heapq.
> 
> I've tested it with various data, and it produces a sorted logfile, per log.
> 
> So in essence this:
> 
> logs = [ LogFile( "/home/stephen/qa/ded1353/quick_log.gz", "04/Nov/2009" ),
>  LogFile( "/home/stephen/qa/ded1408/quick_log.gz", "04/Nov/2009" ),
>  LogFile( "/home/stephen/qa/ded1409/quick_log.gz", "04/Nov/2009" ) ]
> 
> Gives me a list of LogFiles - each of which has a getline() method,
> which returns a tuple.
> 
> I thought I could merge iterables using Kent's recipe, or just with
> heapq.merge()
> 
> But how do I get from a method that can produce a tuple, to some
> mergable iterables?
> 
> for log in logs:
>   l = log.getline()
>   print l
> 
> This gives me three loglines.  How do I get more?  Other than while True:

I'm not 100% sure to understand your needs and intention; just have a try. 
Maybe what you want actually is rather:

for log in logs:
  for line in log:
print l

Meaning your log objects need be iterable. To do this, you must have an 
__iter__ method that would surely simply return the object's getline (or maybe 
replace it alltogether). Then when walking the log with for...in, python will 
silently call getline until error. This means getline must raise StopIteration 
when the log is "empty" and __iter__ must "reset" it.
Another solution may be to subtype "file", for a file is precisely an iterator 
over lines; and you really get your data from a file. Simply (sic), there must 
some job done about this issue of time stamps (haven't studied in details). 
Still, i guess this track may be worth an little study.
Once you get logs iterable, you may subtype list for your overall log 
collection and set it an __iter__ method like:

for log in self:
for line in log:
yield line

(The trick is not from me.)
Then you can write:
for line in my_log_collection

> Of course tuples are iterables, but that doesn't help, as I want to
> sort on timestamp... so a list of tuples would be ok  But how do I
> construct that, bearing in mind I am trying not to use up too much
> memory?
> 
> I think there's a piece of the jigsaw I just don't get.  Please help!
> 
> The code in full is here:
> 
> import gzip, heapq, re
> 
> class LogFile:
>def __init__(self, filename, date):
>self.logfile = gzip.open(filename, 'r')
>for logline in self.logfile:
>self.line = logline
>self.stamp = self.timestamp(self.line)
>if self.stamp.startswith(date):
>break
>self.initialise_heap()
> 
>def timestamp(self, line):
>stamp = re.search(r'\[(.*?)\]', line).group(1)
>return stamp
> 
>def initialise_heap(self):
>initlist=[]
>self.heap=[]
>for x in xrange(10):
>self.line=self.logfile.readline()
>self.stamp=self.timestamp(self.line)
>initlist.append((self.stamp,self.line))
>heapq.heapify(initlist)
>self.heap=initlist
> 
> 
>def getline(self):
>self.line=self.logfile.readline()
>stamp=self.timestamp(self.line)
>heapq.heappush(self.heap, (stamp, self.line))
>pop = heapq.heappop(self.heap)
>return pop
> 
> logs = [ LogFile( "/home/stephen/qa/ded1353/quick_log.gz", "04/Nov/2009" ),
>  LogFile( "/home/stephen/qa/ded1408/quick_log.gz", "04/Nov/2009" ),
>  LogFile( "/home/stephen/qa/ded1409/quick_log.gz", "04/Nov/2009" ) ]
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
> 



* la vita e estrany *

http://spir.wikidot.com/



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question re: hangman.py

2009-11-13 Thread biboy mendz

thanks a lot for the clarification Alan and all.

--
Regards,
bibs M.

Host/Kernel/OS  "cc02695" running Linux 2.6.31-5.slh.4-sidux-686 
[sidux 2009-02 Αιθήρ - kde-full - (200907141427) ]

www.sidux.com



Alan Gauld wrote:


"biboy mendz"  wrote


chapter 8: hangman.py

expression is: print(letter, end=' ')

it explained:
end keyword argument in print() call makes the print() function put a 
space

character at the end of the string instead of a newline.

however when run it gives error: SyntaxError: invalid syntax.

What gives? This is the first time i saw such expression inside print
function. Is this version-specific of python? I'm running version 2.5.4.


Your tutorial is using version 3.
In 2.6 simply put a comma after the string:

print letter,

to do the same thing.

___
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] COM server: cannot assign property

2009-11-13 Thread Emile van Sebille

On 11/12/2009 4:41 PM Yashwin Kanchan said...

Hi Guys

I am trying to create a simple test COM server , but am have trouble 
assigning any property value to it through excel VBA.


Please point out where i am going wrong.


#COM server
class test(object):

_reg_clsid_ = "{B5901450-F9A1-4F76-8FCF-2BFFA96ED210}"
_reg_progid_ = "Python.Test"
_public_methods_ = ["arg"]


My bet is that the problem is here.  Do you need to expose t as well?

I wrote one of these six or seven years ago -- here're the critical bits 
(where EMSpecs is a class defined in the same module):


class fenxUtilities:
_public_methods_ = [ 'FirstPartInsp' ]
_reg_progid_ = "fenxDCom.Util"
_reg_clsid_ = "{3EAD7AB4-2978-4360-8F7D-33FB36E9E146}"
def FirstPartInsp(self, nomDiam, numFlutes, nomOAL, nomLOC):
return EMSpecs(nomDiam, numFlutes, nomOAL, nomLOC).retvals


if __name__=='__main__':
print "Registering COM server..."
import win32com.server.register
win32com.server.register.UseCommandLine(fenxUtilities)

HTH,

Emile



_public_attrs_ = ["t"]
   
def __init__(self):

self._t=0
   
def arg(self,s,r):

return (s,r)
   
def get_t(self):

return self._t
   
def set_t(self,value):

self._t = str(value)
   
t = property(get_t,set_t)

if __name__=='__main__':

import win32com.server.register
win32com.server.register.UseCommandLine(test)
print "done"

VBA Code:

Sub Button1_Click()
Set test = CreateObject("Python.Test")
test.arg 2, 5
test.t = "hello"
MsgBox test.t
End Sub


Error; "Object doesnt support this property or method" at test.t = "hello"

Thanks
Yashwin Kanchan




___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Iterable Understanding

2009-11-13 Thread Stephen Nelson-Smith
I think I'm having a major understanding failure.

So having discovered that my Unix sort breaks on the last day of the
month, I've gone ahead and implemented a per log search, using heapq.

I've tested it with various data, and it produces a sorted logfile, per log.

So in essence this:

logs = [ LogFile( "/home/stephen/qa/ded1353/quick_log.gz", "04/Nov/2009" ),
 LogFile( "/home/stephen/qa/ded1408/quick_log.gz", "04/Nov/2009" ),
 LogFile( "/home/stephen/qa/ded1409/quick_log.gz", "04/Nov/2009" ) ]

Gives me a list of LogFiles - each of which has a getline() method,
which returns a tuple.

I thought I could merge iterables using Kent's recipe, or just with
heapq.merge()

But how do I get from a method that can produce a tuple, to some
mergable iterables?

for log in logs:
  l = log.getline()
  print l

This gives me three loglines.  How do I get more?  Other than while True:

Of course tuples are iterables, but that doesn't help, as I want to
sort on timestamp... so a list of tuples would be ok  But how do I
construct that, bearing in mind I am trying not to use up too much
memory?

I think there's a piece of the jigsaw I just don't get.  Please help!

The code in full is here:

import gzip, heapq, re

class LogFile:
   def __init__(self, filename, date):
   self.logfile = gzip.open(filename, 'r')
   for logline in self.logfile:
   self.line = logline
   self.stamp = self.timestamp(self.line)
   if self.stamp.startswith(date):
   break
   self.initialise_heap()

   def timestamp(self, line):
   stamp = re.search(r'\[(.*?)\]', line).group(1)
   return stamp

   def initialise_heap(self):
   initlist=[]
   self.heap=[]
   for x in xrange(10):
   self.line=self.logfile.readline()
   self.stamp=self.timestamp(self.line)
   initlist.append((self.stamp,self.line))
   heapq.heapify(initlist)
   self.heap=initlist


   def getline(self):
   self.line=self.logfile.readline()
   stamp=self.timestamp(self.line)
   heapq.heappush(self.heap, (stamp, self.line))
   pop = heapq.heappop(self.heap)
   return pop

logs = [ LogFile( "/home/stephen/qa/ded1353/quick_log.gz", "04/Nov/2009" ),
 LogFile( "/home/stephen/qa/ded1408/quick_log.gz", "04/Nov/2009" ),
 LogFile( "/home/stephen/qa/ded1409/quick_log.gz", "04/Nov/2009" ) ]
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question re: hangman.py

2009-11-13 Thread Alan Gauld


"Hugo Arts"  wrote 


print letter, ' ',



You don't need the space, Python automatically inserts 
a space instead of the newline when you use the comma.



--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/l2p/

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question re: hangman.py

2009-11-13 Thread Alan Gauld


"biboy mendz"  wrote


chapter 8: hangman.py

expression is: print(letter, end=' ')

it explained:
end keyword argument in print() call makes the print() function put a 
space

character at the end of the string instead of a newline.

however when run it gives error: SyntaxError: invalid syntax.

What gives? This is the first time i saw such expression inside print
function. Is this version-specific of python? I'm running version 2.5.4.


Your tutorial is using version 3.
In 2.6 simply put a comma after the string:

print letter,

to do the same thing. 



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How to call a method with a print statement?

2009-11-13 Thread Modulok
List,

 __repr__() is exactly what I was looking for :)

You guys rock! Thank you.
-Modulok-

On 11/12/09, Dave Angel  wrote:
>
>
> Kent Johnson wrote:
>> On Thu, Nov 12, 2009 at 6:35 AM, Luke Paireepinart
>>  wrote:
>>
>>> On Thu, Nov 12, 2009 at 5:29 AM, Jeff R. Allen  wrote:
>>>
 You are looking for the __str__ method. See
 http://docs.python.org/reference/datamodel.html#object.__str__


>>> Can't you also implement __repr__?
>>>
>>
>> Yes, in fact if you are only going to implement one of __str__ and
>> __repr__, arguably __repr__ is a better choice. __repr__() is called
>> by the interactive interpreter when it displays an object. __str__ is
>> called by print, and if you don't define __str__ it will call
>> __repr__. So defining only __str__ will not give a custom
>> representation unless you print:
>>
>> In [1]: class Foo():
>>...: def __str__(self):
>>...: return "I'm a Foo"
>>
>> In [2]: f = Foo()
>>
>> In [3]: f
>> Out[3]: <__main__.Foo instance at 0x1433468>
>>
>> In [4]: print f
>> I'm a Foo
>>
>>
>> Defining __repr__ will give the custom representation when you just
>> give the name of the object:
>>
>> In [5]: class Foo2():
>>...: def __repr__(self):
>>...: return "I'm a Foo2"
>>...:
>>...:
>>
>> In [6]: f2=Foo2()
>>
>> In [7]: f2
>> Out[7]: I'm a Foo2
>>
>> In [8]: print f2
>> I'm a Foo2
>>
>> Kent
>>
>>
> And one other important place that uses __repr__() is the printing of
> containers.  So if you have a list of Foo2 objects, and you want to just say
> print mylist
>
> it's better to have __repr__().
>
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question re: hangman.py

2009-11-13 Thread Hugo Arts
On Fri, Nov 13, 2009 at 2:17 PM, biboy mendz  wrote:
> http://inventwithpython.com
>
> chapter 8: hangman.py
>
> expression is: print(letter, end=' ')
>
> it explained:
> end keyword argument in print() call makes the print() function put a space
> character at the end of the string instead of a newline.
>
> however when run it gives error: SyntaxError: invalid syntax.
>
> What gives? This is the first time i saw such expression inside print
> function. Is this version-specific of python? I'm running version 2.5.4.
>

Yes, the print function is a new feature of python 3. The equivalent
statement in python 2.5 is probably something like this:

print letter, ' ',

The print function is a statement in python before 3.0. It takes a
comma-separated list of things to print. The trailing comma prevents
python from appending a newline.

Hugo
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Question re: hangman.py

2009-11-13 Thread biboy mendz

http://inventwithpython.com

chapter 8: hangman.py

expression is: print(letter, end=' ')

it explained:
end keyword argument in print() call makes the print() function put a space
character at the end of the string instead of a newline.

however when run it gives error: SyntaxError: invalid syntax.

What gives? This is the first time i saw such expression inside print
function. Is this version-specific of python? I'm running version 2.5.4.

--
Regards,
bibs M.

Host/Kernel/OS  "cc02695" running Linux 2.6.31-5.slh.4-sidux-686 
[sidux 2009-02 Αιθήρ - kde-full - (200907141427) ]

www.sidux.com

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor