[Tutor] parsing XML into a python dictionary
I've been working on a way to parse an XML document and convert it into a python dictionary. I want to maintain the hierarchy of the XML. Here is the sample XML I have been working on: Neil Gaiman Glyn Dillon Charles Vess This is my first stab at this: #!/usr/bin/env python from lxml import etree def generateKey(element): if element.attrib: key = (element.tag, element.attrib) else: key = element.tag return key class parseXML(object): def __init__(self, xmlFile = 'test.xml'): self.xmlFile = xmlFile def parse(self): doc = etree.parse(self.xmlFile) root = doc.getroot() key = generateKey(root) dictA = {} for r in root.getchildren(): keyR = generateKey(r) if r.text: dictA[keyR] = r.text if r.getchildren(): dictA[keyR] = r.getchildren() newDict = {} newDict[key] = dictA return newDict if __name__ == "__main__": px = parseXML() newDict = px.parse() print newDict This is the output: 163>./parseXML.py {'collection': {('comic', {'number': '62', 'title': 'Sandman'}): [, , ]}} The script doesn't descend all of the way down because I'm not sure how to hand a XML document that may have multiple layers. Advice anyone? Would this be a job for recursion? Thanks! ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Iterable Understanding
Le Fri, 13 Nov 2009 17:58:30 +, Stephen Nelson-Smith s'exprima ainsi: > I think I'm having a major understanding failure. > > So having discovered that my Unix sort breaks on the last day of the > month, I've gone ahead and implemented a per log search, using heapq. > > I've tested it with various data, and it produces a sorted logfile, per log. > > So in essence this: > > logs = [ LogFile( "/home/stephen/qa/ded1353/quick_log.gz", "04/Nov/2009" ), > LogFile( "/home/stephen/qa/ded1408/quick_log.gz", "04/Nov/2009" ), > LogFile( "/home/stephen/qa/ded1409/quick_log.gz", "04/Nov/2009" ) ] > > Gives me a list of LogFiles - each of which has a getline() method, > which returns a tuple. > > I thought I could merge iterables using Kent's recipe, or just with > heapq.merge() > > But how do I get from a method that can produce a tuple, to some > mergable iterables? > > for log in logs: > l = log.getline() > print l > > This gives me three loglines. How do I get more? Other than while True: I'm not 100% sure to understand your needs and intention; just have a try. Maybe what you want actually is rather: for log in logs: for line in log: print l Meaning your log objects need be iterable. To do this, you must have an __iter__ method that would surely simply return the object's getline (or maybe replace it alltogether). Then when walking the log with for...in, python will silently call getline until error. This means getline must raise StopIteration when the log is "empty" and __iter__ must "reset" it. Another solution may be to subtype "file", for a file is precisely an iterator over lines; and you really get your data from a file. Simply (sic), there must some job done about this issue of time stamps (haven't studied in details). Still, i guess this track may be worth an little study. Once you get logs iterable, you may subtype list for your overall log collection and set it an __iter__ method like: for log in self: for line in log: yield line (The trick is not from me.) Then you can write: for line in my_log_collection > Of course tuples are iterables, but that doesn't help, as I want to > sort on timestamp... so a list of tuples would be ok But how do I > construct that, bearing in mind I am trying not to use up too much > memory? > > I think there's a piece of the jigsaw I just don't get. Please help! > > The code in full is here: > > import gzip, heapq, re > > class LogFile: >def __init__(self, filename, date): >self.logfile = gzip.open(filename, 'r') >for logline in self.logfile: >self.line = logline >self.stamp = self.timestamp(self.line) >if self.stamp.startswith(date): >break >self.initialise_heap() > >def timestamp(self, line): >stamp = re.search(r'\[(.*?)\]', line).group(1) >return stamp > >def initialise_heap(self): >initlist=[] >self.heap=[] >for x in xrange(10): >self.line=self.logfile.readline() >self.stamp=self.timestamp(self.line) >initlist.append((self.stamp,self.line)) >heapq.heapify(initlist) >self.heap=initlist > > >def getline(self): >self.line=self.logfile.readline() >stamp=self.timestamp(self.line) >heapq.heappush(self.heap, (stamp, self.line)) >pop = heapq.heappop(self.heap) >return pop > > logs = [ LogFile( "/home/stephen/qa/ded1353/quick_log.gz", "04/Nov/2009" ), > LogFile( "/home/stephen/qa/ded1408/quick_log.gz", "04/Nov/2009" ), > LogFile( "/home/stephen/qa/ded1409/quick_log.gz", "04/Nov/2009" ) ] > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor > * la vita e estrany * http://spir.wikidot.com/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question re: hangman.py
thanks a lot for the clarification Alan and all. -- Regards, bibs M. Host/Kernel/OS "cc02695" running Linux 2.6.31-5.slh.4-sidux-686 [sidux 2009-02 Αιθήρ - kde-full - (200907141427) ] www.sidux.com Alan Gauld wrote: "biboy mendz" wrote chapter 8: hangman.py expression is: print(letter, end=' ') it explained: end keyword argument in print() call makes the print() function put a space character at the end of the string instead of a newline. however when run it gives error: SyntaxError: invalid syntax. What gives? This is the first time i saw such expression inside print function. Is this version-specific of python? I'm running version 2.5.4. Your tutorial is using version 3. In 2.6 simply put a comma after the string: print letter, to do the same thing. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] COM server: cannot assign property
On 11/12/2009 4:41 PM Yashwin Kanchan said... Hi Guys I am trying to create a simple test COM server , but am have trouble assigning any property value to it through excel VBA. Please point out where i am going wrong. #COM server class test(object): _reg_clsid_ = "{B5901450-F9A1-4F76-8FCF-2BFFA96ED210}" _reg_progid_ = "Python.Test" _public_methods_ = ["arg"] My bet is that the problem is here. Do you need to expose t as well? I wrote one of these six or seven years ago -- here're the critical bits (where EMSpecs is a class defined in the same module): class fenxUtilities: _public_methods_ = [ 'FirstPartInsp' ] _reg_progid_ = "fenxDCom.Util" _reg_clsid_ = "{3EAD7AB4-2978-4360-8F7D-33FB36E9E146}" def FirstPartInsp(self, nomDiam, numFlutes, nomOAL, nomLOC): return EMSpecs(nomDiam, numFlutes, nomOAL, nomLOC).retvals if __name__=='__main__': print "Registering COM server..." import win32com.server.register win32com.server.register.UseCommandLine(fenxUtilities) HTH, Emile _public_attrs_ = ["t"] def __init__(self): self._t=0 def arg(self,s,r): return (s,r) def get_t(self): return self._t def set_t(self,value): self._t = str(value) t = property(get_t,set_t) if __name__=='__main__': import win32com.server.register win32com.server.register.UseCommandLine(test) print "done" VBA Code: Sub Button1_Click() Set test = CreateObject("Python.Test") test.arg 2, 5 test.t = "hello" MsgBox test.t End Sub Error; "Object doesnt support this property or method" at test.t = "hello" Thanks Yashwin Kanchan ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Iterable Understanding
I think I'm having a major understanding failure. So having discovered that my Unix sort breaks on the last day of the month, I've gone ahead and implemented a per log search, using heapq. I've tested it with various data, and it produces a sorted logfile, per log. So in essence this: logs = [ LogFile( "/home/stephen/qa/ded1353/quick_log.gz", "04/Nov/2009" ), LogFile( "/home/stephen/qa/ded1408/quick_log.gz", "04/Nov/2009" ), LogFile( "/home/stephen/qa/ded1409/quick_log.gz", "04/Nov/2009" ) ] Gives me a list of LogFiles - each of which has a getline() method, which returns a tuple. I thought I could merge iterables using Kent's recipe, or just with heapq.merge() But how do I get from a method that can produce a tuple, to some mergable iterables? for log in logs: l = log.getline() print l This gives me three loglines. How do I get more? Other than while True: Of course tuples are iterables, but that doesn't help, as I want to sort on timestamp... so a list of tuples would be ok But how do I construct that, bearing in mind I am trying not to use up too much memory? I think there's a piece of the jigsaw I just don't get. Please help! The code in full is here: import gzip, heapq, re class LogFile: def __init__(self, filename, date): self.logfile = gzip.open(filename, 'r') for logline in self.logfile: self.line = logline self.stamp = self.timestamp(self.line) if self.stamp.startswith(date): break self.initialise_heap() def timestamp(self, line): stamp = re.search(r'\[(.*?)\]', line).group(1) return stamp def initialise_heap(self): initlist=[] self.heap=[] for x in xrange(10): self.line=self.logfile.readline() self.stamp=self.timestamp(self.line) initlist.append((self.stamp,self.line)) heapq.heapify(initlist) self.heap=initlist def getline(self): self.line=self.logfile.readline() stamp=self.timestamp(self.line) heapq.heappush(self.heap, (stamp, self.line)) pop = heapq.heappop(self.heap) return pop logs = [ LogFile( "/home/stephen/qa/ded1353/quick_log.gz", "04/Nov/2009" ), LogFile( "/home/stephen/qa/ded1408/quick_log.gz", "04/Nov/2009" ), LogFile( "/home/stephen/qa/ded1409/quick_log.gz", "04/Nov/2009" ) ] ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question re: hangman.py
"Hugo Arts" wrote print letter, ' ', You don't need the space, Python automatically inserts a space instead of the newline when you use the comma. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/l2p/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question re: hangman.py
"biboy mendz" wrote chapter 8: hangman.py expression is: print(letter, end=' ') it explained: end keyword argument in print() call makes the print() function put a space character at the end of the string instead of a newline. however when run it gives error: SyntaxError: invalid syntax. What gives? This is the first time i saw such expression inside print function. Is this version-specific of python? I'm running version 2.5.4. Your tutorial is using version 3. In 2.6 simply put a comma after the string: print letter, to do the same thing. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] How to call a method with a print statement?
List, __repr__() is exactly what I was looking for :) You guys rock! Thank you. -Modulok- On 11/12/09, Dave Angel wrote: > > > Kent Johnson wrote: >> On Thu, Nov 12, 2009 at 6:35 AM, Luke Paireepinart >> wrote: >> >>> On Thu, Nov 12, 2009 at 5:29 AM, Jeff R. Allen wrote: >>> You are looking for the __str__ method. See http://docs.python.org/reference/datamodel.html#object.__str__ >>> Can't you also implement __repr__? >>> >> >> Yes, in fact if you are only going to implement one of __str__ and >> __repr__, arguably __repr__ is a better choice. __repr__() is called >> by the interactive interpreter when it displays an object. __str__ is >> called by print, and if you don't define __str__ it will call >> __repr__. So defining only __str__ will not give a custom >> representation unless you print: >> >> In [1]: class Foo(): >>...: def __str__(self): >>...: return "I'm a Foo" >> >> In [2]: f = Foo() >> >> In [3]: f >> Out[3]: <__main__.Foo instance at 0x1433468> >> >> In [4]: print f >> I'm a Foo >> >> >> Defining __repr__ will give the custom representation when you just >> give the name of the object: >> >> In [5]: class Foo2(): >>...: def __repr__(self): >>...: return "I'm a Foo2" >>...: >>...: >> >> In [6]: f2=Foo2() >> >> In [7]: f2 >> Out[7]: I'm a Foo2 >> >> In [8]: print f2 >> I'm a Foo2 >> >> Kent >> >> > And one other important place that uses __repr__() is the printing of > containers. So if you have a list of Foo2 objects, and you want to just say > print mylist > > it's better to have __repr__(). > > > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question re: hangman.py
On Fri, Nov 13, 2009 at 2:17 PM, biboy mendz wrote: > http://inventwithpython.com > > chapter 8: hangman.py > > expression is: print(letter, end=' ') > > it explained: > end keyword argument in print() call makes the print() function put a space > character at the end of the string instead of a newline. > > however when run it gives error: SyntaxError: invalid syntax. > > What gives? This is the first time i saw such expression inside print > function. Is this version-specific of python? I'm running version 2.5.4. > Yes, the print function is a new feature of python 3. The equivalent statement in python 2.5 is probably something like this: print letter, ' ', The print function is a statement in python before 3.0. It takes a comma-separated list of things to print. The trailing comma prevents python from appending a newline. Hugo ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Question re: hangman.py
http://inventwithpython.com chapter 8: hangman.py expression is: print(letter, end=' ') it explained: end keyword argument in print() call makes the print() function put a space character at the end of the string instead of a newline. however when run it gives error: SyntaxError: invalid syntax. What gives? This is the first time i saw such expression inside print function. Is this version-specific of python? I'm running version 2.5.4. -- Regards, bibs M. Host/Kernel/OS "cc02695" running Linux 2.6.31-5.slh.4-sidux-686 [sidux 2009-02 Αιθήρ - kde-full - (200907141427) ] www.sidux.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor