Re: [Tutor] A somewhat easier way to parse XML

2005-01-19 Thread Danny Yoo


On Wed, 19 Jan 2005, Max Noel wrote:

   I've just spent the last few hours learning how to use the DOM XML
 API (to be more precise, the one that's in PyXML), instead of revising
 for my exams :p. My conclusion so far: it sucks (and so does SAX because
 I can't see a way to use it for OOP or recursive XML trees).

Hi Max,

You are not alone in this restless feeling.

In fact, Uche Ogbuji, one of the lead developers of 4Suite and Amara
(which Kent mentioned earlier), just wrote a blog entry about his
malcontent with the DOM.  Here, these may interest you:

http://www.oreillynet.com/pub/wlg/6224
http://www.oreillynet.com/pub/wlg/6225


 In fact, I find it appalling that none of the standard XML parsers
 (DOM, SAX) provides an easy way to do that (yeah, I know that's what
 more or less what the shelve module does, but I want a
 language-independent way).

For simple applications, the 'xmlrpclib' has two functions (dumps() and
loads()) that we can use:

http://www.python.org/doc/lib/node541.html


For example:

###
 s = xmlrpclib.dumps(({'hello': 'world'},))
 print s
params
param
valuestruct
member
namehello/name
valuestringworld/string/value
/member
/struct/value
/param
/params


 xmlrpclib.loads(s)
(({'hello': 'world'},), None)
###

A little bit silly, but it does work.  The nice thing about this that
xmlrpc is pretty much a platform-independent standard, so if we're dealing
with simple values like strings, integers, lists, and dictionaries, we're
all set.  It is a bit verbose, though.

Amara looks really interesting, especially because they have tools for
doing data-binding in a Python-friendly way.


Best of wishes to you!

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] A somewhat easier way to parse XML

2005-01-19 Thread Max Noel
On Jan 19, 2005, at 03:58, David Rock wrote:
For me, it seems that the way you are supposed to interact with an XML
DOM is to already know what you are looking for, and in theory, you
_should_ know ;-)
	Indeed. The problem is, even if I know what I'm looking for, the  
problem remains that given the following document,

foo
barbaz/bar
/foo
	If I want to get baz, the command is (assuming a DOM object has been  
created):

doc.documentElement.getElementsByTagName(bar)[0].childNodes[0].nodeVal 
ue

	Quoting from memory there, it may not be entirely correct. However,  
the command has more characters than the document itself. Somehow I  
feel it'd be a bit more elegant to use:

doc[bar]
(or depending on the implementation, doc[foo][bar])
Don't you think?
Still, I can't help wishing I had a simple way to create a dict from a
DOM. From a Python perspective, that seems more Pythonic to me as
well. I guess it's just a different way of looking at it.
	I can't help but think that from the perspective of any other  
language, that would feel more [language]-ic as well ;)

-- Max
maxnoel_fr at yahoo dot fr -- ICQ #85274019
Look at you hacker... A pathetic creature of meat and bone, panting  
and sweating as you run through my corridors... How can you challenge a  
perfect, immortal machine?

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] A somewhat easier way to parse XML

2005-01-19 Thread Kent Johnson
David Rock wrote:
* Max Noel [EMAIL PROTECTED] [2005-01-19 11:48]:
On Jan 19, 2005, at 03:58, David Rock wrote:

For me, it seems that the way you are supposed to interact with an XML
DOM is to already know what you are looking for, and in theory, you
_should_ know ;-)
	Indeed. The problem is, even if I know what I'm looking for, the  
problem remains that given the following document,

foo
barbaz/bar
/foo
	If I want to get baz, the command is (assuming a DOM object has 
	been  created):

doc.documentElement.getElementsByTagName(bar)[0].childNodes[0].nodeVal 
ue

	Quoting from memory there, it may not be entirely correct. However,  
the command has more characters than the document itself. Somehow I  
feel it'd be a bit more elegant to use:

doc[bar]
(or depending on the implementation, doc[foo][bar])
	Don't you think?

Absolutely. That is exactly what I was hoping for, too. ElementTree
comes close, but even that can be a bit unwieldy because of the
multi-dimentional array you end up with. Still, if you know the data,
doc[0][0] is a lot easier than doc.documentElement...nodeValue
Use the XPath support in ElementTree. Something like
doc.find('foo/bar')
If I understand correctly Amara allows something like
doc.foo.bar
I'll try to find the time to write up a full example using ElementTree, Amara and dom4j. Meanwhile 
see http://www.oreillynet.com/pub/wlg/6225 and http://www.oreillynet.com/pub/wlg/6239

Kent


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] A somewhat easier way to parse XML

2005-01-19 Thread Kent Johnson
Kent Johnson wrote:
On Jan 19, 2005, at 03:58, David Rock wrote:
Indeed. The problem is, even if I know what I'm looking for, the  
problem remains that given the following document,

foo
barbaz/bar
/foo
If I want to get baz, the command is ...

I'll try to find the time to write up a full example using ElementTree, 
Amara and dom4j. Meanwhile see http://www.oreillynet.com/pub/wlg/6225 
and http://www.oreillynet.com/pub/wlg/6239

OK, here is code to print 'baz' from a simple XML string using three different XML toolkits. (I 
added another level to the XML to make it a little more challenging.)

This is pretty much a tie - it's three lines of code in each toolkit. The main difference is between 
the XPath access used by ElementTree and dom4j and the attribute access used by amara. Personally I 
find dom4j's full XPath support to be very handy - it essentially gives you a query engine built in 
to your data model. But it is a matter of taste, and amara has XPath support also.

I put each example inside 'except ImportError' so I could have them all in one file - it's not 
something you would normally do. The ElementTree and amara examples are for CPython; the dom4j 
example is for Jython.

Of course you need the corresponding toolkit to be correctly installed...
Kent
docText = '''
doc
foo
barbaz/bar
/foo
/doc
'''
# ElementTree
try:
from elementtree import ElementTree
doc = ElementTree.XML(docText)
# Note: doc represents the top-level ('doc') element
print 'ElementTree'
print doc.findtext('foo/bar')
except ImportError:
print 'No ElementTree'
print
# amara
try:
from amara import binderytools
root = binderytools.bind_string(docText)
# root is the 'root' element - the parent of the 'doc' element
print 'amara'
print root.doc.foo.bar
except ImportError:
print 'No amara'
print
# dom4j
try:
import org.dom4j as dom
root = dom.DocumentHelper.parseText(docText)
print 'dom4j'
print root.valueOf('doc/foo/bar')
except ImportError:
print 'No dom4j'
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] A somewhat easier way to parse XML

2005-01-18 Thread Max Noel
Hi everyone,
	I've just spent the last few hours learning how to use the DOM XML API 
(to be more precise, the one that's in PyXML), instead of revising for 
my exams :p. My conclusion so far: it sucks (and so does SAX because I 
can't see a way to use it for OOP or recursive XML trees).
	I'm certain it can be used to do extremely powerful stuff, but as far 
as usability is concerned, it's ridiculously verbose and naming is 
inconsistent. I've had a look at Java DOM as well, and it's apparently 
the same.

	This afternoon, I read a bit about YAML and its basic philosophy that 
everything can be represented as a mix of lists, dictionaries and 
scalars. Now, setting aside the fact that one look at YAML made me want 
to ditch XML for data storage purposes completely (which I can't do 
since there's no Java YAML parser that I know of so far), it came to my 
mind once again that this is the one thing I want to be able to do in 
XML. Chances are that's all what 9 out of 10 programmers want to do 
with XML.
	In fact, I find it appalling that none of the standard XML parsers 
(DOM, SAX) provides an easy way to do that (yeah, I know that's what 
more or less what the shelve module does, but I want a 
language-independent way).

	So, to wrap my head around DOM, I set out to write a little script 
that does just that. Introducing xmldict.py and the DataNode class.
	For example, given the following XML file:

?xml version=1.0 encoding=UTF-8?
character
attribute key=BOD
nameBody/name
rating6/rating
/attribute
attribute key=QCK
nameQuickness/name
rating9/rating
/attribute
/character
	...the DataNode class (yeah, I think I may have implemented that in a 
slightly bizarre fashion) will produce the following dictionary:

{u'attribute': [{u'@key': u'BOD', u'name': u'Body', u'rating': u'6'}, 
{u'@key': u'QCK', u'name': u'Quickness', u'rating': u'9'}]}

	As you can see, everything is represented in a mix of dictionaries, 
lists and unicode strings, and can now be used by a normal human being 
to write a program that uses this data.
	Comments, criticism, improvements, suggestions, [whatever]... Would be 
appreciated. Feel free to use it if you wish.

Thanks for your attention.



xmldict.py
Description: Binary data


-- Max
maxnoel_fr at yahoo dot fr -- ICQ #85274019
Look at you hacker... A pathetic creature of meat and bone, panting 
and sweating as you run through my corridors... How can you challenge a 
perfect, immortal machine?___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] A somewhat easier way to parse XML

2005-01-18 Thread David Rock
* Max Noel [EMAIL PROTECTED] [2005-01-19 00:17]:
 Hi everyone,
 
   I've just spent the last few hours learning how to use the DOM XML 
   API (to be more precise, the one that's in PyXML), instead of revising 
 for 
 my exams :p. My conclusion so far: it sucks (and so does SAX because I 
 can't see a way to use it for OOP or recursive XML trees).
   I'm certain it can be used to do extremely powerful stuff, but as 
   far as usability is concerned, it's ridiculously verbose and naming is 
 inconsistent. I've had a look at Java DOM as well, and it's apparently 
 the same.

I'm kind of in the same boat as you are and I have come to the
conclusion that XML is intended to answer specific questions with
discreet answers, not walk the DOM to create a dictionary. I _think_ the
idea behind this is that it would be redundant. You already have a
dictionary of sorts in the XML itself, why create a new one? 

For me, it seems that the way you are supposed to interact with an XML
DOM is to already know what you are looking for, and in theory, you
_should_ know ;-)

Still, I can't help wishing I had a simple way to create a dict from a
DOM. From a Python perspective, that seems more Pythonic to me as
well. I guess it's just a different way of looking at it.

-- 
David Rock
[EMAIL PROTECTED]


pgpq4Ua00eRSO.pgp
Description: PGP signature
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] A somewhat easier way to parse XML

2005-01-18 Thread Kent Johnson
Max Noel wrote:
Hi everyone,
I've just spent the last few hours learning how to use the DOM XML 
API (to be more precise, the one that's in PyXML), instead of revising 
for my exams :p. My conclusion so far: it sucks (and so does SAX because 
I can't see a way to use it for OOP or recursive XML trees).
I'm certain it can be used to do extremely powerful stuff, but as 
far as usability is concerned, it's ridiculously verbose and naming is 
inconsistent. I've had a look at Java DOM as well, and it's apparently 
the same.
I share your opinion that DOM is a pita. It's the same in Java because it is a 'language-neutral' 
spec - i.e. it sucks equally in every language :-)

For Python, take a look at ElementTree, it is way easier to use. Amara looks 
interesting too.
http://effbot.org/zone/element-index.htm
http://uche.ogbuji.net/uche.ogbuji.net/tech/4Suite/amara/
For Java, try dom4j. http://www.dom4j.org
Many people have tried to make more Pythonic XML libraries, you might want to look around before you 
write your own.

Kent
This afternoon, I read a bit about YAML and its basic philosophy 
that everything can be represented as a mix of lists, dictionaries and 
scalars. Now, setting aside the fact that one look at YAML made me want 
to ditch XML for data storage purposes completely (which I can't do 
since there's no Java YAML parser that I know of so far), it came to my 
mind once again that this is the one thing I want to be able to do in 
XML. Chances are that's all what 9 out of 10 programmers want to do with 
XML.
In fact, I find it appalling that none of the standard XML parsers 
(DOM, SAX) provides an easy way to do that (yeah, I know that's what 
more or less what the shelve module does, but I want a 
language-independent way).

So, to wrap my head around DOM, I set out to write a little script 
that does just that. Introducing xmldict.py and the DataNode class.
For example, given the following XML file:

?xml version=1.0 encoding=UTF-8?
character
attribute key=BOD
nameBody/name
rating6/rating
/attribute
attribute key=QCK
nameQuickness/name
rating9/rating
/attribute
/character
...the DataNode class (yeah, I think I may have implemented that in 
a slightly bizarre fashion) will produce the following dictionary:

{u'attribute': [{u'@key': u'BOD', u'name': u'Body', u'rating': u'6'}, 
{u'@key': u'QCK', u'name': u'Quickness', u'rating': u'9'}]}

As you can see, everything is represented in a mix of dictionaries, 
lists and unicode strings, and can now be used by a normal human being 
to write a program that uses this data.
Comments, criticism, improvements, suggestions, [whatever]... Would 
be appreciated. Feel free to use it if you wish.

Thanks for your attention.


-- Max
maxnoel_fr at yahoo dot fr -- ICQ #85274019
Look at you hacker... A pathetic creature of meat and bone, panting and 
sweating as you run through my corridors... How can you challenge a 
perfect, immortal machine?


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor