Re: [Tutor] Python XML for newbie

2012-07-02 Thread Peter Otten
Sean Carolan wrote:

 Thank you, this is helpful.  Minidom is confusing, even the
 documentation confirms this:
 The name of the functions are perhaps misleading

 But I'd start with the etree tutorial (of which
 there are many variations on the web):
 
 Ok, so I read through these tutorials and am at least able to print
 the XML output now.  I did this:
 
 doc = etree.parse('computer_books.xml')
 
 and then this:
 
 for elem in doc.iter():
 print elem.tag, elem.text
 
 Here's the data I'm interested in:
 
 index 1
 field 11
 value 9780596526740
 datum
 
 How do you say, If the field is 11, then print the next value?  The
 raw XML looks like this:
 
 datum
 index1/index
 field11/field
 value9780470286975/value
 /datum
 
 Basically I just want to pull all these ISBN numbers from the file.

With http://lxml.de/ you can use xpath:

$ cat computer_books.xml 
foo
bar
datum
index1/index
field11/field
value9780470286975/value
/datum
/bar
/foo
$ cat read_isbn.py
from lxml import etree

root = etree.parse(computer_books.xml)
print root.xpath(//datum[field=11]/value/text())
$ python read_isbn.py 
['9780470286975']
$ 


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python XML for newbie

2012-07-02 Thread Stefan Behnel
Peter Otten, 02.07.2012 09:57:
 Sean Carolan wrote:
 Thank you, this is helpful.  Minidom is confusing, even the
 documentation confirms this:
 The name of the functions are perhaps misleading

Yes, I personally think that (Mini)DOM should be locked away from beginners
as far as possible.


 Ok, so I read through these tutorials and am at least able to print
 the XML output now.  I did this:

 doc = etree.parse('computer_books.xml')

 and then this:

 for elem in doc.iter():
 print elem.tag, elem.text

 Here's the data I'm interested in:

 index 1
 field 11
 value 9780596526740
 datum

 How do you say, If the field is 11, then print the next value?  The
 raw XML looks like this:

 datum
 index1/index
 field11/field
 value9780470286975/value
 /datum

 Basically I just want to pull all these ISBN numbers from the file.
 
 With http://lxml.de/ you can use xpath:
 
 $ cat computer_books.xml 
 foo
 bar
 datum
 index1/index
 field11/field
 value9780470286975/value
 /datum
 /bar
 /foo
 $ cat read_isbn.py
 from lxml import etree
 
 root = etree.parse(computer_books.xml)
 print root.xpath(//datum[field=11]/value/text())
 $ python read_isbn.py 
 ['9780470286975']
 $ 

And lxml.objectify is also a nice tool for this:

  $ cat example.xml
  items
   item
id108/id
data
 datum
  index1/index
  field2/field
  valueEssential System Administration/value
 /datum
/data
   /item
  /items

  $ python
  Python 2.7.3
   from lxml import objectify
   t = objectify.parse('example.xml')
   for datum in t.iter('datum'):
  ... if datum.field == 2:
  ... print(datum.value)
  ...
  Essential System Administration
  

It's not impossible that this is faster than the XPath version, but that
depends a lot on the data.

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python XML for newbie

2012-07-02 Thread Sean Carolan
 Yes, I personally think that (Mini)DOM should be locked away from beginners
 as far as possible.

Ok, I'm glad to hear that.  I'll continue to work with ElementTree and
lxml and see where it takes me.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Python XML for newbie

2012-07-01 Thread Sean Carolan
I'm trying to parse some XML data (Book titles, ISBN numbers and
descriptions) with Python.  Is there a *simple* way to import an XML
file into a dictionary, list, or other usable data structure?  I've
poked around with minidom, elementtree, and untangle but am not
really understanding how they are supposed to work.

Here's some sample data:

xml
fields
field
nameTitle/name
id2/id
count1/count
type11/type
searchtrue/search
hasnumberfalse/hasnumber
/field

...several more fields, then there are the items...

/fields
items
item
id108/id
data
datum
index1/index
field2/field
valueEssential System Administration/value
/datum

For starters, I'd like to be able to just print out the list of titles
in the XML file, using the correct XML parser.  I don't mind doing
some research or reading on my own, but the official documentation
seems terribly confusing to me.

http://docs.python.org/library/xml.dom.minidom.html

Any pointers?
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python XML for newbie

2012-07-01 Thread Alan Gauld

On 01/07/12 21:49, Sean Carolan wrote:

... Is there a *simple* way to import an XML
file into a dictionary, list, or other usable data structure?


The simplest way using the standard library tools is (IMHO)
elementtree. minidom is a complex beast by comparison,
especially if you are not intimately familiar with
your XML structure.

However hthere are some other add-in packages that are
allegedly much easier still.

But I'd start with the etree tutorial (of which
there are many variations on the web):

The original:
http://effbot.org/zone/element-index.htm

My preference:
http://infohost.nmt.edu/tcc/help/pubs/pylxml/web/index.html

You may not need anything else...

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python XML for newbie

2012-07-01 Thread Sean Carolan
 The simplest way using the standard library tools is (IMHO)
 elementtree. minidom is a complex beast by comparison,
 especially if you are not intimately familiar with
 your XML structure.

Thank you, this is helpful.  Minidom is confusing, even the
documentation confirms this:
The name of the functions are perhaps misleading

 But I'd start with the etree tutorial (of which
 there are many variations on the web):

 The original:
 http://effbot.org/zone/element-index.htm

 My preference:
 http://infohost.nmt.edu/tcc/help/pubs/pylxml/web/index.html

I'm going to work through those and see what I can come up with.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python XML for newbie

2012-07-01 Thread Sean Carolan
 Thank you, this is helpful.  Minidom is confusing, even the
 documentation confirms this:
 The name of the functions are perhaps misleading

 But I'd start with the etree tutorial (of which
 there are many variations on the web):

Ok, so I read through these tutorials and am at least able to print
the XML output now.  I did this:

doc = etree.parse('computer_books.xml')

and then this:

for elem in doc.iter():
print elem.tag, elem.text

Here's the data I'm interested in:

index 1
field 11
value 9780596526740
datum

How do you say, If the field is 11, then print the next value?  The
raw XML looks like this:

datum
index1/index
field11/field
value9780470286975/value
/datum

Basically I just want to pull all these ISBN numbers from the file.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python XML for newbie

2012-07-01 Thread David Kidd
On Mon, Jul 2, 2012 at 12:31 PM, Sean Carolan scaro...@gmail.com wrote:

 How do you say, If the field is 11, then print the next value?  The
 raw XML looks like this:

 datum
 index1/index
 field11/field
 value9780470286975/value
 /datum


Instead of iterating over the whole tree, grab all the datum elements
then retrieve the field child, check the field value, and if '11', then
pull the value value.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor