subject:"Re\: \[Tutor\] how to extract text by specifying an element using ElementTree"

Re: [Tutor] how to extract text by specifying an element using ElementTree

2005-12-21 Thread Kent Johnson

Danny Yoo wrote:
> 
> On Wed, 21 Dec 2005, ps python wrote:
> 
> 
>>Dear drs. Yoo and johnson, Thank you very much for your help. I
>>successully parsed my GO annotation from all 16,000 files.  thanks again
>>for your kind help
> 
> 
> I'm glad to hear that it's working for you now.  Just as a clarification:
> I'm not a doctor.  *grin*  But I do work with bioinformaticians, so I
> recognize the Gene Ontology annotations you are working with.

No doctor here either. But I'll take it as a compliment!

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] how to extract text by specifying an element using ElementTree

2005-12-21 Thread Danny Yoo

On Wed, 21 Dec 2005, ps python wrote:

> Dear drs. Yoo and johnson, Thank you very much for your help. I
> successully parsed my GO annotation from all 16,000 files.  thanks again
> for your kind help

I'm glad to hear that it's working for you now.  Just as a clarification:
I'm not a doctor.  *grin*  But I do work with bioinformaticians, so I
recognize the Gene Ontology annotations you are working with.

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] how to extract text by specifying an element using ElementTree

2005-12-20 Thread ps python

Dear drs. Yoo and johnson, 
Thank you very much for your help. I successully
parsed my GO annotation from all 16,000 files. 
thanks again for your kind help



--- Danny Yoo <[EMAIL PROTECTED]> wrote:

> 
> > >>> for m in mydata.findall('//functions'):
> > print m.get('molecular_class').text
> >
> > >>> for m in mydata.findall('//functions'):
> > print m.find('molecular_class').text.strip()
> >
> > >>> for process in
> > mydata.findall('//biological_process'):
> > print process.get('title').text
> 
> 
> Hello,
> 
> I believe we're running into XML namespace issues. 
> If we look at all the
> tag names in the XML, we can see this:
> 
> ##
> >>> from elementtree import ElementTree
> >>> tree = ElementTree.parse(open('4.xml'))
> >>> for element in tree.getroot()[0]: print
> element.tag
> ...
> {org:hprd:dtd:hprdr2}title
> {org:hprd:dtd:hprdr2}alt_title
> {org:hprd:dtd:hprdr2}alt_title
> {org:hprd:dtd:hprdr2}alt_title
> {org:hprd:dtd:hprdr2}alt_title
> {org:hprd:dtd:hprdr2}alt_title
> {org:hprd:dtd:hprdr2}omim
> {org:hprd:dtd:hprdr2}gene_symbol
> {org:hprd:dtd:hprdr2}gene_map_locus
> {org:hprd:dtd:hprdr2}seq_entry
> {org:hprd:dtd:hprdr2}molecular_weight
> {org:hprd:dtd:hprdr2}entry_sequence
> {org:hprd:dtd:hprdr2}protein_domain_architecture
> {org:hprd:dtd:hprdr2}expressions
> {org:hprd:dtd:hprdr2}functions
> {org:hprd:dtd:hprdr2}cellular_component
> {org:hprd:dtd:hprdr2}interactions
> {org:hprd:dtd:hprdr2}EXTERNAL_LINKS
> {org:hprd:dtd:hprdr2}author
> {org:hprd:dtd:hprdr2}last_updated
> ##
> 
> (I'm just doing a quick view of the toplevel
> elements in the tree.)
> 
> As we can see, each element's tag is being prefixed
> with the namespace URL
> provided in the XML document.  If we look in our XML
> document and search
> for the attribute 'xmlns', we'll see where this
> 'org:hprd:dtd:hprdr2'
> thing comes from.
> 
> 
> So we may need to prepend the namespace to get the
> proper terms:
> 
> ##
> >>> for process in
>
tree.find("//{org:hprd:dtd:hprdr2}biological_processes"):
> ... print
> process.findtext("{org:hprd:dtd:hprdr2}title")
> ...
> Metabolism
> Energy pathways
> ##
> 
> 
> To tell the truth, I don't quite understand how to
> work fluently with XML
> namespaces, so perhaps there's an easier way to do
> what you want.  But the
> examples above should help you get started parsing
> all your Gene Ontology
> annotations.
> 
> 
> 
> Good luck!
> 
> 


Send instant messages to your online friends http://in.messenger.yahoo.com 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] how to extract text by specifying an element using ElementTree

2005-12-20 Thread Danny Yoo


> >>> for m in mydata.findall('//functions'):
>   print m.get('molecular_class').text
>
> >>> for m in mydata.findall('//functions'):
>   print m.find('molecular_class').text.strip()
>
> >>> for process in
> mydata.findall('//biological_process'):
>   print process.get('title').text


Hello,

I believe we're running into XML namespace issues.  If we look at all the
tag names in the XML, we can see this:

##
>>> from elementtree import ElementTree
>>> tree = ElementTree.parse(open('4.xml'))
>>> for element in tree.getroot()[0]: print element.tag
...
{org:hprd:dtd:hprdr2}title
{org:hprd:dtd:hprdr2}alt_title
{org:hprd:dtd:hprdr2}alt_title
{org:hprd:dtd:hprdr2}alt_title
{org:hprd:dtd:hprdr2}alt_title
{org:hprd:dtd:hprdr2}alt_title
{org:hprd:dtd:hprdr2}omim
{org:hprd:dtd:hprdr2}gene_symbol
{org:hprd:dtd:hprdr2}gene_map_locus
{org:hprd:dtd:hprdr2}seq_entry
{org:hprd:dtd:hprdr2}molecular_weight
{org:hprd:dtd:hprdr2}entry_sequence
{org:hprd:dtd:hprdr2}protein_domain_architecture
{org:hprd:dtd:hprdr2}expressions
{org:hprd:dtd:hprdr2}functions
{org:hprd:dtd:hprdr2}cellular_component
{org:hprd:dtd:hprdr2}interactions
{org:hprd:dtd:hprdr2}EXTERNAL_LINKS
{org:hprd:dtd:hprdr2}author
{org:hprd:dtd:hprdr2}last_updated
##

(I'm just doing a quick view of the toplevel elements in the tree.)

As we can see, each element's tag is being prefixed with the namespace URL
provided in the XML document.  If we look in our XML document and search
for the attribute 'xmlns', we'll see where this 'org:hprd:dtd:hprdr2'
thing comes from.


So we may need to prepend the namespace to get the proper terms:

##
>>> for process in tree.find("//{org:hprd:dtd:hprdr2}biological_processes"):
... print process.findtext("{org:hprd:dtd:hprdr2}title")
...
Metabolism
Energy pathways
##


To tell the truth, I don't quite understand how to work fluently with XML
namespaces, so perhaps there's an easier way to do what you want.  But the
examples above should help you get started parsing all your Gene Ontology
annotations.



Good luck!

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] how to extract text by specifying an element using ElementTree

2005-12-20 Thread ps python

Thank you for your email Dr. Johnson. 


I need to print :

gene_symbol (from line
ALDH3A1)
entry_cdna  (from line
NM_000691.3)
molecular_class 
(from line
Enzyme:Dehydrogenase)


title (from tags Catalytic
activity)

title (from tags section 
Metabolism)

title (from tags section 
cytoplasm)


This is how I tried:

from elementtree.ElementTree import ElementTree
mydata = ElementTree(file='4.xml')
>>> for process in
mydata.findall('//biological_process'):
print process.get('title').text


>>> for m in mydata.findall('//functions'):
print m.get('molecular_class').text


>>> for m in mydata.findall('//functions'):
print m.find('molecular_class').text.strip()


>>> for process in
mydata.findall('//biological_process'):
print process.get('title').text


>>> for m in mydata.findall('//functions'):
print m.get('molecular_class').text


>>> for m in mydata.findall('//functions'):
print m.get('title').text.strip()


>>> for m in mydata.findall('//biological_processes'):
  print m.get('title').text.strip()

  
>>> 


Result:
I get nothing.  No error.  I have no clue why it is
not giving me the result. 

I also tried this alternate way:

>>> strdata = """
  Enzyme:
Dehydrogenase
  
 
Catalytic activity
0003824
 
  
  
 
Metabolism
0008152
 
 
Energy pathways
0006091
 
  
"""

>>> from elementtree import ElementTree
>>> tree = ElementTree.fromstring(strdata)
>>> for m in tree.findall('//functions'):
print m.find('molecular_class').text



Traceback (most recent call last):
  File "", line 1, in -toplevel-
for m in tree.findall('//functions'):
  File
"C:\Python23\Lib\site-packages\elementtree\ElementTree.py",
line 352, in findall
return ElementPath.findall(self, path)
  File
"C:\Python23\Lib\site-packages\elementtree\ElementPath.py",
line 195, in findall
return _compile(path).findall(element)
  File
"C:\Python23\Lib\site-packages\elementtree\ElementPath.py",
line 173, in _compile
p = Path(path)
  File
"C:\Python23\Lib\site-packages\elementtree\ElementPath.py",
line 74, in __init__
raise SyntaxError("cannot use absolute path on
element")
SyntaxError: cannot use absolute path on element
>>> for m in tree.findall('functions'):
print m.find('molecular_class').text


>>> for m in tree.findall('functions'):
print m.find('molecular_class').text.strip()


>>> for m in tree.findall('functions'):
print m.get('molecular_class').text




Do you thing it is a problem with the XML files
instead. 

Thank you for valuable suggestions. 

kind regards, 
M




--- Kent Johnson <[EMAIL PROTECTED]> wrote:

> ps python wrote:
> > Dear Drs. Johnson and Yoo , 
> >  for the last 1 week I have been working on
> parsing
> > the elements from a bunch of XML files following
> your
> > suggestions. 
> > 
> > from elementtree.ElementTree import ElementTree
> > 
> mydata = ElementTree(file='4.xml')
> for process in
> > 
> > mydata.findall('//biological_process'):
> > print process.text
> 
> Looking at the data, neither 
> nor  elements directly
> contain text, they have children that contain text.
> Try
>print process.get('title').text
> to print the title.
> 
> for proc in mydata.findall('functions'):
> > print proc
> 
> I think you want findall('//functions') to find
>  at any depth in the tree.
> 
> If this doesn't work please show the results you get
> and tell us what you expect.
> 
> Kent
> 
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
> 


Send instant messages to your online friends http://in.messenger.yahoo.com 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] how to extract text by specifying an element using ElementTree

2005-12-20 Thread Kent Johnson

ps python wrote:
> Dear Drs. Johnson and Yoo , 
>  for the last 1 week I have been working on parsing
> the elements from a bunch of XML files following your
> suggestions. 
> 
> from elementtree.ElementTree import ElementTree
> 
mydata = ElementTree(file='4.xml')
for process in
> 
> mydata.findall('//biological_process'):
>   print process.text

Looking at the data, neither  nor  elements 
directly
contain text, they have children that contain text. Try
   print process.get('title').text
to print the title.

for proc in mydata.findall('functions'):
>   print proc

I think you want findall('//functions') to find  at any depth in the 
tree.

If this doesn't work please show the results you get and tell us what you 
expect.

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] how to extract text by specifying an element using ElementTree

2005-12-20 Thread ps python

Dear Drs. Johnson and Yoo , 
 for the last 1 week I have been working on parsing
the elements from a bunch of XML files following your
suggestions. 

until now I have been unsuccessul.  I have no clue why
i am failing. 

I have ~16K XML files. this data obtained from johns
hopkins university (of course these are public data
and is allowed to use for teaching and non-commercial
purposes). 


from elementtree.ElementTree import ElementTree
>>> mydata = ElementTree(file='4.xml')
>>> for process in
mydata.findall('//biological_process'):
print process.text


>>> for proc in mydata.findall('functions'):
print proc


>>> 



I do not understand why I am unable to parse this
file. I questioned if this file is not well structures
(well formedness). I feel it is properly structured
and yet it us unparsable.  


Would you please help me /guide me what the problem
is.  Apologies if i am completely ignoring somethings.
 

PS: Attached is the XML file that I am using. 

--- Kent Johnson <[EMAIL PROTECTED]> wrote:

> ps python wrote:
> >  Kent and Dany, 
> > Thanks for your replies.  
> > 
> > Here fromstring() assuming that the input is in a
> kind
> > of text format. 
> 
> Right, that is for the sake of a simple example.
> > 
> > what should be the case when I am reading files
> > directly. 
> > 
> > I am using the following :
> > 
> > from elementtree.ElementTree import ElementTree
> > mydata = ElementTree(file='1.xml')
> > iter = root.getiterator()
> > 
> > Here the whole XML document is loaded as element
> tree
> > and how should this iter into a format where I can
> > apply findall() method. 
> 
> Call findall() directly on mydata, e.g.
> for process in
> mydata.findall('//biological_process'):
>print process.text
> 
> The path //biological_process means find any
> biological_process element 
> at any depth from the root element.
> 
> Kent
> 
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
> 

Send instant messages to your online friends http://in.messenger.yahoo.com 


  
   Aldehyde dehydrogenase 3

Aldehyde dehydrogenase family 3 subfamily A, member 1


ALDH3


Acetaldehyde dehydrogenase 3


ALDH, Stomach type


ALDHIII

100660 
ALDH3A1

  17p11.2
  
  7774944
  


  NM_000691.3
  NP_000682.3

50398


  ccaggagccc cagttaccgg gagaggctgt
gtcaaaggcg ccatgagcaa gatcagcgag
gccgtgaagc gcgcccgcgc cgccttcagc
tcgggcagga cccgtccgct gcagttccgg
atccagcagc tggaggcgct gcagcgcctg
atccaggagc aggagcagga gctggtgggc
gcgctggccg cagacctgca caagaatgaa
tggaacgcct actatgagga ggtggtgtac
gtcctagagg agatcgagta catgatccag
aagctccctg agtgggccgc ggatgagccc
gtggagaaga cgagac tcagcaggac
gagctctaca tccactcgga gccactgggc
gtggtcctcg tcattggcac ctggaactac
cccttcaacc tcaccatcca gcccatggtg
ggcgccatcg ctgcagggaa ctcagtggtc
ctcaagccct cggagctgag tgagaacatg
gcgagcctgc tggctaccat catcag
tacctggaca aggatctgta cccagtaatc
aatgtg tccctgagac cacggagctg
ctcaaggaga ggttcgacca tatcctgtac
acgggcagca cgtggg gaagatcatc
atgacggctg ctgccaagca cctgat
gtcacgctgg agctgggagg gaagagtccc
tgctacgtgg acaagaactg tgacctggac
gtggcctgcc gacgcatcgc ctgaaa
ttcatgaaca gtggccagac ctgcgtggcc
cctgactaca tcctctgtga tcgatc
cagaaccaaa ttgtggagaa gctcaagaag
tcactgaaag agttctacgg ggaagatgct
aagaaatccc gggactatgg aagaatcatt
agtgcccggc acttccagag ggtgatgggc
ctgattgagg gccagaaggt ggcttatggg
ggcacc atgccgccac tcgctacata
gcacca tcctcacgga cgtgga
cagtgg tgatgcaaga ggagatcttc
gggcctgtgc tgcccatcgt gtgcgtgcgc
agcctggagg aggccatcca gttcatcaac
cagcgtgaga agtggc cctctacatg
ttctccagca acgacaaggt gattaagaag
atgattgcag agacatccag tggttg
gcggccaacg atgtcatcgt ccacatcacc
ttgcactctc tgcccttcgg gggcgt
aacagcggca tgggatccta ccatggcaag
aagagcttcg agactttctc tcaccgccgc
tcttgcctgg tgaggcctct gatgaatgat
gaaggcctga aggtcagata ccgagc
ccggccaaga tgacccagca ctgaggaggg
gttgctccgc ctggcctggc catactgtgt
cccatcggag tgcggaccac cctcactggc
tctcctggcc ctgggagaat cgctcctgca
gagccc agactc ctctgctgac
ctgctgacct gtgcacaccc cactcccaca
tgggcccagg cctcaccatt ccaagtctcc
atttct agaccaataa agagacgaat
acaact aactcagcaa aa
aa aa aa
aa aa aa
aa aa

  
  
  
  mskiseavkr araafssgrt rplqfriqql
ealqrliqeq eqelvgalaa dlhknewnay
yeevvyvlee ieymiqklpe waadepvekt
pqtqqdelyi hseplgvvlv igtwnypfnl
tiqpmvgaia agnsvvlkps elsenmasll
atiipqyldk dlypvinggv pettellker
fdhilytgst gvgkiimtaa akhltpvtle
lggkspcyvd kncdldvacr riawgkfmns
gqtcvapdyi lcdpsiqnqi veklkkslke
fygedakksr dygriisarh fqrvmglieg
qkvayggtgd aatryiapti ltdvdpqspv
mqeeifgpvl pivcvrslee aiqfinqrek
plalymfssn dkvikkmiae tssggvaand
vivhitlhsl pfggvgnsgm gsyhgkksfe
tfshrrsclv rplmndeglk vryppspakm
tqh



  
CC

Re: [Tutor] how to extract text by specifying an element using ElementTree

2005-12-09 Thread Kent Johnson

Srinivas Iyyer wrote:
> Hi group,
>   I just have another question in parsin XML files. I
> found it very easy to parse XML files with kent and
> danny's help. 
> 
> I realized that all my XML files have '\t' and '\n'
> and whitespace.  these extra features are making to
> extract the text data from the xml files very
> difficult.  I can make these XML parser work when I
> rekove '\n' and '\t' from xml files. 
> 
> is there a way to get rid of '\n' and '\t' characters
> from xml files easily. 

Did you see how I did this in my original example? I called strip() on 
the text part of the element. This removes leading and trailing 
whitespace. Is that what you need?

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] how to extract text by specifying an element using ElementTree

2005-12-09 Thread Srinivas Iyyer

Hi group,
  I just have another question in parsin XML files. I
found it very easy to parse XML files with kent and
danny's help. 

I realized that all my XML files have '\t' and '\n'
and whitespace.  these extra features are making to
extract the text data from the xml files very
difficult.  I can make these XML parser work when I
rekove '\n' and '\t' from xml files. 

is there a way to get rid of '\n' and '\t' characters
from xml files easily. 
thank you very much.
MDan

--- Kent Johnson <[EMAIL PROTECTED]> wrote:

> ps python wrote:
> >  Kent and Dany, 
> > Thanks for your replies.  
> > 
> > Here fromstring() assuming that the input is in a
> kind
> > of text format. 
> 
> Right, that is for the sake of a simple example.
> > 
> > what should be the case when I am reading files
> > directly. 
> > 
> > I am using the following :
> > 
> > from elementtree.ElementTree import ElementTree
> > mydata = ElementTree(file='1.xml')
> > iter = root.getiterator()
> > 
> > Here the whole XML document is loaded as element
> tree
> > and how should this iter into a format where I can
> > apply findall() method. 
> 
> Call findall() directly on mydata, e.g.
> for process in
> mydata.findall('//biological_process'):
>print process.text
> 
> The path //biological_process means find any
> biological_process element 
> at any depth from the root element.
> 
> Kent
> 
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
> 

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] how to extract text by specifying an element using ElementTree

2005-12-08 Thread Kent Johnson

ps python wrote:
>  Kent and Dany, 
> Thanks for your replies.  
> 
> Here fromstring() assuming that the input is in a kind
> of text format. 

Right, that is for the sake of a simple example.
> 
> what should be the case when I am reading files
> directly. 
> 
> I am using the following :
> 
> from elementtree.ElementTree import ElementTree
> mydata = ElementTree(file='1.xml')
> iter = root.getiterator()
> 
> Here the whole XML document is loaded as element tree
> and how should this iter into a format where I can
> apply findall() method. 

Call findall() directly on mydata, e.g.
for process in mydata.findall('//biological_process'):
   print process.text

The path //biological_process means find any biological_process element 
at any depth from the root element.

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] how to extract text by specifying an element using ElementTree

2005-12-08 Thread ps python

 Kent and Dany, 
Thanks for your replies.  

Here fromstring() assuming that the input is in a kind
of text format. 

what should be the case when I am reading files
directly. 

I am using the following :

from elementtree.ElementTree import ElementTree
mydata = ElementTree(file='1.xml')
iter = root.getiterator()

Here the whole XML document is loaded as element tree
and how should this iter into a format where I can
apply findall() method. 

thanks
mdan



--- Kent Johnson <[EMAIL PROTECTED]> wrote:

> ps python wrote:
> > Hi, 
> > 
> > using ElementTree, how can I extract text of a
> > particular element, or a child node. 
> > 
> > For example:
> > 
> > 
> >
> >Signal transduction
> >
> >
> >Energy process
> > 
> > 
> > 
> > In the case where I already know which element
> tags
> > have the information that I need, in such case how
> do
> > i get that specific text. 
> 
> Use find() to get the nodes of interest. The text
> attribute of the node 
> contains the text. For example:
> 
> data = '''
> 
> Signal transduction
> 
> 
> Energy process
>  
> 
> '''
> 
> from elementtree import ElementTree
> 
> tree = ElementTree.fromstring(data)
> 
> for process in tree.findall('biological_process'):
>print process.text.strip()
> 
> 
> prints
> Signal transduction
> Energy process
> 
> You will have to modify the path in the findall to
> match your actual 
> data, assuming what you have shown is just a
> snippet.
> 
> I stripped whitespace from the text because
> otherwise it includes the 
> newlines and indents exactly as in the original.
> 
> Kent
> 
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
> 




__ 
Yahoo! India Matrimony: Find your partner now. Go to http://yahoo.shaadi.com
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] how to extract text by specifying an element using ElementTree

2005-12-08 Thread Kent Johnson

ps python wrote:
> Hi, 
> 
> using ElementTree, how can I extract text of a
> particular element, or a child node. 
> 
> For example:
> 
> 
>
>Signal transduction
>
>
>Energy process
> 
> 
> 
> In the case where I already know which element tags
> have the information that I need, in such case how do
> i get that specific text. 

Use find() to get the nodes of interest. The text attribute of the node 
contains the text. For example:

data = '''

Signal transduction

Energy process

'''

from elementtree import ElementTree

tree = ElementTree.fromstring(data)

for process in tree.findall('biological_process'):
   print process.text.strip()

prints
Signal transduction
Energy process

You will have to modify the path in the findall to match your actual 
data, assuming what you have shown is just a snippet.

I stripped whitespace from the text because otherwise it includes the 
newlines and indents exactly as in the original.

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] how to extract text by specifying an element using ElementTree

2005-12-08 Thread Danny Yoo



> For example:
>
> 
>
>Signal transduction
>
>
>Energy process
> 
> 
>
> I looked at some tutorials (eg. Ogbuji).  Those
> examples described to extract all text of nodes and
> child nodes.

Hi Mdan,

The following might help:

http://article.gmane.org/gmane.comp.python.tutor/24986
http://mail.python.org/pipermail/tutor/2005-December/043817.html

The second post shows how we can use the findtext() method from an
ElementTree.

Here's another example that demonstrates how we can treat elements as
sequences of their subelements:

##
from elementtree import ElementTree
from StringIO import StringIO

text = """


skywalker
luke


valentine
faye


reynolds
mal


"""

people = ElementTree.fromstring(text)
for person in people:
print "here's a person:",
print person.findtext("firstName"), person.findtext('lastName')
##


Does this make sense?  The API allows us to treat an element as a sequence
that we can march across, and the loop above marches across every person
subelement in people.


Another way we could have written the loop above would be:

###
>>> for person in people.findall('person'):
... print person.find('firstName').text,
... print person.find('lastName').text
...
luke skywalker
faye valentine
mal reynolds
###


Or we might go a little funkier, and just get the first names anywhere in
people:

###
>>> for firstName in people.findall('.//firstName'):
... print firstName.text
...
luke
faye
mal
###

where the subelement "tag" that we're giving findall is really an
XPath-query.  ".//firstName" is an query in XPath format that says "Give
me all the firstName elements anywhere within the current element."


The documentation in:

http://effbot.org/zone/element.htm#searching-for-subelements

should also be helpful.


If you have more questions, please feel free to ask.

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] how to extract text by specifying an element using ElementTree

Re: [Tutor] how to extract text by specifying an element using ElementTree

Re: [Tutor] how to extract text by specifying an element using ElementTree

Re: [Tutor] how to extract text by specifying an element using ElementTree

Re: [Tutor] how to extract text by specifying an element using ElementTree

Re: [Tutor] how to extract text by specifying an element using ElementTree

Re: [Tutor] how to extract text by specifying an element using ElementTree

Re: [Tutor] how to extract text by specifying an element using ElementTree

Re: [Tutor] how to extract text by specifying an element using ElementTree

Re: [Tutor] how to extract text by specifying an element using ElementTree

Re: [Tutor] how to extract text by specifying an element using ElementTree

Re: [Tutor] how to extract text by specifying an element using ElementTree

Re: [Tutor] how to extract text by specifying an element using ElementTree

13 matches

Site Navigation

Mail list logo

Footer information