Re: Parsing an XML feed using ElementTree

2011-05-25 Thread Sithembewena Lloyd Dube
P.S: I was aware that I posted a non-django question: I just took the chance
that someone here may have needed to do the same.

Thanks!

On Wed, May 25, 2011 at 2:35 PM, Sithembewena Lloyd Dube
wrote:

> Hi Everyone,
>
> Thanks for all your suggestions. I read up on gzip and urllib and also
> learned in the process that I could use urllib2 as its the latest form of
> that library.
>
> Herewith my solution: I don't know how elegant it is, but it works just
> fine.
>
> def get_contests():
>  url = '
> http://xml.matchbook.com/xmlfeed/feed?sport-id==TEST==Po
> '
>  req = urllib2.Request(url)
>  req.add_header('accept-encoding','gzip/deflate')
>  opener = urllib2.build_opener()
>  response = opener.open(req)
>  compressed_data = response.read()
>  compressed_stream = StringIO.StringIO(compressed_data)
>  gzipper = gzip.GzipFile(fileobj=compressed_stream)
>  data = gzipper.read()
>  current_path = os.path.realpath(MEDIA_ROOT + '/xml-files/d.xml')
>  data_file = open(current_path, 'w')
>  data_file.write(data)
>  data_file.close()
>  xml_data = ET.parse(open(current_path, 'r'))
>  contest_list = []
>  for contest_parent_node in xml_data.getiterator('contest'):
>   contest = Contest()
>   for contest_child_node in contest_parent_node:
>if (contest_child_node.tag == "name" and
> contest_child_node.text is not None and contest_child_node.text != ""):
> contest.name = contest_child_node.text
>if (contest_child_node.tag == "league" and
> contest_child_node.text is not None and contest_child_node.text != ""):
>contest.league = contest_child_node.text
>if (contest_child_node.tag == "acro" and
> contest_child_node.text is not None and contest_child_node.text != ""):
>contest.acro = contest_child_node.text
>if (contest_child_node.tag == "time" and
> contest_child_node.text is not None and contest_child_node.text != ""):
>contest.time = contest_child_node.text
>if (contest_child_node.tag == "home" and
> contest_child_node.text is not None and contest_child_node.text != ""):
>contest.home = contest_child_node.text
>if (contest_child_node.tag == "away" and
> contest_child_node.text is not None and contest_child_node.text != ""):
>contest.away = contest_child_node.text
>   contest_list.append(contest)
>  try:
>   os.remove(current_path)
>  except:
>   pass
>  return contest_list
>
> Many thanks!
>
>
>
>
>
> On Tue, May 24, 2011 at 10:26 PM, Brian Bouterse wrote:
>
>> We all have our opinions.  Either way this conversation is OT from Django.
>>
>>
>> On Tue, May 24, 2011 at 4:07 PM, Masklinn  wrote:
>>
>>> On 2011-05-24, at 21:57 , Brian Bouterse wrote:
>>> > +1 for xpath
>>> >
>>> > I also like using
>>> > xml.dom.minidom>> >since
>>> > it is so simple and straightforward.
>>> >
>>> I'm sorry, but I whole-heartedly disagree with this. ElementTree is
>>> orders of magnitude simpler and more straightforward than the unending pain
>>> of working with the DOM interface.
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Django users" group.
>>> To post to this group, send email to django-users@googlegroups.com.
>>> To unsubscribe from this group, send email to
>>> django-users+unsubscr...@googlegroups.com.
>>> For more options, visit this group at
>>> http://groups.google.com/group/django-users?hl=en.
>>>
>>>
>>
>>
>> --
>> Brian Bouterse
>> ITng Services
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "Django users" group.
>> To post to this group, send email to django-users@googlegroups.com.
>> To unsubscribe from this group, send email to
>> django-users+unsubscr...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/django-users?hl=en.
>>
>
>
>
> --
> Regards,
> Sithembewena Lloyd Dube
>



-- 
Regards,
Sithembewena Lloyd Dube

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Parsing an XML feed using ElementTree

2011-05-25 Thread Sithembewena Lloyd Dube
Hi Everyone,

Thanks for all your suggestions. I read up on gzip and urllib and also
learned in the process that I could use urllib2 as its the latest form of
that library.

Herewith my solution: I don't know how elegant it is, but it works just
fine.

def get_contests():
 url = '
http://xml.matchbook.com/xmlfeed/feed?sport-id==TEST==Po
'
 req = urllib2.Request(url)
 req.add_header('accept-encoding','gzip/deflate')
 opener = urllib2.build_opener()
 response = opener.open(req)
 compressed_data = response.read()
 compressed_stream = StringIO.StringIO(compressed_data)
 gzipper = gzip.GzipFile(fileobj=compressed_stream)
 data = gzipper.read()
 current_path = os.path.realpath(MEDIA_ROOT + '/xml-files/d.xml')
 data_file = open(current_path, 'w')
 data_file.write(data)
 data_file.close()
 xml_data = ET.parse(open(current_path, 'r'))
 contest_list = []
 for contest_parent_node in xml_data.getiterator('contest'):
  contest = Contest()
  for contest_child_node in contest_parent_node:
   if (contest_child_node.tag == "name" and
contest_child_node.text is not None and contest_child_node.text != ""):
contest.name = contest_child_node.text
   if (contest_child_node.tag == "league" and
contest_child_node.text is not None and contest_child_node.text != ""):
   contest.league = contest_child_node.text
   if (contest_child_node.tag == "acro" and
contest_child_node.text is not None and contest_child_node.text != ""):
   contest.acro = contest_child_node.text
   if (contest_child_node.tag == "time" and
contest_child_node.text is not None and contest_child_node.text != ""):
   contest.time = contest_child_node.text
   if (contest_child_node.tag == "home" and
contest_child_node.text is not None and contest_child_node.text != ""):
   contest.home = contest_child_node.text
   if (contest_child_node.tag == "away" and
contest_child_node.text is not None and contest_child_node.text != ""):
   contest.away = contest_child_node.text
  contest_list.append(contest)
 try:
  os.remove(current_path)
 except:
  pass
 return contest_list

Many thanks!




On Tue, May 24, 2011 at 10:26 PM, Brian Bouterse  wrote:

> We all have our opinions.  Either way this conversation is OT from Django.
>
>
> On Tue, May 24, 2011 at 4:07 PM, Masklinn  wrote:
>
>> On 2011-05-24, at 21:57 , Brian Bouterse wrote:
>> > +1 for xpath
>> >
>> > I also like using
>> > xml.dom.minidom> >since
>> > it is so simple and straightforward.
>> >
>> I'm sorry, but I whole-heartedly disagree with this. ElementTree is orders
>> of magnitude simpler and more straightforward than the unending pain of
>> working with the DOM interface.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Django users" group.
>> To post to this group, send email to django-users@googlegroups.com.
>> To unsubscribe from this group, send email to
>> django-users+unsubscr...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/django-users?hl=en.
>>
>>
>
>
> --
> Brian Bouterse
> ITng Services
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-users@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
>



-- 
Regards,
Sithembewena Lloyd Dube

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Parsing an XML feed using ElementTree

2011-05-24 Thread Brian Bouterse
We all have our opinions.  Either way this conversation is OT from Django.

On Tue, May 24, 2011 at 4:07 PM, Masklinn  wrote:

> On 2011-05-24, at 21:57 , Brian Bouterse wrote:
> > +1 for xpath
> >
> > I also like using
> > xml.dom.minidom >since
> > it is so simple and straightforward.
> >
> I'm sorry, but I whole-heartedly disagree with this. ElementTree is orders
> of magnitude simpler and more straightforward than the unending pain of
> working with the DOM interface.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-users@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
>
>


-- 
Brian Bouterse
ITng Services

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Parsing an XML feed using ElementTree

2011-05-24 Thread Masklinn
On 2011-05-24, at 21:57 , Brian Bouterse wrote:
> +1 for xpath
> 
> I also like using
> xml.dom.minidomsince
> it is so simple and straightforward.
> 
I'm sorry, but I whole-heartedly disagree with this. ElementTree is orders of 
magnitude simpler and more straightforward than the unending pain of working 
with the DOM interface.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Parsing an XML feed using ElementTree

2011-05-24 Thread Brian Bouterse
+1 for xpath

I also like using
xml.dom.minidomsince
it is so simple and straightforward.

If you XML is poorly formed go with beautiful
soup
.

Brian


2011/5/24 Тимур Зарипов 

> I'd really reallly suggest you to use lxml library  for
> xml parsing -- it has xpath in it.
>
> On Tue, May 24, 2011 at 2:13 PM, Sithembewena Lloyd Dube <
> zebr...@gmail.com> wrote:
>
>> Hi Everyone,
>>
>> I am trying to parse an XML feed and display the text of each child node
>> without any success. My code in the python shell is as follows:
>>
>> >>>import urllib
>> >>>from xml.etree import ElementTree as ET
>>
>> >>>content = urllib.urlopen('
>> http://xml.matchbook.com/xmlfeed/feed?sport-id==TEST==Po
>> ')
>> >>>xml_content = ET.parse(content)
>>
>> I then check the xml_content object as follows:
>>
>> >>>xml_content
>> 
>>
>> And now, to iterate through its child nodes and print out the text of each
>> node:
>>
>> >>>for node in xml_content.getiterator('contest'):
>> ...name = node.attrib.get('text')
>> ...print name
>> ...
>> >>>
>>
>> Nothing is printed, even though the document does have 'contest' tags with
>> text in them. If I try to count the contest tags and increment an integer
>> (to see that the document is traversed) I get the same result - the int
>> remains at 0.
>>
>> >>> i = 0
>> >>> for node in xml_content.getiterator('contest'):
>> ... i += 1
>> ...
>> >>> i
>> 0
>>
>> What am I getting wrong? Any hints would be appreciated.
>>
>> --
>> Regards,
>> Sithembewena Lloyd Dube
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Django users" group.
>> To post to this group, send email to django-users@googlegroups.com.
>> To unsubscribe from this group, send email to
>> django-users+unsubscr...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/django-users?hl=en.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-users@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
>



-- 
Brian Bouterse
ITng Services

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Parsing an XML feed using ElementTree

2011-05-24 Thread Тимур Зарипов
I'd really reallly suggest you to use lxml library  for xml
parsing -- it has xpath in it.

On Tue, May 24, 2011 at 2:13 PM, Sithembewena Lloyd Dube
wrote:

> Hi Everyone,
>
> I am trying to parse an XML feed and display the text of each child node
> without any success. My code in the python shell is as follows:
>
> >>>import urllib
> >>>from xml.etree import ElementTree as ET
>
> >>>content = urllib.urlopen('
> http://xml.matchbook.com/xmlfeed/feed?sport-id==TEST==Po
> ')
> >>>xml_content = ET.parse(content)
>
> I then check the xml_content object as follows:
>
> >>>xml_content
> 
>
> And now, to iterate through its child nodes and print out the text of each
> node:
>
> >>>for node in xml_content.getiterator('contest'):
> ...name = node.attrib.get('text')
> ...print name
> ...
> >>>
>
> Nothing is printed, even though the document does have 'contest' tags with
> text in them. If I try to count the contest tags and increment an integer
> (to see that the document is traversed) I get the same result - the int
> remains at 0.
>
> >>> i = 0
> >>> for node in xml_content.getiterator('contest'):
> ... i += 1
> ...
> >>> i
> 0
>
> What am I getting wrong? Any hints would be appreciated.
>
> --
> Regards,
> Sithembewena Lloyd Dube
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-users@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Parsing an XML feed using ElementTree

2011-05-24 Thread Daniel Roseman


On Tuesday, May 24, 2011 11:13:31 AM UTC+1, Lloyd Dube wrote:
>
> Hi Everyone,
>
> I am trying to parse an XML feed and display the text of each child node 
> without any success. My code in the python shell is as follows:
>
> >>>import urllib
> >>>from xml.etree import ElementTree as ET
>
> >>>content = urllib.urlopen('
> http://xml.matchbook.com/xmlfeed/feed?sport-id==TEST==Po
> ')
> >>>xml_content = ET.parse(content)
>
> I then check the xml_content object as follows:
>
> >>>xml_content
> 
>
> And now, to iterate through its child nodes and print out the text of each 
> node:
>
> >>>for node in xml_content.getiterator('contest'):
> ...name = node.attrib.get('text')
> ...print name
> ...
> >>>
>
> Nothing is printed, even though the document does have 'contest' tags with 
> text in them. If I try to count the contest tags and increment an integer 
> (to see that the document is traversed) I get the same result - the int 
> remains at 0.
>
> >>> i = 0
> >>> for node in xml_content.getiterator('contest'):
> ... i += 1
> ...
> >>> i
> 0
>
> What am I getting wrong? Any hints would be appreciated.
>
> -- 
> Regards,
> Sithembewena Lloyd Dube


This isn't really a Django question...

Nevertheless, the issue is probably in the line "name = 
node.attrib.get('text')". What this does is get the attribute of the current 
node that has the name 'text' - ie if your XML was like this:



However, what you probably have is this:

foo

in which case you just want to access the `text` property directly:

name = node.text

--
DR.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.