Re: Splitting SAX results

2007-06-20 Thread Stefan Behnel
Gabriel Genellina wrote:
> Forget about SAX. Use ElementTree instead
> ElementTree is infinitely more flexible and easier to use.
> See 

That's what I told him/her already :)

Rephrasing a famous word:

Being faced with an XML problem, you might think "Ok, I'll just use SAX". And
now you have two problems.

SAX is a great way to hide your real problems behind a wall of unreadable
code. If you want my opinion, lxml is currently the straightest way to get XML
work done in Python.

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Splitting SAX results

2007-06-12 Thread Gabriel Genellina
En Tue, 12 Jun 2007 16:16:45 -0300, IamIan <[EMAIL PROTECTED]> escribió:

> I do know how split works, but thank you for the response. The end
> result that I want is a dictionary made up of the title results coming
> through SAX, looking like {'Title1: Description',
> 'Title2:Description'}.
>
> The XML data looks like:
> 
> Title1:Description
> Link
> Desc
> Author
> Date
> 
> 
> Title2:Description
> Link
> Desc
> Author
> Date
> 
>
> I've tried different approaches, a couple of which I've added to the
> code below (only running one option at a time):

Forget about SAX. Use ElementTree instead

py> import xml.etree.cElementTree as ET
py> f = open("x.xml","r")
py> tree = ET.parse(f)
py> for item in tree.getiterator('item'):
...   print item.findtext('title')
...
Title1:Description
Title2:Description

ElementTree is infinitely more flexible and easier to use.
See 

-- 
Gabriel Genellina

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Splitting SAX results

2007-06-12 Thread IamIan
I do know how split works, but thank you for the response. The end
result that I want is a dictionary made up of the title results coming
through SAX, looking like {'Title1: Description',
'Title2:Description'}.

The XML data looks like:

Title1:Description
Link
Desc
Author
Date


Title2:Description
Link
Desc
Author
Date


I've tried different approaches, a couple of which I've added to the
code below (only running one option at a time):

from xml.sax import make_parser
from xml.sax.handler import ContentHandler

tracker = [] # Option 1
tracker = {} # Option 2

class reportHandler(ContentHandler):

  def __init__(self):
self.isReport = 0

  def startElement(self, name, attrs):
if name == 'title':
  self.isReport = 1
  self.reportText = ''

  def characters(self, ch):
if self.isReport:
  self.reportText += ch
  tracker.append(ch) # Option 1
  key, value = ch.split (':') # Option 2
  tracker[key] = value

  def endElement(self, name):
if name == 'title':
  self.isReport = 0
  print self.reportText

parser = make_parser()
parser.setContentHandler(reportHandler())
parser.parse('http://www.some.com/rss/')

print tracker


Option 1 returns a list with the markup included, looking like:
[u'Title1:", u'\n', u'Description ', u'\n', u'\t\t\t', u'Title2:',
u'\n', u'Description ', u'\n', u'\t\t\t', etc]

Option 2 fails with the traceback:
File "C:\test.py", line 21, in characters
key, value = ch.split(':')
ValueError: need more than 1 value to unpack

Thank you for the help!

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Splitting SAX results

2007-06-08 Thread Jerry Hill
On 6/8/07, IamIan <[EMAIL PROTECTED]> wrote:
> Well SAX isn't the problem... maybe I should repost this with a
> different title. The SAX part works just as I want, but the results I
> get back need to be manipulated. No matter what I try I can't split a
> result like 'Title 1:Description' on the colon without getting an
> IndexError. Ideas anyone?

I don't think you've showed us any examples of the code you're having
trouble with.  I don't see anything in your original post that tries
to split strings. If you just want to know how split works, here's an
example:

>>> t = 'Title1:Description'
>>> key, value = t.split(':')
>>> print key
Title1
>>> print value
Description
>>>

If that doesn't help, show us a sample of some of the data you're
working with, what you've
tried so far, and what the end result is supposed to look like.

-- 
Jerry
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Splitting SAX results

2007-06-08 Thread IamIan
Well SAX isn't the problem... maybe I should repost this with a
different title. The SAX part works just as I want, but the results I
get back need to be manipulated. No matter what I try I can't split a
result like 'Title 1:Description' on the colon without getting an
IndexError. Ideas anyone?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Splitting SAX results

2007-06-06 Thread Stefan Behnel
IamIan wrote:
> I have a very simple SAX script from which I get results like
> 'Title1:Description','Title2:Description'. I want to split each result
> on the colon, using the two resulting elements as key/value pairs in a
> dictionary. I've tried a couple different approaches with lists etc,
> but I keep getting an 'IndexError: list index out of range' when I go
> to split the results. Probably an easy fix but it's my first hack at
> SAX/XML. Thank you!

Sounds like a problem with the data to me rather than SAX.

However, SAX tends to make things much more complex than necessary, so you
loose the sight on the real problems. Try a library like ElementTree or lxml
to make your life easier. You might especially like lxml.objectify.

http://effbot.org/zone/element.htm
http://effbot.org/zone/element-iterparse.htm

http://codespeak.net/lxml/dev/
http://codespeak.net/lxml/dev/objectify.html

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list


Splitting SAX results

2007-06-06 Thread IamIan
Hi list,

I have a very simple SAX script from which I get results like
'Title1:Description','Title2:Description'. I want to split each result
on the colon, using the two resulting elements as key/value pairs in a
dictionary. I've tried a couple different approaches with lists etc,
but I keep getting an 'IndexError: list index out of range' when I go
to split the results. Probably an easy fix but it's my first hack at
SAX/XML. Thank you!

from xml.sax import make_parser
from xml.sax.handler import ContentHandler

class reportHandler(ContentHandler):
  def __init__(self):
self.isReport = 0

  def startElement(self, name, attrs):
if name == 'title':
  self.isReport = 1
  self.reportText = ''

  def characters(self, ch):
if self.isReport:
  self.reportText += ch

  def endElement(self, name):
if name == 'title':
  self.isReport = 0
  print self.reportText

parser = make_parser()
parser.setContentHandler(reportHandler())
parser.parse('http://www.some.com/rss/')

-- 
http://mail.python.org/mailman/listinfo/python-list