Re: [python-win32] Parse HTML String only not file

Randy Syring Thu, 17 Jun 2010 14:12:17 -0700

We have had great success with PyQuery for getting API access to XML data:


http://pypi.python.org/pypi/pyquery

--------------------------------------
Randy Syring
Intelicom
502-644-4776

"Whether, then, you eat or drink orwhatever you do, do all to the glory

of God." 1 Cor 10:31



Tim Roberts wrote:

On 6/17/2010 11:09 AM, Mauricio Martinez Garcia wrote:

Hi, how can parse an HTML String.
I need parse next Line :

'<FIELD><NAME>BSCS
status</NAME><TYPE>string</TYPE><VALUE>none</VALUE></FIELD><FIELD><NAME>TopCre_life</NAME><TYPE>integer</TYPE><VALUE>0</VALUE></FIELD>'


That's not HTML.  It's XML.  You CAN parse this with the SGMLParser
(since XML is a variant of SGML), but you might consider whether you
would be better served using xmllib, or even xml.sax.

Result of program its:

bash-3.1$ ./pruebasDOM.py
['BSCS status']
['string']
['none']
['TopCre_life']
['integer']
['0']


I can't pass the data to one dict() or [].  I need all values, ['BSCS
Status', 'string', 'none', 'TopCre_life', 'integer', '0']

That i can do?


Of course.  Just change your ParserHTML class to create a list in "def
__init__", then append the values that you get to the list instead of
printing them.  So, for example:

class ParserHTML(SGMLParser):
    def __init__(self):
        SGMLParser.__init__(self)
        self.results = []
    ...
    def handle_data(self, data):
        ...
        self.results.append(data)
    ...
if __name__ == '__main__':
    ...
    p = ParserHTML()
    p.feed(node)
    print p.results

_______________________________________________
python-win32 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-win32

Re: [python-win32] Parse HTML String only not file

Reply via email to