Re: expat parsing error
On 2 Ιούν, 03:47, John Machin sjmac...@lexicon.net wrote: On Jun 2, 1:57 am, kak...@gmail.com kak...@gmail.com wrote: On Jun 1, 11:12 am, kak...@gmail.com kak...@gmail.com wrote: On Jun 1, 11:09 am, John Bokma j...@castleamber.com wrote: kak...@gmail.com kak...@gmail.com writes: On Jun 1, 10:34 am, Stefan Behnel stefan...@behnel.de wrote: kak...@gmail.com, 01.06.2010 16:00: how can i fix it, how to ignore the headers and parse only the XML? Consider reading the answers you got in the last thread that you opened with exactly this question. Stefan That's exactly, what i did but something seems to not working with the solutions i had, when i changed my implementation from pure Python's sockets to twisted library! That's the reason i have created a new post! Any ideas why this happened? As I already explained: if you send your headers as well to any XML parser it will choke on those, because the headers are /not/ valid / well-formed XML. The solution is to remove the headers from your data. As I explained before: headers are followed by one empty line. Just remove lines up and until including the empty line, and pass the data to any XML parser. -- John Bokma j3b Hacking Hiking in Mexico - http://johnbokma.com/http://castleamber.com/-PerlPython Development Thank you so much i'll try it! Antonis Dear John can you provide me a simple working solution? I don't seem to get it You're not wrong. Trysomething like this: rubbish1, rubbish2, xml = your_guff.partition('\n\n') Ok thanks a lot! Antonis -- http://mail.python.org/mailman/listinfo/python-list
Re: expat parsing error
kak...@gmail.com kak...@gmail.com writes: I got the following error --- exception caught here --- File /usr/lib/python2.6/site-packages/Twisted-10.0.0-py2.6-linux- x86_64.egg/twisted/internet/selectreactor.py, line 146, in _doReadOrWrite why = getattr(selectable, method)() File /usr/lib/python2.6/site-packages/Twisted-10.0.0-py2.6-linux- x86_64.egg/twisted/internet/tcp.py, line 460, in doRead return self.protocol.dataReceived(data) File stdiodemo.py, line 419, in dataReceived p.Parse(line, 1) xml.parsers.expat.ExpatError: syntax error: line 1, column 0 The XML Message is coming in the form of: POST /test/pcp/Listener HTTP/1.1 Does Expat get this line as well? If so, that's the reason why you get an error at line 1, column 0. -- John Bokma j3b Hacking Hiking in Mexico - http://johnbokma.com/ http://castleamber.com/ - Perl Python Development -- http://mail.python.org/mailman/listinfo/python-list
Re: expat parsing error
On Jun 1, 9:51 am, John Bokma j...@castleamber.com wrote: kak...@gmail.com kak...@gmail.com writes: I got the following error --- exception caught here --- File /usr/lib/python2.6/site-packages/Twisted-10.0.0-py2.6-linux- x86_64.egg/twisted/internet/selectreactor.py, line 146, in _doReadOrWrite why = getattr(selectable, method)() File /usr/lib/python2.6/site-packages/Twisted-10.0.0-py2.6-linux- x86_64.egg/twisted/internet/tcp.py, line 460, in doRead return self.protocol.dataReceived(data) File stdiodemo.py, line 419, in dataReceived p.Parse(line, 1) xml.parsers.expat.ExpatError: syntax error: line 1, column 0 The XML Message is coming in the form of: POST /test/pcp/Listener HTTP/1.1 Does Expat get this line as well? If so, that's the reason why you get an error at line 1, column 0. -- John Bokma j3b Hacking Hiking in Mexico - http://johnbokma.com/http://castleamber.com/- Perl Python Development Yes but how can i fix it, how to ignore the headers and parse only the XML? Thanks -- http://mail.python.org/mailman/listinfo/python-list
Re: expat parsing error
kak...@gmail.com kak...@gmail.com writes: On Jun 1, 9:51 am, John Bokma j...@castleamber.com wrote: kak...@gmail.com kak...@gmail.com writes: I got the following error --- exception caught here --- File /usr/lib/python2.6/site-packages/Twisted-10.0.0-py2.6-linux- x86_64.egg/twisted/internet/selectreactor.py, line 146, in _doReadOrWrite why = getattr(selectable, method)() File /usr/lib/python2.6/site-packages/Twisted-10.0.0-py2.6-linux- x86_64.egg/twisted/internet/tcp.py, line 460, in doRead return self.protocol.dataReceived(data) File stdiodemo.py, line 419, in dataReceived p.Parse(line, 1) xml.parsers.expat.ExpatError: syntax error: line 1, column 0 The XML Message is coming in the form of: POST /test/pcp/Listener HTTP/1.1 Does Expat get this line as well? If so, that's the reason why you get an error at line 1, column 0. Yes but how can i fix it, how to ignore the headers and parse only the XML? The headers are followed by exactly one empty line, so you you simply could remove lines up until including this empty line and then hand over the data to the parser. -- John Bokma j3b Hacking Hiking in Mexico - http://johnbokma.com/ http://castleamber.com/ - Perl Python Development -- http://mail.python.org/mailman/listinfo/python-list
Re: expat parsing error
kak...@gmail.com, 01.06.2010 16:00: how can i fix it, how to ignore the headers and parse only the XML? Consider reading the answers you got in the last thread that you opened with exactly this question. Stefan -- http://mail.python.org/mailman/listinfo/python-list
Re: expat parsing error
On Jun 1, 10:34 am, Stefan Behnel stefan...@behnel.de wrote: kak...@gmail.com, 01.06.2010 16:00: how can i fix it, how to ignore the headers and parse only the XML? Consider reading the answers you got in the last thread that you opened with exactly this question. Stefan That's exactly, what i did but something seems to not working with the solutions i had, when i changed my implementation from pure Python's sockets to twisted library! That's the reason i have created a new post! Any ideas why this happened? Thanks Stefan -- http://mail.python.org/mailman/listinfo/python-list
Re: expat parsing error
kak...@gmail.com kak...@gmail.com writes: On Jun 1, 10:34 am, Stefan Behnel stefan...@behnel.de wrote: kak...@gmail.com, 01.06.2010 16:00: how can i fix it, how to ignore the headers and parse only the XML? Consider reading the answers you got in the last thread that you opened with exactly this question. Stefan That's exactly, what i did but something seems to not working with the solutions i had, when i changed my implementation from pure Python's sockets to twisted library! That's the reason i have created a new post! Any ideas why this happened? As I already explained: if you send your headers as well to any XML parser it will choke on those, because the headers are /not/ valid / well-formed XML. The solution is to remove the headers from your data. As I explained before: headers are followed by one empty line. Just remove lines up and until including the empty line, and pass the data to any XML parser. -- John Bokma j3b Hacking Hiking in Mexico - http://johnbokma.com/ http://castleamber.com/ - Perl Python Development -- http://mail.python.org/mailman/listinfo/python-list
Re: expat parsing error
On Jun 1, 11:09 am, John Bokma j...@castleamber.com wrote: kak...@gmail.com kak...@gmail.com writes: On Jun 1, 10:34 am, Stefan Behnel stefan...@behnel.de wrote: kak...@gmail.com, 01.06.2010 16:00: how can i fix it, how to ignore the headers and parse only the XML? Consider reading the answers you got in the last thread that you opened with exactly this question. Stefan That's exactly, what i did but something seems to not working with the solutions i had, when i changed my implementation from pure Python's sockets to twisted library! That's the reason i have created a new post! Any ideas why this happened? As I already explained: if you send your headers as well to any XML parser it will choke on those, because the headers are /not/ valid / well-formed XML. The solution is to remove the headers from your data. As I explained before: headers are followed by one empty line. Just remove lines up and until including the empty line, and pass the data to any XML parser. -- John Bokma j3b Hacking Hiking in Mexico - http://johnbokma.com/http://castleamber.com/- Perl Python Development Thank you so much i'll try it! Antonis -- http://mail.python.org/mailman/listinfo/python-list
Re: expat parsing error
On Jun 1, 11:12 am, kak...@gmail.com kak...@gmail.com wrote: On Jun 1, 11:09 am, John Bokma j...@castleamber.com wrote: kak...@gmail.com kak...@gmail.com writes: On Jun 1, 10:34 am, Stefan Behnel stefan...@behnel.de wrote: kak...@gmail.com, 01.06.2010 16:00: how can i fix it, how to ignore the headers and parse only the XML? Consider reading the answers you got in the last thread that you opened with exactly this question. Stefan That's exactly, what i did but something seems to not working with the solutions i had, when i changed my implementation from pure Python's sockets to twisted library! That's the reason i have created a new post! Any ideas why this happened? As I already explained: if you send your headers as well to any XML parser it will choke on those, because the headers are /not/ valid / well-formed XML. The solution is to remove the headers from your data. As I explained before: headers are followed by one empty line. Just remove lines up and until including the empty line, and pass the data to any XML parser. -- John Bokma j3b Hacking Hiking in Mexico - http://johnbokma.com/http://castleamber.com/-Perl Python Development Thank you so much i'll try it! Antonis Dear John can you provide me a simple working solution? I don't seem to get it -- http://mail.python.org/mailman/listinfo/python-list
Re: expat parsing error
On Jun 2, 1:57 am, kak...@gmail.com kak...@gmail.com wrote: On Jun 1, 11:12 am, kak...@gmail.com kak...@gmail.com wrote: On Jun 1, 11:09 am, John Bokma j...@castleamber.com wrote: kak...@gmail.com kak...@gmail.com writes: On Jun 1, 10:34 am, Stefan Behnel stefan...@behnel.de wrote: kak...@gmail.com, 01.06.2010 16:00: how can i fix it, how to ignore the headers and parse only the XML? Consider reading the answers you got in the last thread that you opened with exactly this question. Stefan That's exactly, what i did but something seems to not working with the solutions i had, when i changed my implementation from pure Python's sockets to twisted library! That's the reason i have created a new post! Any ideas why this happened? As I already explained: if you send your headers as well to any XML parser it will choke on those, because the headers are /not/ valid / well-formed XML. The solution is to remove the headers from your data. As I explained before: headers are followed by one empty line. Just remove lines up and until including the empty line, and pass the data to any XML parser. -- John Bokma j3b Hacking Hiking in Mexico - http://johnbokma.com/http://castleamber.com/-Perl; Python Development Thank you so much i'll try it! Antonis Dear John can you provide me a simple working solution? I don't seem to get it You're not wrong. Trysomething like this: rubbish1, rubbish2, xml = your_guff.partition('\n\n') -- http://mail.python.org/mailman/listinfo/python-list
Re: expat having problems with entities (amp;)
On Fri, Dec 11, 2009 at 13:23, nnguyen nguy...@gmail.com wrote: Any ideas on any expat tricks I'm missing out on? I'm also inclined to try another parser that can keep the string together when there are entities, or at least ampersands. IIRC expat explicitly does not guarantee that character data will be handed to the CharacterDataHandler in complete blocks. If you're certain you want to stay at such a low level, I would just modify your char_data method to append character data to self.current_data rather than replacing it. Personally, if I had the option (e.g. Python 2.5+) I'd use ElementTree... -- Rami Chowdhury Never assume malice when stupidity will suffice. -- Hanlon's Razor 408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD) -- http://mail.python.org/mailman/listinfo/python-list
Re: expat having problems with entities (amp;)
On Dec 11, 4:23 pm, nnguyen nguy...@gmail.com wrote: I need expat to parse this block of xml: datafield tag=991 subfield code=bc-Pamp;P/subfield subfield code=hLOT 3677/subfield subfield code=m(F)/subfield /datafield I need to parse the xml and return a dictionary that follows roughly the same layout as the xml. Currently the code for the class handling this is: class XML2Map(): def __init__(self): self.parser = expat.ParserCreate() self.parser.StartElementHandler = self.start_element self.parser.EndElementHandler = self.end_element self.parser.CharacterDataHandler = self.char_data self.map = [] #not a dictionary self.current_tag = '' self.current_subfields = [] self.current_sub = '' self.current_data = '' def parse_xml(self, xml_text): self.parser.Parse(xml_text, 1) def start_element(self, name, attrs): if name == 'datafield': self.current_tag = attrs['tag'] elif name == 'subfield': self.current_sub = attrs['code'] def char_data(self, data): self.current_data = data def end_element(self, name): if name == 'subfield': self.current_subfields.append([self.current_sub, self.current_data]) elif name == 'datafield': self.map.append({'tag': self.current_tag, 'subfields': self.current_subfields}) self.current_subfields = [] #resetting the values for next subfields Right now my problem is that when it's parsing the subfield element with the data c-Pamp;P, it's not taking the whole data, but instead it's breaking it into c-P, , P. i'm not an expert with expat, and I couldn't find a lot of information on how it handles specific entities. In the resulting map, instead of: {'tag': u'991', 'subfields': [[u'b', u'c-PP'], [u'h', u'LOT 3677'], [u'm', u'(F)']], 'inds': [u' ', u' ']} I get this: {'tag': u'991', 'subfields': [[u'b', u'P'], [u'h', u'LOT 3677'], [u'm', u'(F)']], 'inds': [u' ', u' ']} In the debugger, I can see that current_data gets assigned c-P, then , and then P. Any ideas on any expat tricks I'm missing out on? I'm also inclined to try another parser that can keep the string together when there are entities, or at least ampersands. I forgot, ignore the 'inds':... in the output above, it's just another part of the xml I had to parse that isn't important to this discussion. -- http://mail.python.org/mailman/listinfo/python-list
Re: expat having problems with entities (amp;)
On Dec 11, 4:39 pm, Rami Chowdhury rami.chowdh...@gmail.com wrote: On Fri, Dec 11, 2009 at 13:23, nnguyen nguy...@gmail.com wrote: Any ideas on any expat tricks I'm missing out on? I'm also inclined to try another parser that can keep the string together when there are entities, or at least ampersands. IIRC expat explicitly does not guarantee that character data will be handed to the CharacterDataHandler in complete blocks. If you're certain you want to stay at such a low level, I would just modify your char_data method to append character data to self.current_data rather than replacing it. Personally, if I had the option (e.g. Python 2.5+) I'd use ElementTree... Well the appending trick worked. From some logging I figured out that it was reading through those bits of current_data before getting to the subfield ending element (which is kinda obvious when you think about it). So I just used a += and made sure to clear out current_data when it hits a subfield ending element. Thanks! -- http://mail.python.org/mailman/listinfo/python-list
Re: expat error, help to debug?
Aloha, Andreas Lobinger wrote: Lawrence D'Oliveiro wrote: In message [EMAIL PROTECTED], Andreas Lobinger wrote: Anyone any idea where the error is produced? ... to share my findings with you: def ex(self,context,baseid,n1,n2): print x,context,n1,n2 return 1 The registered Handler has to return a (integer) value. Would have been nice if this had been mentioned in the documentation. Wishing a happy day, LOBI -- http://mail.python.org/mailman/listinfo/python-list
Re: expat error, help to debug?
Aloha, Andreas Lobinger wrote: Andreas Lobinger wrote: Lawrence D'Oliveiro wrote: In message [EMAIL PROTECTED], Andreas Lobinger wrote: Anyone any idea where the error is produced? The registered Handler has to return a (integer) value. Would have been nice if this had been mentioned in the documentation. Delete last line, it is mentioned in the documentation. -- http://mail.python.org/mailman/listinfo/python-list
Re: expat error, help to debug?
Aloha, Lawrence D'Oliveiro wrote: In message [EMAIL PROTECTED], Andreas Lobinger wrote: Anyone any idea where the error is produced? Do you want to try adding an EndElementHandler as well, just to get more information on where the error might be happening? I want. Adding an EndElement (left as an exercise to the user) handler the output looks like this: [42] scylla(scylla) python pbxml.py s3.xml s 7 book {} x bookinfo bookinfo.xml None s 9 chapter {u'id': u'technicalDescription'} s 9 title {} e title s 10 para {} e para e chapter e book Traceback (most recent call last): File pbxml.py, line 29, in ? fromxml(sys.argv[1]) File pbxml.py, line 24, in fromxml p.ParseFile(file(fname)) TypeError: an integer is required which shows me that the error is caused after parsing the /book ... BUT still within p.ParseFile (expat internal), so i can't look into it. The example here may be missleading. It was stripped down from a quite large docbook.xml and there ther error happened in the middle of the document, not at the end. Wishing a happy day, LOBI -- http://mail.python.org/mailman/listinfo/python-list
Re: expat error, help to debug?
In message [EMAIL PROTECTED], Andreas Lobinger wrote: Anyone any idea where the error is produced? Do you want to try adding an EndElementHandler as well, just to get more information on where the error might be happening? -- http://mail.python.org/mailman/listinfo/python-list
Re: expat parser
Sebastian Bassi wrote: I have this code: import xml.parsers.expat def start_element(name, attrs): print 'Start element:', name, attrs def end_element(name): print 'End element:', name def char_data(data): print 'Character data:', repr(data) p = xml.parsers.expat.ParserCreate() p.StartElementHandler = start_element p.EndElementHandler = end_element p.CharacterDataHandler = char_data fh=open(/home/sbassi/bioinfo/smallUniprot.xml,r) p.ParseFile(fh) And I get this on the output: ... Start element: sequence {u'checksum': u'E0C0CC2E1F189B8A', u'length': u'393'} Character data: u'\n' Character data: u'MPKKKPTPIQLNPAPDGSAVNGTSSAETNLEALQKKLEELELDEQQRKRL' Character data: u'\n' Character data: u'EAFLTQKQKVGELKDDDFEKISELGAGNGGVVFKVSHKPSGLVMARKLIH' ... End element: sequence ... Is there a way to have the character data together in one string? I guess it should not be difficult, but I can't do it. Each time the parse reads a line, return a line, and I want to have it in one variable. Any reason you are using expat and not cElementTree's iterparse? Stefan -- http://mail.python.org/mailman/listinfo/python-list
Re: expat
Merci à Frederik et Jarek! According to your hints I did tests with a different coding and another option in OpenOffice 'Size optimization for XML format'. Went fine! - Back to my files from yesterday the same proper converting... uups Anyway, it's running! Katja -- http://mail.python.org/mailman/listinfo/python-list
Re: expat
Katja Suess wrote: may I have a hint what the problem is in my situation? Is it a syntax error in sweetone.odt or in xml.parsers.expat? xml.parsers.expat.ExpatError: syntax error: line 1, column 0 it's a problem with the file you're parsing (either because it's not a valid XML file, or because it's encoded in some non-standard way). can you post the first few lines from that file ? /F -- http://mail.python.org/mailman/listinfo/python-list
Re: expat
Katja Suess napisał(a): may I have a hint what the problem is in my situation? Is it a syntax error in sweetone.odt or in xml.parsers.expat? Same problem with different file instead of sweetone.odt means that it's not the file that has a syntax error. xml.parsers.expat is a standard module that probably has no errors. So what could cause this error message?? Malformed XML document, perhaps. This may be anything, that expat doesn't like (i.e. wrong encoding, cp1252 declared as latin-1, document declared as utf-8 but with BOM, and so on). -- Jarek Zgoda http://jpa.berlios.de/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Expat - how to UseForeignDTD
I needed to set Entity Parsing, such as parser.SetParamEntityParsing( expat.XML_PARAM_ENTITY_PARSING_ALWAYS ) -- http://mail.python.org/mailman/listinfo/python-list