Re: expat parsing error

2010-06-02 Thread kak...@gmail.com
On 2 Ιούν, 03:47, John Machin sjmac...@lexicon.net wrote:
 On Jun 2, 1:57 am, kak...@gmail.com kak...@gmail.com wrote:





  On Jun 1, 11:12 am, kak...@gmail.com kak...@gmail.com wrote:

   On Jun 1, 11:09 am, John Bokma j...@castleamber.com wrote:

kak...@gmail.com kak...@gmail.com writes:
 On Jun 1, 10:34 am, Stefan Behnel stefan...@behnel.de wrote:
 kak...@gmail.com, 01.06.2010 16:00:

  how can i fix it, how to ignore the headers and parse only
  the XML?

 Consider reading the answers you got in the last thread that you 
 opened
 with exactly this question.

 Stefan

 That's exactly, what i did but something seems to not working with the
 solutions i had, when i changed my implementation from pure Python's
 sockets to twisted library!
 That's the reason i have created a new post!
 Any ideas why this happened?

As I already explained: if you send your headers as well to any XML
parser it will choke on those, because the headers are /not/ valid /
well-formed XML. The solution is to remove the headers from your
data. As I explained before: headers are followed by one empty
line. Just remove lines up and until including the empty line, and pass
the data to any XML parser.

--
John Bokma                                                              
 j3b

Hacking  Hiking in Mexico -  
http://johnbokma.com/http://castleamber.com/-PerlPython Development

   Thank you so much i'll try it!
   Antonis

  Dear John can you provide me a simple working solution?
  I don't seem to get it

 You're not wrong. Trysomething like this:

 rubbish1, rubbish2, xml = your_guff.partition('\n\n')

Ok thanks a lot!
Antonis
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: expat parsing error

2010-06-01 Thread John Bokma
kak...@gmail.com kak...@gmail.com writes:

 I got the following error
 --- exception caught here ---
   File /usr/lib/python2.6/site-packages/Twisted-10.0.0-py2.6-linux-
 x86_64.egg/twisted/internet/selectreactor.py, line 146, in
 _doReadOrWrite
 why = getattr(selectable, method)()
   File /usr/lib/python2.6/site-packages/Twisted-10.0.0-py2.6-linux-
 x86_64.egg/twisted/internet/tcp.py, line 460, in doRead
 return self.protocol.dataReceived(data)
   File stdiodemo.py, line 419, in dataReceived
 p.Parse(line, 1)
 xml.parsers.expat.ExpatError: syntax error: line 1, column 0


 The XML Message is coming in the form of:

 POST /test/pcp/Listener HTTP/1.1

Does Expat get this line as well? If so, that's the reason why you get
an error at line 1, column 0.

-- 
John Bokma   j3b

Hacking  Hiking in Mexico -  http://johnbokma.com/
http://castleamber.com/ - Perl  Python Development
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: expat parsing error

2010-06-01 Thread kak...@gmail.com
On Jun 1, 9:51 am, John Bokma j...@castleamber.com wrote:
 kak...@gmail.com kak...@gmail.com writes:
  I got the following error
  --- exception caught here ---
    File /usr/lib/python2.6/site-packages/Twisted-10.0.0-py2.6-linux-
  x86_64.egg/twisted/internet/selectreactor.py, line 146, in
  _doReadOrWrite
      why = getattr(selectable, method)()
    File /usr/lib/python2.6/site-packages/Twisted-10.0.0-py2.6-linux-
  x86_64.egg/twisted/internet/tcp.py, line 460, in doRead
      return self.protocol.dataReceived(data)
    File stdiodemo.py, line 419, in dataReceived
      p.Parse(line, 1)
  xml.parsers.expat.ExpatError: syntax error: line 1, column 0

  The XML Message is coming in the form of:

  POST /test/pcp/Listener HTTP/1.1

 Does Expat get this line as well? If so, that's the reason why you get
 an error at line 1, column 0.

 --
 John Bokma                                                               j3b

 Hacking  Hiking in Mexico -  http://johnbokma.com/http://castleamber.com/- 
 Perl  Python Development

Yes but how can i fix it, how to ignore the headers and parse only
the XML?
Thanks
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: expat parsing error

2010-06-01 Thread John Bokma
kak...@gmail.com kak...@gmail.com writes:

 On Jun 1, 9:51 am, John Bokma j...@castleamber.com wrote:
 kak...@gmail.com kak...@gmail.com writes:
  I got the following error
  --- exception caught here ---
    File /usr/lib/python2.6/site-packages/Twisted-10.0.0-py2.6-linux-
  x86_64.egg/twisted/internet/selectreactor.py, line 146, in
  _doReadOrWrite
      why = getattr(selectable, method)()
    File /usr/lib/python2.6/site-packages/Twisted-10.0.0-py2.6-linux-
  x86_64.egg/twisted/internet/tcp.py, line 460, in doRead
      return self.protocol.dataReceived(data)
    File stdiodemo.py, line 419, in dataReceived
      p.Parse(line, 1)
  xml.parsers.expat.ExpatError: syntax error: line 1, column 0

  The XML Message is coming in the form of:

  POST /test/pcp/Listener HTTP/1.1

 Does Expat get this line as well? If so, that's the reason why you get
 an error at line 1, column 0.

 Yes but how can i fix it, how to ignore the headers and parse only
 the XML?

The headers are followed by exactly one empty line, so you you simply
could remove lines up until including this empty line and then hand over
the data to the parser.

-- 
John Bokma   j3b

Hacking  Hiking in Mexico -  http://johnbokma.com/
http://castleamber.com/ - Perl  Python Development
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: expat parsing error

2010-06-01 Thread Stefan Behnel

kak...@gmail.com, 01.06.2010 16:00:

how can i fix it, how to ignore the headers and parse only
the XML?


Consider reading the answers you got in the last thread that you opened 
with exactly this question.


Stefan

--
http://mail.python.org/mailman/listinfo/python-list


Re: expat parsing error

2010-06-01 Thread kak...@gmail.com
On Jun 1, 10:34 am, Stefan Behnel stefan...@behnel.de wrote:
 kak...@gmail.com, 01.06.2010 16:00:

  how can i fix it, how to ignore the headers and parse only
  the XML?

 Consider reading the answers you got in the last thread that you opened
 with exactly this question.

 Stefan

That's exactly, what i did but something seems to not working with the
solutions i had, when i changed my implementation from pure Python's
sockets to twisted library!
That's the reason i have created a new post!
Any ideas why this happened?
Thanks Stefan

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: expat parsing error

2010-06-01 Thread John Bokma
kak...@gmail.com kak...@gmail.com writes:

 On Jun 1, 10:34 am, Stefan Behnel stefan...@behnel.de wrote:
 kak...@gmail.com, 01.06.2010 16:00:

  how can i fix it, how to ignore the headers and parse only
  the XML?

 Consider reading the answers you got in the last thread that you opened
 with exactly this question.

 Stefan

 That's exactly, what i did but something seems to not working with the
 solutions i had, when i changed my implementation from pure Python's
 sockets to twisted library!
 That's the reason i have created a new post!
 Any ideas why this happened?

As I already explained: if you send your headers as well to any XML
parser it will choke on those, because the headers are /not/ valid /
well-formed XML. The solution is to remove the headers from your
data. As I explained before: headers are followed by one empty
line. Just remove lines up and until including the empty line, and pass
the data to any XML parser.

-- 
John Bokma   j3b

Hacking  Hiking in Mexico -  http://johnbokma.com/
http://castleamber.com/ - Perl  Python Development
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: expat parsing error

2010-06-01 Thread kak...@gmail.com
On Jun 1, 11:09 am, John Bokma j...@castleamber.com wrote:
 kak...@gmail.com kak...@gmail.com writes:
  On Jun 1, 10:34 am, Stefan Behnel stefan...@behnel.de wrote:
  kak...@gmail.com, 01.06.2010 16:00:

   how can i fix it, how to ignore the headers and parse only
   the XML?

  Consider reading the answers you got in the last thread that you opened
  with exactly this question.

  Stefan

  That's exactly, what i did but something seems to not working with the
  solutions i had, when i changed my implementation from pure Python's
  sockets to twisted library!
  That's the reason i have created a new post!
  Any ideas why this happened?

 As I already explained: if you send your headers as well to any XML
 parser it will choke on those, because the headers are /not/ valid /
 well-formed XML. The solution is to remove the headers from your
 data. As I explained before: headers are followed by one empty
 line. Just remove lines up and until including the empty line, and pass
 the data to any XML parser.

 --
 John Bokma                                                               j3b

 Hacking  Hiking in Mexico -  http://johnbokma.com/http://castleamber.com/- 
 Perl  Python Development

Thank you so much i'll try it!
Antonis
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: expat parsing error

2010-06-01 Thread kak...@gmail.com
On Jun 1, 11:12 am, kak...@gmail.com kak...@gmail.com wrote:
 On Jun 1, 11:09 am, John Bokma j...@castleamber.com wrote:



  kak...@gmail.com kak...@gmail.com writes:
   On Jun 1, 10:34 am, Stefan Behnel stefan...@behnel.de wrote:
   kak...@gmail.com, 01.06.2010 16:00:

how can i fix it, how to ignore the headers and parse only
the XML?

   Consider reading the answers you got in the last thread that you opened
   with exactly this question.

   Stefan

   That's exactly, what i did but something seems to not working with the
   solutions i had, when i changed my implementation from pure Python's
   sockets to twisted library!
   That's the reason i have created a new post!
   Any ideas why this happened?

  As I already explained: if you send your headers as well to any XML
  parser it will choke on those, because the headers are /not/ valid /
  well-formed XML. The solution is to remove the headers from your
  data. As I explained before: headers are followed by one empty
  line. Just remove lines up and until including the empty line, and pass
  the data to any XML parser.

  --
  John Bokma                                                               j3b

  Hacking  Hiking in Mexico -  
  http://johnbokma.com/http://castleamber.com/-Perl  Python Development

 Thank you so much i'll try it!
 Antonis

Dear John can you provide me a simple working solution?
I don't seem to get it
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: expat parsing error

2010-06-01 Thread John Machin
On Jun 2, 1:57 am, kak...@gmail.com kak...@gmail.com wrote:
 On Jun 1, 11:12 am, kak...@gmail.com kak...@gmail.com wrote:



  On Jun 1, 11:09 am, John Bokma j...@castleamber.com wrote:

   kak...@gmail.com kak...@gmail.com writes:
On Jun 1, 10:34 am, Stefan Behnel stefan...@behnel.de wrote:
kak...@gmail.com, 01.06.2010 16:00:

 how can i fix it, how to ignore the headers and parse only
 the XML?

Consider reading the answers you got in the last thread that you opened
with exactly this question.

Stefan

That's exactly, what i did but something seems to not working with the
solutions i had, when i changed my implementation from pure Python's
sockets to twisted library!
That's the reason i have created a new post!
Any ideas why this happened?

   As I already explained: if you send your headers as well to any XML
   parser it will choke on those, because the headers are /not/ valid /
   well-formed XML. The solution is to remove the headers from your
   data. As I explained before: headers are followed by one empty
   line. Just remove lines up and until including the empty line, and pass
   the data to any XML parser.

   --
   John Bokma                                                               
   j3b

   Hacking  Hiking in Mexico -  
   http://johnbokma.com/http://castleamber.com/-Perl; Python Development

  Thank you so much i'll try it!
  Antonis

 Dear John can you provide me a simple working solution?
 I don't seem to get it

You're not wrong. Trysomething like this:

rubbish1, rubbish2, xml = your_guff.partition('\n\n')
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: expat having problems with entities (amp;)

2009-12-11 Thread Rami Chowdhury
On Fri, Dec 11, 2009 at 13:23, nnguyen nguy...@gmail.com wrote:

 Any ideas on any expat tricks I'm missing out on? I'm also inclined to
 try another parser that can keep the string together when there are
 entities, or at least ampersands.

IIRC expat explicitly does not guarantee that character data will be
handed to the CharacterDataHandler in complete blocks. If you're
certain you want to stay at such a low level, I would just modify your
char_data method to append character data to self.current_data rather
than replacing it. Personally, if I had the option (e.g. Python 2.5+)
I'd use ElementTree...


-- 

Rami Chowdhury
Never assume malice when stupidity will suffice. -- Hanlon's Razor
408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: expat having problems with entities (amp;)

2009-12-11 Thread nnguyen
On Dec 11, 4:23 pm, nnguyen nguy...@gmail.com wrote:
 I need expat to parse this block of xml:

 datafield tag=991
   subfield code=bc-Pamp;P/subfield
   subfield code=hLOT 3677/subfield
   subfield code=m(F)/subfield
 /datafield

 I need to parse the xml and return a dictionary that follows roughly
 the same layout as the xml. Currently the code for the class handling
 this is:

 class XML2Map():

 def __init__(self):
  
 self.parser = expat.ParserCreate()

 self.parser.StartElementHandler = self.start_element
 self.parser.EndElementHandler = self.end_element
 self.parser.CharacterDataHandler = self.char_data

 self.map = [] #not a dictionary

 self.current_tag = ''
 self.current_subfields = []
 self.current_sub = ''
 self.current_data = ''

 def parse_xml(self, xml_text):
 self.parser.Parse(xml_text, 1)

 def start_element(self, name, attrs):
 if name == 'datafield':
 self.current_tag = attrs['tag']

 elif name == 'subfield':
 self.current_sub = attrs['code']

 def char_data(self, data):
 self.current_data = data

 def end_element(self, name):
 if name == 'subfield':
 self.current_subfields.append([self.current_sub,
 self.current_data])

 elif name == 'datafield':
 self.map.append({'tag': self.current_tag, 'subfields':
 self.current_subfields})
 self.current_subfields = [] #resetting the values for next
 subfields

 Right now my problem is that when it's parsing the subfield element
 with the data c-Pamp;P, it's not taking the whole data, but instead
 it's breaking it into c-P, , P. i'm not an expert with expat,
 and I couldn't find a lot of information on how it handles specific
 entities.

 In the resulting map, instead of:

 {'tag': u'991', 'subfields': [[u'b', u'c-PP'], [u'h', u'LOT 3677'],
 [u'm', u'(F)']], 'inds': [u' ', u' ']}

 I get this:

 {'tag': u'991', 'subfields': [[u'b', u'P'], [u'h', u'LOT 3677'],
 [u'm', u'(F)']], 'inds': [u' ', u' ']}

 In the debugger, I can see that current_data gets assigned c-P, then
 , and then P.

 Any ideas on any expat tricks I'm missing out on? I'm also inclined to
 try another parser that can keep the string together when there are
 entities, or at least ampersands.

I forgot, ignore the 'inds':... in the output above, it's just
another part of the xml I had to parse that isn't important to this
discussion.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: expat having problems with entities (amp;)

2009-12-11 Thread nnguyen
On Dec 11, 4:39 pm, Rami Chowdhury rami.chowdh...@gmail.com wrote:
 On Fri, Dec 11, 2009 at 13:23, nnguyen nguy...@gmail.com wrote:

  Any ideas on any expat tricks I'm missing out on? I'm also inclined to
  try another parser that can keep the string together when there are
  entities, or at least ampersands.

 IIRC expat explicitly does not guarantee that character data will be
 handed to the CharacterDataHandler in complete blocks. If you're
 certain you want to stay at such a low level, I would just modify your
 char_data method to append character data to self.current_data rather
 than replacing it. Personally, if I had the option (e.g. Python 2.5+)
 I'd use ElementTree...


Well the appending trick worked. From some logging I figured out that
it was reading through those bits of current_data before getting to
the subfield ending element (which is kinda obvious when you think
about it). So I just used a += and made sure to clear out current_data
when it hits a subfield ending element.

Thanks!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: expat error, help to debug?

2007-08-28 Thread Andreas Lobinger
Aloha,

Andreas Lobinger wrote:
 Lawrence D'Oliveiro wrote:
 In message [EMAIL PROTECTED], Andreas Lobinger wrote:
 Anyone any idea where the error is produced?

... to share my findings with you:

 def ex(self,context,baseid,n1,n2):
 print x,context,n1,n2
 return 1

The registered Handler has to return a (integer) value.
Would have been nice if this had been mentioned in the documentation.

Wishing a happy day,
LOBI
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: expat error, help to debug?

2007-08-28 Thread Andreas Lobinger
Aloha,

Andreas Lobinger wrote:
 Andreas Lobinger wrote:
 Lawrence D'Oliveiro wrote:
 In message [EMAIL PROTECTED], Andreas Lobinger wrote:
 Anyone any idea where the error is produced?
 The registered Handler has to return a (integer) value.
 Would have been nice if this had been mentioned in the documentation.

Delete last line, it is mentioned in the documentation.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: expat error, help to debug?

2007-08-27 Thread Andreas Lobinger
Aloha,

Lawrence D'Oliveiro wrote:
 In message [EMAIL PROTECTED], Andreas Lobinger wrote:
Anyone any idea where the error is produced?

 Do you want to try adding an EndElementHandler as well, just to get more
 information on where the error might be happening?

I want.

Adding an EndElement (left as an exercise to the user) handler the
output looks like this:
[42] scylla(scylla) python pbxml.py s3.xml
s 7 book {}
x bookinfo bookinfo.xml None
s 9 chapter {u'id': u'technicalDescription'}
s 9 title {}
e title
s 10 para {}
e para
e chapter
e book
Traceback (most recent call last):
   File pbxml.py, line 29, in ?
 fromxml(sys.argv[1])
   File pbxml.py, line 24, in fromxml
 p.ParseFile(file(fname))
TypeError: an integer is required

which shows me that the error is caused after parsing the /book ...
BUT still within p.ParseFile (expat internal), so i can't look
into it.

The example here may be missleading. It was stripped down from
a quite large docbook.xml and there ther error happened in the
middle of the document, not at the end.

Wishing a happy day,
LOBI
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: expat error, help to debug?

2007-08-26 Thread Lawrence D'Oliveiro
In message [EMAIL PROTECTED], Andreas Lobinger wrote:

 Anyone any idea where the error is produced?

Do you want to try adding an EndElementHandler as well, just to get more
information on where the error might be happening?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: expat parser

2007-05-28 Thread Stefan Behnel
Sebastian Bassi wrote:
 I have this code:
 
 import xml.parsers.expat
 def start_element(name, attrs):
print 'Start element:', name, attrs
 def end_element(name):
print 'End element:', name
 def char_data(data):
print 'Character data:', repr(data)
 p = xml.parsers.expat.ParserCreate()
 p.StartElementHandler = start_element
 p.EndElementHandler = end_element
 p.CharacterDataHandler = char_data
 fh=open(/home/sbassi/bioinfo/smallUniprot.xml,r)
 p.ParseFile(fh)
 
 And I get this on the output:
 
 ...
 Start element: sequence {u'checksum': u'E0C0CC2E1F189B8A', u'length':
 u'393'}
 Character data: u'\n'
 Character data: u'MPKKKPTPIQLNPAPDGSAVNGTSSAETNLEALQKKLEELELDEQQRKRL'
 Character data: u'\n'
 Character data: u'EAFLTQKQKVGELKDDDFEKISELGAGNGGVVFKVSHKPSGLVMARKLIH'
 ...
 End element: sequence
 ...
 
 Is there a way to have the character data together in one string? I
 guess it should not be difficult, but I can't do it. Each time the
 parse reads a line, return a line, and I want to have it in one
 variable.

Any reason you are using expat and not cElementTree's iterparse?

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: expat

2006-02-28 Thread Katja Suess
Merci à Frederik et Jarek!
According to your hints I did tests with a different coding and another option 
in OpenOffice 'Size optimization for XML format'.
Went fine! - Back to my files from yesterday the same proper converting... uups
Anyway, it's running!
Katja
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: expat

2006-02-27 Thread Fredrik Lundh
Katja Suess wrote:

 may I have a hint what the problem is in my situation?
 Is it a syntax error in sweetone.odt or in xml.parsers.expat?

 xml.parsers.expat.ExpatError: syntax error: line 1, column 0

it's a problem with the file you're parsing (either because it's not a
valid XML file, or because it's encoded in some non-standard way).

can you post the first few lines from that file ?

/F



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: expat

2006-02-27 Thread Jarek Zgoda
Katja Suess napisał(a):

 may I have a hint what the problem is in my situation?
 Is it a syntax error in sweetone.odt or in xml.parsers.expat?
 Same problem with different file instead of sweetone.odt means that it's
 not the file that has a syntax error.
 xml.parsers.expat is a standard module that probably has no errors.
 So what could cause this error message??

Malformed XML document, perhaps. This may be anything, that expat
doesn't like (i.e. wrong encoding, cp1252 declared as latin-1, document
declared as utf-8 but with BOM, and so on).

-- 
Jarek Zgoda
http://jpa.berlios.de/
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Expat - how to UseForeignDTD

2005-10-20 Thread B Mahoney
I needed to set Entity Parsing, such as

parser.SetParamEntityParsing( expat.XML_PARAM_ENTITY_PARSING_ALWAYS )

-- 
http://mail.python.org/mailman/listinfo/python-list