Re: [ANN] pyxser-0.2r --- Python XML Serialization

2009-04-20 Thread Daniel Molina Wegener
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Stefan Behnel wrote:

 Daniel Molina Wegener wrote:
   Sorry, it appears that I've misunderstand your question. By /unicode
 objects/ I mean /python unicode objects/ aka /python unicode strings/.
 
 Yes, that's exactly what I'm talking about. Maybe you should read up on
 what Unicode is.

  OK, seems that the better option is to return both types in different 
functions, then it will allow the user to choice to fit the development
needs.

 
 
 Most of them can be reencoded into /latin*/ strings and then /ascii/
 strings if is that what you want. But for most communications, suchs as
 Java systems, utf-8 encoding goes as default.
 
 Well, then do not output a Python unicode string, but a UTF-8 encoded byte
 string as the default. Except for a couple of cases, Python unicode
 strings are very inconvenient for serialised XML.

  OK, good point, I must take a look on the implementation, and as I've 
said, I will implement both returns in different functions to allow a user
choice, and document the impact of using python unicode strings.

  Thanks for your feedback :D

 
 Stefan

Best regards,
- -- 
.O.| Daniel Molina Wegener   | C/C++ Developer
..O| dmw [at] coder [dot] cl | FreeBSD  Linux
OOO| http://coder.cl/| Standards Basis
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)

iQIcBAEBAgAGBQJJ7KMSAAoJEHxqfq6Y4O5NxXcQALZCT+mpjXv2My2XV6VdNAdd
rd2W3q4ZLWdFdawvPwBWIzsoyAWIq1fU5tsZ4gywGesZFF2dbh6QJA7WvsqNaOXp
wraeN0A3uWIwxAQEtHXu/vmO68CskVBxKZOrwjFOFoH3CoDqj0cdltKtddNkjHjl
sxR91bK8lsBtYleQvng5oVjDouTvzSZEj9Lz2EbgjGIe+UKB8cQDLpT5CqF/whW7
kPmmbMJz195dyPTHstTy7BaZTJu/zgA3aNrbl4/QQ9B97dO5oMO3JEQgpTv4KSWn
prpFo447HxYCChd+3wYyEx4tjMfnFezreuWxymKU9BP9Bk6yAcBFfdIDqBvUTDIw
HF24n8NkesoHnoyQ1vf474fyIQ8NT28MQaBZXYntTTx1h015UB7vRMF0L3EttLRy
VdpoRvlVAp01Z+7fdUjIRszveC5OCp1a4ZRmptcrZmIQM83Z/HZDBwjRO0zVuIqM
5qFmhERvgHSEl3cpdANznHZBKEEB9dqmAv9/XV5n5lUMg5Hn6d8yBkiwr5lRJ9eK
0n0602EuiPxgaP5cAbYF0MJGs3c+YNK9eIAmZC9++Fcg6lOlDSwS3RTQMsCi+Rvo
k6gI0YNA9N19zbBfQippf1SGrmGrfPk141gNuXGW2HjIuTF9t3IAUZ/bgpWI74L/
69u5ugNM+ERRnxpgIHLV
=P88y
-END PGP SIGNATURE-

--
http://mail.python.org/mailman/listinfo/python-list


Re: [ANN] pyxser-0.2r --- Python XML Serialization

2009-04-19 Thread Stefan Behnel
Daniel Molina Wegener wrote:
 * Every serilization is made into unicode objects.

Hmm, does that mean that when I serialise, I get a unicode object back?
What about the XML declaration? How can a user create well-formed XML from
your output? Or is that not the intention?

Stefan
--
http://mail.python.org/mailman/listinfo/python-list


Re: [ANN] pyxser-0.2r --- Python XML Serialization

2009-04-19 Thread Daniel Molina Wegener
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Stefan Behnel stefan...@behnel.de
on Sunday 19 April 2009 02:25
wrote in comp.lang.python:


 Daniel Molina Wegener wrote:
 * Every serilization is made into unicode objects.
 
 Hmm, does that mean that when I serialise, I get a unicode object back?
 What about the XML declaration? How can a user create well-formed XML from
 your output? Or is that not the intention?

  Yes, if you serialize an object you get an XML string as
unicode object, since unicode objects supports UTF-8 and
some other encodings. Also you can deserialize the object ---
I mean convert the XML back to python object tree. Take a
look on the serializer output:

  http://coder.cl/software/pyxser/#id_example

 
 Stefan


Best regards,
- -- 
 .O. | Daniel Molina Wegener   | FreeBSD  Linux
 ..O | dmw [at] coder [dot] cl | Open Standards
 OOO | http://coder.cl/| FOSS Developer

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (FreeBSD)

iQIcBAEBCgAGBQJJ61B1AAoJEHxqfq6Y4O5N3qEQANT07GTyO17rFGMRhVsQ9IzK
qKcJl7tv15dYnLjJ+TPLRJ44ENPbQfUfrSevsY6ZTKK+MEqKcUej+41JKwImc8RT
GDILehrn1SgttALryKJyZlIWFFoVlIHflJL883bUd1S2nppY9yz5o9wBoq98KIbt
Rs3Azb8ZxVE9yABDQJbKhsBPZfa65wEzQo+MeDI+2xz301Rr9EttPJCFLMJFBeBt
0uvhwGXHOTKLwGsOOf//T1XNpg14QouEJJKGC1LjTSfAWvcXKsKdgLu4aAn6JGtW
zHqG2Uw3LvpBjgCwA5i1CTpmJxx8HhrDmVQyO6jdw65j5Ms9nCFD3BSezvvuYwtd
bvd0L7cHx/9TwGRifDDhAhBjdqR8lX8XyK8VSaNpjyf0ZCPmXk+AIDgINGzk2bG3
CkC8VfFDRJubwX0tFbtqXx8A1M7s5pu4DMdi8e9h5Bw+b/qfC0hCHIB7bViq2gH1
ELsC0xoffW1LxxowqjDlMDK1FymTLmErssQ7qCFLXBzxS7UHCcRMwKC+9v/NAbyU
wUuQNxRPASYnvfxVyQJdAurK9NNXQ5A58fclli6H/5Su+knDxOElXZ4ZMJt2Bbdg
9W2l99rBL1n50pfwiLezeRH3fhDXByNZiAPO2+ahdyDjyMekc/kqtTD898v841oG
UJyGm+fyw/kxEpI0R3E5
=mXFf
-END PGP SIGNATURE-
--
http://mail.python.org/mailman/listinfo/python-list


Re: [ANN] pyxser-0.2r --- Python XML Serialization

2009-04-19 Thread Stefan Behnel
Daniel Molina Wegener wrote:
 Stefan Behnel stefan...@behnel.de
 on Sunday 19 April 2009 02:25
 wrote in comp.lang.python:
 
 
 Daniel Molina Wegener wrote:
 * Every serilization is made into unicode objects.
 Hmm, does that mean that when I serialise, I get a unicode object back?
 What about the XML declaration? How can a user create well-formed XML from
 your output? Or is that not the intention?
 
   Yes, if you serialize an object you get an XML string as
 unicode object, since unicode objects supports UTF-8 and
 some other encodings.

That's not what I meant. I was wondering why you chose to use a unicode
string instead of a byte string (which XML is defined for). If your only
intention is to deserialise the unicode string into a tree, that may be
acceptable. However, as soon as you start writing the data to a file or
through a network pipe, or pass it to an XML parser, you'd better make it
well-formed XML. So you either need to encode it as UTF-8 (for which you do
not need a declaration), or you will need to encode it in a different byte
encoding, and then prepend a declaration yourself. In any case, this is a
lot more overhead (and cumbersome for users) than writing out a correctly
serialised byte string directly.

You seemed to be very interested in good performance, so I don't quite
understand why you want to require an additional step with a relatively
high performance impact that only makes it harder for users to use the tool
correctly.

Stefan
--
http://mail.python.org/mailman/listinfo/python-list


Re: [ANN] pyxser-0.2r --- Python XML Serialization

2009-04-19 Thread Daniel Molina Wegener
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Stefan Behnel stefan...@behnel.de
on Sunday 19 April 2009 15:08
wrote in comp.lang.python:


 Daniel Molina Wegener wrote:
 Stefan Behnel stefan...@behnel.de
 on Sunday 19 April 2009 02:25
 wrote in comp.lang.python:
 
 
 Daniel Molina Wegener wrote:
 * Every serilization is made into unicode objects.
 Hmm, does that mean that when I serialise, I get a unicode object back?
 What about the XML declaration? How can a user create well-formed XML
 from your output? Or is that not the intention?
 
   Yes, if you serialize an object you get an XML string as
 unicode object, since unicode objects supports UTF-8 and
 some other encodings.
 
 That's not what I meant. I was wondering why you chose to use a unicode
 string instead of a byte string (which XML is defined for). If your only
 intention is to deserialise the unicode string into a tree, that may be
 acceptable.

  Since libxml2 default encoding is UTF-8, and most applications are using
XML encoded in UTF-8, it's clear to define it as the default encoding for
the generated XML. Also, if take a little bit of time and read the
documentation, you can use any encoding supported by Python, such as
latin1, aka iso-8859-1. UTF-8 it's just the default encoding.

  The first intention was to have an C14N representation of python objects,
and regarding the C14N specification, I can't use another encoding for C14N
representation.

 However, as soon as you start writing the data to a file or 
 through a network pipe, or pass it to an XML parser, you'd better make it
 well-formed XML. So you either need to encode it as UTF-8 (for which you
 do not need a declaration),

  I repeat, it's just the default encoding. But do you which exception do
you get with byte strings and wrong encoded strings (think on accents and
special characters)?, Unicode objects in python support most of regular
encodings.

 or you will need to encode it in a different 
 byte encoding, and then prepend a declaration yourself. In any case, this
 is a lot more overhead (and cumbersome for users) than writing out a
 correctly serialised byte string directly.

  No, I'm just using the default encoding for libxml2 which can be converted
or reencoded to other character sets, and if read the documentation, you
will see that you can use most of python supported encodings.

 
 You seemed to be very interested in good performance, so I don't quite
 understand why you want to require an additional step with a relatively
 high performance impact that only makes it harder for users to use the
 tool correctly.

  By using a different encoding than the default encoding for libxml2 makes
the work hard for libxml2 since it requires that every #PCDATA section to be
reencoded to the desired encoding and comparing one string conversion in
python against many string conversion under libxml2, the program gets more
slow performance by using a different encoding than the default encoding.
Also, since it is the default encoding, using an UTF-8 string in python
by passing the UTF-8 string buffer and size does not have a huge impact
on performance.

 
 Stefan

Best regards,
- -- 
 .O. | Daniel Molina Wegener   | FreeBSD  Linux
 ..O | dmw [at] coder [dot] cl | Open Standards
 OOO | http://coder.cl/| FOSS Developer

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (FreeBSD)

iQIcBAEBCgAGBQJJ6+N6AAoJEHxqfq6Y4O5NdKAQAMXyoK/V4/bI16D9naydS4n8
IdjZ+R9MJIOKeUhuDABnk1ieyOB8Uxga86lyVOIaXnN4LK6wWioci+TxzoVgJJ8q
pUiiG9E1jq6rQ7DTJN3enoCi7odOVrKr4L69mkZ9GMLkfWI3cdvcwZIq42eev2LI
yGCnJbHCwR2tgo4YCSy/luBucHCdW8ZkV0A8WMD7f2nZJgRygzqwwx6gOUpFGj1H
UH0AfzCvZLndhh9THl4xz2eIT+6SeaNM5s9Oq04gz64jOKiHPuX1sZMAqxQgQCVQ
v7HnPBq1oBkqwX/sSF4BR+Gqitue10ya1jWHJsln2e76KGXFDCaun1F1vfoa8HZI
RE7XawXprTTpCCQ9KVv+NSeKG6dnnxhYKA0SKXCmcgh2CTjxZPFpNqXlTCof2pdp
gKLWwD5te/DaYTh/GRpTnYsJMGtrHlUQ8KEIBEg2j7cItkgpPx1siNDe0WQoXo17
+fwmKeuNDJwCWAM1n6Bgp28AkJ7Fs32E+t1zN5Ij0QrbJX/ez58Z3hGszS57zsNY
bvhcdFVvt+AOF+uL2Kubmaj3g0ta406Oic/MzCjIe9yE+pmBikcgYce0oU3b44F5
8z/w3ZsaWPCMS2V4FRqaUMQzDpE7XW/7GRU4OaHyJLfGQxj0bfDogL0WAhYhKhyf
/myumLDlCsPu1HhD6PdB
=nbx6
-END PGP SIGNATURE-
--
http://mail.python.org/mailman/listinfo/python-list


Re: [ANN] pyxser-0.2r --- Python XML Serialization

2009-04-19 Thread Daniel Molina Wegener
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Stefan Behnel stefan...@behnel.de
on Sunday 19 April 2009 15:08
wrote in comp.lang.python:


 Daniel Molina Wegener wrote:
 Stefan Behnel stefan...@behnel.de
 on Sunday 19 April 2009 02:25
 wrote in comp.lang.python:
 
 
 Daniel Molina Wegener wrote:
 * Every serilization is made into unicode objects.
 Hmm, does that mean that when I serialise, I get a unicode object back?
 What about the XML declaration? How can a user create well-formed XML
 from your output? Or is that not the intention?
 
   Yes, if you serialize an object you get an XML string as
 unicode object, since unicode objects supports UTF-8 and
 some other encodings.
 
 That's not what I meant. I was wondering why you chose to use a unicode
 string instead of a byte string (which XML is defined for). If your only
 intention is to deserialise the unicode string into a tree, that may be
 acceptable. However, as soon as you start writing the data to a file or
 through a network pipe, or pass it to an XML parser, you'd better make it
 well-formed XML. So you either need to encode it as UTF-8 (for which you
 do not need a declaration), or you will need to encode it in a different
 byte encoding, and then prepend a declaration yourself. In any case, this
 is a lot more overhead (and cumbersome for users) than writing out a
 correctly serialised byte string directly.

  Sorry, it appears that I've misunderstand your question. By /unicode
objects/ I mean /python unicode objects/ aka /python unicode strings/.
Most of them can be reencoded into /latin*/ strings and then /ascii/
strings if is that what you want. But for most communications, suchs as
Java systems, utf-8 encoding goes as default. I've made pyxser to
generate interoperability between python and other systems.

 
 You seemed to be very interested in good performance, so I don't quite
 understand why you want to require an additional step with a relatively
 high performance impact that only makes it harder for users to use the
 tool correctly.
 
 Stefan

Atte.
- -- 
 .O. | Daniel Molina Wegener   | FreeBSD  Linux
 ..O | dmw [at] coder [dot] cl | Open Standards
 OOO | http://coder.cl/| FOSS Developer

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (FreeBSD)

iQIcBAEBCgAGBQJJ6+7fAAoJEHxqfq6Y4O5NC3cQAKbjssbbGbIIKSAD+OKj3KCm
dyJw4PePeXnYMlbEWvYY+QRkpQbJMSRISFjOIKS3IFtUcJIuAA94XjTMuvDt8L1X
k5oClZlEOPQU3tXuuMTA6AuhZVzW4RSkz0fNhVdi6RZve+rscmjZMIWz95uygnet
ct1J6y9JRN2BmBgoBa5A72tcIvNQBx/T7Q2iUk1oUB6iLZutQejEeUeHT7p89e4d
x38+mVZqpPYoZNn4Sxwcz61LgYEYQH7sIfzup8+6qv8CiDRD6PFrP1DBcV08mtYO
PBDGyK9RBHDVPqZ0SK40uNdX3TROprllaf41XDas8602xGsgJR64xwBM9s21yWcu
Z2ovweYvwEivqJeg+H6sWvKILJIqa1tkSM/JU2Gm5//cQstt2nfx5eoW07UrLmSb
qH0T7Gvg+pd060HbkQ3bgxumG5iIAtTOoix8V5R+ILYtdxWHsEXizm+XMKQITQTt
jngR0zoFwsdSesMwUdFD/RE4vpE4z9KErBcdO3Rhc5AbyR7HZwqemQ2KiloXg207
nn3hrZOz8GgHXeIg8nAugFOxJ6b2RxDJPb8zf6vjC9FFO8PESyb3kKz+XvaeMx2f
0eVaWhgiuNTxPyC/JJWO8yVizoQA1uXPGC6H0dhGxqMNNb4K7mtcQVNUpogzHq4X
lwz4KllmL7b4YWpd983D
=TEqt
-END PGP SIGNATURE-
--
http://mail.python.org/mailman/listinfo/python-list


Re: [ANN] pyxser-0.2r --- Python XML Serialization

2009-04-19 Thread Stefan Behnel
Daniel Molina Wegener wrote:
   Sorry, it appears that I've misunderstand your question. By /unicode
 objects/ I mean /python unicode objects/ aka /python unicode strings/.

Yes, that's exactly what I'm talking about. Maybe you should read up on
what Unicode is.


 Most of them can be reencoded into /latin*/ strings and then /ascii/
 strings if is that what you want. But for most communications, suchs as
 Java systems, utf-8 encoding goes as default.

Well, then do not output a Python unicode string, but a UTF-8 encoded byte
string as the default. Except for a couple of cases, Python unicode strings
are very inconvenient for serialised XML.

Stefan
--
http://mail.python.org/mailman/listinfo/python-list


Re: [ANN] pyxser-0.2r --- Python XML Serialization

2009-04-19 Thread Stefan Behnel
Daniel Molina Wegener wrote:
   By using a different encoding than the default encoding for libxml2 makes
 the work hard for libxml2 since it requires that every #PCDATA section to be
 reencoded to the desired encoding and comparing one string conversion in
 python against many string conversion under libxml2, the program gets more
 slow performance by using a different encoding than the default encoding.

It's not that much slower, though.

http://codespeak.net/lxml/performance.html#parsing-and-serialising

Stefan
--
http://mail.python.org/mailman/listinfo/python-list