[Python-Dev] Only one week left for PyCon proposals!
There is only one week left for PyCon tutorial & scheduled talk proposals. If you've been thinking about making a proposal, now's the time! Tutorial details and instructions here: http://us.pycon.org/2008/tutorials/proposals/ Scheduled talk details and instructions here: http://us.pycon.org/2008/conference/proposals/ The deadline is Friday, November 16. Don't put it off any longer! PyCon 2008: http://us.pycon.org -- David Goodger PyCon 2008 Chair signature.asc Description: OpenPGP digital signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Declaring setters with getters
D'oh. I forgot to point to the patch. It's here: http://bugs.python.org/issue1416 On Nov 9, 2007 10:00 PM, Guido van Rossum <[EMAIL PROTECTED]> wrote: > To follow up, I now have a patch. It's pretty straightforward. > > This implements the kind of syntax that I believe won over most folks > in the end: > > @property > def foo(self): ... > > @foo.setter > def foo(self, value=None): ... > > There are also .getter and .deleter descriptors. This includes the hack > that if you specify a setter but no deleter, the setter is called > without a value argument when attempting to delete something. If the > setter isn't ready for this, a TypeError will be raised, pretty much > just as if no deleter was provided (just with a somewhat worse error > message :-). > > I intend to check this into 2.6 and 3.0 unless there is a huge cry of > dismay. Docs will be left to volunteers as always. > > --Guido > > > On Oct 31, 2007 9:08 AM, Guido van Rossum <[EMAIL PROTECTED]> wrote: > > I've come up with a relatively unobtrusive pattern for defining > > setters. Given the following definition: > > > > def propset(prop): > > assert isinstance(prop, property) > > def helper(func): > > return property(prop.__get__, func, func, prop.__doc__) > > return helper > > > > we can declare getters and setters as follows: > > > > class C(object): > > > > _encoding = None > > > > @property > > def encoding(self): > > return self._encoding > > > > @propset(encoding) > > def encoding(self, value=None): > > if value is not None: > > unicode("0", value) # Test it > > self._encoding = value > > > > c = C() > > print(c.encoding) > > c.encoding = "ascii" > > print(c.encoding) > > try: > > c.encoding = "invalid" # Fails > > except: > > pass > > print(c.encoding) > > > > I'd like to make this a standard built-in, in the hope the debate on > > how to declare settable properties. > > > > I'd also like to change property so that the doc string defaults to > > the doc string of the getter. > > > > -- > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Declaring setters with getters
To follow up, I now have a patch. It's pretty straightforward. This implements the kind of syntax that I believe won over most folks in the end: @property def foo(self): ... @foo.setter def foo(self, value=None): ... There are also .getter and .deleter descriptors. This includes the hack that if you specify a setter but no deleter, the setter is called without a value argument when attempting to delete something. If the setter isn't ready for this, a TypeError will be raised, pretty much just as if no deleter was provided (just with a somewhat worse error message :-). I intend to check this into 2.6 and 3.0 unless there is a huge cry of dismay. Docs will be left to volunteers as always. --Guido On Oct 31, 2007 9:08 AM, Guido van Rossum <[EMAIL PROTECTED]> wrote: > I've come up with a relatively unobtrusive pattern for defining > setters. Given the following definition: > > def propset(prop): > assert isinstance(prop, property) > def helper(func): > return property(prop.__get__, func, func, prop.__doc__) > return helper > > we can declare getters and setters as follows: > > class C(object): > > _encoding = None > > @property > def encoding(self): > return self._encoding > > @propset(encoding) > def encoding(self, value=None): > if value is not None: > unicode("0", value) # Test it > self._encoding = value > > c = C() > print(c.encoding) > c.encoding = "ascii" > print(c.encoding) > try: > c.encoding = "invalid" # Fails > except: > pass > print(c.encoding) > > I'd like to make this a standard built-in, in the hope the debate on > how to declare settable properties. > > I'd also like to change property so that the doc string defaults to > the doc string of the getter. > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
Martin v. Löwis wrote: >> Not really, but the codec has more control over what happens to >> the stream, ie. it's easier to implement look-ahead in the codec >> than to do the detection and then try to push the bytes back onto >> the stream (which may or may not be possible depending on the >> nature of the stream). > > YAGNI. A non-seekable stream is not all that uncommon in network processing. I usually end up either reading the complete data into memory or doing the needed buffering by hand. Regards, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 10 2007) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
"Martin v. Löwis" writes: > > It's clear to me that detecting an encoding is actually the simplest > > part of all this (so long as there's an API to do it!) Putting it > > inside a codec seems like the wrong subdivision of responsibility. > > In case it isn't clear - this is exactly my view also. But is there an API to do it? As MAL points out that API would have to return not an encoding, but a pair of an encoding and the rewound stream. For non-seekable, non-peekable streams (if any), what you'd need would be a stream that consisted of a concatenation of the buffered data used for detection and the continuation of the stream. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
> Not really, but the codec has more control over what happens to > the stream, ie. it's easier to implement look-ahead in the codec > than to do the detection and then try to push the bytes back onto > the stream (which may or may not be possible depending on the > nature of the stream). YAGNI. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
On Nov 9, 2007 3:59 PM, M.-A. Lemburg <[EMAIL PROTECTED]> wrote: > Martin v. Löwis wrote: > >> It makes working with XML data a lot easier: you simply don't have to > >> bother with the encoding of the XML data anymore and can just let the > >> codec figure out the details. The XML parser can then work directly > >> on the Unicode data. > > > > Having the functionality indeed makes things easier. However, I don't > > find > > > > s.decode(xml.detect_encoding(s)) > > > > particularly more difficult than > > > > s.decode("xml-auto-detection") > > Not really, but the codec has more control over what happens to > the stream, ie. it's easier to implement look-ahead in the codec > than to do the detection and then try to push the bytes back onto > the stream (which may or may not be possible depending on the > nature of the stream). io.BufferedReader() standardizes a .peek() API, making it trivial. I don't see why we couldn't require it. (As an aside, .peek() will fail to do what detect_encodings() needs if BufferedReader's buffer size is too small. I do wonder if that limitation is appropriate.) -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
Martin v. Löwis wrote: >> It makes working with XML data a lot easier: you simply don't have to >> bother with the encoding of the XML data anymore and can just let the >> codec figure out the details. The XML parser can then work directly >> on the Unicode data. > > Having the functionality indeed makes things easier. However, I don't > find > > s.decode(xml.detect_encoding(s)) > > particularly more difficult than > > s.decode("xml-auto-detection") Not really, but the codec has more control over what happens to the stream, ie. it's easier to implement look-ahead in the codec than to do the detection and then try to push the bytes back onto the stream (which may or may not be possible depending on the nature of the stream). >> Whether it needs to be in C or not is another question (I would have >> done this in Python since performance is not really an issue), but since >> the code is already written, why not use it ? > > It's a maintenance issue. I'm sure Walter will do a great job in maintaining the code :-) Regards, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 09 2007) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
> It's clear to me that detecting an encoding is actually the simplest > part of all this (so long as there's an API to do it!) Putting it > inside a codec seems like the wrong subdivision of responsibility. In case it isn't clear - this is exactly my view also. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
> In fact, we already have such a codec. The utf-16 decoder looks at the > first two bytes and then decides to forward the rest to either a > utf-16-be or a utf-16-le decoder. That's different. UTF-16 is a proper encoding that is just specified to use the BOM. "xml-auto-detection" is not an encoding. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
> It makes working with XML data a lot easier: you simply don't have to > bother with the encoding of the XML data anymore and can just let the > codec figure out the details. The XML parser can then work directly > on the Unicode data. Having the functionality indeed makes things easier. However, I don't find s.decode(xml.detect_encoding(s)) particularly more difficult than s.decode("xml-auto-detection") > Whether it needs to be in C or not is another question (I would have > done this in Python since performance is not really an issue), but since > the code is already written, why not use it ? It's a maintenance issue. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
On Nov 9, 2007 6:10 AM, Walter Dörwald <[EMAIL PROTECTED]> wrote: > > Martin v. Löwis wrote: > >>> Yes, an XML parser should be able to use UTF-8, UTF-16, UTF-32, etc > >>> codecs to do the encoding. There's no need to create a magical > >>> mystery codec to pick out which though. > >> So the code is good, if it is inside an XML parser, and it's bad if it > >> is inside a codec? > > > > Exactly so. This functionality just *isn't* a codec - there is no > > encoding. Instead, it is an algorithm for *detecting* an encoding. > > And what do you do once you've detected the encoding? You decode the > input, so why not combine both into an XML decoder? It seems to me that parsing XML requires 3 steps: 1) determine encoding 2) decode byte stream 3) parse XML (including handling of character references) All an xml codec does is make the first part a side-effect of the second part. Rather than this: encoding = detect_encoding(raw_data) decoded_data = raw_data.decode(encoding) tree = parse_xml(decoded_data, encoding) # Verifies encoding You'd have this: e = codecs.getincrementaldecoder("xml-auto-detect")() decoded_data = e.decode(raw_data, True) tree = parse_xml(decoded_data, e.encoding) # Verifies encoding It's clear to me that detecting an encoding is actually the simplest part of all this (so long as there's an API to do it!) Putting it inside a codec seems like the wrong subdivision of responsibility. (An example using streams would end up closer, but it still seems wrong to me. Encoding detection is always one way, while codecs are always two way (even if lossy.)) -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
> And what do you do once you've detected the encoding? You decode the > input, so why not combine both into an XML decoder? Because it is the XML parser that does the decoding, not the application. Also, it is better to provide functionality in a modular manner (i.e. encoding detection separately from encodings), and leaving integration of modules to the application, in particular if the integration is trivial. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
>> So what if the unicode string doesn't start with an XML declaration? >> Will it add one? > > No. Ok. So the XML document would be ill-formed then unless the encoding is UTF-8, right? > The point of this code is not just to return whether the string starts > with " * The string does start with " * The string starts with a prefix of " decide if it starts with " * The string definitely doesn't start with "> What bit fiddling are you referring to specifically that you think >> is better done in C than in Python? > > The code that checks the byte signature, i.e. the first part of > detect_xml_encoding_str(). I can't see any *bit* fiddling there, except for the bit mask of candidates. For the candidate list, I cannot quite understand why you need a bit mask at all, since the candidates are rarely overlapping. I think there could be a much simpler routine to have the same effect. - if it's less than 4 bytes, answer "need more data". - otherwise, implement annex F "literally". Make a dictionary of all prefixes that are exactly 4 bytes, i.e. prefixes4 = {"\x00\x00\xFE\xFF":"utf-32be", ... ..., "\0\x3c\0\x3f":"utf-16le"} try: return prefixes4[s[:4]] except KeyError: pass if s.startswith(codecs.BOM_UTF16_BE):return "utf-16be" ... if s.startswith("http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Summary of Tracker Issues
ACTIVITY SUMMARY (11/02/07 - 11/09/07) Tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 1322 open (+23) / 11575 closed (+18) / 12897 total (+41) Open issues with patches: 418 Average duration of open issues: 686 days. Median duration of open issues: 788 days. Open Issues Breakdown open 1317 (+23) pending 5 ( +0) Issues Created Or Reopened (41) ___ IDLE - minor FormatParagraph bug fix 11/02/07 http://bugs.python.org/issue1374created taleinat patch hotshot IndexError when loading stats11/02/07 http://bugs.python.org/issue1375created ratsberg uu module catches a wrong exception type 11/02/07 CLOSED http://bugs.python.org/issue1376created billiejoex test_import breaks on Linux 11/09/07 http://bugs.python.org/issue1377reopened gvanrossum py3k fromfd() and dup() for _socket on WIndows11/03/07 http://bugs.python.org/issue1378created roudkerk patch reloading imported modules sometimes fail with 'parent not in sy 11/03/07 CLOSED http://bugs.python.org/issue1379created _doublep py3k, patch fix for test_asynchat and test_asyncore on pep3137 branch11/03/07 CLOSED http://bugs.python.org/issue1380created hupp py3k, patch cmath is numerically unsound 11/03/07 http://bugs.python.org/issue1381created inducer py3k-pep3137: patch for test_ctypes 11/04/07 CLOSED http://bugs.python.org/issue1382created amaury.forgeotdarc py3k, patch Backport abcoll to 2.6 11/04/07 http://bugs.python.org/issue1383created baranguren patch Windows fix for inspect tests11/04/07 CLOSED http://bugs.python.org/issue1384created tiran py3k, patch hmac module violates RFC for some hash functions, e.g. sha51211/04/07 CLOSED http://bugs.python.org/issue1385created jowagner py3k py3k-pep3137: patch to ensure that all codecs return bytes 11/04/07 CLOSED http://bugs.python.org/issue1386created amaury.forgeotdarc py3k, patch py3k-pep3137: patch for hashlib on Windows 11/04/07 CLOSED http://bugs.python.org/issue1387created amaury.forgeotdarc py3k, patch py3k-pep3137: possible ref leak in ctypes11/05/07 CLOSED http://bugs.python.org/issue1388created tiran py3k py3k-pep3137: struct module is leaking references11/05/07 CLOSED http://bugs.python.org/issue1389created tiran py3k toxml generates output that is not well formed 11/05/07 http://bugs.python.org/issue1390created drtomc Adds the .compact() method to bsddb db.DB objects11/05/07 http://bugs.python.org/issue1391created gregory.p.smith patch, rfe py3k-pep3137: issue warnings / errors on str(bytes()) and simila 11/05/07 CLOSED http://bugs.python.org/issue1392created tiran py3k, patch function comparing lacks NotImplemented error11/05/07
Re: [Python-Dev] Bug tracker: meaning of resolution keywords
Christian Heimes wrote: > (*) It's missing from the list of resolutions but I like to have it > added. http://psf.upfronthosting.co.za/roundup/meta/issue167 Update: Georg Brandl pointed out that it makes more sense to add confirmed to status. Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Bug tracker: meaning of resolution keywords
Hello! Guido has granted me committer privileges to svn.python.org and bugs.python.org about a week ago. So I'm new and new people tend to make mistakes until they've learned the specific rules of a project. Today I've learned that the resolution keyword "accepted" doesn't mean the bug report is accepted. It only means a patch for the bug is accepted. In the past I've used "accepted" in the meaning of "bug is confirmed" in my own projects. In my ignorance I've used it in the same way to mark bugs as confirmed when I was able to reproduce the bug myself. The tracker doc at http://wiki.python.org/moin/TrackerDocs/ doesn't have a formal definition of the various keywords. I like to add a definition to the wiki to prevent others from making the same mistake. But first I like to discuss my view of the keywords Resolutions *** accepted - patch accepted confirmed (*) - the problem is confirmed duplicate - the bug is a duplicated of another bug fixed - the bug is fixed / patch is applied invalid - catch all for invalid reports later - the problem is going to be addressed later in the release cycle out of date - the bug was already fixed in svn postponed - the problem is going to be fixed in the next minor version rejected - the patch or feature request is rejected remind - remind me to finish the task (docs, unit tests) wont fix - it's not a bug, it's a feature works for me - unable to reproduce the problem (*) It's missing from the list of resolutions but I like to have it added. http://psf.upfronthosting.co.za/roundup/meta/issue167 Priority *** immediate - the bug must be fixed *NOW* (only used for important security related problems) urgent - the problem must be fixed ASAP because it's crucial for future development high - the problem should be fixed soonish and must be fixed for the next release normal - the problem should be fixed for the next release low - nice to have features and fixes Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
On Nov 9, 2007, at 8:22 AM, M.-A. Lemburg wrote: > FWIW: I'm +1 on adding such a codec. I'm undecided, and really don't feel strongly either way. > It makes working with XML data a lot easier: you simply don't have to > bother with the encoding of the XML data anymore and can just let the > codec figure out the details. The XML parser can then work directly > on the Unicode data. Which is fine if you want to write a new parser. I've no interest in that myself. > Whether it needs to be in C or not is another question (I would have > done this in Python since performance is not really an issue), but > since > the code is already written, why not use it ? The reason not to use C is the usual one: The implementation is more cross-implementation if it's written in Python. This makes it more useful with Jython, IronPython, and PyPy. That seems a pretty good reason to me. -Fred -- Fred Drake ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
M.-A. Lemburg wrote: > On 2007-11-09 14:10, Walter Dörwald wrote: >> Martin v. Löwis wrote: > Yes, an XML parser should be able to use UTF-8, UTF-16, UTF-32, etc > codecs to do the encoding. There's no need to create a magical > mystery codec to pick out which though. So the code is good, if it is inside an XML parser, and it's bad if it is inside a codec? >>> Exactly so. This functionality just *isn't* a codec - there is no >>> encoding. Instead, it is an algorithm for *detecting* an encoding. >> And what do you do once you've detected the encoding? You decode the >> input, so why not combine both into an XML decoder? > > FWIW: I'm +1 on adding such a codec. > > It makes working with XML data a lot easier: you simply don't have to > bother with the encoding of the XML data anymore and can just let the > codec figure out the details. The XML parser can then work directly > on the Unicode data. Exactly. I have a version of sgmlop lying around that does that. > Whether it needs to be in C or not is another question (I would have > done this in Python since performance is not really an issue), but since > the code is already written, why not use it ? Servus, Walter ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
Walter Dörwald wrote: > Martin v. Löwis wrote: Yes, an XML parser should be able to use UTF-8, UTF-16, UTF-32, etc codecs to do the encoding. There's no need to create a magical mystery codec to pick out which though. >>> So the code is good, if it is inside an XML parser, and it's bad if it >>> is inside a codec? >> Exactly so. This functionality just *isn't* a codec - there is no >> encoding. Instead, it is an algorithm for *detecting* an encoding. > > And what do you do once you've detected the encoding? You decode the > input, so why not combine both into an XML decoder? In fact, we already have such a codec. The utf-16 decoder looks at the first two bytes and then decides to forward the rest to either a utf-16-be or a utf-16-le decoder. Servus, Walter ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
On 2007-11-09 14:10, Walter Dörwald wrote: > Martin v. Löwis wrote: Yes, an XML parser should be able to use UTF-8, UTF-16, UTF-32, etc codecs to do the encoding. There's no need to create a magical mystery codec to pick out which though. >>> So the code is good, if it is inside an XML parser, and it's bad if it >>> is inside a codec? >> Exactly so. This functionality just *isn't* a codec - there is no >> encoding. Instead, it is an algorithm for *detecting* an encoding. > > And what do you do once you've detected the encoding? You decode the > input, so why not combine both into an XML decoder? FWIW: I'm +1 on adding such a codec. It makes working with XML data a lot easier: you simply don't have to bother with the encoding of the XML data anymore and can just let the codec figure out the details. The XML parser can then work directly on the Unicode data. Whether it needs to be in C or not is another question (I would have done this in Python since performance is not really an issue), but since the code is already written, why not use it ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 09 2007) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
Martin v. Löwis wrote: >>> Yes, an XML parser should be able to use UTF-8, UTF-16, UTF-32, etc >>> codecs to do the encoding. There's no need to create a magical >>> mystery codec to pick out which though. >> So the code is good, if it is inside an XML parser, and it's bad if it >> is inside a codec? > > Exactly so. This functionality just *isn't* a codec - there is no > encoding. Instead, it is an algorithm for *detecting* an encoding. And what do you do once you've detected the encoding? You decode the input, so why not combine both into an XML decoder? Servus, Walter ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
Martin v. Löwis wrote: >> Because you can force the encoder to use a specified encoding. If you do >> this and the unicode string starts with an XML declaration > > So what if the unicode string doesn't start with an XML declaration? > Will it add one? No. > If so, what version number will it use? If we added this we could add an extra argument version to the encoder constructor defaulting to '1.0'. OK, so should I put the C code into a _xml module? >>> I don't see the need for C code at all. >> Doing the bit fiddling for >> Modules/_codecsmodule.c::detect_xml_encoding_str() in C felt like the >> right thing to do. > > Hmm. I don't think a sequence like > > +if (strlen>0) > +{ > +if (*str++ != '<') > +return 1; > +if (strlen>1) > +{ > +if (*str++ != '?') > +return 1; > +if (strlen>2) > +{ > +if (*str++ != 'x') > +return 1; > +if (strlen>3) > +{ > +if (*str++ != 'm') > +return 1; > +if (strlen>4) > +{ > +if (*str++ != 'l') > +return 1; > +if (strlen>5) > +{ > +if (*str != ' ' && *str != '\t' && *str != > '\r' && *str != '\n') > +return 1; > > is well-maintainable C. I feel it is much better writing > > if not s.startswith("<=?xml"): > return 1 The point of this code is not just to return whether the string starts with " What bit fiddling are you referring to specifically that you think > is better done in C than in Python? The code that checks the byte signature, i.e. the first part of detect_xml_encoding_str(). Servus, Walter ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
> Because you can force the encoder to use a specified encoding. If you do > this and the unicode string starts with an XML declaration So what if the unicode string doesn't start with an XML declaration? Will it add one? If so, what version number will it use? >>> OK, so should I put the C code into a _xml module? >> I don't see the need for C code at all. > > Doing the bit fiddling for > Modules/_codecsmodule.c::detect_xml_encoding_str() in C felt like the > right thing to do. Hmm. I don't think a sequence like +if (strlen>0) +{ +if (*str++ != '<') +return 1; +if (strlen>1) +{ +if (*str++ != '?') +return 1; +if (strlen>2) +{ +if (*str++ != 'x') +return 1; +if (strlen>3) +{ +if (*str++ != 'm') +return 1; +if (strlen>4) +{ +if (*str++ != 'l') +return 1; +if (strlen>5) +{ +if (*str != ' ' && *str != '\t' && *str != '\r' && *str != '\n') +return 1; is well-maintainable C. I feel it is much better writing if not s.startswith("<=?xml"): return 1 What bit fiddling are you referring to specifically that you think is better done in C than in Python? Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
>> Yes, an XML parser should be able to use UTF-8, UTF-16, UTF-32, etc >> codecs to do the encoding. There's no need to create a magical >> mystery codec to pick out which though. > > So the code is good, if it is inside an XML parser, and it's bad if it > is inside a codec? Exactly so. This functionality just *isn't* a codec - there is no encoding. Instead, it is an algorithm for *detecting* an encoding. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
Martin v. Löwis wrote: >> ci = codecs.lookup("xml-auto-detect") >> p = expat.ParserCreate() >> e = "utf-32" >> s = (u"" % e).encode(e) >> s = ci.encode(ci.decode(s)[0], encoding="utf-8")[0] >> p.Parse(s, True) > > So how come the document being parsed is recognized as UTF-8? Because you can force the encoder to use a specified encoding. If you do this and the unicode string starts with an XML declaration, the encoder will put the specified encoding into the declaration: import codecs e = codecs.getencoder("xml-auto-detect") print e(u"", encoding="utf-8")[0] This prints: >> OK, so should I put the C code into a _xml module? > > I don't see the need for C code at all. Doing the bit fiddling for Modules/_codecsmodule.c::detect_xml_encoding_str() in C felt like the right thing to do. Servus, Walter ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
Adam Olsen wrote: > On 11/8/07, Walter Dörwald <[EMAIL PROTECTED]> wrote: >> [...] Furthermore encoding-detection might be part of the responsibility of the XML parser, but this decoding phase is totally distinct from the parsing phase, so why not put the decoding into a common library? >>> I would not object to that - just to expose it as a codec. Adding it >>> to the XML library is fine, IMO. >> But it does make sense as a codec. The decoding phase of an XML parser >> has to turn a byte stream into a unicode stream. That's the job of a codec. > > Yes, an XML parser should be able to use UTF-8, UTF-16, UTF-32, etc > codecs to do the encoding. There's no need to create a magical > mystery codec to pick out which though. So the code is good, if it is inside an XML parser, and it's bad if it is inside a codec? > It's not even sufficient for > XML: > > 1) round-tripping a file should be done in the original encoding. > Containing the auto-detected encoding within a codec doesn't let you > see what it picked. The chosen encoding is available from the incremental encoder: import codecs e = codecs.getincrementalencoder("xml-auto-detect")() e.encode(u"", True) print e.encoding This prints utf-32. > 2) the encoding may be specified externally from the file/stream[1]. > The xml parser needs to handle these out-of-band encodings anyway. It does. You can pass an encoding to the stateless decoder, the incremental decoder and the streamreader. It will then use this encoding instead the one detected from the byte stream. It even will put the correct encoding into the XML declaration (if there is one): import codecs d = codecs.getdecoder("xml-auto-detect") print d("", encoding="utf-8")[0] This prints: Servus, Walter ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com