[Python-Dev] Only one week left for PyCon proposals!

2007-11-09 Thread David Goodger
There is only one week left for PyCon tutorial & scheduled talk proposals.  If
you've been thinking about making a proposal, now's the time!

Tutorial details and instructions here:
http://us.pycon.org/2008/tutorials/proposals/

Scheduled talk details and instructions here:
http://us.pycon.org/2008/conference/proposals/

The deadline is Friday, November 16.  Don't put it off any longer!

PyCon 2008: http://us.pycon.org

-- 
David Goodger
PyCon 2008 Chair



signature.asc
Description: OpenPGP digital signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Declaring setters with getters

2007-11-09 Thread Guido van Rossum
D'oh. I forgot to point to the patch. It's here:
http://bugs.python.org/issue1416

On Nov 9, 2007 10:00 PM, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> To follow up, I now have a patch. It's pretty straightforward.
>
> This implements the kind of syntax that I believe won over most folks
> in the end:
>
>   @property
>   def foo(self): ...
>
>   @foo.setter
>   def foo(self, value=None): ...
>
> There are also .getter and .deleter descriptors.  This includes the hack
> that if you specify a setter but no deleter, the setter is called
> without a value argument when attempting to delete something.  If the
> setter isn't ready for this, a TypeError will be raised, pretty much
> just as if no deleter was provided (just with a somewhat worse error
> message :-).
>
> I intend to check this into 2.6 and 3.0 unless there is a huge cry of
> dismay.  Docs will be left to volunteers as always.
>
> --Guido
>
>
> On Oct 31, 2007 9:08 AM, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> > I've come up with a relatively unobtrusive pattern for defining
> > setters. Given the following definition:
> >
> > def propset(prop):
> > assert isinstance(prop, property)
> > def helper(func):
> > return property(prop.__get__, func, func, prop.__doc__)
> > return helper
> >
> > we can declare getters and setters as follows:
> >
> > class C(object):
> >
> > _encoding = None
> >
> > @property
> > def encoding(self):
> > return self._encoding
> >
> > @propset(encoding)
> > def encoding(self, value=None):
> > if value is not None:
> > unicode("0", value)  # Test it
> > self._encoding = value
> >
> > c = C()
> > print(c.encoding)
> > c.encoding = "ascii"
> > print(c.encoding)
> > try:
> > c.encoding = "invalid"  # Fails
> > except:
> > pass
> > print(c.encoding)
> >
> > I'd like to make this a standard built-in, in the hope the debate on
> > how to declare settable properties.
> >
> > I'd also like to change property so that the doc string defaults to
> > the doc string of the getter.
> >
> > --
> > --Guido van Rossum (home page: http://www.python.org/~guido/)
> >
>
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>



-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Declaring setters with getters

2007-11-09 Thread Guido van Rossum
To follow up, I now have a patch. It's pretty straightforward.

This implements the kind of syntax that I believe won over most folks
in the end:

  @property
  def foo(self): ...

  @foo.setter
  def foo(self, value=None): ...

There are also .getter and .deleter descriptors.  This includes the hack
that if you specify a setter but no deleter, the setter is called
without a value argument when attempting to delete something.  If the
setter isn't ready for this, a TypeError will be raised, pretty much
just as if no deleter was provided (just with a somewhat worse error
message :-).

I intend to check this into 2.6 and 3.0 unless there is a huge cry of
dismay.  Docs will be left to volunteers as always.

--Guido

On Oct 31, 2007 9:08 AM, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> I've come up with a relatively unobtrusive pattern for defining
> setters. Given the following definition:
>
> def propset(prop):
> assert isinstance(prop, property)
> def helper(func):
> return property(prop.__get__, func, func, prop.__doc__)
> return helper
>
> we can declare getters and setters as follows:
>
> class C(object):
>
> _encoding = None
>
> @property
> def encoding(self):
> return self._encoding
>
> @propset(encoding)
> def encoding(self, value=None):
> if value is not None:
> unicode("0", value)  # Test it
> self._encoding = value
>
> c = C()
> print(c.encoding)
> c.encoding = "ascii"
> print(c.encoding)
> try:
> c.encoding = "invalid"  # Fails
> except:
> pass
> print(c.encoding)
>
> I'd like to make this a standard built-in, in the hope the debate on
> how to declare settable properties.
>
> I'd also like to change property so that the doc string defaults to
> the doc string of the getter.
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>



-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] XML codec?

2007-11-09 Thread M.-A. Lemburg
Martin v. Löwis wrote:
>> Not really, but the codec has more control over what happens to
>> the stream, ie. it's easier to implement look-ahead in the codec
>> than to do the detection and then try to push the bytes back onto
>> the stream (which may or may not be possible depending on the
>> nature of the stream).
> 
> YAGNI.

A non-seekable stream is not all that uncommon in network processing.
I usually end up either reading the complete data into memory
or doing the needed buffering by hand.

Regards,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 10 2007)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


 Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] XML codec?

2007-11-09 Thread Stephen J. Turnbull
"Martin v. Löwis" writes:

 > > It's clear to me that detecting an encoding is actually the simplest
 > > part of all this (so long as there's an API to do it!)  Putting it
 > > inside a codec seems like the wrong subdivision of responsibility.
 > 
 > In case it isn't clear - this is exactly my view also.

But is there an API to do it?  As MAL points out that API would have
to return not an encoding, but a pair of an encoding and the rewound
stream.  For non-seekable, non-peekable streams (if any), what you'd
need would be a stream that consisted of a concatenation of the
buffered data used for detection and the continuation of the stream.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] XML codec?

2007-11-09 Thread Martin v. Löwis
> Not really, but the codec has more control over what happens to
> the stream, ie. it's easier to implement look-ahead in the codec
> than to do the detection and then try to push the bytes back onto
> the stream (which may or may not be possible depending on the
> nature of the stream).

YAGNI.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] XML codec?

2007-11-09 Thread Adam Olsen
On Nov 9, 2007 3:59 PM, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> Martin v. Löwis wrote:
> >> It makes working with XML data a lot easier: you simply don't have to
> >> bother with the encoding of the XML data anymore and can just let the
> >> codec figure out the details. The XML parser can then work directly
> >> on the Unicode data.
> >
> > Having the functionality indeed makes things easier. However, I don't
> > find
> >
> >   s.decode(xml.detect_encoding(s))
> >
> > particularly more difficult than
> >
> >   s.decode("xml-auto-detection")
>
> Not really, but the codec has more control over what happens to
> the stream, ie. it's easier to implement look-ahead in the codec
> than to do the detection and then try to push the bytes back onto
> the stream (which may or may not be possible depending on the
> nature of the stream).

io.BufferedReader() standardizes a .peek() API, making it trivial.  I
don't see why we couldn't require it.

(As an aside, .peek() will fail to do what detect_encodings() needs if
BufferedReader's buffer size is too small.  I do wonder if that
limitation is appropriate.)


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] XML codec?

2007-11-09 Thread M.-A. Lemburg
Martin v. Löwis wrote:
>> It makes working with XML data a lot easier: you simply don't have to
>> bother with the encoding of the XML data anymore and can just let the
>> codec figure out the details. The XML parser can then work directly
>> on the Unicode data.
> 
> Having the functionality indeed makes things easier. However, I don't
> find
> 
>   s.decode(xml.detect_encoding(s))
> 
> particularly more difficult than
> 
>   s.decode("xml-auto-detection")

Not really, but the codec has more control over what happens to
the stream, ie. it's easier to implement look-ahead in the codec
than to do the detection and then try to push the bytes back onto
the stream (which may or may not be possible depending on the
nature of the stream).

>> Whether it needs to be in C or not is another question (I would have
>> done this in Python since performance is not really an issue), but since
>> the code is already written, why not use it ?
> 
> It's a maintenance issue.

I'm sure Walter will do a great job in maintaining the code :-)

Regards,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 09 2007)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


 Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] XML codec?

2007-11-09 Thread Martin v. Löwis
> It's clear to me that detecting an encoding is actually the simplest
> part of all this (so long as there's an API to do it!)  Putting it
> inside a codec seems like the wrong subdivision of responsibility.

In case it isn't clear - this is exactly my view also.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] XML codec?

2007-11-09 Thread Martin v. Löwis
> In fact, we already have such a codec. The utf-16 decoder looks at the
> first two bytes and then decides to forward the rest to either a
> utf-16-be or a utf-16-le decoder.

That's different. UTF-16 is a proper encoding that is just specified
to use the BOM. "xml-auto-detection" is not an encoding.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] XML codec?

2007-11-09 Thread Martin v. Löwis
> It makes working with XML data a lot easier: you simply don't have to
> bother with the encoding of the XML data anymore and can just let the
> codec figure out the details. The XML parser can then work directly
> on the Unicode data.

Having the functionality indeed makes things easier. However, I don't
find

  s.decode(xml.detect_encoding(s))

particularly more difficult than

  s.decode("xml-auto-detection")

> Whether it needs to be in C or not is another question (I would have
> done this in Python since performance is not really an issue), but since
> the code is already written, why not use it ?

It's a maintenance issue.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] XML codec?

2007-11-09 Thread Adam Olsen
On Nov 9, 2007 6:10 AM, Walter Dörwald <[EMAIL PROTECTED]> wrote:
>
> Martin v. Löwis wrote:
> >>> Yes, an XML parser should be able to use UTF-8, UTF-16, UTF-32, etc
> >>> codecs to do the encoding.  There's no need to create a magical
> >>> mystery codec to pick out which though.
> >> So the code is good, if it is inside an XML parser, and it's bad if it
> >> is inside a codec?
> >
> > Exactly so. This functionality just *isn't* a codec - there is no
> > encoding. Instead, it is an algorithm for *detecting* an encoding.
>
> And what do you do once you've detected the encoding? You decode the
> input, so why not combine both into an XML decoder?

It seems to me that parsing XML requires 3 steps:
1) determine encoding
2) decode byte stream
3) parse XML (including handling of character references)

All an xml codec does is make the first part a side-effect of the
second part.  Rather than this:

encoding = detect_encoding(raw_data)
decoded_data = raw_data.decode(encoding)
tree = parse_xml(decoded_data, encoding)  # Verifies encoding

You'd have this:

e = codecs.getincrementaldecoder("xml-auto-detect")()
decoded_data = e.decode(raw_data, True)
tree = parse_xml(decoded_data, e.encoding)  # Verifies encoding

It's clear to me that detecting an encoding is actually the simplest
part of all this (so long as there's an API to do it!)  Putting it
inside a codec seems like the wrong subdivision of responsibility.

(An example using streams would end up closer, but it still seems
wrong to me.  Encoding detection is always one way, while codecs are
always two way (even if lossy.))

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] XML codec?

2007-11-09 Thread Martin v. Löwis
> And what do you do once you've detected the encoding? You decode the
> input, so why not combine both into an XML decoder?

Because it is the XML parser that does the decoding, not the
application. Also, it is better to provide functionality in
a modular manner (i.e. encoding detection separately from
encodings), and leaving integration of modules to the application,
in particular if the integration is trivial.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] XML codec?

2007-11-09 Thread Martin v. Löwis
>> So what if the unicode string doesn't start with an XML declaration?
>> Will it add one?
> 
> No.

Ok. So the XML document would be ill-formed then unless the encoding is
UTF-8, right?

> The point of this code is not just to return whether the string starts
> with "   * The string does start with "   * The string starts with a prefix of " decide if it starts with "   * The string definitely doesn't start with "> What bit fiddling are you referring to specifically that you think
>> is better done in C than in Python?
> 
> The code that checks the byte signature, i.e. the first part of
> detect_xml_encoding_str().

I can't see any *bit* fiddling there, except for the bit mask of
candidates. For the candidate list, I cannot quite understand why
you need a bit mask at all, since the candidates are rarely
overlapping.

I think there could be a much simpler routine to have the same
effect.
- if it's less than 4 bytes, answer "need more data".
- otherwise, implement annex F "literally". Make a dictionary
  of all prefixes that are exactly 4 bytes, i.e.

  prefixes4 = {"\x00\x00\xFE\xFF":"utf-32be", ...
  ...,  "\0\x3c\0\x3f":"utf-16le"}

  try: return prefixes4[s[:4]]
  except KeyError: pass
  if s.startswith(codecs.BOM_UTF16_BE):return "utf-16be"
  ...
  if s.startswith("http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Summary of Tracker Issues

2007-11-09 Thread Tracker

ACTIVITY SUMMARY (11/02/07 - 11/09/07)
Tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue 
number.  Do NOT respond to this message.


 1322 open (+23) / 11575 closed (+18) / 12897 total (+41)

Open issues with patches:   418

Average duration of open issues: 686 days.
Median duration of open issues: 788 days.

Open Issues Breakdown
   open  1317 (+23)
pending 5 ( +0)

Issues Created Or Reopened (41)
___

IDLE - minor FormatParagraph bug fix 11/02/07
   http://bugs.python.org/issue1374created  taleinat 
   patch   

hotshot IndexError when loading stats11/02/07
   http://bugs.python.org/issue1375created  ratsberg 
   

uu module catches a wrong exception type 11/02/07
CLOSED http://bugs.python.org/issue1376created  billiejoex   
   

test_import breaks on Linux  11/09/07
   http://bugs.python.org/issue1377reopened gvanrossum   
   py3k

fromfd() and dup() for _socket on WIndows11/03/07
   http://bugs.python.org/issue1378created  roudkerk 
   patch   

reloading imported modules sometimes fail with 'parent not in sy 11/03/07
CLOSED http://bugs.python.org/issue1379created  _doublep 
   py3k, patch 

fix for test_asynchat and test_asyncore on pep3137 branch11/03/07
CLOSED http://bugs.python.org/issue1380created  hupp 
   py3k, patch 

cmath is numerically unsound 11/03/07
   http://bugs.python.org/issue1381created  inducer  
   

py3k-pep3137: patch for test_ctypes  11/04/07
CLOSED http://bugs.python.org/issue1382created  amaury.forgeotdarc   
   py3k, patch 

Backport abcoll to 2.6   11/04/07
   http://bugs.python.org/issue1383created  baranguren   
   patch   

Windows fix for inspect tests11/04/07
CLOSED http://bugs.python.org/issue1384created  tiran
   py3k, patch 

hmac module violates RFC for some hash functions, e.g. sha51211/04/07
CLOSED http://bugs.python.org/issue1385created  jowagner 
   py3k

py3k-pep3137: patch to ensure that all codecs return bytes   11/04/07
CLOSED http://bugs.python.org/issue1386created  amaury.forgeotdarc   
   py3k, patch 

py3k-pep3137: patch for hashlib on Windows   11/04/07
CLOSED http://bugs.python.org/issue1387created  amaury.forgeotdarc   
   py3k, patch 

py3k-pep3137: possible ref leak in ctypes11/05/07
CLOSED http://bugs.python.org/issue1388created  tiran
   py3k

py3k-pep3137: struct module is leaking references11/05/07
CLOSED http://bugs.python.org/issue1389created  tiran
   py3k

toxml generates output that is not well formed   11/05/07
   http://bugs.python.org/issue1390created  drtomc   
   

Adds the .compact() method to bsddb db.DB objects11/05/07
   http://bugs.python.org/issue1391created  gregory.p.smith  
   patch, rfe  

py3k-pep3137: issue warnings / errors on str(bytes()) and simila 11/05/07
CLOSED http://bugs.python.org/issue1392created  tiran
   py3k, patch 

function comparing lacks NotImplemented error11/05/07

Re: [Python-Dev] Bug tracker: meaning of resolution keywords

2007-11-09 Thread Christian Heimes
Christian Heimes wrote:
> (*) It's missing from the list of resolutions but I like to have it
> added. http://psf.upfronthosting.co.za/roundup/meta/issue167

Update:
Georg Brandl pointed out that it makes more sense to add confirmed to
status.

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Bug tracker: meaning of resolution keywords

2007-11-09 Thread Christian Heimes
Hello!

Guido has granted me committer privileges to svn.python.org and
bugs.python.org about a week ago. So I'm new and new people tend to make
mistakes until they've learned the specific rules of a project.

Today I've learned that the resolution keyword "accepted" doesn't mean
the bug report is accepted. It only means a patch for the bug is
accepted. In the past I've used "accepted" in the meaning of "bug is
confirmed" in my own projects. In my ignorance I've used it in the same
way to mark bugs as confirmed when I was able to reproduce the bug myself.

The tracker doc at http://wiki.python.org/moin/TrackerDocs/ doesn't have
a formal definition of the various keywords. I like to add a definition
to the wiki to prevent others from making the same mistake. But first I
like to discuss my view of the keywords

Resolutions
***

accepted - patch accepted
confirmed (*) - the problem is confirmed
duplicate - the bug is a duplicated of another bug
fixed - the bug is fixed / patch is applied
invalid - catch all for invalid reports
later - the problem is going to be addressed later in the release cycle
out of date - the bug was already fixed in svn
postponed - the problem is going to be fixed in the next minor version
rejected - the patch or feature request is rejected
remind - remind me to finish the task (docs, unit tests)
wont fix - it's not a bug, it's a feature
works for me - unable to reproduce the problem

(*) It's missing from the list of resolutions but I like to have it
added. http://psf.upfronthosting.co.za/roundup/meta/issue167

Priority
***
immediate - the bug must be fixed *NOW* (only used for important
security related problems)
urgent - the problem must be fixed ASAP because it's crucial for future
development
high - the problem should be fixed soonish and must be fixed for the
next release
normal - the problem should be fixed for the next release
low - nice to have features and fixes

Christian

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] XML codec?

2007-11-09 Thread Fred Drake
On Nov 9, 2007, at 8:22 AM, M.-A. Lemburg wrote:
> FWIW: I'm +1 on adding such a codec.

I'm undecided, and really don't feel strongly either way.

> It makes working with XML data a lot easier: you simply don't have to
> bother with the encoding of the XML data anymore and can just let the
> codec figure out the details. The XML parser can then work directly
> on the Unicode data.

Which is fine if you want to write a new parser.  I've no interest in  
that myself.

> Whether it needs to be in C or not is another question (I would have
> done this in Python since performance is not really an issue), but  
> since
> the code is already written, why not use it ?

The reason not to use C is the usual one:  The implementation is more  
cross-implementation if it's written in Python.  This makes it more  
useful with Jython, IronPython, and PyPy.

That seems a pretty good reason to me.


   -Fred

-- 
Fred Drake   




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] XML codec?

2007-11-09 Thread Walter Dörwald
M.-A. Lemburg wrote:

> On 2007-11-09 14:10, Walter Dörwald wrote:
>> Martin v. Löwis wrote:
> Yes, an XML parser should be able to use UTF-8, UTF-16, UTF-32, etc
> codecs to do the encoding.  There's no need to create a magical
> mystery codec to pick out which though.
 So the code is good, if it is inside an XML parser, and it's bad if it
 is inside a codec?
>>> Exactly so. This functionality just *isn't* a codec - there is no
>>> encoding. Instead, it is an algorithm for *detecting* an encoding.
>> And what do you do once you've detected the encoding? You decode the
>> input, so why not combine both into an XML decoder?
> 
> FWIW: I'm +1 on adding such a codec.
> 
> It makes working with XML data a lot easier: you simply don't have to
> bother with the encoding of the XML data anymore and can just let the
> codec figure out the details. The XML parser can then work directly
> on the Unicode data.

Exactly. I have a version of sgmlop lying around that does that.

> Whether it needs to be in C or not is another question (I would have
> done this in Python since performance is not really an issue), but since
> the code is already written, why not use it ?

Servus,
   Walter
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] XML codec?

2007-11-09 Thread Walter Dörwald
Walter Dörwald wrote:
> Martin v. Löwis wrote:
 Yes, an XML parser should be able to use UTF-8, UTF-16, UTF-32, etc
 codecs to do the encoding.  There's no need to create a magical
 mystery codec to pick out which though.
>>> So the code is good, if it is inside an XML parser, and it's bad if it
>>> is inside a codec?
>> Exactly so. This functionality just *isn't* a codec - there is no
>> encoding. Instead, it is an algorithm for *detecting* an encoding.
> 
> And what do you do once you've detected the encoding? You decode the
> input, so why not combine both into an XML decoder?

In fact, we already have such a codec. The utf-16 decoder looks at the
first two bytes and then decides to forward the rest to either a
utf-16-be or a utf-16-le decoder.

Servus,
   Walter
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] XML codec?

2007-11-09 Thread M.-A. Lemburg
On 2007-11-09 14:10, Walter Dörwald wrote:
> Martin v. Löwis wrote:
 Yes, an XML parser should be able to use UTF-8, UTF-16, UTF-32, etc
 codecs to do the encoding.  There's no need to create a magical
 mystery codec to pick out which though.
>>> So the code is good, if it is inside an XML parser, and it's bad if it
>>> is inside a codec?
>> Exactly so. This functionality just *isn't* a codec - there is no
>> encoding. Instead, it is an algorithm for *detecting* an encoding.
> 
> And what do you do once you've detected the encoding? You decode the
> input, so why not combine both into an XML decoder?

FWIW: I'm +1 on adding such a codec.

It makes working with XML data a lot easier: you simply don't have to
bother with the encoding of the XML data anymore and can just let the
codec figure out the details. The XML parser can then work directly
on the Unicode data.

Whether it needs to be in C or not is another question (I would have
done this in Python since performance is not really an issue), but since
the code is already written, why not use it ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 09 2007)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


 Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] XML codec?

2007-11-09 Thread Walter Dörwald
Martin v. Löwis wrote:
>>> Yes, an XML parser should be able to use UTF-8, UTF-16, UTF-32, etc
>>> codecs to do the encoding.  There's no need to create a magical
>>> mystery codec to pick out which though.
>> So the code is good, if it is inside an XML parser, and it's bad if it
>> is inside a codec?
> 
> Exactly so. This functionality just *isn't* a codec - there is no
> encoding. Instead, it is an algorithm for *detecting* an encoding.

And what do you do once you've detected the encoding? You decode the
input, so why not combine both into an XML decoder?

Servus,
   Walter

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] XML codec?

2007-11-09 Thread Walter Dörwald
Martin v. Löwis wrote:

>> Because you can force the encoder to use a specified encoding. If you do
>> this and the unicode string starts with an XML declaration
> 
> So what if the unicode string doesn't start with an XML declaration?
> Will it add one?

No.

> If so, what version number will it use?

If we added this we could add an extra argument version to the encoder
constructor defaulting to '1.0'.

 OK, so should I put the C code into a _xml module?
>>> I don't see the need for C code at all.
>> Doing the bit fiddling for
>> Modules/_codecsmodule.c::detect_xml_encoding_str() in C felt like the
>> right thing to do.
> 
> Hmm. I don't think a sequence like
> 
> +if (strlen>0)
> +{
> +if (*str++ != '<')
> +return 1;
> +if (strlen>1)
> +{
> +if (*str++ != '?')
> +return 1;
> +if (strlen>2)
> +{
> +if (*str++ != 'x')
> +return 1;
> +if (strlen>3)
> +{
> +if (*str++ != 'm')
> +return 1;
> +if (strlen>4)
> +{
> +if (*str++ != 'l')
> +return 1;
> +if (strlen>5)
> +{
> +if (*str != ' ' && *str != '\t' && *str !=
> '\r' && *str != '\n')
> +return 1;
> 
> is well-maintainable C. I feel it is much better writing
> 
>   if not s.startswith("<=?xml"):
>  return 1

The point of this code is not just to return whether the string starts
with " What bit fiddling are you referring to specifically that you think
> is better done in C than in Python?

The code that checks the byte signature, i.e. the first part of
detect_xml_encoding_str().

Servus,
   Walter




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] XML codec?

2007-11-09 Thread Martin v. Löwis
> Because you can force the encoder to use a specified encoding. If you do
> this and the unicode string starts with an XML declaration

So what if the unicode string doesn't start with an XML declaration?
Will it add one? If so, what version number will it use?

>>> OK, so should I put the C code into a _xml module?
>> I don't see the need for C code at all.
> 
> Doing the bit fiddling for
> Modules/_codecsmodule.c::detect_xml_encoding_str() in C felt like the
> right thing to do.

Hmm. I don't think a sequence like

+if (strlen>0)
+{
+if (*str++ != '<')
+return 1;
+if (strlen>1)
+{
+if (*str++ != '?')
+return 1;
+if (strlen>2)
+{
+if (*str++ != 'x')
+return 1;
+if (strlen>3)
+{
+if (*str++ != 'm')
+return 1;
+if (strlen>4)
+{
+if (*str++ != 'l')
+return 1;
+if (strlen>5)
+{
+if (*str != ' ' && *str != '\t' && *str !=
'\r' && *str != '\n')
+return 1;

is well-maintainable C. I feel it is much better writing

  if not s.startswith("<=?xml"):
 return 1

What bit fiddling are you referring to specifically that you think
is better done in C than in Python?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] XML codec?

2007-11-09 Thread Martin v. Löwis
>> Yes, an XML parser should be able to use UTF-8, UTF-16, UTF-32, etc
>> codecs to do the encoding.  There's no need to create a magical
>> mystery codec to pick out which though.
> 
> So the code is good, if it is inside an XML parser, and it's bad if it
> is inside a codec?

Exactly so. This functionality just *isn't* a codec - there is no
encoding. Instead, it is an algorithm for *detecting* an encoding.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] XML codec?

2007-11-09 Thread Walter Dörwald
Martin v. Löwis wrote:

>> ci = codecs.lookup("xml-auto-detect")
>> p = expat.ParserCreate()
>> e = "utf-32"
>> s = (u"" % e).encode(e)
>> s = ci.encode(ci.decode(s)[0], encoding="utf-8")[0]
>> p.Parse(s, True)
> 
> So how come the document being parsed is recognized as UTF-8?

Because you can force the encoder to use a specified encoding. If you do
this and the unicode string starts with an XML declaration, the encoder
will put the specified encoding into the declaration:

import codecs

e = codecs.getencoder("xml-auto-detect")
print e(u"",
encoding="utf-8")[0]

This prints:


>> OK, so should I put the C code into a _xml module?
> 
> I don't see the need for C code at all.

Doing the bit fiddling for
Modules/_codecsmodule.c::detect_xml_encoding_str() in C felt like the
right thing to do.

Servus,
   Walter

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] XML codec?

2007-11-09 Thread Walter Dörwald
Adam Olsen wrote:

> On 11/8/07, Walter Dörwald <[EMAIL PROTECTED]> wrote:
>> [...]
 Furthermore encoding-detection might be part of the responsibility of
 the XML parser, but this decoding phase is totally distinct from the
 parsing phase, so why not put the decoding into a common library?
>>> I would not object to that - just to expose it as a codec. Adding it
>>> to the XML library is fine, IMO.
>> But it does make sense as a codec. The decoding phase of an XML parser
>> has to turn a byte stream into a unicode stream. That's the job of a codec.
> 
> Yes, an XML parser should be able to use UTF-8, UTF-16, UTF-32, etc
> codecs to do the encoding.  There's no need to create a magical
> mystery codec to pick out which though.

So the code is good, if it is inside an XML parser, and it's bad if it
is inside a codec?

> It's not even sufficient for
> XML:
> 
> 1) round-tripping a file should be done in the original encoding.
> Containing the auto-detected encoding within a codec doesn't let you
> see what it picked.

The chosen encoding is available from the incremental encoder:

import codecs

e = codecs.getincrementalencoder("xml-auto-detect")()
e.encode(u"", True)
print e.encoding

This prints utf-32.

> 2) the encoding may be specified externally from the file/stream[1].
> The xml parser needs to handle these out-of-band encodings anyway.

It does. You can pass an encoding to the stateless decoder, the
incremental decoder and the streamreader. It will then use this encoding
instead the one detected from the byte stream. It even will put the
correct encoding into the XML declaration (if there is one):

import codecs

d = codecs.getdecoder("xml-auto-detect")
print d("",
encoding="utf-8")[0]

This prints:


Servus,
   Walter
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com