Andrew McNamara wrote:
Andrew McNamara wrote:

There's a bunch of jobs we (CSV module maintainers) have been putting
off - attached is a list (in no particular order):
* unicode support (this will probably uglify the code considerably).

Martin v. Löwis wrote:

Can you please elaborate on that? What needs to be done, and how is
that going to be done? It might be possible to avoid considerable
uglification.


I'm not altogether sure there. The parsing state machine is all written in
C, and deals with signed chars - I expect we'll need two versions of that
(or one version that's compiled twice using pre-processor macros). Quite
a large job. Suggestions gratefully received.

M.-A. Lemburg wrote:

Indeed. The trick is to convert to Unicode early and to use Unicode
literals instead of string literals in the code.


Yes, although it would be nice to also retain the 8-bit versions as well.

You can do so by using latin-1 as default encoding. Works great !

Note that the only real-life Unicode format in use is UTF-16
(with BOM mark) written by Excel. Note that there's no standard
for specifying the encoding in CSV files, so this is also the only
feasable format.

Yes - that's part of the problem I hadn't really thought about yet - the
csv module currently interacts directly with files as iterators, but it's clear that we'll need to decode as we go.

Depends on your needs: CSV files tend to be small enough to do the decoding in one call in memory.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 05 2005)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to