On Wed, Jan 18, 2012 at 10:02:31AM +0000, Colin Watson wrote: > On Wed, Jan 18, 2012 at 12:56:03AM +0000, Colin Watson wrote: > > python-debian's test suite also tests that it's possible to parse old > > Sources files in *mixed* encodings. This is going to be harder because > > it basically means having apt_pkg.TagSection return bytes, which I don't > > think is desirable in general. Maybe this could be optional somehow? > > Thinking about it, this seems a reasonable thing to make switchable in > TagFile's constructor. After all: > > >>> with open("test", encoding="iso-8859-1") as test: > ... print(test.read().__class__) > ... > <class 'str'> > >>> with open("test", mode="rb") as test: > ... print(test.read().__class__) > ... > <class 'bytes'> > > So there's clear precedent in the language for the same method returning > str or bytes depending on how the class was constructed. Maybe a bytes= > keyword argument?
You'd also need to take care of TagSection if that is done, which should then work in bytes mode when passed a bytes string. Basically you'd need to modify TagSection and TagFile to both store whether to use bytes or unicode and pass the value of that flag from the TagFile to the TagSection. Then create a function PyObject *TagFile_ToString(char *s, size_t n) or similar that uses PyString_* functions or PyBytes_ functions depending on the context (where PyString is mapped to unicode in Python 3, and str in Python 2). Then use that function everywhere we currently create strings in the TagFile. -- Julian Andres Klode - Debian Developer, Ubuntu Member See http://wiki.debian.org/JulianAndresKlode and http://jak-linux.org/. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org