Re: [Python-ideas] Chaining coders

2018-01-19 Thread Rob Speer
I see how this is another way to get what I was asking for: a way to decode
some unfortunately common text encodings, ones that Web browsers use, in
Python without having to import additional modules.

I appreciate other ideas about how to solve this problem, but the
generality here seems pretty unnecessary. The world isn't making any
_novel_ legacy encodings. There are 8 legacy encodings that Python has
missed, and there's no reason to expect there to be any more of them.

It's worrisome to support arbitrary compositions of encodings. Most of
these possible hybrid encodings haven't been used before, and using them
would be a bad idea because there would be no reason to expect any other
software in existence to be compatible with them.

Some of these legacy encodings (like the webbish version of windows-1255)
are not the composition of two encodings that already exist in Python. So
you'd have to define new encodings anyway.

On Fri, 19 Jan 2018 at 17:09 Soni L.  wrote:

> windows-1252 is based on iso-8859-1. Thus, I'd like to be able to chain
> coders as follows:
>
> bytes.decode("windows-1252-ext", else=lambda r: r.decode("iso-8859-1"))
>
> What this "else" does is that it's a lambda, and it gets passed an
> object with a decode method identical to the bytes decode method, except
> that it doesn't affect already-decoded characters. In this case,
> "windows-1252-ext" only includes things in the \x80-\x9F range, leaving
> it up to "iso-8859-1" to handle the rest.
>
> A similar process would happen for encoding: encode with
> "windows-1252-ext", else = "iso-8859-1".
>
> (Technically, "windows-1252-ext" isn't needed - you can use the existing
> "windows-1252" and combine it with the "iso-8859-1" to get
> "windows-1252-c1".)
>
> This would be a novel way to think of encodings as not just flat
> translation tables but highly composable translation tables. I have a
> thing for composition.
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Chaining coders

2018-01-19 Thread Soni L.
windows-1252 is based on iso-8859-1. Thus, I'd like to be able to chain 
coders as follows:


bytes.decode("windows-1252-ext", else=lambda r: r.decode("iso-8859-1"))

What this "else" does is that it's a lambda, and it gets passed an 
object with a decode method identical to the bytes decode method, except 
that it doesn't affect already-decoded characters. In this case, 
"windows-1252-ext" only includes things in the \x80-\x9F range, leaving 
it up to "iso-8859-1" to handle the rest.


A similar process would happen for encoding: encode with 
"windows-1252-ext", else = "iso-8859-1".


(Technically, "windows-1252-ext" isn't needed - you can use the existing 
"windows-1252" and combine it with the "iso-8859-1" to get 
"windows-1252-c1".)


This would be a novel way to think of encodings as not just flat 
translation tables but highly composable translation tables. I have a 
thing for composition.

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/