Re: [Python-ideas] Consider adding clip or clamp function to math

2016-08-12 Thread MRAB

On 2016-08-13 00:48, David Mertz wrote:

On Fri, Aug 12, 2016 at 4:25 PM, Victor Stinner
> wrote:


[snip]


Also, what is the calling syntax? Are the arguments strictly positional,
or do they have keywords? What are those default values if the arguments
are not specified for either or both of min_val/max_val?  E.g., is this OK:

clamp(5, min_val=0)

I would've thought that the obvious default would be None, meaning 
"missing".


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Fix default encodings on Windows

2016-08-12 Thread Victor Stinner
Hello,

I'm in holiday and I'm writing on a phone, so sorry in advance for the
short answer.

In short: we should drop support for the bytes API. Just use Unicode on all
platforms, especially for filenames.

Sorry but most of these changes look like very bad ideas. Or maybe I
misunderstood something. Windows bytes API are broken in different ways, in
short your proposal is to put another layer on top of it to try to
workaround issues.

Unicode is complex. Unicode issues are hard to debug. Adding a new layer
makes debugging even harder. Is the bug in the input data? In the layer? In
the final Windows function?

In my experience on UNIX, the most important part is the interoperability
with other applications. I understand that Python 2 will speak ANSI code
page but Python 3 will speak UTF-8. I don't understand how it can work.
Almsot all Windows applications speak the ANSI code page (I'm talking about
stdin, stdout, pipes, ...).

Do you propose to first try to decode from UTF-8 or fallback on decoding
from the ANSI code page? What about encoding? Always encode to UTF-8?

About BOM: I hate them. Many applications don't understand them. Again,
think about Python 2. I recall vaguely that the Unicode strandard suggests
to not use BOM (I have to check).

I recall a bug in gettext. The tool doesn't understand BOM. When I opened
the file in vim, the BOM was invisible (hidden). I had to use hexdump to
understand the issue!

BOM introduces issues very difficult to debug :-/ I also think that it goes
in the wrong direction in term of interoperability.

For the Windows console: I played with all Windows functions, tried all
fonts and many code pages. I also read technical blog articles of Microsoft
employees. I gave up on this issue. It doesn't seem possible to support
fully Unicode the Windows console (at least the last time I checked). By
the way, it seems like Windows functions have bugs, and the code page 65001
fixes a few issues but introduces new issues...

Victor

Le 10 août 2016 20:16, "Steve Dower"  a écrit :

> I suspect there's a lot of discussion to be had around this topic, so I
> want to get it started. There are some fairly drastic ideas here and I need
> help figuring out whether the impact outweighs the value.
>
> Some background: within the Windows API, the preferred encoding is UTF-16.
> This is a 16-bit format that is typed as wchar_t in the APIs that use it.
> These APIs are generally referred to as the *W APIs (because they have a W
> suffix).
>
> There are also (broadly deprecated) APIs that use an 8-bit format (char),
> where the encoding is assumed to be "the user's active code page". These
> are *A APIs. AFAIK, there are no cases where a *A API should be preferred
> over a *W API, and many newer APIs are *W only.
>
> In general, Python passes byte strings into the *A APIs and text strings
> into the *W APIs.
>
> Right now, sys.getfilesystemencoding() on Windows returns "mbcs", which
> translates to "the system's active code page". As this encoding generally
> cannot represent all paths on Windows, it is deprecated and Unicode strings
> are recommended instead. This, however, means you need to write
> significantly different code between POSIX (use bytes) and Windows (use
> text).
>
> ISTM that changing sys.getfilesystemencoding() on Windows to "utf-8" and
> updating path_converter() (Python/posixmodule.c; likely similar code in
> other places) to decode incoming byte strings would allow us to undeprecate
> byte strings and add the requirement that they *must* be encoded with
> sys.getfilesystemencoding(). I assume that this would allow cross-platform
> code to handle paths similarly by encoding to whatever the sys module says
> they should and using bytes consistently (starting this thread is meant to
> validate/refute my assumption).
>
> (Yes, I know that people on POSIX should just change to using Unicode and
> surrogateescape. Unfortunately, rather than doing that they complain about
> Windows and drop support for the platform. If you want to keep hitting them
> with the stick, go ahead, but I'm inclined to think the carrot is more
> valuable here.)
>
> Similarly, locale.getpreferredencoding() on Windows returns a legacy value
> - the user's active code page - which should generally not be used for any
> reason. The one exception is as a default encoding for opening files when
> no other information is available (e.g. a Unicode BOM or explicit encoding
> argument). BOMs are very common on Windows, since the default assumption is
> nearly always a bad idea.
>
> Making open()'s default encoding detect a BOM before falling back to
> locale.getpreferredencoding() would resolve many issues, but I'm also
> inclined towards making the fallback utf-8, leaving
> locale.getpreferredencoding() solely as a way to get the active system
> codepage (with suitable warnings about it only being useful for
> back-compat). This would match the behavior that the .NET Framework has
> 

Re: [Python-ideas] Fix default encodings on Windows

2016-08-12 Thread Adam Bartoš
*On Fri Aug 12 11:33:35 EDT 2016, *

*Random832 wrote:*> On Wed, Aug 10, 2016, at 15:08, Steve Dower wrote:
>>* That's the hope, though that module approaches the solution differently
*>>* and may still uses. An alternative way for us to fix this whole thing
*>>* would be to bring win_unicode_console into the standard library and use
*>>* it by default (or probably whenever PYTHONIOENCODING is not specified).
*>
> I have concerns about win_unicode_console:
> - For the "text_transcoded" streams, stdout.encoding is utf-8. For the
> "text" streams, it is utf-16.

UTF-16 it the "native" encoding since it corresponds to the wide chars used
by Read/WriteConsoleW. The UTF-8 is used just as a signal for the consumers
of PyOS_Readline.

> - There is no object, as far as I can find, which can be used as an
> unbuffered unicode I/O object.

There is no buffer just on those wrapping streams because the bytes I have
are not in UTF-8. Adding one would mean a fake buffer that just decodes and
writes to the text stream. AFAIK there is no guarantee that sys.std*
objects have buffer attribute and any code relying on that is incorrect.
But I inderstand that there may be such code and we may want to be
compatible.


> - raw output streams silently drop the last byte if an odd number of
> bytes are written.

That's not true, it doesn't write an odd number of bytes, but returns the
correct number of bytes written. If only one byte is given, it raises a
ValueError.


> - The sys.stdout obtained via streams.enable does not support .buffer /
> .buffer.raw / .detach
> - All of these objects provide a fileno() interface.

Is this wrong? If I remember, I provide it because of some check -- maybe
in input() -- to be viewed as a stdio stream.


> - When using os.read/write for data that represents text, the data still
> should be encoded in the console encoding and not in utf-8 or utf-16.

I don't know what to do with this. Generally I wouldn't use bytes to
communicate textual data.


Regards,
Adam Bartoš
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Fix default encodings on Windows

2016-08-12 Thread Steve Dower
I was thinking we would end up using the console API for input but stick with 
the standard handles for output, mostly to minimize the amount of magic 
switching we have to do. But since we can just switch the entire stream object 
in __std*__ once at startup if nothing is redirected it probably isn't that 
much of a simplification.

I have some airport/aeroplane time today where I can experiment.

Top-posted from my Windows Phone

-Original Message-
From: "eryk sun" 
Sent: ‎8/‎12/‎2016 5:40
To: "python-ideas" 
Subject: Re: [Python-ideas] Fix default encodings on Windows

On Thu, Aug 11, 2016 at 9:07 AM, Paul Moore  wrote:
> set codepage to UTF-8
> ...
> set codepage back
> spawn subprocess X, but don't wait for it
> set codepage to UTF-8
> ...
> ... At this point what codepage does Python see? What codepage does
> process X see? (Note that they are both sharing the same console).

The input and output codepages are global data in conhost.exe. They
aren't tracked for each attached process (unlike input history and
aliases). That's how chcp.com works in the first place. Otherwise its
calls to SetConsoleCP and SetConsoleOutputCP would be pointless.

But IMHO all talk of using codepage 65001 is a waste of time. I think
the trailing garbage output with this codepage in Windows 7 is
unacceptable. And getting EOF for non-ASCII input is a show stopper.
The problem occurs in conhost. All you get is the EOF result from
ReadFile/ReadConsoleA, so it can't be worked around. This kills the
REPL and raises EOFError for input(). ISTM the only people who think
codepage 65001 actually works are those using Windows 8+ who
occasionally need to print non-OEM text and never enter (or paste)
anything but ASCII text.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Fix default encodings on Windows

2016-08-12 Thread eryk sun
On Thu, Aug 11, 2016 at 9:07 AM, Paul Moore  wrote:
> set codepage to UTF-8
> ...
> set codepage back
> spawn subprocess X, but don't wait for it
> set codepage to UTF-8
> ...
> ... At this point what codepage does Python see? What codepage does
> process X see? (Note that they are both sharing the same console).

The input and output codepages are global data in conhost.exe. They
aren't tracked for each attached process (unlike input history and
aliases). That's how chcp.com works in the first place. Otherwise its
calls to SetConsoleCP and SetConsoleOutputCP would be pointless.

But IMHO all talk of using codepage 65001 is a waste of time. I think
the trailing garbage output with this codepage in Windows 7 is
unacceptable. And getting EOF for non-ASCII input is a show stopper.
The problem occurs in conhost. All you get is the EOF result from
ReadFile/ReadConsoleA, so it can't be worked around. This kills the
REPL and raises EOFError for input(). ISTM the only people who think
codepage 65001 actually works are those using Windows 8+ who
occasionally need to print non-OEM text and never enter (or paste)
anything but ASCII text.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/