Re: D8337: pycompat: change argv conversion semantics

2020-03-31 Thread Yuya Nishihara
>   > On Windows, my assumption was os.fsencode() == .encode("mbcs") if
>   > sys._enablelegacywindowsfsencoding(). So this looks good to me.
>   > Perhaps, the "ignore" error mode would match the legacy Windows behavior.
>   
>   Does this mean it could be a problem running from source on Windows?  For 
> example, `hg version` (as opposed to `hg.exe version`) seems to be equivalent 
> to `python hg`, which obviously doesn't have the proper environment variable 
> or C API option to enable legacy mode.  Should there be code early on that 
> detects this and warns/aborts?

It's hacked up by pycompat.py so `python hg` should be fine.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: D8337: pycompat: change argv conversion semantics

2020-03-29 Thread Yuya Nishihara
> -# Since Python 3 converts argv to wchar_t type by Py_DecodeLocale() on 
> Unix,
> -# we can use os.fsencode() to get back bytes argv.
> -#
> -# https://hg.python.org/cpython/file/v3.5.1/Programs/python.c#l55
> -#
> -# On Windows, the native argv is unicode and is converted to MBCS bytes
> -# since we do enable the legacy filesystem encoding.
>  if getattr(sys, 'argv', None) is not None:
> -sysargv = list(map(os.fsencode, sys.argv))
> +# On POSIX, the char** argv array is converted to Python str using
> +# Py_DecodeLocale(). The inverse of this is Py_EncodeLocale(), which 
> isn't
> +# directly callable from Python code. So, we need to emulate it.
> +# Py_DecodeLocale() calls mbstowcs() and falls back to mbrtowc() with
> +# surrogateescape error handling on failure. These functions take the
> +# current system locale into account. So, the inverse operation is to
> +# .encode() using the system locale's encoding and using the
> +# surrogateescape error handler. The only tricky part here is getting
> +# the system encoding correct, since `locale.getlocale()` can return
> +# None. We fall back to the filesystem encoding if lookups via 
> `locale`
> +# fail, as this seems like a reasonable thing to do.
> +#
> +# On Windows, the wchar_t **argv is passed into the interpreter 
> as-is.
> +# Like POSIX, we need to emulate what Py_EncodeLocale() would do. But
> +# there's an additional wrinkle. What we really want to access is the
> +# ANSI codepage representation of the arguments, as this is what
> +# `int main()` would receive if Python 3 didn't define `int wmain()`
> +# (this is how Python 2 worked). To get that, we encode with the mbcs
> +# encoding, which will pass CP_ACP to the underlying Windows API to
> +# produce bytes.
> +if os.name == r'nt':
> +sysargv = [a.encode("mbcs", "ignore") for a in sys.argv]

On Windows, my assumption was os.fsencode() == .encode("mbcs") if
sys._enablelegacywindowsfsencoding(). So this looks good to me.
Perhaps, the "ignore" error mode would match the legacy Windows behavior.

> +else:
> +encoding = (
> +locale.getlocale()[1]
> +or locale.getdefaultlocale()[1]
> +or sys.getfilesystemencoding()
> +)
> +sysargv = [a.encode(encoding, "surrogateescape") for a in 
> sys.argv]

I'm not pretty sure if the locale encoding is the encoding Py_DecodeLocale()
would use. There are many ifdefs for `__APPLE__`. The doc says use
`os.fsencode()`, but that's no longer valid (or wrong from the start)?

https://docs.python.org/3/library/sys.html#sys.argv

Something might be changed around 3.7 or 3.8. Since bytes argv handling
has been moved from `int main()` to `preconfig.c`, things could become
more dynamic. But I don't know. Just my guess.

Overall, the new code looks good, but I have no idea if that's more correct.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel