Re: Strange UnicodeEncodeError in Windows image on Azure DevOps and Github

2022-11-13 Thread Jessica Smith
On Fri, Nov 11, 2022 at 8:16 PM Eryk Sun  wrote:
> If sys.std* are console files, then in Python 3.6+, sys.std*.buffer.raw will 
> be _io._WindowsConsoleIO
> io.TextIOWrapper uses locale.getpreferredencoding(False) as the default 
> encoding

Thank you for your replies - checking the sys.stdout.buffer.raw value
is what finally helped me understand. Turns out, the Windows agent is
redirecting the output of all python commands to a file, so sys.stdout
is a file using the locale encoding of cp1252, instead of being a
stream using encoding utf8. I wrote up a gist with my findings to
hopefully help out some other poor soul in the future:
https://gist.github.com/NodeJSmith/e7e37f2d3f162456869f015f842bcf15
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Strange UnicodeEncodeError in Windows image on Azure DevOps and Github

2022-11-11 Thread Inada Naoki
On Sat, Nov 12, 2022 at 11:53 AM Inada Naoki  wrote:
>
> On Sat, Nov 12, 2022 at 10:21 AM 12Jessicasmith34
> <12jessicasmit...@gmail.com> wrote:
> >
> >
> > Two questions: any idea why this would be happening in this situation? 
> > AFAIK, stdout *is* a console when these images are running the python 
> > process. Second - is there a way I can check the locale and code page 
> > values that you mentioned? I assume I could call GetACP using ctypes, but 
> > maybe there is a simpler way?
> >
>
> Maybe, python doesn't write to console in this case.
>
>   python -(pipe)-> PowerShell -> Console
>
> In this case, python uses ACP for writing to pipe.
> And PowerShell uses OutputEncoding for reading from pipe.
>
> If you want to use UTF-8 on PowerShell in Windows,
>
>  * Set PYTHONUTF8=1 (Python uses UTF-8 for writing into pipe).
>  * Set `$OutputEncoding =
> [System.Text.Encoding]::GetEncoding('utf-8')` in PowerShell profile.
>

I forgot [Console]::OutputEncoding. This is what PowerShell uses when
reading from pipe. So PowerShell profile should be:

  $OutputEncoding = [Console]::OutputEncoding = [System.Text.Encoding]::UTF8

-- 
Inada Naoki  
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Strange UnicodeEncodeError in Windows image on Azure DevOps and Github

2022-11-11 Thread Inada Naoki
On Sat, Nov 12, 2022 at 10:21 AM 12Jessicasmith34
<12jessicasmit...@gmail.com> wrote:
>
>
> Two questions: any idea why this would be happening in this situation? AFAIK, 
> stdout *is* a console when these images are running the python process. 
> Second - is there a way I can check the locale and code page values that you 
> mentioned? I assume I could call GetACP using ctypes, but maybe there is a 
> simpler way?
>

Maybe, python doesn't write to console in this case.

  python -(pipe)-> PowerShell -> Console

In this case, python uses ACP for writing to pipe.
And PowerShell uses OutputEncoding for reading from pipe.

If you want to use UTF-8 on PowerShell in Windows,

 * Set PYTHONUTF8=1 (Python uses UTF-8 for writing into pipe).
 * Set `$OutputEncoding =
[System.Text.Encoding]::GetEncoding('utf-8')` in PowerShell profile.

Regards,

-- 
Inada Naoki  
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Strange UnicodeEncodeError in Windows image on Azure DevOps and Github

2022-11-11 Thread Eryk Sun
On 11/11/22, 12Jessicasmith34 <12jessicasmit...@gmail.com> wrote:
>
> any idea why this would be happening in this situation? AFAIK, stdout
> *is* a console when these images are running the python process.

If sys.std* are console files, then in Python 3.6+,
sys.std*.buffer.raw will be _io._WindowsConsoleIO. The latter presents
itself to Python code as a UTF-8 file stream, but internally it uses
UTF-16LE with the wide-character API functions ReadConsoleW() and
WriteConsoleW().

> is there a way I can check the locale and code page values that you
> mentioned? I assume I could call GetACP using ctypes, but maybe
> there is a simpler way?

io.TextIOWrapper uses locale.getpreferredencoding(False) as the
default encoding. Actually, in 3.11+ it uses locale.getencoding()
unless UTF-8 mode is enabled, which is effectively the same as
locale.getpreferredencoding(False). On Windows this calls GetACP() and
formats the result as "cp%u" (e.g. "cp1252").
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Strange UnicodeEncodeError in Windows image on Azure DevOps and Github

2022-11-11 Thread 12Jessicasmith34
  >  If stdout isn't a console (e.g. a pipe), it defaults to using the 
process code page (i.e. CP_ACP), such as legacy code page 1252
 
(extended Latin-1).
 

 
First off, really helpful information, thank you. That was the exact background 
I was missing.
 

 
Two questions: any idea why this would be happening in this situation? AFAIK, 
stdout *is* a console when these images are running the python process. Second 
- is there a way I can check the locale and code page values that you 
mentioned? I assume I could call GetACP using ctypes, but maybe there is a 
simpler way?
 
 

   
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Strange UnicodeEncodeError in Windows image on Azure DevOps and Github

2022-11-11 Thread Eryk Sun
On 11/10/22, Jessica Smith <12jessicasmit...@gmail.com> wrote:
>
> Weird issue I've found on Windows images in Azure Devops Pipelines and
> Github actions. Printing Unicode characters fails on these images because,
> for some reason, the encoding is mapped to cp1252. What is particularly
> weird about the code page being set to 1252 is that if you execute "chcp"
> it shows that the code page is 65001.

If stdout isn't a console (e.g. a pipe), it defaults to using the
process code page (i.e. CP_ACP), such as legacy code page 1252
(extended Latin-1). You can override just sys.std* to UTF-8 by setting
the environment variable `PYTHONIOENCODING=UTF-8`. You can override
all I/O to use UTF-8 by setting `PYTHONUTF8=1`, or by passing the
command-line option `-X utf8`.

Background

The locale system in Windows supports a common system locale, plus a
separate locale for each user. By default the process code page is
based on the system locale, and the thread code page (i.e.
CP_THREAD_ACP) is based on the user locale. The default locale of the
Universal C runtime combines the user locale with the process code
page. (This combination may be inconsistent.)

In Windows 10 and later, the default process and thread code pages can
be configured to use CP_UTF8 (65001). Applications can also override
them to UTF-8 in their manifest via the "ActiveCodePage" setting. In
either case, if the process code page is UTF-8, the C runtime will use
UTF-8 for its default locale encoding (e.g. "en_uk.utf8").

Unlike some frameworks, Python has never used the console input code
page or output code page as a locale encoding. Personally, I wouldn't
want Python to default to that old MS-DOS behavior. However, I'd be in
favor of supporting a "console" encoding that's based on the console
input code page that's returned by GetConsoleCP(). If the process
doesn't have a console session, the "console" encoding would fall back
on the process code page from GetACP().
-- 
https://mail.python.org/mailman/listinfo/python-list


Strange UnicodeEncodeError in Windows image on Azure DevOps and Github

2022-11-11 Thread Jessica Smith
Hello,

Weird issue I've found on Windows images in Azure Devops Pipelines and
Github actions. Printing Unicode characters fails on these images because,
for some reason, the encoding is mapped to cp1252. What is particularly
weird about the code page being set to 1252 is that if you execute "chcp"
it shows that the code page is 65001.

At the end of this email are the cleaned up logs from GH actions. The
actions are very simple - print out unicode characters using echo to prove
the characters can be printed to the console. The rest of the commands are
in Python, and they include printing out the "encoding" variable of
sys.stdout, as well as printing sys.flags and sys.getfilesystemencoding.
Then print the same unicode character using print, which causes a
UnicodEncodeError because the character isn't in the cp1252 charmap.

I've also uploaded the logs to pastebin here: https://pastebin.com/ExzGRHav
I also uploaded a screenshot to imgur, since the logs are not the easiest
to read. https://imgur.com/a/dhvLWOJ

I'm trying to determine why this issue only happens on these images - I can
replicate it on multiple versions of Python (from 3.9 to 3.7 at least,
haven't tried more), but I can't replicate this on my own machines.

There are a few issues on GH regarding this issue but they seem to stay
open since they are hard to replicate. Here are the ones I have stumbled
upon while researching this.

https://github.com/databrickslabs/dbx/issues/455
https://github.com/PrefectHQ/prefect/issues/5754
https://github.com/pallets/click/issues/2121

Any insight or ideas on how to test and validate the cause would be great.
I'm pulling my hair out trying to find the root cause of this - not because
it really matters to any of my processes but because it is weird and broken.

Thanks for any help,

Jessica

Begin Logs:

2022-11-10T23:54:51.7272453Z Requested labels: windows-latest
2022-11-10T23:54:51.7272494Z Job defined at:
NodeJSmith/wsl_home/.github/workflows/blank.yml@refs/heads/main
2022-11-10T23:54:51.7272514Z Waiting for a runner to pick up this job...
2022-11-10T23:54:52.3387510Z Job is waiting for a hosted runner to come
online.
2022-11-10T23:55:04.8574435Z Job is about to start running on the hosted
runner: Hosted Agent (hosted)
2022-11-10T23:55:15.8332600Z Current runner version: '2.298.2'

2022-11-10T23:55:15.8366947Z ##[group]Operating System
2022-11-10T23:55:15.8367650Z Microsoft Windows Server 2022
2022-11-10T23:55:15.8367954Z 10.0.20348
2022-11-10T23:55:15.8368389Z Datacenter
2022-11-10T23:55:15.8368696Z ##[endgroup]

2022-11-10T23:55:15.8369023Z ##[group]Runner Image
2022-11-10T23:55:15.8369654Z Image: windows-2022
2022-11-10T23:55:15.8369931Z Version: 20221027.1
2022-11-10T23:55:15.8370539Z Included Software:
https://github.com/actions/runner-images/blob/win22/20221027.1/images/win/Windows2022-Readme.md
2022-11-10T23:55:15.8371174Z Image Release:
https://github.com/actions/runner-images/releases/tag/win22%2F20221027.1
2022-11-10T23:55:15.8371622Z ##[endgroup]

2022-11-10T23:55:15.8371955Z ##[group]Runner Image Provisioner
2022-11-10T23:55:15.8372277Z 2.0.91.1
2022-11-10T23:55:15.8372514Z ##[endgroup]

2022-11-10T23:55:16.3619998Z ##[group]Run echo  " └── ID:"
2022-11-10T23:55:16.3620626Z echo  " └── ID:"
2022-11-10T23:55:16.3927292Z shell: C:\Program Files\PowerShell\7\pwsh.EXE
-command ". '{0}'"
2022-11-10T23:55:16.3927894Z ##[endgroup]
2022-11-10T23:55:32.9958751Z  └── ID:

2022-11-10T23:55:34.0835652Z ##[group]Run chcp
2022-11-10T23:55:34.0836104Z chcp
2022-11-10T23:55:34.0878901Z shell: C:\Program Files\PowerShell\7\pwsh.EXE
-command ". '{0}'"
2022-11-10T23:55:34.0879350Z ##[endgroup]
2022-11-10T23:55:34.4878247Z Active code page: 65001

2022-11-10T23:55:34.7917219Z ##[group]Run python -c "import sys;
print('sys.stdout.encoding', sys.stdout.encoding);
print('sys.flags',sys.flags);print('sys.getfilesystemencoding',sys.getfilesystemencoding())"
2022-11-10T23:55:34.7918148Z python -c "import sys;
print('sys.stdout.encoding', sys.stdout.encoding);
print('sys.flags',sys.flags);print('sys.getfilesystemencoding',sys.getfilesystemencoding())"
2022-11-10T23:55:34.7960873Z shell: C:\Program Files\PowerShell\7\pwsh.EXE
-command ". '{0}'"
2022-11-10T23:55:34.7961202Z ##[endgroup]
2022-11-10T23:55:36.2324642Z sys.stdout.encoding cp1252
2022-11-10T23:55:36.2325910Z sys.flags sys.flags(debug=0, inspect=0,
interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0,
no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0,
hash_randomization=1, isolated=0, dev_mode=False, utf8_mode=0)
2022-11-10T23:55:36.2327055Z sys.getfilesystemencoding utf-8

2022-11-10T23:55:36.4553957Z ##[group]Run python -c "print('└── ID:')"
2022-11-10T23:55:36.4554395Z python -c "print('└── ID:')"
2022-11-10T23:55:36.4595413Z shell: C:\Program Files\PowerShell\7\pwsh.EXE
-command ". '{0}'"
2022-11-10T23:55:36.4595740Z ##[endgroup]
2022-11-10T23:55:36.8739309Z Traceback (most recent call last):
2022-11-10T23:55:37.1316425Z