Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
On Sep 8 18:45, Takashi Yano via Cygwin-patches wrote: > Hi Corinna, > > On Tue, 8 Sep 2020 10:40:34 +0200 > Corinna Vinschen wrote: > > On Sep 7 13:45, Takashi Yano via Cygwin-patches wrote: > > > On Mon, 7 Sep 2020 01:04:13 +0900 > > > > > Chages: > > > > > - If global locale is set, it takes precedence. > > > > > > > > Changes: > > > > - Use __get_current_locale() instead of __get_global_locale(). > > > > - Fix a bug for ISO-8859-* charset. > > > > > > Changes: > > > - Use envblock if it is passed to CreateProcess in spawn.cc. > > > > For the time being and to make at least *some* progress and with my > > upcoming "away from keyboard"-time , I pushed the gist of my patch, > > replacing the locale evaluating code in fhandler_tty with the function > > __eval_codepage_from_internal_charset in its most simple form. > > I didn't touch anything else, given that this discussion is still > > ongoing. > > Your patch pushed does the magic! > > Even cygterm works even though the code does not check environment. > > The point is here. > > @@ -1977,9 +1807,6 @@ fhandler_pty_slave::fixup_after_exec () >if (!close_on_exec ()) > fixup_after_fork (NULL); /* No parent handle required. */ > > - /* Set locale */ > - setup_locale (); > - >/* Hook Console API */ > #define DO_HOOK(module, name) \ >if (!name##_Orig) \ > > Without this deletion, term_code_page is determined when > cygwin shell is executed. Since it is in fixup_after_exec(), > setlocale() does not called yet in the shell. As a result, > term_code_page cannot be determined correctly. > > In your new patch, term_code_page is determined when the first > non-cygwin program is execed in the shell. The shell process > already calls setlocale(), so term_code_page can be determined > using global locale. > > Thanks for the excellent idea! > > Only the problem I noticed is that cygterm does not work if the > shell does not call setlocale(). This happens if the shell is > non-cygwin program, for example, cmd.exe, however, it is unusual > case. This is unexpected, but I'm glad this could be much simplfied. Thanks, Corinna
Re: [PATCH v2 2/3] Cygwin: path_conv::check: handle error from fhandler_process::exists
On Sep 8 15:05, Ken Brown via Cygwin-patches wrote: > On 9/8/2020 3:02 PM, Ken Brown via Cygwin-patches wrote: > > fhandler_process::exists is called when we are checking a path > > starting with "/proc//fd". If it returns virt_none and sets an > > errno, there is no need for further checking. Just set 'error' and > > return. > > --- > > winsup/cygwin/path.cc | 9 + > > 1 file changed, 9 insertions(+) > > > > diff --git a/winsup/cygwin/path.cc b/winsup/cygwin/path.cc > > index 95faf8ca7..1d0c38a20 100644 > > --- a/winsup/cygwin/path.cc > > +++ b/winsup/cygwin/path.cc > > @@ -809,6 +809,15 @@ path_conv::check (const char *src, unsigned opt, > > delete fh; > > goto retry_fs_via_processfd; > > } > > + else if (file_type == virt_none && dev == FH_PROCESSFD) > > + { > > + error = get_errno (); > > + if (error) > > + { > > + delete fh; > > + return; > > + } > > + } > > delete fh; > > } > > switch (file_type) > > > > The subject should say "2/2", not "2/3". I have a local third patch > documenting the bug fix, which I didn't bother to send. ACk to both patches, 2 and 3 ;) Corinna
Re: [PATCH 1/2] Cygwin: format_process_fd: add directory check
Hi Ken, On Sep 8 12:50, Ken Brown via Cygwin-patches wrote: > The incoming path is allowed to have the form "$PID/fd/[0-9]*/.*" > provided the descriptor symlink points to a directory. Check that > this is indeed the case. > --- > winsup/cygwin/fhandler_process.cc | 15 +++ > 1 file changed, 15 insertions(+) > > diff --git a/winsup/cygwin/fhandler_process.cc > b/winsup/cygwin/fhandler_process.cc > index a6c358217..888604e3d 100644 > --- a/winsup/cygwin/fhandler_process.cc > +++ b/winsup/cygwin/fhandler_process.cc > @@ -398,6 +398,21 @@ format_process_fd (void *data, char *&destbuf) > *((process_fd_t *) data)->fd_type = virt_fdsymlink; >else /* trailing path */ > { > + /* Does the descriptor link point to a directory? */ > + bool dir; > + if (*destbuf != '/') /* e.g., "pipe:[" or "socket:[" */ > + dir = false; > + else > + { > + path_conv pc (destbuf); > + dir = pc.isdir (); > + } > + if (!dir) > + { > + set_errno (ENOTDIR); > + cfree (destbuf); > + return -1; > + } > char *newbuf = (char *) cmalloc_abort (HEAP_STR, strlen (destbuf) > + strlen (e) + 1); > stpcpy (stpcpy (newbuf, destbuf), e); > -- > 2.28.0 Huh, I'd never realized this check is missing. I was just puzzeling over your patch, searching for the directory check, but yeah, there is none. Please push. Thanks, Corinna
Re: [PATCH v2 2/3] Cygwin: path_conv::check: handle error from fhandler_process::exists
On 9/8/2020 3:02 PM, Ken Brown via Cygwin-patches wrote: fhandler_process::exists is called when we are checking a path starting with "/proc//fd". If it returns virt_none and sets an errno, there is no need for further checking. Just set 'error' and return. --- winsup/cygwin/path.cc | 9 + 1 file changed, 9 insertions(+) diff --git a/winsup/cygwin/path.cc b/winsup/cygwin/path.cc index 95faf8ca7..1d0c38a20 100644 --- a/winsup/cygwin/path.cc +++ b/winsup/cygwin/path.cc @@ -809,6 +809,15 @@ path_conv::check (const char *src, unsigned opt, delete fh; goto retry_fs_via_processfd; } + else if (file_type == virt_none && dev == FH_PROCESSFD) + { + error = get_errno (); + if (error) + { + delete fh; + return; + } + } delete fh; } switch (file_type) The subject should say "2/2", not "2/3". I have a local third patch documenting the bug fix, which I didn't bother to send. Ken
[PATCH v2 2/3] Cygwin: path_conv::check: handle error from fhandler_process::exists
fhandler_process::exists is called when we are checking a path starting with "/proc//fd". If it returns virt_none and sets an errno, there is no need for further checking. Just set 'error' and return. --- winsup/cygwin/path.cc | 9 + 1 file changed, 9 insertions(+) diff --git a/winsup/cygwin/path.cc b/winsup/cygwin/path.cc index 95faf8ca7..1d0c38a20 100644 --- a/winsup/cygwin/path.cc +++ b/winsup/cygwin/path.cc @@ -809,6 +809,15 @@ path_conv::check (const char *src, unsigned opt, delete fh; goto retry_fs_via_processfd; } + else if (file_type == virt_none && dev == FH_PROCESSFD) + { + error = get_errno (); + if (error) + { + delete fh; + return; + } + } delete fh; } switch (file_type) -- 2.28.0
Re: [PATCH 2/2] Cygwin: path_conv::check: handle error from fhandler_process::exists
On 9/8/2020 12:50 PM, Ken Brown via Cygwin-patches wrote: fhandler_process::exists is called when we are checking a path starting with "/proc//fd". If it returns virt_none and sets an errno, there is no need for further checking. Just set 'error' and return. --- winsup/cygwin/path.cc | 6 ++ 1 file changed, 6 insertions(+) diff --git a/winsup/cygwin/path.cc b/winsup/cygwin/path.cc index 95faf8ca7..092261939 100644 --- a/winsup/cygwin/path.cc +++ b/winsup/cygwin/path.cc @@ -809,6 +809,12 @@ path_conv::check (const char *src, unsigned opt, delete fh; goto retry_fs_via_processfd; } + else if (file_type == virt_none && dev == FH_PROCESSFD) + { + error = get_errno (); + if (error) + return; + } delete fh; } switch (file_type) There's a missing 'delete fh' above. I'll send v2 shortly. Ken
Re: [PATCH] Cygwin: pty: Fix input charset for non-cygwin apps with disable_pcon.
Hi Takashi, On Sep 8 18:57, Takashi Yano via Cygwin-patches wrote: > - If the non-cygwin apps is executed under pseudo console disabled, > multibyte input for the apps are garbled. This patch fixes the > issue. > --- > winsup/cygwin/fhandler_tty.cc | 13 - > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc > index 6de591d9b..afaa4546e 100644 > --- a/winsup/cygwin/fhandler_tty.cc > +++ b/winsup/cygwin/fhandler_tty.cc > @@ -271,8 +271,17 @@ fhandler_pty_master::accept_input () >bytes_left = eat_readahead (-1); > >HANDLE write_to = get_output_handle (); > + char *buf = NULL; >if (to_be_read_from_pcon ()) > -write_to = to_slave; > +{ > + write_to = to_slave; > + size_t nlen; > + buf = convert_mb_str (GetConsoleCP (), &nlen, > + get_ttyp ()->term_code_page, > + (const char *) p, bytes_left); > + p = buf; > + bytes_left = nlen; > +} How big are chances that the string in p is larger than 32767 chars? I'd like to see convert_mb_str use a tmp_pathbuf buffer instead of calling HeapAlloc/HeapFree every time. That also drops the mb_str_free entirely. Isn't there a problem anyway with calling convert_mb_str? Consider a write call which stops in the middle of a multibyte char, the second half only sent with the next write call. convert_mb_str only allows to convert complete multibyte chars, and the caller does not keep something like an mbstate_t around, which would allow continuation of split multibyte chars. Corinna
[PATCH 2/2] Cygwin: path_conv::check: handle error from fhandler_process::exists
fhandler_process::exists is called when we are checking a path starting with "/proc//fd". If it returns virt_none and sets an errno, there is no need for further checking. Just set 'error' and return. --- winsup/cygwin/path.cc | 6 ++ 1 file changed, 6 insertions(+) diff --git a/winsup/cygwin/path.cc b/winsup/cygwin/path.cc index 95faf8ca7..092261939 100644 --- a/winsup/cygwin/path.cc +++ b/winsup/cygwin/path.cc @@ -809,6 +809,12 @@ path_conv::check (const char *src, unsigned opt, delete fh; goto retry_fs_via_processfd; } + else if (file_type == virt_none && dev == FH_PROCESSFD) + { + error = get_errno (); + if (error) + return; + } delete fh; } switch (file_type) -- 2.28.0
[PATCH 0/2] Fix problems with /proc//fd checking
This fixes the assertion failure reported here: https://sourceware.org/pipermail/cygwin/2020-September/246160.html with a simple test case here: https://sourceware.org/pipermail/cygwin/2020-September/246188.html That test case now fails as follows, as on Linux: $ ./proc_bug.exe open: Not a directory I'm not completely sure about the second patch. The path_conv::check code is so complicated that I could easily have missed something. But I think it's correct. Ken Brown (2): Cygwin: format_process_fd: add directory check Cygwin: path_conv::check: handle error from fhandler_process::exists winsup/cygwin/fhandler_process.cc | 15 +++ winsup/cygwin/path.cc | 6 ++ 2 files changed, 21 insertions(+) -- 2.28.0
[PATCH 1/2] Cygwin: format_process_fd: add directory check
The incoming path is allowed to have the form "$PID/fd/[0-9]*/.*" provided the descriptor symlink points to a directory. Check that this is indeed the case. --- winsup/cygwin/fhandler_process.cc | 15 +++ 1 file changed, 15 insertions(+) diff --git a/winsup/cygwin/fhandler_process.cc b/winsup/cygwin/fhandler_process.cc index a6c358217..888604e3d 100644 --- a/winsup/cygwin/fhandler_process.cc +++ b/winsup/cygwin/fhandler_process.cc @@ -398,6 +398,21 @@ format_process_fd (void *data, char *&destbuf) *((process_fd_t *) data)->fd_type = virt_fdsymlink; else /* trailing path */ { + /* Does the descriptor link point to a directory? */ + bool dir; + if (*destbuf != '/') /* e.g., "pipe:[" or "socket:[" */ + dir = false; + else + { + path_conv pc (destbuf); + dir = pc.isdir (); + } + if (!dir) + { + set_errno (ENOTDIR); + cfree (destbuf); + return -1; + } char *newbuf = (char *) cmalloc_abort (HEAP_STR, strlen (destbuf) + strlen (e) + 1); stpcpy (stpcpy (newbuf, destbuf), e); -- 2.28.0
[PATCH] Cygwin: pty: Fix input charset for non-cygwin apps with disable_pcon.
- If the non-cygwin apps is executed under pseudo console disabled, multibyte input for the apps are garbled. This patch fixes the issue. --- winsup/cygwin/fhandler_tty.cc | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc index 6de591d9b..afaa4546e 100644 --- a/winsup/cygwin/fhandler_tty.cc +++ b/winsup/cygwin/fhandler_tty.cc @@ -271,8 +271,17 @@ fhandler_pty_master::accept_input () bytes_left = eat_readahead (-1); HANDLE write_to = get_output_handle (); + char *buf = NULL; if (to_be_read_from_pcon ()) -write_to = to_slave; +{ + write_to = to_slave; + size_t nlen; + buf = convert_mb_str (GetConsoleCP (), &nlen, + get_ttyp ()->term_code_page, + (const char *) p, bytes_left); + p = buf; + bytes_left = nlen; +} if (!bytes_left) { @@ -305,6 +314,8 @@ fhandler_pty_master::accept_input () } } } + if (buf) +mb_str_free (buf); SetEvent (input_available_event); ReleaseMutex (input_mutex); -- 2.28.0
Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
Hi Corinna, On Tue, 8 Sep 2020 10:40:34 +0200 Corinna Vinschen wrote: > On Sep 7 13:45, Takashi Yano via Cygwin-patches wrote: > > On Mon, 7 Sep 2020 01:04:13 +0900 > > > > Chages: > > > > - If global locale is set, it takes precedence. > > > > > > Changes: > > > - Use __get_current_locale() instead of __get_global_locale(). > > > - Fix a bug for ISO-8859-* charset. > > > > Changes: > > - Use envblock if it is passed to CreateProcess in spawn.cc. > > For the time being and to make at least *some* progress and with my > upcoming "away from keyboard"-time , I pushed the gist of my patch, > replacing the locale evaluating code in fhandler_tty with the function > __eval_codepage_from_internal_charset in its most simple form. > I didn't touch anything else, given that this discussion is still > ongoing. Your patch pushed does the magic! Even cygterm works even though the code does not check environment. The point is here. @@ -1977,9 +1807,6 @@ fhandler_pty_slave::fixup_after_exec () if (!close_on_exec ()) fixup_after_fork (NULL); /* No parent handle required. */ - /* Set locale */ - setup_locale (); - /* Hook Console API */ #define DO_HOOK(module, name) \ if (!name##_Orig) \ Without this deletion, term_code_page is determined when cygwin shell is executed. Since it is in fixup_after_exec(), setlocale() does not called yet in the shell. As a result, term_code_page cannot be determined correctly. In your new patch, term_code_page is determined when the first non-cygwin program is execed in the shell. The shell process already calls setlocale(), so term_code_page can be determined using global locale. Thanks for the excellent idea! Only the problem I noticed is that cygterm does not work if the shell does not call setlocale(). This happens if the shell is non-cygwin program, for example, cmd.exe, however, it is unusual case. -- Takashi Yano
Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
On Sep 7 13:45, Takashi Yano via Cygwin-patches wrote: > On Mon, 7 Sep 2020 01:04:13 +0900 > > > Chages: > > > - If global locale is set, it takes precedence. > > > > Changes: > > - Use __get_current_locale() instead of __get_global_locale(). > > - Fix a bug for ISO-8859-* charset. > > Changes: > - Use envblock if it is passed to CreateProcess in spawn.cc. For the time being and to make at least *some* progress and with my upcoming "away from keyboard"-time , I pushed the gist of my patch, replacing the locale evaluating code in fhandler_tty with the function __eval_codepage_from_internal_charset in its most simple form. I didn't touch anything else, given that this discussion is still ongoing. Corinna
Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
On Mon, 7 Sep 2020 23:17:36 +0200 (CEST) Johannes Schindelin wrote: > Hi Takashi, > > On Sat, 5 Sep 2020, Takashi Yano wrote: > > > On Fri, 4 Sep 2020 08:23:42 +0200 (CEST) > > Johannes Schindelin wrote: > > > > > > On Fri, 4 Sep 2020, Takashi Yano via Cygwin-patches wrote: > > > > > > > On Tue, 1 Sep 2020 18:19:16 +0200 (CEST) > > > > Johannes Schindelin wrote: > > > > > > > > > When `LANG=en_US.UTF-8`, the detected `LCID` is 0x0409, which is > > > > > correct, but after that (at least if Pseudo Console support is > > > > > enabled), > > > > > we try to find the default code page for that `LCID`, which is ASCII > > > > > (437). Subsequently, we set the Console output code page to that > > > > > value, > > > > > completely ignoring that we wanted to use UTF-8. > > > > > > > > > > Let's not ignore the specifically asked-for UTF-8 character set. > > > > > > > > > > While at it, let's also set the Console output code page even if > > > > > Pseudo > > > > > Console support is disabled; contrary to the behavior of v3.0.7, the > > > > > Console output code page is not ignored in that case. > > > > > > > > > > The most common symptom would be that console applications which do > > > > > not > > > > > specifically call `SetConsoleOutputCP()` but output UTF-8-encoded text > > > > > seem to be broken with v3.1.x when they worked plenty fine with > > > > > v3.0.x. > > > > > > > > > > This fixes https://github.com/msys2/MSYS2-packages/issues/1974, > > > > > https://github.com/msys2/MSYS2-packages/issues/2012, > > > > > https://github.com/rust-lang/cargo/issues/8369, > > > > > https://github.com/git-for-windows/git/issues/2734, > > > > > https://github.com/git-for-windows/git/issues/2793, > > > > > https://github.com/git-for-windows/git/issues/2792, and possibly > > > > > quite a > > > > > few others. > > > > > > > > > > Signed-off-by: Johannes Schindelin > > > > > --- > > > > > winsup/cygwin/fhandler_tty.cc | 9 + > > > > > 1 file changed, 9 insertions(+) > > > > > > > > > > diff --git a/winsup/cygwin/fhandler_tty.cc > > > > > b/winsup/cygwin/fhandler_tty.cc > > > > > index 06789a500..414c26992 100644 > > > > > --- a/winsup/cygwin/fhandler_tty.cc > > > > > +++ b/winsup/cygwin/fhandler_tty.cc > > > > > @@ -2859,6 +2859,15 @@ fhandler_pty_slave::setup_locale (void) > > > > >char charset[ENCODING_LEN + 1] = "ASCII"; > > > > >LCID lcid = get_langinfo (locale, charset); > > > > > > > > > > + /* Special-case the UTF-8 character set */ > > > > > + if (strcasecmp (charset, "UTF-8") == 0) > > > > > +{ > > > > > + get_ttyp ()->term_code_page = CP_UTF8; > > > > > + SetConsoleCP (CP_UTF8); > > > > > + SetConsoleOutputCP (CP_UTF8); > > > > > + return; > > > > > +} > > > > > + > > > > >/* Set console code page from locale */ > > > > >if (get_pseudo_console ()) > > > > > { > > > > > -- > > > > > 2.27.0 > > > > > > > > I would like to propose a counter patch attached. > > > > What do you think of this patch? > > > > > > > > This patch does not treat UTF-8 as special. > > > > > > Sure, but it only fixes the issue in `disable_pcon` mode in the current > > > tip commit. That's because a new Pseudo Console is created for every > > > spawned non-Cygwin console application, and that new Pseudo Console does > > > _not_ have the code page set by your patch. > > > > You are right. However, if pseudo console is enabled, the app > > which works correclty in command prompt should work as well in > > pseudo console. Therefore, there is nothing to be fixed. > > I am coming to the conclusion that your definition what is correct differs > from my definition of what is correct. > > For me, it matters what users see. And what users actually see is the > output of UTF-8 encoded text that is now interpreted via the default code > page of their LCID, i.e. it is incorrect. > > Sure, you can argue all you want that those console applications are _all > wrong_. _All of them_. > > In practice, that matters very little, as many users have > `LANG=en_US.UTF-8` (meaning your patches force their console applications' > output to be interpreted with code page 437) and therefore for those > users, things looked fine before, and now they don't. > > Note that I am not talking about developers who develop said console > applications. I am talking about users who use those console applications. > In other words, I am talking about a vastly larger group of affected > people. > > All of those people (or at least a substantial majority) will now have to > be told to please disable Pseudo Console support in v3.2.0 because they > would have to patch and rebuild those console applications that don't call > `SetConsoleOutputCP()`, and that is certainly unreasonable to expect of > the majority of users. Not even the `cmd /c chcp 65001` work-around (that > helps with v3.1.7) will work with v3.2.0 when Pseudo Console support is > enabled. In the case where pseudo console is disabled, the patch I propos
Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
On Sep 7 22:40, Takashi Yano via Cygwin-patches wrote: > Here is a summary of my points: > > [Senario 1] > 1) Start mintty (UTF-8). > 2) Start another mintty by > mintty -o charset=SJIS >from the first mintty. > > [Senario 2] > int pm = getpt(); > if (fork()) { > [do the master operations] > } else { > setsid(); > ps = open(ptsname(pm), O_RDWR); > close(pm); > dup2(ps, 0); > dup2(ps, 1); > dup2(ps, 2); > close(ps); > setenv("LANG", "ja_JP.SJIS", 1); > [exec shell] > } > > > Q1) cygheap or tty shared memory? > > Consider senario 1. If the term_code_page is in cygheap, > it is inherited to child process. Therefore, the second > mintty cannot update term_code_page while locale is changed. > > Consider senario 2. Since only the child process knows the > locale, master (parent process) cannot get term_code_page > if it is in cygheap. > > Q2) Is checking environment necessary? > > As for senario 2, setlocale() is not called. So it is > necessary to check environment to know locale. > > Q3) Where and when should term_code_page be set? > > In senario 2, LANG is set just before exec() in the CHILD > process. Therefore, term_code_page should be determined > in the child_info_spawn:worker or in the execed process. What bugs me here is that in scenario 1, the codeset of the master side is the defining factor, while in case 2 the slave side is the defining factor. Actually, the only defining factor is the codeset of the master side of the pty. If the master side interprets all input as utf-8, then the slave side should either send utf-8, or live with the consequences. It's the task of the *user* on the slave side, to set the env setting matching to the master side. I tend to agree with Johannes. We should not enforce a codepage setting inside Cygwin. The bytes should go to the master side and the master side interprets them. If native apps produce garbage, well, I'm not overly sympathetic. Especially given the fact that even Microsoft is now doing a lot to switch to UTF-8 as much as possible. It's the only sane option anyway. Corinna