Re: Can not stat file with utf char U+F020
Il 2023-04-19 03:10 L A Walsh ha scritto: I'm a bit confused as to what char you are trying to access/use, as U+F020 is in the Private Use area (PUA) Since it's in the PUA, it seems its meaning could differ by application/OS/User, no? I.e. have no set definition I mean you can use it in Cygwin to represent some character not usually permitted in a DOS/Win filename (like :/\, etc.), but it wouldn't have the same meaning then in Windows though.? Isn't Private Use area application specific so an application can create and use its own symbol set -- even though it wouldn't be portable to another application. The issue is with any clients/applications (even cygwin) creating a filename ending with a dot (or other chars) which is replaced with U+F020. If this file is later renamed adding some other character *after* the replaced dot, it become unreadable by cygwin. Something similar to that: - an user create a file name "project.", forgetting the extension, on an Windows share; - the client replace the dot with U+F020; - at this point all is good: the file can be read by the client, Windows and cygwin; - the user notice the missing extension and rename the file in "project.txt"; - cygwin now does *not* traslate back U+F020 to dot and it is unable to read the file. I think characters in the PUA range are used to allow Cygwin filenames to contain colon, slashes and quotes -- so one wouldn't want Windows to understand the cygwin intent or it would defeat the purpose of using custom characters to represent filenames that are legal under POSIX but not under Windows. True, but dot and spaces are somewhat different from the other reserved chars. While backslash, colons, etc. are rejected by NTFS itself (or by lower layer API), trailing dot and spaces are ignored/stripped by Win32. This means that Linux clients accessing an SMB share *can* successfully create such filenames without any issue and without replacing them with PUA chars. For example, I created a file called "zzz." from a Linux+Mate client. Cygwin correctly see the filename as: $ ls "zzz." | od -x --endian=big 000 7a7a 7a2e 0a00 True, Windows can not access this file, but this is fine because such a filename should never be understood by Windows. Not being able to open the file from Windows, its users themselves will find and correct the issue, renaming the file. As things are now, we have the opposite issue: should (for whichever reason) a file exist with names as "zzz[U+F020]txt", cygwin will not be able to access this file. This means that anyone using cygwin+rsync to backup a Windows server will now have an inaccessible and impossible to backup file. Thinking about that: how do you feel having an option to exclude trailing dots and spaces from PUA translations (effectively reverting them to the status of "normal" characters)? Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Can not stat file with utf char U+F020
I'm a bit confused as to what char you are trying to access/use, as U+F020 is in the Private Use area (PUA) Since it's in the PUA, it seems its meaning could differ by application/OS/User, no? I.e. have no set definition I mean you can use it in Cygwin to represent some character not usually permitted in a DOS/Win filename (like :/\, etc.), but it wouldn't have the same meaning then in Windows though.? Isn't Private Use area application specific so an application can create and use its own symbol set -- even though it wouldn't be portable to another application. So if you create a character in Cygwin that maps to that area -- how would you expect Windows to know that the character is and how treat it? I think characters in the PUA range are used to allow Cygwin filenames to contain colon, slashes and quotes -- so one wouldn't want Windows to understand the cygwin intent or it would defeat the purpose of using custom characters to represent filenames that are legal under POSIX but not under Windows. -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Can not stat file with utf char U+F020
Il 2023-04-17 15:46 Gionatan Danti via Cygwin ha scritto: First, I use the "dos" mount option to always trigger conversion of space and dot at filename end into F+00xx chars. Now I am able to create such strange-looking file (in Explorer) within cygwin itself. For example, touch "zzs " now results in "zzs+strangechar" in Explorer. Both cygwin and windows are able to read/write such file. But if I edit the filename via Explorer adding an extension (ie: from "zzs+strangechar" to "zzs+strangechar.txt") now cygwin is suddenly unable to read/write the file. It seems to me that the appended chars prevent cygwin to translate back F0xx to 00xx (as the PUA char is not at the end of the filename anymore). So, two paths should be available: - always translate back F0xx to 00xx even if not at the end of filename; - otherwise, if too invasive to do it unconditionally, add an option as "always_translate_pua" (default: off) to enable such behavior based on user needs. I would (naively?) think that option 1 (always translate back PUA) should be the preferred approach, as cygwin is at the moment effectively unable to access some files. Hi all, any thoughts on the matter? Am I missing something? Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Can not stat file with utf char U+F020
Il 2023-04-17 11:05 Corinna Vinschen ha scritto: It's actually not the "dos" mount option but specific filesystems which trigger the conversion from U+0020 to U+F020. OK. However, the conversion back is handled in a piece of code which has no information about the underlying filesystem, so the F0xx -> 00xx conversion is done all the time. Adding filesystem info in this place is really tricky. Ah, I missed it, thanks! With these new information, I did some progress. First, I use the "dos" mount option to always trigger conversion of space and dot at filename end into F+00xx chars. Now I am able to create such strange-looking file (in Explorer) within cygwin itself. For example, touch "zzs " now results in "zzs+strangechar" in Explorer. Both cygwin and windows are able to read/write such file. But if I edit the filename via Explorer adding an extension (ie: from "zzs+strangechar" to "zzs+strangechar.txt") now cygwin is suddenly unable to read/write the file. It seems to me that the appended chars prevent cygwin to translate back F0xx to 00xx (as the PUA char is not at the end of the filename anymore). So, two paths should be available: - always translate back F0xx to 00xx even if not at the end of filename; - otherwise, if too invasive to do it unconditionally, add an option as "always_translate_pua" (default: off) to enable such behavior based on user needs. I would (naively?) think that option 1 (always translate back PUA) should be the preferred approach, as cygwin is at the moment effectively unable to access some files. Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Can not stat file with utf char U+F020
Greetings, Corinna Vinschen via Cygwin! > On Apr 17 07:36, Gionatan Danti via Cygwin wrote: >> Il 2023-04-14 23:01 Gionatan Danti via Cygwin ha scritto: >> > Il 2023-04-14 22:25 Corinna Vinschen via Cygwin ha scritto: >> > > We do that. You're just stumbling over tha fact that U+F020 is also >> > > used as outlined in >> > > https://cygwin.com/cygwin-ug-net/using-specialnames.html#pathnames-specialchars >> > > and https://cygwin.com/pipermail/cygwin/2023-April/253478.html >> > >> > Ah, so spaces and dots are replaced respectively by U+F020 and U+F02E >> > even without the "dos" mount option? >> > Because I can not see it in my case of an NTFS filesystem with the >> > following mount options: binary,posix=0,user,noumount,auto >> >> Hi all, >> it's not clear to me why even without the "dos" mount option both space and >> dot are replaced by U+F020 and U+F02E, preventing U+F020 passthrough. >> >> Am I missing something? > It's actually not the "dos" mount option but specific filesystems > which trigger the conversion from U+0020 to U+F020. > However, the conversion back is handled in a piece of code which has > no information about the underlying filesystem, so the F0xx -> 00xx > conversion is done all the time. Adding filesystem info in this > place is really tricky. My understanding is that on Windows, a regular file name can't start or end with space, and can't end with dot. There's ways to game this rule, but in simple cases this is how it works for most part. If a similar rule can be crafted for filesystems under discussion, that could simplify the problem. -- With best regards, Andrey Repin Monday, April 17, 2023 13:53:57 Sorry for my terrible english... -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Can not stat file with utf char U+F020
On Apr 14 23:10, Brian Inglis via Cygwin wrote: > On 2023-04-14 14:17, Gionatan Danti via Cygwin wrote: > > Il 2023-04-14 21:00 Corinna Vinschen ha scritto: > > > There's no (good) solution from inside Cygwin. > > > Yeah, I can only imagine how difficult is to be compatible with posix, > > win32 and the likes. > > > > Any chance you can just rename the files? > > > I renamed the files, in fact. > > However, it seems that users working with (older?) Office for MAC use > > U+F020 more frequently than I expected, maybe because of that [1]: > > "Microsoft's defunct Services For Macintosh feature used U+F001 through > > U+F029 as replacements for special characters allowed in HFS but > > forbidden in NTFS, and U+F02A for the Apple logo." > > Any chances to enable a "bypass" for these characters (excluding the one > > you reserved for compatibility as explained detailed in the "Forbidden > > characters in filenames")? Maybe hidden behind a configurable option > > (even disabled by default), so to not interfere with the current > > behavior? > > > [1] https://en.wikipedia.org/wiki/Private_Use_Areas#Vendor_use > > Now if MS SfM and Cygwin had both registered with U/CSUR, they would not be > fighting over Unicode code points, although it looks like there is a lot of > competition for the code points! ;^> > > Would it make more sense to add custom file name character filters into some > utility, such as unix2dos/mac2unix, cygpath, or some other, and add > (Cyg)win, or create such a utility, so those could be added to processes? Adding this to some utility would make more sense than adding another complication into the Cygwin codebase to support really old stuff. Corinna -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Can not stat file with utf char U+F020
On Apr 17 07:36, Gionatan Danti via Cygwin wrote: > Il 2023-04-14 23:01 Gionatan Danti via Cygwin ha scritto: > > Il 2023-04-14 22:25 Corinna Vinschen via Cygwin ha scritto: > > > We do that. You're just stumbling over tha fact that U+F020 is also > > > used as outlined in > > > https://cygwin.com/cygwin-ug-net/using-specialnames.html#pathnames-specialchars > > > and https://cygwin.com/pipermail/cygwin/2023-April/253478.html > > > > Ah, so spaces and dots are replaced respectively by U+F020 and U+F02E > > even without the "dos" mount option? > > Because I can not see it in my case of an NTFS filesystem with the > > following mount options: binary,posix=0,user,noumount,auto > > Hi all, > it's not clear to me why even without the "dos" mount option both space and > dot are replaced by U+F020 and U+F02E, preventing U+F020 passthrough. > > Am I missing something? It's actually not the "dos" mount option but specific filesystems which trigger the conversion from U+0020 to U+F020. However, the conversion back is handled in a piece of code which has no information about the underlying filesystem, so the F0xx -> 00xx conversion is done all the time. Adding filesystem info in this place is really tricky. Corinna -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Can not stat file with utf char U+F020
Il 2023-04-14 23:01 Gionatan Danti via Cygwin ha scritto: Il 2023-04-14 22:25 Corinna Vinschen via Cygwin ha scritto: We do that. You're just stumbling over tha fact that U+F020 is also used as outlined in https://cygwin.com/cygwin-ug-net/using-specialnames.html#pathnames-specialchars and https://cygwin.com/pipermail/cygwin/2023-April/253478.html Ah, so spaces and dots are replaced respectively by U+F020 and U+F02E even without the "dos" mount option? Because I can not see it in my case of an NTFS filesystem with the following mount options: binary,posix=0,user,noumount,auto Hi all, it's not clear to me why even without the "dos" mount option both space and dot are replaced by U+F020 and U+F02E, preventing U+F020 passthrough. Am I missing something? Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Can not stat file with utf char U+F020
On 2023-04-14 14:17, Gionatan Danti via Cygwin wrote: Il 2023-04-14 21:00 Corinna Vinschen ha scritto: There's no (good) solution from inside Cygwin. Yeah, I can only imagine how difficult is to be compatible with posix, win32 and the likes. Any chance you can just rename the files? I renamed the files, in fact. However, it seems that users working with (older?) Office for MAC use U+F020 more frequently than I expected, maybe because of that [1]: "Microsoft's defunct Services For Macintosh feature used U+F001 through U+F029 as replacements for special characters allowed in HFS but forbidden in NTFS, and U+F02A for the Apple logo." Any chances to enable a "bypass" for these characters (excluding the one you reserved for compatibility as explained detailed in the "Forbidden characters in filenames")? Maybe hidden behind a configurable option (even disabled by default), so to not interfere with the current behavior? [1] https://en.wikipedia.org/wiki/Private_Use_Areas#Vendor_use Now if MS SfM and Cygwin had both registered with U/CSUR, they would not be fighting over Unicode code points, although it looks like there is a lot of competition for the code points! ;^> Would it make more sense to add custom file name character filters into some utility, such as unix2dos/mac2unix, cygpath, or some other, and add (Cyg)win, or create such a utility, so those could be added to processes? -- Take care. Thanks, Brian Inglis Calgary, Alberta, Canada La perfection est atteinte Perfection is achieved non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut -- Antoine de Saint-Exupéry -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Can not stat file with utf char U+F020
Il 2023-04-14 22:25 Corinna Vinschen via Cygwin ha scritto: We do that. You're just stumbling over tha fact that U+F020 is also used as outlined in https://cygwin.com/cygwin-ug-net/using-specialnames.html#pathnames-specialchars and https://cygwin.com/pipermail/cygwin/2023-April/253478.html Ah, so spaces and dots are replaced respectively by U+F020 and U+F02E even without the "dos" mount option? Because I can not see it in my case of an NTFS filesystem with the following mount options: binary,posix=0,user,noumount,auto Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Can not stat file with utf char U+F020
Il 2023-04-14 22:40 Corinna Vinschen ha scritto: This is really tricky. A new mount point flag could be used to override this behaviour on a per path basis. One problem is, the unicode -> multibyte conversion when evaluating a symlink is done before it's clear where the symlink target is. Only the string is converted and it might be a relative path, so the code doesn't know where the target ends up. And that's probably not all. To tell the truth, it is such a corner (and infortunate) case that I would not care if the workaround does not work for symlinks. Is it really worth to add code to support a long deprecated Windows service? Yeah, I understand your point. I am not in the position to evaluate if it would be worth. Maybe a special case for only U+F020 (the most common "strange" char I see) can be considered? Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Can not stat file with utf char U+F020
On Apr 14 22:17, Gionatan Danti via Cygwin wrote: > Il 2023-04-14 21:00 Corinna Vinschen ha scritto: > > There's no (good) solution from inside Cygwin. > > [snip] > > Yeah, I can only imagine how difficult is to be compatible with posix, win32 > and the likes. > > > Any chance you can just rename the files? > > I renamed the files, in fact. > > However, it seems that users working with (older?) Office for MAC use U+F020 > more frequently than I expected, maybe because of that [1]: > > "Microsoft's defunct Services For Macintosh feature used U+F001 through > U+F029 as replacements for special characters allowed in HFS but forbidden > in NTFS, and U+F02A for the Apple logo." Drat. This is kind of sick. At the same time, Interix used the U+F0xx area as we do. That's why I chose this area, to be filename compatible with Interix. > Any chances to enable a "bypass" for these characters (excluding the one you > reserved for compatibility as explained detailed in the "Forbidden > characters in filenames")? Maybe hidden behind a configurable option (even > disabled by default), so to not interfere with the current behavior? This is really tricky. A new mount point flag could be used to override this behaviour on a per path basis. One problem is, the unicode -> multibyte conversion when evaluating a symlink is done before it's clear where the symlink target is. Only the string is converted and it might be a relative path, so the code doesn't know where the target ends up. And that's probably not all. Is it really worth to add code to support a long deprecated Windows service? Corinna -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Can not stat file with utf char U+F020
On Apr 14 22:21, Gionatan Danti via Cygwin wrote: > Il 2023-04-14 21:54 Brian Inglis ha scritto: > > UCSUR Under-ConScript Unicode Registry and its predecessor ConScript > > Unicode Registry CSUR > > > > https://www.kreativekorp.com/ucsur/ > > > > http://www.evertype.com/standards/csur/ > > > > unofficially register Unicode PUA glyphs for academic, artificial, > > constructed, historical, invented, and minority language scripts, some > > of which have made it into Unicode e.g. > > > > Script CSURUnicode > > PHAISTOS DISC U+E6D0-U+E6FF U+101D0-U+101DF > > SHAVIAN U+E700-U+E72F U+10450-U+1047F > > DESERET U+E830-U+E88F U+10400-U+1044F > > > > and maintain their own Unidata e.g. > > > > https://www.kreativekorp.com/ucsur/UNIDATA/Blocks.txt > > > > and some Unicode fonts have -CSUR addition files (like -Italic etc.) > > that support BMP and SMP PUA glyphs. > > So they are actively using PUA? I did not know that, thanks. > > > For Cygwin purposes: > > > > F000−F7FF unassigned Reserved for hacks and corporate use > > > > so Cygwin's special Windows file name characters mappings are clear: > > > > F022" > > F02A* > > F03A: > > F03C< > > F03E> > > F03F? > > F07C| > > Would it be possible to "bypass" the chars in the range F000−F7FF that are > not used/reserved by cygwin? We do that. You're just stumbling over tha fact that U+F020 is also used as outlined in https://cygwin.com/cygwin-ug-net/using-specialnames.html#pathnames-specialchars and https://cygwin.com/pipermail/cygwin/2023-April/253478.html Corinna -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Can not stat file with utf char U+F020
Il 2023-04-14 21:54 Brian Inglis ha scritto: UCSUR Under-ConScript Unicode Registry and its predecessor ConScript Unicode Registry CSUR https://www.kreativekorp.com/ucsur/ http://www.evertype.com/standards/csur/ unofficially register Unicode PUA glyphs for academic, artificial, constructed, historical, invented, and minority language scripts, some of which have made it into Unicode e.g. Script CSURUnicode PHAISTOS DISC U+E6D0-U+E6FF U+101D0-U+101DF SHAVIAN U+E700-U+E72F U+10450-U+1047F DESERET U+E830-U+E88F U+10400-U+1044F and maintain their own Unidata e.g. https://www.kreativekorp.com/ucsur/UNIDATA/Blocks.txt and some Unicode fonts have -CSUR addition files (like -Italic etc.) that support BMP and SMP PUA glyphs. So they are actively using PUA? I did not know that, thanks. For Cygwin purposes: F000−F7FF unassigned Reserved for hacks and corporate use so Cygwin's special Windows file name characters mappings are clear: F022" F02A* F03A: F03C< F03E> F03F? F07C| Would it be possible to "bypass" the chars in the range F000−F7FF that are not used/reserved by cygwin? Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Can not stat file with utf char U+F020
On Apr 14 13:54, Brian Inglis via Cygwin wrote: > On 2023-04-14 13:00, Corinna Vinschen via Cygwin wrote: > > On Apr 14 19:53, Gionatan Danti via Cygwin wrote: > > > [1] https://sourceware.org/legacy-ml/cygwin/2009-11/msg00043.html > > > While this patch would have fixed your problem, a later followup patch > > broke your usage of U+F020 (space replacement) and, FWIW, of U+F02E > > (dot replacement) again: > > https://cygwin.com/cgit/newlib-cygwin/commit/?id=8802178fddfd > > This was done to accomodate filesystems implementing the idiotic > > approach to support only DOS filenames, i.e. not allowing leading or > > trailing spaces and not allowing trailing dots. These are Netapp and > > Novell Netware filesystems. See the last paragraph of > > https://cygwin.com/cygwin-ug-net/using-specialnames.html#pathnames-specialchars > > Any chance you can just rename the files? > > UCSUR Under-ConScript Unicode Registry and its predecessor ConScript Unicode > Registry CSUR > > https://www.kreativekorp.com/ucsur/ > > http://www.evertype.com/standards/csur/ > > unofficially register Unicode PUA glyphs for academic, artificial, > constructed, historical, invented, and minority language scripts, some of > which have made it into Unicode e.g. > > Script CSURUnicode > PHAISTOS DISC U+E6D0-U+E6FF U+101D0-U+101DF > SHAVIAN U+E700-U+E72F U+10450-U+1047F > DESERET U+E830-U+E88F U+10400-U+1044F > > and maintain their own Unidata e.g. > > https://www.kreativekorp.com/ucsur/UNIDATA/Blocks.txt > > and some Unicode fonts have -CSUR addition files (like -Italic etc.) that > support BMP and SMP PUA glyphs. > > For Cygwin purposes: > > F000−F7FF unassigned Reserved for hacks and corporate use > > so Cygwin's special Windows file name characters mappings are clear: > For completeness sake, starting with commit 8802178fddfd: F020 > F022" > F02A* F02E. > F03A: > F03C< > F03E> > F03F? > F07C| Corinna -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Can not stat file with utf char U+F020
Il 2023-04-14 21:00 Corinna Vinschen ha scritto: There's no (good) solution from inside Cygwin. [snip] Yeah, I can only imagine how difficult is to be compatible with posix, win32 and the likes. Any chance you can just rename the files? I renamed the files, in fact. However, it seems that users working with (older?) Office for MAC use U+F020 more frequently than I expected, maybe because of that [1]: "Microsoft's defunct Services For Macintosh feature used U+F001 through U+F029 as replacements for special characters allowed in HFS but forbidden in NTFS, and U+F02A for the Apple logo." Any chances to enable a "bypass" for these characters (excluding the one you reserved for compatibility as explained detailed in the "Forbidden characters in filenames")? Maybe hidden behind a configurable option (even disabled by default), so to not interfere with the current behavior? Thanks. [1] https://en.wikipedia.org/wiki/Private_Use_Areas#Vendor_use -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Can not stat file with utf char U+F020
On 2023-04-14 13:00, Corinna Vinschen via Cygwin wrote: On Apr 14 19:53, Gionatan Danti via Cygwin wrote: I have an issue with unreadable files with contain utf char U+F020 (which appear as "middle dot with some space after") in their name. stat on such a file results in "no such file or directory" From here [1] it seems that a patch was contemplated many years ago, but I don't know its status now. Any ideas or workaround? There's no (good) solution from inside Cygwin. Keep in mind that the Unicode area from U+E000 up to U+F8FF is called "Private Use Area". So none of the chars are mapped into any singlebyte, doublebyte, or multibyte charset. Typically we don't expect that filenames contain any of these chars, and we're only using a very small subset of them for our own, dubious purposes anyway: https://cygwin.com/cygwin-ug-net/using-specialnames.html#pathnames-specialchars [1] https://sourceware.org/legacy-ml/cygwin/2009-11/msg00043.html While this patch would have fixed your problem, a later followup patch broke your usage of U+F020 (space replacement) and, FWIW, of U+F02E (dot replacement) again: https://cygwin.com/cgit/newlib-cygwin/commit/?id=8802178fddfd This was done to accomodate filesystems implementing the idiotic approach to support only DOS filenames, i.e. not allowing leading or trailing spaces and not allowing trailing dots. These are Netapp and Novell Netware filesystems. See the last paragraph of https://cygwin.com/cygwin-ug-net/using-specialnames.html#pathnames-specialchars Any chance you can just rename the files? UCSUR Under-ConScript Unicode Registry and its predecessor ConScript Unicode Registry CSUR https://www.kreativekorp.com/ucsur/ http://www.evertype.com/standards/csur/ unofficially register Unicode PUA glyphs for academic, artificial, constructed, historical, invented, and minority language scripts, some of which have made it into Unicode e.g. Script CSURUnicode PHAISTOS DISC U+E6D0-U+E6FF U+101D0-U+101DF SHAVIAN U+E700-U+E72F U+10450-U+1047F DESERET U+E830-U+E88F U+10400-U+1044F and maintain their own Unidata e.g. https://www.kreativekorp.com/ucsur/UNIDATA/Blocks.txt and some Unicode fonts have -CSUR addition files (like -Italic etc.) that support BMP and SMP PUA glyphs. For Cygwin purposes: F000−F7FF unassigned Reserved for hacks and corporate use so Cygwin's special Windows file name characters mappings are clear: F022" F02A* F03A: F03C< F03E> F03F? F07C| -- Take care. Thanks, Brian Inglis Calgary, Alberta, Canada La perfection est atteinte Perfection is achieved non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut -- Antoine de Saint-Exupéry -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Can not stat file with utf char U+F020
On Apr 14 19:53, Gionatan Danti via Cygwin wrote: > Dear list, > I have an issue with unreadable files with contain utf char U+F020 (which > appear as "middle dot with some space after") in their name. > > stat on such a file results in "no such file or directory" > > From here [1] it seems that a patch was contemplated many years ago, but I > don't know its status now. > > Any ideas or workaround? There's no (good) solution from inside Cygwin. Keep in mind that the Unicode area from U+E000 up to U+F8FF is called "Private Use Area". So none of the chars are mapped into any singlebyte, doublebyte, or multibyte charset. Typically we don't expect that filenames contain any of these chars, and we're only using a very small subset of them for our own, dubious purposes anyway: https://cygwin.com/cygwin-ug-net/using-specialnames.html#pathnames-specialchars > [1] https://sourceware.org/legacy-ml/cygwin/2009-11/msg00043.html While this patch would have fixed your problem, a later followup patch broke your usage of U+F020 (space replacement) and, FWIW, of U+F02E (dot replacement) again: https://cygwin.com/cgit/newlib-cygwin/commit/?id=8802178fddfd This was done to accomodate filesystems implementing the idiotic approach to support only DOS filenames, i. e., not allowing leading or trailing spaces and not allowing trailing dots. These are Netapp and Novell Netware filesystems. See the last paragraph of https://cygwin.com/cygwin-ug-net/using-specialnames.html#pathnames-specialchars Any chance you can just rename the files? Corinna -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Can not stat file with utf char U+F020
Dear list, I have an issue with unreadable files with contain utf char U+F020 (which appear as "middle dot with some space after") in their name. stat on such a file results in "no such file or directory" From here [1] it seems that a patch was contemplated many years ago, but I don't know its status now. Any ideas or workaround? Thanks. [1] https://sourceware.org/legacy-ml/cygwin/2009-11/msg00043.html -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple