On 2023-04-14 13:00, Corinna Vinschen via Cygwin wrote:
On Apr 14 19:53, Gionatan Danti via Cygwin wrote:
I have an issue with unreadable files with contain utf char U+F020 (which
appear as "middle dot with some space after") in their name.
stat on such a file results in "no such file or directory"
 From here [1] it seems that a patch was contemplated many years ago, but I
don't know its status now.
Any ideas or workaround?

There's no (good) solution from inside Cygwin.
Keep in mind that the Unicode area from U+E000 up to U+F8FF is called
"Private Use Area".  So none of the chars are mapped into any
singlebyte, doublebyte, or multibyte charset.  Typically we don't expect
that filenames contain any of these chars, and we're only using a very
small subset of them for our own, dubious purposes anyway:
https://cygwin.com/cygwin-ug-net/using-specialnames.html#pathnames-specialchars

[1] https://sourceware.org/legacy-ml/cygwin/2009-11/msg00043.html

While this patch would have fixed your problem, a later followup patch
broke your usage of U+F020 (space replacement) and, FWIW, of U+F02E
(dot replacement) again:
        https://cygwin.com/cgit/newlib-cygwin/commit/?id=8802178fddfd
This was done to accomodate filesystems implementing the idiotic
approach to support only DOS filenames, i.e. not allowing leading or
trailing spaces and not allowing trailing dots. These are Netapp and
Novell Netware filesystems. See the last paragraph of
https://cygwin.com/cygwin-ug-net/using-specialnames.html#pathnames-specialchars
Any chance you can just rename the files?

UCSUR Under-ConScript Unicode Registry and its predecessor ConScript Unicode Registry CSUR

        https://www.kreativekorp.com/ucsur/

        http://www.evertype.com/standards/csur/

unofficially register Unicode PUA glyphs for academic, artificial, constructed, historical, invented, and minority language scripts, some of which have made it into Unicode e.g.

        Script          CSUR            Unicode
        PHAISTOS DISC   U+E6D0-U+E6FF   U+101D0-U+101DF
        SHAVIAN         U+E700-U+E72F   U+10450-U+1047F
        DESERET         U+E830-U+E88F   U+10400-U+1044F

and maintain their own Unidata e.g.

        https://www.kreativekorp.com/ucsur/UNIDATA/Blocks.txt

and some Unicode fonts have -CSUR addition files (like -Italic etc.) that support BMP and SMP PUA glyphs.

For Cygwin purposes:

F000−F7FF       unassigned      Reserved for hacks and corporate use

so Cygwin's special Windows file name characters mappings are clear:

        F022    "
        F02A    *
        F03A    :
        F03C    <
        F03E    >
        F03F    ?
        F07C    |

--
Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada

La perfection est atteinte                   Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer     but when there is no more to cut
                                -- Antoine de Saint-Exupéry

--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to