On 2023-04-14 13:00, Corinna Vinschen via Cygwin wrote:
On Apr 14 19:53, Gionatan Danti via Cygwin wrote:
I have an issue with unreadable files with contain utf char U+F020 (which
appear as "middle dot with some space after") in their name.
stat on such a file results in "no such file or directory"
From here [1] it seems that a patch was contemplated many years ago, but I
don't know its status now.
Any ideas or workaround?
There's no (good) solution from inside Cygwin.
Keep in mind that the Unicode area from U+E000 up to U+F8FF is called
"Private Use Area". So none of the chars are mapped into any
singlebyte, doublebyte, or multibyte charset. Typically we don't expect
that filenames contain any of these chars, and we're only using a very
small subset of them for our own, dubious purposes anyway:
https://cygwin.com/cygwin-ug-net/using-specialnames.html#pathnames-specialchars
[1] https://sourceware.org/legacy-ml/cygwin/2009-11/msg00043.html
While this patch would have fixed your problem, a later followup patch
broke your usage of U+F020 (space replacement) and, FWIW, of U+F02E
(dot replacement) again:
https://cygwin.com/cgit/newlib-cygwin/commit/?id=8802178fddfd
This was done to accomodate filesystems implementing the idiotic
approach to support only DOS filenames, i.e. not allowing leading or
trailing spaces and not allowing trailing dots. These are Netapp and
Novell Netware filesystems. See the last paragraph of
https://cygwin.com/cygwin-ug-net/using-specialnames.html#pathnames-specialchars
Any chance you can just rename the files?
UCSUR Under-ConScript Unicode Registry and its predecessor ConScript Unicode
Registry CSUR
https://www.kreativekorp.com/ucsur/
http://www.evertype.com/standards/csur/
unofficially register Unicode PUA glyphs for academic, artificial, constructed,
historical, invented, and minority language scripts, some of which have made it
into Unicode e.g.
Script CSUR Unicode
PHAISTOS DISC U+E6D0-U+E6FF U+101D0-U+101DF
SHAVIAN U+E700-U+E72F U+10450-U+1047F
DESERET U+E830-U+E88F U+10400-U+1044F
and maintain their own Unidata e.g.
https://www.kreativekorp.com/ucsur/UNIDATA/Blocks.txt
and some Unicode fonts have -CSUR addition files (like -Italic etc.) that
support BMP and SMP PUA glyphs.
For Cygwin purposes:
F000−F7FF unassigned Reserved for hacks and corporate use
so Cygwin's special Windows file name characters mappings are clear:
F022 "
F02A *
F03A :
F03C <
F03E >
F03F ?
F07C |
--
Take care. Thanks, Brian Inglis Calgary, Alberta, Canada
La perfection est atteinte Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut
-- Antoine de Saint-Exupéry
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple