macos - composed vs decomposed unicode filenames

I just checked again in my current mac (catalina) and that problem has been
fixed and all is ok.

They must have fixed it in the last release or so.

This lets me remove a Jd kludge.

On Tue, Mar 22, 2022 at 11:22 PM bill lam <[email protected]> wrote:

> This is a bug in J.  J should follow MacOS file system name normalization
> rule. I'll take a look.
>
>
> On Wed, 23 Mar 2022 at 11:01 AM Eric Iverson <[email protected]>
> wrote:
>
> > Not sure. And not sure I want to know.
> >
> > But to continue the example:
> >    fread c NB. fails on macos as the system has the decomposed form as
> the
> > name it looks for
> >    fread d
> > abc
> >
> >
> > On Tue, Mar 22, 2022 at 9:39 PM Elijah Stone <[email protected]>
> wrote:
> >
> > > I wonder what happens when you create two files with distinct names,
> and
> > > then unicode changes such that they are the same when
> > > normalised/casefolded/..
> > >
> > > Probably nothing good.
> > >
> > > On Tue, 22 Mar 2022, Eric Iverson wrote:
> > >
> > > > My favorite unicode story is from macos filenames.
> > > >
> > > > They decompose filenames and only track the decomposed form (letter
> > > > separate from the overstrike).
> > > >
> > > > The following accented chars look the same, but have different
> values.
> > > >
> > > >   c=: 195 164{a. NB. composed
> > > >   c
> > > > ä
> > > >   d=: 97 204 136{a. NB. decomposed
> > > >   d
> > > > ä
> > > >   c-:d
> > > > 0
> > > >   'abc'fwrite c
> > > >   fread c NB. fails on macos as the system has the decomposed form as
> > the
> > > > name it looks for
> > > >
> > > > Torvald has a wonderful rant about this that is a fun read.
> > > >
> > > > On Tue, Mar 22, 2022 at 7:02 PM Raul Miller <[email protected]>
> > > wrote:
> > > >
> > > >> I ran into a situation, today (dealing with files), where most of
> the
> > > >> files were utf-8 encoded but some represented the latin-1 "code
> plane"
> > > >> with 8 bit characters.
> > > >>
> > > >> To cope with this issue, I coded up a mechanism to test whether the
> > > >> file contained only valid utf-8 sequences, and used {{ ": 10 u: y }}
> > > >> for the files which failed this test.
> > > >>
> > > >> In other words:
> > > >>
> > > >> cclass=: (i.9) (48+i.9)} 256#9
> > > >> cstates=: 0 10#:10* ".;._2{{)n
> > > >>   0    7.3  2    3    4    5    6    7.3  7.3  7.1 NB. 0: start char
> > > >> sequence
> > > >>   0    7.3  2    3    4    5    6    7.3  7.3  7.1 NB. 1: finish
> char
> > > >> sequence, start next
> > > >>   7.3  1    7.3  7.3  7.3  7.3  7.3  7.3  7.3  7.3 NB. 2: need one
> > > >> more character
> > > >>   7.3  2    7.3  7.3  7.3  7.3  7.3  7.3  7.3  7.3 NB. 3: need two
> > > >> more characters
> > > >>   7.3  3    7.3  7.3  7.3  7.3  7.3  7.3  7.3  7.3 NB. 4: need three
> > > >> more characters
> > > >>   7.3  4    7.3  7.3  7.3  7.3  7.3  7.3  7.3  7.3 NB. 5: need four
> > > >> more characters
> > > >>   7.3  5    7.3  7.3  7.3  7.3  7.3  7.3  7.3  7.3 NB. 6: need five
> > > >> more characters
> > > >>   7.3  7.3  7.3  7.3  7.3  7.3  7.3  7.3  7.3  7.2 NB. 7: end
> > > >> }}
> > > >>
> > > >> utf8lenb=: <:2#.>1 #each~1+i.8
> > > >> utf8ok=: {{
> > > >>   try.
> > > >>     (1;cstates;cclass) ;: '.',~'012345678_'{~ utf8lenb I. 3 u: y
> > > >>     1
> > > >>   catch.
> > > >>     0
> > > >>   end.
> > > >> }}
> > > >>
> > > >> NB. most content is utf-8 -- assume non-utf-8 sequences are
> > > ascii+latin-1
> > > >> latin2utf8=: {{
> > > >>   if.utf8ok y do. y else. ":10 u: y end.
> > > >> }}
> > > >>
> > > >> I don't know if this approach would be useful to anyone else here,
> > > >> but... just in case...
> > > >>
> > > >> FYI,
> > > >>
> > > >> --
> > > >> Raul
> > > >>
> ----------------------------------------------------------------------
> > > >> For information about J forums see
> > http://www.jsoftware.com/forums.htm
> > > >>
> > > >
> ----------------------------------------------------------------------
> > > > For information about J forums see
> http://www.jsoftware.com/forums.htm
> > > ----------------------------------------------------------------------
> > > For information about J forums see http://www.jsoftware.com/forums.htm
> > >
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> >
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to