I don’t think think this is a bug in j and don’t think it is an easy fix. Probably better things to work on.
On Tuesday, March 22, 2022, bill lam <[email protected]> wrote: > This is a bug in J. J should follow MacOS file system name normalization > rule. I'll take a look. > > > On Wed, 23 Mar 2022 at 11:01 AM Eric Iverson <[email protected]> > wrote: > > > Not sure. And not sure I want to know. > > > > But to continue the example: > > fread c NB. fails on macos as the system has the decomposed form as > the > > name it looks for > > fread d > > abc > > > > > > On Tue, Mar 22, 2022 at 9:39 PM Elijah Stone <[email protected]> > wrote: > > > > > I wonder what happens when you create two files with distinct names, > and > > > then unicode changes such that they are the same when > > > normalised/casefolded/.. > > > > > > Probably nothing good. > > > > > > On Tue, 22 Mar 2022, Eric Iverson wrote: > > > > > > > My favorite unicode story is from macos filenames. > > > > > > > > They decompose filenames and only track the decomposed form (letter > > > > separate from the overstrike). > > > > > > > > The following accented chars look the same, but have different > values. > > > > > > > > c=: 195 164{a. NB. composed > > > > c > > > > ä > > > > d=: 97 204 136{a. NB. decomposed > > > > d > > > > ä > > > > c-:d > > > > 0 > > > > 'abc'fwrite c > > > > fread c NB. fails on macos as the system has the decomposed form as > > the > > > > name it looks for > > > > > > > > Torvald has a wonderful rant about this that is a fun read. > > > > > > > > On Tue, Mar 22, 2022 at 7:02 PM Raul Miller <[email protected]> > > > wrote: > > > > > > > >> I ran into a situation, today (dealing with files), where most of > the > > > >> files were utf-8 encoded but some represented the latin-1 "code > plane" > > > >> with 8 bit characters. > > > >> > > > >> To cope with this issue, I coded up a mechanism to test whether the > > > >> file contained only valid utf-8 sequences, and used {{ ": 10 u: y }} > > > >> for the files which failed this test. > > > >> > > > >> In other words: > > > >> > > > >> cclass=: (i.9) (48+i.9)} 256#9 > > > >> cstates=: 0 10#:10* ".;._2{{)n > > > >> 0 7.3 2 3 4 5 6 7.3 7.3 7.1 NB. 0: start char > > > >> sequence > > > >> 0 7.3 2 3 4 5 6 7.3 7.3 7.1 NB. 1: finish > char > > > >> sequence, start next > > > >> 7.3 1 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 NB. 2: need one > > > >> more character > > > >> 7.3 2 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 NB. 3: need two > > > >> more characters > > > >> 7.3 3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 NB. 4: need three > > > >> more characters > > > >> 7.3 4 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 NB. 5: need four > > > >> more characters > > > >> 7.3 5 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 NB. 6: need five > > > >> more characters > > > >> 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.2 NB. 7: end > > > >> }} > > > >> > > > >> utf8lenb=: <:2#.>1 #each~1+i.8 > > > >> utf8ok=: {{ > > > >> try. > > > >> (1;cstates;cclass) ;: '.',~'012345678_'{~ utf8lenb I. 3 u: y > > > >> 1 > > > >> catch. > > > >> 0 > > > >> end. > > > >> }} > > > >> > > > >> NB. most content is utf-8 -- assume non-utf-8 sequences are > > > ascii+latin-1 > > > >> latin2utf8=: {{ > > > >> if.utf8ok y do. y else. ":10 u: y end. > > > >> }} > > > >> > > > >> I don't know if this approach would be useful to anyone else here, > > > >> but... just in case... > > > >> > > > >> FYI, > > > >> > > > >> -- > > > >> Raul > > > >> ------------------------------------------------------------ > ---------- > > > >> For information about J forums see > > http://www.jsoftware.com/forums.htm > > > >> > > > > ------------------------------------------------------------ > ---------- > > > > For information about J forums see http://www.jsoftware.com/ > forums.htm > > > ---------------------------------------------------------------------- > > > For information about J forums see http://www.jsoftware.com/forums.htm > > > > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
