Would be nice if Mac has finally fixed it. On Wednesday, March 23, 2022, Eric Iverson <[email protected]> wrote:
> I don’t think think this is a bug in j and don’t think it is an easy fix. > Probably better things to work on. > > On Tuesday, March 22, 2022, bill lam <[email protected]> wrote: > >> This is a bug in J. J should follow MacOS file system name normalization >> rule. I'll take a look. >> >> >> On Wed, 23 Mar 2022 at 11:01 AM Eric Iverson <[email protected]> >> wrote: >> >> > Not sure. And not sure I want to know. >> > >> > But to continue the example: >> > fread c NB. fails on macos as the system has the decomposed form as >> the >> > name it looks for >> > fread d >> > abc >> > >> > >> > On Tue, Mar 22, 2022 at 9:39 PM Elijah Stone <[email protected]> >> wrote: >> > >> > > I wonder what happens when you create two files with distinct names, >> and >> > > then unicode changes such that they are the same when >> > > normalised/casefolded/.. >> > > >> > > Probably nothing good. >> > > >> > > On Tue, 22 Mar 2022, Eric Iverson wrote: >> > > >> > > > My favorite unicode story is from macos filenames. >> > > > >> > > > They decompose filenames and only track the decomposed form (letter >> > > > separate from the overstrike). >> > > > >> > > > The following accented chars look the same, but have different >> values. >> > > > >> > > > c=: 195 164{a. NB. composed >> > > > c >> > > > ä >> > > > d=: 97 204 136{a. NB. decomposed >> > > > d >> > > > ä >> > > > c-:d >> > > > 0 >> > > > 'abc'fwrite c >> > > > fread c NB. fails on macos as the system has the decomposed form >> as >> > the >> > > > name it looks for >> > > > >> > > > Torvald has a wonderful rant about this that is a fun read. >> > > > >> > > > On Tue, Mar 22, 2022 at 7:02 PM Raul Miller <[email protected]> >> > > wrote: >> > > > >> > > >> I ran into a situation, today (dealing with files), where most of >> the >> > > >> files were utf-8 encoded but some represented the latin-1 "code >> plane" >> > > >> with 8 bit characters. >> > > >> >> > > >> To cope with this issue, I coded up a mechanism to test whether the >> > > >> file contained only valid utf-8 sequences, and used {{ ": 10 u: y >> }} >> > > >> for the files which failed this test. >> > > >> >> > > >> In other words: >> > > >> >> > > >> cclass=: (i.9) (48+i.9)} 256#9 >> > > >> cstates=: 0 10#:10* ".;._2{{)n >> > > >> 0 7.3 2 3 4 5 6 7.3 7.3 7.1 NB. 0: start >> char >> > > >> sequence >> > > >> 0 7.3 2 3 4 5 6 7.3 7.3 7.1 NB. 1: finish >> char >> > > >> sequence, start next >> > > >> 7.3 1 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 NB. 2: need one >> > > >> more character >> > > >> 7.3 2 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 NB. 3: need two >> > > >> more characters >> > > >> 7.3 3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 NB. 4: need >> three >> > > >> more characters >> > > >> 7.3 4 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 NB. 5: need four >> > > >> more characters >> > > >> 7.3 5 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 NB. 6: need five >> > > >> more characters >> > > >> 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.2 NB. 7: end >> > > >> }} >> > > >> >> > > >> utf8lenb=: <:2#.>1 #each~1+i.8 >> > > >> utf8ok=: {{ >> > > >> try. >> > > >> (1;cstates;cclass) ;: '.',~'012345678_'{~ utf8lenb I. 3 u: y >> > > >> 1 >> > > >> catch. >> > > >> 0 >> > > >> end. >> > > >> }} >> > > >> >> > > >> NB. most content is utf-8 -- assume non-utf-8 sequences are >> > > ascii+latin-1 >> > > >> latin2utf8=: {{ >> > > >> if.utf8ok y do. y else. ":10 u: y end. >> > > >> }} >> > > >> >> > > >> I don't know if this approach would be useful to anyone else here, >> > > >> but... just in case... >> > > >> >> > > >> FYI, >> > > >> >> > > >> -- >> > > >> Raul >> > > >> ------------------------------------------------------------ >> ---------- >> > > >> For information about J forums see >> > http://www.jsoftware.com/forums.htm >> > > >> >> > > > ------------------------------------------------------------ >> ---------- >> > > > For information about J forums see http://www.jsoftware.com/forum >> s.htm >> > > ------------------------------------------------------------ >> ---------- >> > > For information about J forums see http://www.jsoftware.com/forum >> s.htm >> > > >> > ---------------------------------------------------------------------- >> > For information about J forums see http://www.jsoftware.com/forums.htm >> > >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
