wait. I tried on Mac and got the followings.  Perhaps the newest MacOS had
changed behaviour.
   c=: 195 164{a.

    d=: 97 204 136{a.

   c-:d

0

   fread c

abc

   'abc'fwrite c
3
   fread c
abc
   fread d
abc

   'def'fwrite d
3
   fread c
def
   fread d
def
   1!:0 <c
+-+------------------+-+---+------+----------+
|ä|2022 3 23 11 46 37|3|rw-|------|-rw-r--r--|
+-+------------------+-+---+------+----------+
   1!:0 <d
   a.i.>{.{.1!:0 <c
195 164

On Wed, Mar 23, 2022 at 11:22 AM bill lam <[email protected]> wrote:

> This is a bug in J.  J should follow MacOS file system name normalization
> rule. I'll take a look.
>
>
> On Wed, 23 Mar 2022 at 11:01 AM Eric Iverson <[email protected]>
> wrote:
>
>> Not sure. And not sure I want to know.
>>
>> But to continue the example:
>>    fread c NB. fails on macos as the system has the decomposed form as the
>> name it looks for
>>    fread d
>> abc
>>
>>
>> On Tue, Mar 22, 2022 at 9:39 PM Elijah Stone <[email protected]> wrote:
>>
>> > I wonder what happens when you create two files with distinct names, and
>> > then unicode changes such that they are the same when
>> > normalised/casefolded/..
>> >
>> > Probably nothing good.
>> >
>> > On Tue, 22 Mar 2022, Eric Iverson wrote:
>> >
>> > > My favorite unicode story is from macos filenames.
>> > >
>> > > They decompose filenames and only track the decomposed form (letter
>> > > separate from the overstrike).
>> > >
>> > > The following accented chars look the same, but have different values.
>> > >
>> > >   c=: 195 164{a. NB. composed
>> > >   c
>> > > ä
>> > >   d=: 97 204 136{a. NB. decomposed
>> > >   d
>> > > ä
>> > >   c-:d
>> > > 0
>> > >   'abc'fwrite c
>> > >   fread c NB. fails on macos as the system has the decomposed form as
>> the
>> > > name it looks for
>> > >
>> > > Torvald has a wonderful rant about this that is a fun read.
>> > >
>> > > On Tue, Mar 22, 2022 at 7:02 PM Raul Miller <[email protected]>
>> > wrote:
>> > >
>> > >> I ran into a situation, today (dealing with files), where most of the
>> > >> files were utf-8 encoded but some represented the latin-1 "code
>> plane"
>> > >> with 8 bit characters.
>> > >>
>> > >> To cope with this issue, I coded up a mechanism to test whether the
>> > >> file contained only valid utf-8 sequences, and used {{ ": 10 u: y }}
>> > >> for the files which failed this test.
>> > >>
>> > >> In other words:
>> > >>
>> > >> cclass=: (i.9) (48+i.9)} 256#9
>> > >> cstates=: 0 10#:10* ".;._2{{)n
>> > >>   0    7.3  2    3    4    5    6    7.3  7.3  7.1 NB. 0: start char
>> > >> sequence
>> > >>   0    7.3  2    3    4    5    6    7.3  7.3  7.1 NB. 1: finish char
>> > >> sequence, start next
>> > >>   7.3  1    7.3  7.3  7.3  7.3  7.3  7.3  7.3  7.3 NB. 2: need one
>> > >> more character
>> > >>   7.3  2    7.3  7.3  7.3  7.3  7.3  7.3  7.3  7.3 NB. 3: need two
>> > >> more characters
>> > >>   7.3  3    7.3  7.3  7.3  7.3  7.3  7.3  7.3  7.3 NB. 4: need three
>> > >> more characters
>> > >>   7.3  4    7.3  7.3  7.3  7.3  7.3  7.3  7.3  7.3 NB. 5: need four
>> > >> more characters
>> > >>   7.3  5    7.3  7.3  7.3  7.3  7.3  7.3  7.3  7.3 NB. 6: need five
>> > >> more characters
>> > >>   7.3  7.3  7.3  7.3  7.3  7.3  7.3  7.3  7.3  7.2 NB. 7: end
>> > >> }}
>> > >>
>> > >> utf8lenb=: <:2#.>1 #each~1+i.8
>> > >> utf8ok=: {{
>> > >>   try.
>> > >>     (1;cstates;cclass) ;: '.',~'012345678_'{~ utf8lenb I. 3 u: y
>> > >>     1
>> > >>   catch.
>> > >>     0
>> > >>   end.
>> > >> }}
>> > >>
>> > >> NB. most content is utf-8 -- assume non-utf-8 sequences are
>> > ascii+latin-1
>> > >> latin2utf8=: {{
>> > >>   if.utf8ok y do. y else. ":10 u: y end.
>> > >> }}
>> > >>
>> > >> I don't know if this approach would be useful to anyone else here,
>> > >> but... just in case...
>> > >>
>> > >> FYI,
>> > >>
>> > >> --
>> > >> Raul
>> > >>
>> ----------------------------------------------------------------------
>> > >> For information about J forums see
>> http://www.jsoftware.com/forums.htm
>> > >>
>> > > ----------------------------------------------------------------------
>> > > For information about J forums see
>> http://www.jsoftware.com/forums.htm
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>> >
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to