Nice! I wish I had thought of that.
Thanks, -- Raul On Tue, Mar 22, 2022 at 7:10 PM Elijah Stone <[email protected]> wrote: > > FWIW here is a one-liner hack which accomplishes the same thing: > > latin2utf8=: (9&u: ] ]) :: (8 u: 10 u: ]) > > -E > > On Tue, 22 Mar 2022, Raul Miller wrote: > > > I ran into a situation, today (dealing with files), where most of the > > files were utf-8 encoded but some represented the latin-1 "code plane" > > with 8 bit characters. > > > > To cope with this issue, I coded up a mechanism to test whether the > > file contained only valid utf-8 sequences, and used {{ ": 10 u: y }} > > for the files which failed this test. > > > > In other words: > > > > cclass=: (i.9) (48+i.9)} 256#9 > > cstates=: 0 10#:10* ".;._2{{)n > > 0 7.3 2 3 4 5 6 7.3 7.3 7.1 NB. 0: start char sequence > > 0 7.3 2 3 4 5 6 7.3 7.3 7.1 NB. 1: finish char > > sequence, start next > > 7.3 1 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 NB. 2: need one > > more character > > 7.3 2 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 NB. 3: need two > > more characters > > 7.3 3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 NB. 4: need three > > more characters > > 7.3 4 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 NB. 5: need four > > more characters > > 7.3 5 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 NB. 6: need five > > more characters > > 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.2 NB. 7: end > > }} > > > > utf8lenb=: <:2#.>1 #each~1+i.8 > > utf8ok=: {{ > > try. > > (1;cstates;cclass) ;: '.',~'012345678_'{~ utf8lenb I. 3 u: y > > 1 > > catch. > > 0 > > end. > > }} > > > > NB. most content is utf-8 -- assume non-utf-8 sequences are ascii+latin-1 > > latin2utf8=: {{ > > if.utf8ok y do. y else. ":10 u: y end. > > }} > > > > I don't know if this approach would be useful to anyone else here, > > but... just in case... > > > > FYI, > > > > -- > > Raul > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
