Seems like a good motivation to support quad equal: ⌸ See the key operator in dyalog: http://help.dyalog.com/15.0/Content/Language/Primitive%20Operators/Key.htm
On the other hand, pattern matching A[n]←x for in-place operation seems a good way to go. Not sure if it’s possible in GNU APL. > On Sep 9, 2016, at 10:27 PM, Christian Robert <christian.rob...@polymtl.ca> > wrote: > > > I got to may be 2% of the work with this: > > alpha_only←{(⍵∊'abcdefghijklmnopqrstuvwxyz ')/⍵←tolower ⍵} > remove_blank_lines←{(∊0≠⍴¨⍵)/⍵} > tolower←{('abcdefghijklmnopqrstuvwxyz',⎕av)[('ABCDEFGHIJKLMNOPQRSTUVWXYZ',⎕av)⍳⍵]} > > > > )sic > )erase readfile_fast > ∇z←readfile_fast name;fd;lines;⎕io > ⎕io←1 ⍝ Bring a file into a vector of strings, utf8 aware for both name and > contents. > →(0≠"r" ⎕fio[31] 18 ⎕cr name)/Error ⍝ Can not read file ? → Error > z←⎕fio[26] 18 ⎕cr name ⍝ First pass, read the whole > file > lines←⍳+/((↑"\n")=z) ⍝ Compute the iota for each line > z←(⍴lines)⍴⍬ ⍝ Preallocate "z" to the right > size > fd←⎕fio[3] 18 ⎕cr name ⍝ Open the file > ⊣ {⊣z[⍵]←⊂19 ⎕cr ⎕ucs ¯1↓⎕fio[8] fd} ⍤0 lines ⍝ Put each line in the > preallocated "z" > ⊣ ⎕fio[4] fd ⋄ →0 ⍝ Close the file and return > Error: ⎕ES ∊'Error on file "',name,'": ',⎕fio[2] | ⎕fio[1] '' > ∇ > > > alpha_only←{(⍵∊'abcdefghijklmnopqrstuvwxyz ')/⍵←tolower ⍵} > remove_blank_lines←{(∊0≠⍴¨⍵)/⍵} > tolower←{('abcdefghijklmnopqrstuvwxyz',⎕av)[('ABCDEFGHIJKLMNOPQRSTUVWXYZ',⎕av)⍳⍵]} > vertical←{,[⍳0]⍵} > words_only←{(⍵∊'abcdefghijklmnopqrstuvwxyz ')/⍵←tolower ⍵} > > ⍝ then ... > > z←remove_blank_lines alpha_only ¨ tolower ¨ readfile_fast 'big.txt' > > ⍴ z > 103561 > ⍝ here you have 103,561 lines, no empty ones, clean of special > characters (but may have several blanks between each word). > > ⌊/⍴¨z ⍝ minimum line length, probable "I" > 1 > > ⌈/⍴¨z ⍝ maximum line length, may contain 400 to 600 words on each line > of 2488 characters. > 2488 > > ⍝ at this point you have to iterate (rank operator?) over thoses 103,561 > lines > ⍝ to extract all the words in each lines, saving thems (unique) and > count the occurence of > ⍝ each word. > > ⍝ since APL can't do things like count['abc'] = 0 or count['abc'] += 1 > (index with string on vectors) > ⍝ it's a near no-end issue (eg: very difficult to do, but not impossible) > > ⍝ you will NEVER win race to language like "awk" who have indexed string > *part* of the basic language. > > my 2 cents, > > Xtian. > > On 2016-09-09 17:39, Ala'a Mohammad wrote: >> Hi, >> >> I'm trying to create simple spell corrector (Norvig at >> http://norvig.com/spell-correct.html) in APL. >> I tried but stumbled upon the frequency/count stage and could not move >> further. The stopper was either WS Full, or apl process killed. I'm >> assuming the main issue is 'lack of experience with APL', and thus the >> inefficient coding. >> >> ftxt ← { ⎕FIO[26] ⍵ } >> a ← 'abcdefghijklmnopqrstuvwxyz' >> A ← 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' >> downcase ← { (a,⎕AV)[(A,⎕AV)⍳⍵] } >> nl ← ⎕UCS 13 >> cr ← ⎕UCS 10 >> tab ← ⎕UCS 9 >> nonalpha ← nl, cr, tab, ' 0123456789()[]!?%$,.:;/+*=<>-_#"`~@&' >> alphamask ← { ~ ⍵ ∊ nonalpha } >> hist ← { (⍪∪⍵),+/∨/¨(∪⍵)∘.⍷⍵ } >> fhist ← { hist (alphamask txt) ⊂ downcase txt ← ftxt ⍵ } >> ⍝ file ← '/misc/small.txt' ~ 28K >> ⍝ file ← '/misc/xaa' ~ 1.3M >> file ← '/misc/big.txt' ⍝ ~ 6.2M >> ⍝ following 2 lines for debugging >> ⎕ ← ⍴w ← (alphamask txt) ⊂ downcase txt ← ftxt file >> ⎕ ← ⍴u ← ∪w >> fhist file >> >> the errors happened inside 'hist' function, and I presume mostly due >> to the jot dot find (if understand correctly, operating on a matrix of >> length equal to : unique-length * words-length) >> >> Is there anyway to fix the issue? and then proceed to complete the solution. >> >> Also, Is this the way to create simple spell corrector in APL (that is >> a one which is capitalizing on APL strength as an array language)? >> >> I'm using >> LinuxMint 17.1 (kernel 3.13.0-37-generic #64-Ubuntu) >> Gnu APL 1.6 (794) >> Zsch 5.0.2 >> Emacs 25.1.50.1 >> >> Best, >> >> Ala'a >> >> P.S: I hoped that I could create the solution in APL and then get some >> wacks on the head from fellow experienced APL programmers before >> submitting it as 'another solution in X language'. but the hope >> stopped short before even getting the probability stage. >> >> >