I have amended http://www.jsoftware.com/papers/APLHashingModel.htm to include both APL and J hashing models.
On Thu, Jul 31, 2014 at 11:55 AM, Roger Hui <rogerhui.can...@gmail.com> wrote: > Please see http://www.jsoftware.com/papers/APLHashingModel.htm > > > > > > On Thu, Jul 31, 2014 at 11:18 AM, Raul Miller <rauldmil...@gmail.com> > wrote: > >> I don't suppose you could post that somewhere on the web where the apl >> characters being no longer a problem could shine through? >> >> (Every special character displays as the same character for me, right >> now.) >> >> Thanks, >> >> -- >> Raul >> >> >> On Thu, Jul 31, 2014 at 11:37 AM, Roger Hui <rogerhui.can...@gmail.com> >> wrote: >> >> > There is no point to hashing or small-range or ... if you only have one >> > item in the right argument. Benchmarks in the paper (as stated) were >> done >> > on arguments with 1e6 items. >> > >> > I have the following model of hashing in Dyalog APL. Its translation >> into >> > J (including dealing with APL chars which I am told is NO LONGER A >> PROBLEM) >> > is left as an exercise for the reader. :-). Another (easy) exercise is >> to >> > find x and y for which the verbose model is faster than the one-liner. >> > >> > z←x xiy y;⎕io;h;hf;i;j;m;n;q >> > >> > ⍝ model of x⍳y using hashing; written to be easily translated into C >> > >> > >> > >> > ⎕io←0 >> > >> > hf←{123457×⍵} ⍝ hash function >> > >> > n←≢y >> > >> > m←≢x >> > >> > q←2*⌈2⍟m ⍝ size of hash table >> > >> > h←q⍴m ⍝ hash table; m means "free" >> > >> > z←n⍴m ⍝ initialize to "not found" >> > >> > >> > >> > :For i :In ⍳m ⍝ index for each x >> > >> > j←q|hf x[i] ⍝ index into hash table >> > >> > :While m>h[j] ⋄ :AndIf x[h[j]]≠x[i] ⍝ i.e. stop on finding m or an >> > equal entry >> > j←q|1+j ⍝ the next hash table entry >> > >> > :End >> > >> > :If m=h[j] ⋄ h[j]←i ⋄ :End ⍝ new hash entry >> > >> > :End >> > >> > >> > >> > :For i :In ⍳n ⍝ index for each y >> > >> > j←q|hf y[i] ⍝ where to start looking in >> hash >> > table >> > :While m>h[j] ⋄ :AndIf x[h[j]]≠y[i] ⍝ i.e. stop on finding m or an >> > equal entry >> > j←q|1+j ⍝ the next hash table entry >> > >> > :End >> > >> > z[i]←h[j] ⍝ here, either m=h[j] or >> > x[h[j]]=y[i] >> > :End >> > >> > >> > >> > On Thu, Jul 31, 2014 at 6:41 AM, Joe Bogner <joebog...@gmail.com> >> wrote: >> > >> > > I enjoyed your paper and particularly enjoyed playing with the native >> > > J implementation: >> > > >> > > xiy =: 13 : '+/*./\x~:/y' >> > > >> > > (i. 1e6) xiy 1e5 >> > > 100000 >> > > >> > > I was surprised that the tacit J version performed reasonably close to >> > > the special code C version: >> > > >> > > big=: 1e6 >> > > >> > > 100 timespacex 'big xiy 1e5' >> > > 1.01128e_5 2816 >> > > >> > > 100 timespacex 'big i. 1e5' >> > > 1.23157e_6 1664 >> > > >> > > The timing starts to diverge on significantly boxed arrays it seems at >> > > first glance: >> > > >> > > big=: (1e6 # <'a') >> > > >> > > 100 timespacex 'big i. <''a''' >> > > 3.50044e_6 1920 >> > > 100 timespacex 'big xiy <''a''' >> > > 0.054772 2.09933e6 >> > > >> > > That seems to be hitting an optimization where it stops on first >> > > find. Compare to: >> > > >> > > 100 timespacex 'big i. <''z''' >> > > 0.0486771 1920 >> > > >> > > And it runs similarly to the native J version, which was 0.054772 >> > > >> > > At first I was also puzzled by why / was required. >> > > >> > > xiy1 =: 13 : '+/*./\x~:y' >> > > (i. 1e5) (xiy -: xiy1) 10 >> > > 1 >> > > >> > > Then I realized this: >> > > >> > > (i. 1e5) xiy1 (10,2) >> > > |length error: xiy1 >> > > | (i.100000) xiy1(10,2) >> > > >> > > However: >> > > (i. 1e5) xiy (10,2) >> > > 10 2 >> > > >> > > It was neat to tinker with xiy to better understand how it works. >> > > >> > > I now need to spend some time better understanding the hashing. I >> > > understand at a surface level yet want to play with the examples too. >> > > I may try to create a J implementation of the algorithm using amend >> > > and loops. I realize it will be slow, but it will be easier to play >> > > with than C. If someone else wants to do it and share I'd be glad to >> > > use that instead. Otherwise, if I get around to it I will post it. >> > > >> > > >> > > >> > > >> > > >> > > >> > > On Wed, Jul 30, 2014 at 2:44 AM, Roger Hui <rogerhui.can...@gmail.com >> > >> > > wrote: >> > > > My J Conference 2014 presentation can be found at: >> > > > >> > > > Slides only: http://www.jsoftware.com/papers/indexof/ >> > > > Slides and script: >> > > http://www.jsoftware.com/papers/indexof/indexofscript.htm >> > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm