Raul, That is an elegant answer and one I'll add to my notes to reference in the future.
It takes approximately 50 seconds to process 240 million 9 character infixes using 50 million character blocks. As a comparison, I wrote an integration to TCC which enables generating and calling C code from J https://github.com/joebo/lang-lab/blob/master/j/tcc.ijs It could use some cleanup and it's not tested on linux or win32 It includes everything to build the stub dll and then some examples. Here's an example of creating infixes on a character array: testInfix_code=: 0 : 0 #include <stdio.h> void free2(long ptr) { free(ptr); } long infix(long pmem, long len, int infixLen, long long *out) { long long allocLen = len*sizeof(char)*infixLen; char *newMem = (char*)malloc(allocLen); char *mem = (char*)pmem; printf("len: %d\n", len); fflush(stdout); long long offset = 0; long long idx = 0; for(long long i=0;i<len && ((len-offset) >= infixLen);i++) { for(long q=0;q<infixLen;q++) { newMem[idx++] = mem[q+offset]; } offset++; } *out = idx; return newMem; } ) To run this: infix=: 3 : 0 txt=.y addr=. mema 4*(#txt) txt memw addr,0,(#txt) infixSize=.9 tcc=:init_tcc_'' compile_tcc_ < testInfix_code ret=.execIntOutInt_tcc_ 'infix';addr;(#txt);infixSize smoutput ret 'memPtr size'=: ret output =: memr memPtr,0,size execInt_tcc_ 'free';memPtr;0 free_tcc_'' output ) testInfix =: 3 : 0 txt=. 'abcdefghijklmnopqrstuvwxyz' infix txt ) _9[\ testInfix'' len: 26 +--------+---+ |15900816|162| +--------+---+ abcdefghi bcdefghij cdefghijk defghijkl efghijklm fghijklmn ghijklmno hijklmnop ijklmnopq jklmnopqr klmnopqrs lmnopqrst mnopqrstu nopqrstuv opqrstuvw pqrstuvwx qrstuvwxy rstuvwxyz I ended up running into problems with sizes greater than 75 million, probably due to a long size issue. Funny enough, it's actually slower than J's version by a factor of 4. 6!:2 'b1=:_9[\(infix 50e6{. txt)' 1.82504 6!:2 'b2=:9[\ 50e6{. txt' 0.431845 b1-:b2 1 This could be a combination of TCC's ability to optimize code or it could be J's algorithm is implemented more efficiently than mine (probably both) The actual JIT operation only takes about 5/100 of a second I'm sure I could profile it and speed it up, but I've already spent considerably more time than desired on this "challenge". On the upside, we now have a prototype for generating C code from J On Tue, Feb 3, 2015 at 8:05 PM, Raul Miller <[email protected]> wrote: > Here's how I'd implement the overlapped block thing. > > overlapblock=:2 :0 > : > 'size overlap init'=. x > bs=. size*i.>.(#y)%{.size > bl=. bs-~(#y)<.bs+size+overlap > accum=. init > for_sl. bs,.bl do. > accum=. accum u v ((+i.)/sl){y > end. > ) > > #(10;9;i.0 0) ~.@, overlapblock (~.@(9&(]\))) 30$'abcdefghij' > 10 > > Of course change the value for x and y to match what you want to do... > And maybe recode to swap u and v, if that looks better. > > If that's too slow, it might be better to gather intermediate results > from each block and then post-process them in a single pass? > > Thanks, > > -- > Raul > > On Tue, Feb 3, 2015 at 6:24 PM, Joe Bogner <[email protected]> wrote: >> On Tue, Feb 3, 2015 at 6:17 PM, Raul Miller <[email protected]> wrote: >>> So you are working with non-overlapping infixes? >>> >>> _9[\'abcdefghijklmnopqrstuvwxyz' >>> abcdefghi >>> jklmnopqr >>> stuvwxyz >>> >> >> >> Sorry, no that was a typo. I can use non-overlapping to split up into >> blocks, but I need overlapping for the string. >> >> It is overlapping infixes as I am calculating the unique 9 character >> substrings >> >> 9[\abc >> abcdefghi >> bcdefghij >> cdefghijk >> defghijkl >> efghijklm >> fghijklmn >> ghijklmno >> hijklmnop >> ijklmnopq >> jklmnopqr >> klmnopqrs >> lmnopqrst >> mnopqrstu >> nopqrstuv >> opqrstuvw >> pqrstuvwx >> qrstuvwxy >> rstuvwxyz >> >> # ~. 9[\abc >> 18 >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
