Probably should delete the {. from the line declaring the bs value.
That's an [hopefully] irrelevant artifact from an earlier version of the code.
Thanks,
--
Raul
On Wed, Feb 4, 2015 at 1:43 PM, Joe Bogner <[email protected]> wrote:
> Raul,
>
> That is an elegant answer and one I'll add to my notes to reference in
> the future.
>
> It takes approximately 50 seconds to process 240 million 9 character
> infixes using 50 million character blocks.
>
> As a comparison, I wrote an integration to TCC which enables
> generating and calling C code from J
>
> https://github.com/joebo/lang-lab/blob/master/j/tcc.ijs
>
> It could use some cleanup and it's not tested on linux or win32
>
> It includes everything to build the stub dll and then some examples.
>
> Here's an example of creating infixes on a character array:
>
> testInfix_code=: 0 : 0
> #include <stdio.h>
>
> void free2(long ptr) {
> free(ptr);
> }
>
> long infix(long pmem, long len, int infixLen, long long *out) {
> long long allocLen = len*sizeof(char)*infixLen;
> char *newMem = (char*)malloc(allocLen);
> char *mem = (char*)pmem;
>
> printf("len: %d\n", len);
> fflush(stdout);
> long long offset = 0;
> long long idx = 0;
>
> for(long long i=0;i<len && ((len-offset) >= infixLen);i++) {
> for(long q=0;q<infixLen;q++) {
> newMem[idx++] = mem[q+offset];
> }
> offset++;
> }
>
> *out = idx;
> return newMem;
> }
> )
>
> To run this:
>
> infix=: 3 : 0
> txt=.y
> addr=. mema 4*(#txt)
> txt memw addr,0,(#txt)
> infixSize=.9
> tcc=:init_tcc_''
> compile_tcc_ < testInfix_code
> ret=.execIntOutInt_tcc_ 'infix';addr;(#txt);infixSize
> smoutput ret
> 'memPtr size'=: ret
> output =: memr memPtr,0,size
> execInt_tcc_ 'free';memPtr;0
> free_tcc_''
> output
> )
>
> testInfix =: 3 : 0
> txt=. 'abcdefghijklmnopqrstuvwxyz'
> infix txt
> )
>
> _9[\ testInfix''
> len: 26
> +--------+---+
> |15900816|162|
> +--------+---+
> abcdefghi
> bcdefghij
> cdefghijk
> defghijkl
> efghijklm
> fghijklmn
> ghijklmno
> hijklmnop
> ijklmnopq
> jklmnopqr
> klmnopqrs
> lmnopqrst
> mnopqrstu
> nopqrstuv
> opqrstuvw
> pqrstuvwx
> qrstuvwxy
> rstuvwxyz
>
>
>
> I ended up running into problems with sizes greater than 75 million,
> probably due to a long size issue.
>
> Funny enough, it's actually slower than J's version by a factor of 4.
>
> 6!:2 'b1=:_9[\(infix 50e6{. txt)'
> 1.82504
> 6!:2 'b2=:9[\ 50e6{. txt'
> 0.431845
> b1-:b2
> 1
>
>
>
> This could be a combination of TCC's ability to optimize code or it
> could be J's algorithm is implemented more efficiently than mine
> (probably both)
>
> The actual JIT operation only takes about 5/100 of a second
>
> I'm sure I could profile it and speed it up, but I've already spent
> considerably more time than desired on this "challenge".
>
> On the upside, we now have a prototype for generating C code from J
>
>
>
> On Tue, Feb 3, 2015 at 8:05 PM, Raul Miller <[email protected]> wrote:
>> Here's how I'd implement the overlapped block thing.
>>
>> overlapblock=:2 :0
>> :
>> 'size overlap init'=. x
>> bs=. size*i.>.(#y)%{.size
>> bl=. bs-~(#y)<.bs+size+overlap
>> accum=. init
>> for_sl. bs,.bl do.
>> accum=. accum u v ((+i.)/sl){y
>> end.
>> )
>>
>> #(10;9;i.0 0) ~.@, overlapblock (~.@(9&(]\))) 30$'abcdefghij'
>> 10
>>
>> Of course change the value for x and y to match what you want to do...
>> And maybe recode to swap u and v, if that looks better.
>>
>> If that's too slow, it might be better to gather intermediate results
>> from each block and then post-process them in a single pass?
>>
>> Thanks,
>>
>> --
>> Raul
>>
>> On Tue, Feb 3, 2015 at 6:24 PM, Joe Bogner <[email protected]> wrote:
>>> On Tue, Feb 3, 2015 at 6:17 PM, Raul Miller <[email protected]> wrote:
>>>> So you are working with non-overlapping infixes?
>>>>
>>>> _9[\'abcdefghijklmnopqrstuvwxyz'
>>>> abcdefghi
>>>> jklmnopqr
>>>> stuvwxyz
>>>>
>>>
>>>
>>> Sorry, no that was a typo. I can use non-overlapping to split up into
>>> blocks, but I need overlapping for the string.
>>>
>>> It is overlapping infixes as I am calculating the unique 9 character
>>> substrings
>>>
>>> 9[\abc
>>> abcdefghi
>>> bcdefghij
>>> cdefghijk
>>> defghijkl
>>> efghijklm
>>> fghijklmn
>>> ghijklmno
>>> hijklmnop
>>> ijklmnopq
>>> jklmnopqr
>>> klmnopqrs
>>> lmnopqrst
>>> mnopqrstu
>>> nopqrstuv
>>> opqrstuvw
>>> pqrstuvwx
>>> qrstuvwxy
>>> rstuvwxyz
>>>
>>> # ~. 9[\abc
>>> 18
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm