Re: [Jprogramming] Insert zeroes into large data file

Dan Bron Tue, 29 Jul 2008 16:19:01 -0700

I wrote:
>  If you're really interested in a tacit solution, then the next avenue 
>  to explore is replacing   \  with  ;. 
>
>  Left as an exercise to the reader.


I thought someone else would post a solution pretty quickly, but no one did.  
So I've written my own.  As others have said, there's nothing wrong with 
explicit looping when you need to, and doing so is probably the only truly 
scalable solution.  None the less, here is a completely tacit solution.

It follows the design outlined in my previous message.  Fundamentally, the 
solution is   #!.'0'~ 1 j. ',,' E. ]  but since that fails on very large 
inputs, I've modified it to partition the input, process each chunk, and 
reassemble the results.  This is similar to the  \  solution, but I took my own 
advice and used  ;.  instead, to avoid the bug whereby a partition could fall 
between two successive commas.

I haven't tested it, and I do not guaruntee it is free of bugs; it is merely 
meant to prove the concept that an entirely tacit solution is possible.  On an 
input of  134125010 $ '0,,34567,,abcd,,efg'  it completes in about 25 seconds 
on my laptop.   Here it is:

           text          =.  '0,,34567,,abcd,,efg'
           big_text      =.  134125010 $ text
           
           chunk_idx     =.  (i.@:<.&.(%&chunk_size =: 10000))@:#
           chunkify_mask =.  (($@:[ $ 0"_) (1"_)`]`[} _1 , ] + ',,' -:"1 ({~ 
(,. >:))) chunk_idx 
           
           null2zero     =.  #!.'0'~ 1 j.',,' E. ]
           
             (;@:(<@:null2zero;.2)~ chunkify_mask)     text
        0,0,34567,0,abcd,0,efg
           
           $ (;@:(<@:null2zero;.2)~ chunkify_mask) big_text
        155302643
           
Correcting bugs and optimizing performance (perhaps leveraging special code) is 
again left as an exercise for the reader.

-Dan

PS to Roger:  Originally, I had  0:  and  1:  in place of  0"_  and  1"_  
              respectively, but this resulted in limit errors.  

              The problem is that  0:y  results in an integer, where  0"_y  
              results in a boolean.  This is consistent with the other 
              constant primitives (e.g.  7:  ) but inefficient and 
              problematic in many cases.  

              Most users won't know J well enough to try  0"_  when  0:  
              fails.  Further,  0:  and  1:  are special [1] and it is 
              justifiable to treat them specially (especially given that 
              a lot of J primitives have special code for boolean inputs,
              so idioms that leverage  0: and  1:  would be more efficient,
              automatically).  

              What is the likelihood of changing  0:  and  1:  to produce
              boolean outputs?

[1] http://www.jsoftware.com/pipermail/programming/2005-November/000049.html

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Insert zeroes into large data file

Reply via email to