I wrote:
> If you're really interested in a tacit solution, then the next avenue
> to explore is replacing \ with ;.
>
> Left as an exercise to the reader.
I thought someone else would post a solution pretty quickly, but no one did.
So I've written my own. As others have said, there's nothing wrong with
explicit looping when you need to, and doing so is probably the only truly
scalable solution. None the less, here is a completely tacit solution.
It follows the design outlined in my previous message. Fundamentally, the
solution is #!.'0'~ 1 j. ',,' E. ] but since that fails on very large
inputs, I've modified it to partition the input, process each chunk, and
reassemble the results. This is similar to the \ solution, but I took my own
advice and used ;. instead, to avoid the bug whereby a partition could fall
between two successive commas.
I haven't tested it, and I do not guaruntee it is free of bugs; it is merely
meant to prove the concept that an entirely tacit solution is possible. On an
input of 134125010 $ '0,,34567,,abcd,,efg' it completes in about 25 seconds
on my laptop. Here it is:
text =. '0,,34567,,abcd,,efg'
big_text =. 134125010 $ text
chunk_idx =. (i.@:<.&.(%&chunk_size =: 10000))@:#
chunkify_mask =. (($@:[ $ 0"_) (1"_)`]`[} _1 , ] + ',,' -:"1 ({~
(,. >:))) chunk_idx
null2zero =. #!.'0'~ 1 j.',,' E. ]
(;@:(<@:null2zero;.2)~ chunkify_mask) text
0,0,34567,0,abcd,0,efg
$ (;@:(<@:null2zero;.2)~ chunkify_mask) big_text
155302643
Correcting bugs and optimizing performance (perhaps leveraging special code) is
again left as an exercise for the reader.
-Dan
PS to Roger: Originally, I had 0: and 1: in place of 0"_ and 1"_
respectively, but this resulted in limit errors.
The problem is that 0:y results in an integer, where 0"_y
results in a boolean. This is consistent with the other
constant primitives (e.g. 7: ) but inefficient and
problematic in many cases.
Most users won't know J well enough to try 0"_ when 0:
fails. Further, 0: and 1: are special [1] and it is
justifiable to treat them specially (especially given that
a lot of J primitives have special code for boolean inputs,
so idioms that leverage 0: and 1: would be more efficient,
automatically).
What is the likelihood of changing 0: and 1: to produce
boolean outputs?
[1] http://www.jsoftware.com/pipermail/programming/2005-November/000049.html
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm