On 7/29/08, Matthew Brand <[EMAIL PROTECTED]> wrote:
> Does anyone know a faster algorithm to do this on such a large file? Can it
> be done in a 32-bit address space? The problem can be solved by streaming
> through the data in C++, but I want know how to do it in J efficiently
> without using explicit loops.
Your file is over a gigabyte -- just writing that much data will take a
lot of time (how much time depends on your disk -- its speed, how
much space it has, and how fragmented that space is).
That said, this could be made to work in a 32 bit address space. The
trick is that you do not have to process your entire file at once.
require'csv'
fixcsv ('0,0,0',LF),'0,0,0',LF
The fixcsv routine will take a csv text element and convert it
to the corresponding table structure.
Hypothetically speaking, you could read blocks of data in
(using 1!:11), process them, then append them to a result
file (using 1!:3). You would also want to keep track of any
line fragment (the characters following the last LF in your
block) and pre-pend them to the next block that you read
in, but that's fairly simple.
With an appropriate block size (maybe a megabyte? 10MB?)
your overhead from J should not be too bad, but your intermediate
results should not be too large.
This will not be particularly quick -- not with that much data -- but
you can take some comfort in being able to watch the result file
growing as it gets processed.
FYI,
--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm