On 7/29/08, Matthew Brand <[EMAIL PROTECTED]> wrote:
> Does anyone know a faster algorithm to do this on such a large file? Can it
> be done in a 32-bit address space?  The problem can be solved by streaming
> through the data in C++, but I want know how to do it in J efficiently
> without using explicit loops.

Your file is over a gigabyte -- just writing that much data will take a
lot of time (how much time depends on your disk -- its speed, how
much space it has, and how fragmented that space is).

That said, this could be made to work in a 32 bit address space.  The
trick is that you do not have to process your entire file at once.

require'csv'
fixcsv ('0,0,0',LF),'0,0,0',LF

The fixcsv routine will take a csv text element and convert it
to the corresponding table structure.

Hypothetically speaking, you could read blocks of data in
(using 1!:11), process them, then append them to a result
file (using 1!:3).  You would also want to keep track of any
line fragment (the characters following the last LF in your
block) and pre-pend them to the next block that you read
in, but that's fairly simple.

With an appropriate block size (maybe a megabyte? 10MB?)
your overhead from J should not be too bad, but your intermediate
results should not be too large.

This will not be particularly quick -- not with that much data -- but
you can take some comfort in being able to watch the result file
growing as it gets processed.

FYI,

-- 
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to