Re: [Jprogramming] memory mapped tab delimited file

Joe Bogner Thu, 10 Oct 2013 12:23:45 -0700

Thanks Pascal. Similar run-time , slightly slower on two runs -- around 115
 avg each vs 100


I had to modify it to make it work, so I assume this is the same as what
you had intended and you mentioned it wasn't tested

+/ (<'ABC') = 2&{@:([: <;._1 TAB -.~ ])"1 mf

It needed a cap I think (according to playing with 13 :0 ) and due to a
domain error without it

I'll assume it's as good as it's going to get. Thanks again


On Thu, Oct 10, 2013 at 2:27 PM, Pascal Jasmin <[email protected]>wrote:

> +/ (3 :'(2{"1 (< ;._1 TAB -.~ y))=<''ABC''')"1  mf
>
>
> match '-:' might be faster than =, but overall just a tacit version:
>
> +/ (<'ABC') = 2&{@:(<;._1 TAB -.~ ])"1 mf
>
> untested
>
>
> ----- Original Message -----
> From: Joe Bogner <[email protected]>
> To: [email protected]
> Cc:
> Sent: Thursday, October 10, 2013 2:02:07 PM
> Subject: [Jprogramming] memory mapped tab delimited file
>
> I have a 5 gig, 9 million row tab delimited file that I'm working with.
>
> I started with a subset of 300k records and used fapplylines. It took about
> 5 seconds. I shaved 2 seconds off by modifying fapplylines to use memory
> mapped files
>
> I then applied it to my larger file and found that it was taking about 220
> seconds. Not bad, but I wanted to push for something faster.
>
> Using a memory mapped file was simple enough. I wrote a routine to add a
> column and pad it to the longest column (600 characters).
>
> $ mf
> 9667548 602
>
> I'd like to keep it in a tab delimited file if possible because I'm using
> that file for other purposes.
>
> The file is so large that I don't think I'll be able to cut it up ahead of
> time into an inverted table or otherwise (but maybe?), so I'm effectively
> looping through
>
> I've played with different variants and came up with the following
> statement to count the number of rows that have column 2 = ABC
>
> +/ (3 :'(2{"1 (< ;._1 TAB -.~ y))=<''ABC''')"1  mf
>
> This gives the correct result and takes about 102 seconds and only uses
> about 2 gig of memory while running and settles back down to 500mb
>
> I picked off some of the syntax _1 TAB -.~ from other posts.
>
> Is there any ideas on how to make it go faster or am I up against hardware
> limit? By the way, I'm impressed with this speed as is. It takes about 348
> seconds to read into R using the heavily optimized data.table fread package
> which also uses memory mapped files. The standard import is more than a few
> hours. I can go from start to finish in J in under 102 seconds.
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] memory mapped tab delimited file

Reply via email to