I'm going to be building hashes, but currently it looks like it'll be
faster to generate them after finding the unique values rather than
before.

Hashes have great sales pitches in favor of them, but generating them
takes time.

Meanwhile, getting lots and lots of real memory is relatively cheap.
According to https://aws.amazon.com/ec2/pricing/ I can use a machine
with 244 gigabytes of ram for $3.50 an hour. I do not think I'll need
that much memory, even for a short time.

If it helps, though, there's other ways to get a substring:

   4{.3}. 'abcdefghijk'
defg
   (3+i.4){'abcdefghijk'
defg

Thanks,

-- 
Raul


On Thu, Apr 10, 2014 at 6:45 PM, Don Guinn <[email protected]> wrote:
>
>
> Sounds like you can format the data as you want once you get it all. Given
> the amount of data there is no way to read it
> the entire file
>  in an interactive environment. So it has to be indexed or in a database.
> If you really want to use a segmented string you could keep the index to
> the start of each line to
> quickly get
>  the line. Also, build a hash for the keys you intend to search on rather
> than trying to search the actual file.
> This index and hash should fit in memory quite easily.
> When you talk about a file that big, mapped or not, it's still going to
> require the amount of virtual storage the size of the file, which will need
> to be moved from the file or swap into real storage to process.
> Scanning the entire file
>  will be very slow unless you have lots and lots of real memory.
>
> I haven't seen anyone mention an easy way to substring from a text file. It
> took me a long time to find it in the dictionary.
> It really exists.
> And it was not obvious when I finally did find it. So I will post it here
> for anyone who hasn't found it.
>
>
>    substr=:4 : 0 NB. x: start and length, y: string
>
> (,:x)];.0 y
>
> )
>
>    3 4 substr 'abcdefghijk'
>
> defg
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to