[julia-users] What is really "big data" for Julia (or otherwise), 1D or multi-dimensional?

Páll Haraldsson Wed, 12 Oct 2016 15:25:09 -0700

I'm thinking of a new algorithm for Julia..

I'm most concerned, about how much needs to fit in *RAM*, and curious what 
is considered big, in RAM (or not..).


A.
For 2D (or more), dense or sparse (including non-square), is at most a 2 
billion for any highest dimensional a big limit? Note for square dense 
array you can't get more than 8.4 million × 8.4 million (with 2015/2015 era 
x86 CPUs as address busses are capped at 46-bit; while theoretical 4 
billion × 4 billion could fit if 64-bit addressing was available) to fit in 
RAM (one byte per entry).. and in practice much lower.. limited by actual 
RAM..


I see, however, a map-reduce way:

http://infolab.stanford.edu/~ullman/mmds/book.pdf
2.6.7 Case Study: Matrix Multiplication


Would that use much less RAM? At any point?


B.

I'm aware of billion row tables, but you usually query them (or kind of 
"stream" them), how much would be limiting to fit in RAM? Would a 2 GB (or 
say 8 or 16 GB) be limiting?


https://books.google.is/books?id=BKEoDAAAQBAJ&pg=PA145&lpg=PA145&dq=big+one+dimensional+dataset&source=bl&ots=qkbpp3Ks_T&sig=ewWSbdVp8MUhQHjMqMWfnQh4Rfs&hl=en&sa=X&redir_esc=y#v=onepage&q=big%20one%20dimensional%20dataset&f=false


Three billion DNA <https://en.wikipedia.org/wiki/DNA> base pairs 
<https://en.wikipedia.org/wiki/Base_pair>, seem to blow 2 GB limit, but not 
if you need less than one byte per base. I also doubt all chromosomes would 
be kept in the same array.

Can't imagine 2 GB being limiting for UFT-8..

[julia-users] What is really "big data" for Julia (or otherwise), 1D or multi-dimensional?

Reply via email to