[julia-users] Re: Large HDF5 matrix multiplication

2015-09-28 Thread Jim Garrison
Thank you; both replies have been incredibly helpful.

I was under the mistaken impression that all large HDF5 matrices must be 
stored in chunks, but it seems that it is not necessary and is not even the 
default for JLD.

I also realized shortly after sending my message that my problem is 
equivalent to multiplying two matrices A * B, where B fits in memory but A 
must remain out of core.  (The columns of B are the vectors v[i] I was 
attempting to multiply before.)

I've done a bit of benchmarking, and so far mmap seems to have better 
performance than loading each column manually.  However, my matrix is in 
general complex, and `ismmappable` returns `false` for an array of complex 
numbers.  Is the inability to mmap an inherent limitation of storing 
complex numbers as complex datasets, or is it something that would be 
straightforward to solve?

Another thing: At times, I also would like to multiply the Hermitian 
conjugate of A times B (i.e. Ac_mul_B behavior).  I'm running some 
benchmarks now to see how mmap fares on this slightly modified problem (for 
a real matrix).

I also plan to try again with Blosc compression to see if this speeds up 
the reads from disk any.  (Of course, in this case mmap will not longer be 
possible, even for a real matrix.)

Thanks again!

On Friday, September 25, 2015 at 2:30:17 PM UTC-7, Jim Garrison wrote:
>
> I have a very large, dense matrix (too large to fit in RAM) "A" saved in 
> an HDF5/JLD file.  I also have several vectors of the same dimension 
> v[1]..v[n], which *are* able to fit in memory, and I would like to know 
> the result of multiplying each vector by my matrix A. 
>
> My current idea is to do this manually -- simply loading each row or 
> column from the matrix dataset, one at a time, and implementing 
> matrix-vector multiplication myself.  However, it is possible that my 
> large matrix is already stored is some sort of "chunked"/block form in 
> the HDF5 file, and it would be nice to choose my blocks accordingly so 
> the calculation happens as efficiently as possible.  Is there a way to 
> load all blocks of an HDF5 dataset in the most efficient way? 
>
> In terms of the bigger picture, I've also considered that it might be 
> nice to implement in JLD all general matrix-matrix and matrix-vector 
> operations for JldDataset (which would throw an error when the 
> dimensions of the objects do not match, or when the data types do not 
> make sense for multiplication).  But I could also see this being an 
> unwelcome "feature," as it would be quite easy to accidentally call this 
> even when it is possible to load the matrix into memory.  (Also, it 
> would not be the most efficient way to handle my problem, as it would 
> involve loading the dataset n times, once for each vector v[i].) 
>
>

[julia-users] Re: Large HDF5 matrix multiplication

2015-09-25 Thread Steven G. Johnson
What you are asking for is called an "out of core" matrix-multiplication 
algorithm.  If you google for it, you'll find several papers on this.  The 
key (as is also true for in-memory matrix operations), is to maximize 
memory locality.

However, there doesn't seem to be much free code available for this 
problem.  My suspicion is that it is not so popular because:

1) If your problem doesn't fit in memory, these days the first choice is 
usually to go to a distributed-memory cluster rather than hitting the disk.
2) The cubic scaling of matrix-matrix products means that you can't 
increase the size too much anyway.  If you give me 1000x more processing 
power, I can handle a 10x bigger matrix rank (assuming dense matrices).

PS. HDF5 lets you specify how an array is "chunked" for storage, and lets 
you read back subsets of an array at a time.  All of this functionality is 
exposed in HDF5.jl, I believe.  Possibly it would be more efficient to 
memory-map the array and let the OS deal with paging it in and out of 
memory.  Note that Julia matrices are stored in column-major format 
(contiguous columns stored one after another), so you'll want to access the 
matrix column-by-column in your matrix-vector products.