Let's continue this discussion at https://github.com/JuliaLang/julia/issues/6708
-viral On Thursday, May 1, 2014 2:22:42 PM UTC+5:30, Viral Shah wrote: > > It could be because of memory usage. I have 1 TB RAM on the machine I was > doing. If you were running into swap, it would certainly take much longer. > > I will try the other version as soon as the machine is available for me to > use (some admin issues), and also look into speeding things up if possible. > If the generation of the data is in your control, you can just generate it > pre-sorted or in CSC format. I just need to check if we can shortcut > pre-sorted data and generate the sparse matrix quickly. > > -viral > > > > On 01-May-2014, at 12:59 am, Ryan Gardner <rwgard...@gmail.com> wrote: > > > Hmmm. That is much better than I was getting. Thanks Viral. > > > > Was it much faster for you to create the column-index, row-index, and > value arrays? I would still expect them to be roughly on par in terms of > speed. > > > > > > On Wed, Apr 30, 2014 at 2:36 PM, Viral Shah <vi...@mayin.org> wrote: > > I ran the sprand example, and it took 290 seconds on a machine with > enough RAM. Given that it is creating a matrix with half a billion > nonzeros, this doesn’t sound too bad. > > > > -viral > > > > > > > > On 30-Apr-2014, at 8:48 pm, Ryan Gardner <rwgard...@gmail.com> wrote: > > > > > I've got 16GB of RAM on this machine. Largely, my question, with > admittedly little knowledge of the internal structure of the sparse arrays, > is why generating the actual SparseMatrixCSC is so much slower than > generating what is essentially another sparse matrix representation > consisting of the indices and values. (I realize that once we start > swapping, which will happen in my example, things slow down a ton, but even > the sprand I mention was slow.) Do you observe the same results? Is the > reason for the difference clear to someone else? > > > > > > Thanks for all the comments. These are helpful. It had not crossed > my mind that I could control the data type of the indices. > > > > > > Using the SparseMatrixCSC constructor directly would probably be very > helpful. Do you learn about that constructor from looking at source code > or do you see it somewhere else? > > > > > > I'm also curious about where @inbounds was used. > > > > > > > > > > > > > > > > > > > > > On Wed, Apr 30, 2014 at 8:59 AM, Tony Kelman <t...@kelman.net> wrote: > > > If you're assembling the matrix in row-sorted column-major order and > there's no duplication, then you can also skip the conversion work by using > the SparseMatrixCSC constructor directly. > > > > > > > > > On Wednesday, April 30, 2014 1:10:31 AM UTC-7, Viral Shah wrote: > > > Could you post your code? Will avoid me writing the same. :-) > > > > > > Was building the vectors taking all the time, or was it in building > the sparse matrix from the triples? Triples to CSC conversion is an > expensive operation, and we have spent a fair amount of time making it > fast. Of course, there could be more opportunities at speeding the code. > > > > > > Where did you use @inbounds and @simd? > > > > > > -viral > > > > > > > > > > > > On 30-Apr-2014, at 1:11 pm, Dominique Orban <dominiq...@gmail.com> > wrote: > > > > > > > Downgrading the 700,000 to 70,000 for the sake of not waiting all > night, the original implementation takes about 4.3 seconds on my laptop. > Preallocating arrays and using @inbounds brings it down to about 0.6 > seconds. @simd doesn't seem to provide any further speedup. Building the > sparse matrix takes about 3.8 seconds. This may be due to conversion from > triple to csc format?! > > > > > > > > ps: using the original size of 700,000, Julia reports a memory usage > of 11.8GB. > > > > > > > > > > > > On Wednesday, April 30, 2014 12:26:02 AM UTC-7, Viral Shah wrote: > > > > I believe the memory requirement should be 700000*700*16 (64-bit > nonzeros and row indices) + 700001*8 (64-bit column pointers) = 7.8 GB. > > > > > > > > This can be brought down a bit by using 32-bit index values and > 64-bit floats, but then you need 5.8 GB. Finally, if you use 32-bit index > values with 32-bit floats, you can come down to 4GB. The Julia sparse > matrix implementation is quite flexible and allows you to easily do such > things. > > > > > > > > > > > > julia> s = sparse(int32(1:10), int32(1:10), 1.0); > > > > > > > > julia> typeof(s) > > > > SparseMatrixCSC{Float64,Int32} (constructor with 1 method) > > > > > > > > julia> s = sparse(int32(1:10), int32(1:10), float32(1.0)); > > > > > > > > julia> typeof(s) > > > > SparseMatrixCSC{Float32,Int32} (constructor with 1 method) > > > > > > > > > > > > -viral > > > > > > > > On Wednesday, April 30, 2014 12:36:17 PM UTC+5:30, Ivar Nesje wrote: > > > > Sorry for pointing out a probably obvious problem, but as there are > others that might try debug this issue on their laptop, I ask how much > memory do you have? 700000*700 floats + indexes, will spend a minimum of 11 > GB (if my math is correct) and possibly more if the asymptotic storage > requirement is more than 2 Int64 + 1 Float64 per stored value. > > > > > > > > Ivar > > > > > > > > kl. 01:46:22 UTC+2 onsdag 30. april 2014 skrev Ryan Gardner > følgende: > > > > Creating sparse arrays seems exceptionally slow. > > > > > > > > I can set up the non-zero data of the array relatively quickly. For > example, the following code takes about 80 seconds on one machine. > > > > > > > > > > > > vec_len = 700000 > > > > > > > > > > > > row_ind = Uint64[] > > > > col_ind = Uint64[] > > > > value = Float64[] > > > > > > > > > > > > for j = 1:700000 > > > > for k = 1:700 > > > > ind = k*50 > > > > push!(row_ind, ind) > > > > push!(col_ind, j) > > > > push!(value, 5.0) > > > > end > > > > end > > > > > > > > > > > > but then > > > > > > > > a = sparse(row_ind, col_ind, value, 700000, 700000) > > > > > > > > > > > > takes more than at least about 30 minutes. (I never let it finish.) > > > > > > > > It doesn't seem like the numbers I'm using should be that far off > the scale. Is there a more efficient way I should be doing what I'm doing? > Am I missing something and asking for something that really is > impractical? > > > > > > > > If not, I may be able to look into the sparse matrix code a little > this weekend. > > > > > > > > > > > > The never-finishing result is the same if I try > > > > > > > > sprand(700000, 700000, .001) > > > > > > > > or if I try to set 700000*700 values in a sparse matrix of zeros > directly. Thanks. > > > > > > > > > > > > > > > > > > > >