On Tue, 20 Feb 2007, Federico Calboli wrote: > Liaw, Andy wrote: >> I don't see why making copies of the columns you need inside the loop is >> "better" memory management. If the data are in a matrix, accessing >> elements is quite fast. If you're worrying about speed of that, do what >> Charles suggest: work with the transpose so that you are accessing >> elements in the same column in each iteration of the loop. > > As I said, this is pretty academic, I am not looking for how to do something > differetly. > > Having said that, let me present this code: > > for(i in gp){ > new[i,1] = ifelse(srow[i]>0, new[srow[i],zippo[i]], sav[i]) > new[i,2] = ifelse(drow[i]>0, new[drow[i],zappo[i]], sav[i]) > } > > where gp is large vector and srow and drow are the dummy variables for: > > srow = data[,2] > drow = data[,4] > > If instead of the dummy variable I access the array directly (and its' a > 600000 x 6 array) the loop takes 2/3 days --not sure here, I killed it after > 48 hours. > > If I use dummy variables the code runs in 10 minutes-ish. > > Comments?
This is a bit different than your original post (where it appeared that you were manipulating one row of a matrix at a time), but the issue is the same. As suggested in my earlier email this looks like a caching issue, and this is not peculiar to R. Viz. "Most modern CPUs are so fast that for most program workloads the locality of reference of memory accesses, and the efficiency of the caching and memory transfer between different levels of the hierarchy, is the practical limitation on processing speed. As a result, the CPU spends much of its time idling, waiting for memory I/O to complete." (from http://en.wikipedia.org/wiki/Memory_hierarchy) The computation you have is challenging to your cache, and the effect of dropping unused columns of your 'data' object by assiging the columns used to 'srow' and 'drow' has lightened the load. If you do not know why SAXPY and friends are written as they are, a little bit of study will be rewarded by a much better understanding of these issues. I think Golub and Van Loan's 'Matrix Computations' touches on this (but I do not have my copy close to hand to check). > > Best, > > Fede > > -- > Federico C. F. Calboli > Department of Epidemiology and Public Health > Imperial College, St Mary's Campus > Norfolk Place, London W2 1PG > > Tel +44 (0)20 7594 1602 Fax (+44) 020 7594 3193 > > f.calboli [.a.t] imperial.ac.uk > f.calboli [.a.t] gmail.com > Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.