2011/2/25 Gael Varoquaux <[email protected]>: > On Fri, Feb 25, 2011 at 10:36:42AM +0100, Fred wrote: >> I have a big array (44 GB) I want to decimate. > >> But this array has a lot of NaN (only 1/3 has value, in fact, so 2/3 of >> NaN). > >> If I "basically" decimate it (a la NumPy, ie data[::nx, ::ny, ::nz], for >> instance), the decimated array will also have a lot of NaN. > >> What I would like to have in one cell of the decimated array is the >> nearest (for instance) value in the big array. This is what I call a >> "condensated array". > > What exactly do you mean by 'decimating'. To me is seems that you are > looking for matrix factorization or matrix completion techniques, which > are trendy topics in machine learning currently. > > They however are a bit challenging, and I fear that you will have read > the papers and do some implementation, unless you have a clear > application in mind that enables for simple tricks to solve it.
Indeed the following paper by G. Martinsson from there is also a section on matrix summarization: http://arxiv.org/abs/0909.4061 http://www.stanford.edu/group/mmds/slides2010/Martinsson.pdf The scikit-learn randomized SVD implementation is coming this paper. It's pretty useful in practice. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
