On Tuesday, June 26, 2012, Charles R Harris wrote: > > > On Tue, Jun 26, 2012 at 3:42 PM, Matthew Brett <matthew.br...@gmail.com>wrote: > > Hi, > > On Mon, Jun 18, 2012 at 3:50 PM, Matthew Brett <matthew.br...@gmail.com> > wrote: > > Hi, > > > > On Sun, Jun 17, 2012 at 7:22 PM, Charles R Harris > > <charlesr.har...@gmail.com> wrote: > >> > >> > >> On Sat, Jun 16, 2012 at 2:33 PM, Matthew Brett <matthew.br...@gmail.com > > > >> wrote: > >>> > >>> Hi, > >>> > >>> On Sat, Jun 16, 2012 at 8:03 PM, Matthew Brett < > matthew.br...@gmail.com> > >>> wrote: > >>> > Hi, > >>> > > >>> > On Sat, Jun 16, 2012 at 10:40 AM, Nathaniel Smith <n...@pobox.com> > wrote: > >>> >> On Fri, Jun 15, 2012 at 4:10 AM, Charles R Harris > >>> >> <charlesr.har...@gmail.com> wrote: > >>> >>> > >>> >>> > >>> >>> On Thu, Jun 14, 2012 at 8:06 PM, Matthew Brett > >>> >>> <matthew.br...@gmail.com> > >>> >>> wrote: > >>> >>>> > >>> >>>> Hi, > >>> >>>> > >>> >>>> I noticed that numpy.linalg.matrix_rank sometimes gives full rank > for > >>> >>>> matrices that are numerically rank deficient: > >>> >>>> > >>> >>>> If I repeatedly make random matrices, then set the first column > to be > >>> >>>> equal to the sum of the second and third columns: > >>> >>>> > >>> >>>> def make_deficient(): > >>> >>>> X = np.random.normal(size=(40, 10)) > >>> >>>> deficient_X = X.copy() > >>> >>>> deficient_X[:, 0] = deficient_X[:, 1] + deficient_X[:, 2] > >>> >>>> return deficient_X > >>> >>>> > >>> >>>> then the current numpy.linalg.matrix_rank algorithm returns full > rank > >>> >>>> (10) in about 8 percent of cases (see appended script). > >>> >>>> > >>> >>>> I think this is a tolerance problem. The ``matrix_rank`` > algorithm > >>> >>>> does this by default: > >>> >>>> > >>> >>>> S = spl.svd(M, compute_uv=False) > >>> >>>> tol = S.max() * np.finfo(S.dtype).eps > >>> >>>> return np.sum(S > tol) > >>> >>>> > >>> >>>> I guess we'd we want the lowest tolerance that nearly always or > >>> >>>> always > >>> >>>> identifies numerically rank deficient matrices. I suppose one > way of > >>> >>>> looking at whether the tolerance is in the right range is to > compare > >>> >>>> the calculated tolerance (``tol``) to the minimum singular value > >>> >>>> (``S.min()``) because S.min() in our case should be very small and > >>> >>>> indicate the rank deficiency. The mean value of tol / S.min() for > the > >>> >>>> current algorithm, across many iterations, is about 2.8. We might > >>> >>>> hope this value would be higher than 1, but not much higher, > >>> >>>> otherwise > >>> >>>> we might be rejecting too many columns. > >>> >>>> > >>> >>>> Our current algorithm for tolerance is the same as the 2-norm of > M * > >>> >>>> eps. We're citing Golub and Van Loan for this, but now I look at > our > >>> >>>> copy (p 261, last para) - they seem to be suggesting using u * |M| > >>> >>>> where u = (p 61, section 2.4.2) eps / 2. (see [1]). I think the > >>> >>>> Golub > > > I'm fine with that, and agree that it is likely to lead to fewer folks > wondering why Matlab and numpy are different. A good explanation in the > function documentation would be useful. > > Chuck > > One potential problem is that it implies that it will always be the same as any version of matlab's tolerance. What if they change it in a future release? How likely are we to even notice?
Cheers! Ben Root
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion