Re: [Numpy-discussion] svd error checking vs. speed

alex Sat, 15 Feb 2014 15:21:12 -0800

On Sat, Feb 15, 2014 at 6:06 PM, Sebastian Berg
<sebast...@sipsolutions.net> wrote:
> On Sa, 2014-02-15 at 17:35 -0500, josef.p...@gmail.com wrote:
>> On Sat, Feb 15, 2014 at 5:12 PM, Skipper Seabold <jsseab...@gmail.com> wrote:
>> > On Sat, Feb 15, 2014 at 5:08 PM, <josef.p...@gmail.com> wrote:
>> >>
>> >> On Sat, Feb 15, 2014 at 4:56 PM, Sebastian Berg
>> >> <sebast...@sipsolutions.net> wrote:
>> >> > On Sa, 2014-02-15 at 16:37 -0500, alex wrote:
>> >> >> Hello list,
>> >> >>
>> >> >> Here's another idea resurrection from numpy github comments that I've
>> >> >> been advised could be posted here for re-discussion.
>> >> >>
>> >> >> The proposal would be to make np.linalg.svd more like scipy.linalg.svd
>> >> >> with respect to input checking.  The argument against the change is
>> >> >> raw speed; if you know that you will never feed non-finite input to
>> >> >> svd, then np.linalg.svd is a bit faster than scipy.linalg.svd.  An
>> >> >> argument for the change could be to avoid issues reported on github
>> >> >> like crashes, hangs, spurious non-convergence exceptions, etc. from
>> >> >> the undefined behavior of svd of non-finite input.
>> >> >>
>> >> >
>> >> > +1, unless this is a huge speed penalty, correctness (and decent error
>> >> > messages) should come first in my opinion, this is python after all. If
>> >> > this is a noticable speed difference, a kwarg may be an option (but
>> >> > would think about that some more).
>> >>
>> >> maybe -1
>> >>
>> >> statsmodels is using np.linalg.pinv which uses svd
>> >> I never ran heard of any crash (*), and the only time I compared with
>> >> scipy I didn't like the slowdown.
>> >> I didn't do any serious timings just a few examples.
>> >>
>> >> (*) not converged, ...
>> >>
>> >> pinv(x.T).dot(x) -> pinv(x.T, please_don_t_check=True).dot(y)
>> >>
>> >> numbers ?
>> >
>> >
>> > FWIW, I see this spurious SVD did not converge warning very frequently with
>> > ARMA when there is a nan that has creeped in. I usually know where to find
>> > the problem, but I think it'd be nice if this error message was a little
>> > better.
>>
>> maybe I'm +1
>>
>> While we don't see crashes, when I run Alex's example I see 13% cpu
>> usage for a hanging process which looks very familiar to me, I see it
>> reasonably often when I'm debugging code.
>>
>> I never tried to track down where it hangs.
>>
>
> If this should not cause big hangs/crashes (just "not converged" after a
> long time or so), then maybe we should just check afterwards to give the
> user a better idea of where to look for the error. I think I remember
> people running into this and being confused (but without crash/hang).


I'm not sure exactly what you mean by this.  You are suggesting that
if the svd fails with some kind of exception (possibly poorly or
misleadingly worded) then it could be cleaned-up after the fact by
checking the input, and that this would not incur the speed penalty
because no check will be done if the svd succeeds?  This would not
work on my system because that svd call really does hang, as in some
non-ctrl-c-interruptable spin lock inside fortran code or something.
I think the behavior is undefined and it can crash although I do not
personally have an example of this.  These modes of failure cannot be
recovered from as easily as recovering from an exception.

Alex
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] svd error checking vs. speed

Reply via email to