It seems that my implementation strays from the naive implementation quite a lot. p_diff is the proportion of time they were different
n: 256, m: 30, s: 10 -> p_diff = 0.81 n: 256, m: 50, s: 10 -> p_diff = 0.32 n: 256, m: 70, s: 10 -> p_diff = 0.16 n: 256, m: 90, s: 10 -> p_diff = 0.17 n: 256, m: 110, s: 10 -> p_diff = 0.18 n: 256, m: 130, s: 10 -> p_diff = 0.19 n: 256, m: 150, s: 10 -> p_diff = 0.17 n: 256, m: 170, s: 10 -> p_diff = 0.17 n: 256, m: 190, s: 10 -> p_diff = 0.18 However given that the tests pass (so the behaviour is deterministic) I assume this is triggered by something in the data. I am still stumped. Vlad On Tue, Oct 18, 2011 at 10:15 AM, Vlad Niculae <[email protected]> wrote: > Thank you for the observation. I have been looking into this since > yesterday where the same thing has been reported on my blog by Bob L. > Strum. At the moment I have no idea what the cause is. Does it behave > in the same way if you use the gram solver instead? > > Best, > Vlad > > On Tue, Oct 18, 2011 at 5:16 AM, Alejandro Weinstein > <[email protected]> wrote: >> Hi: >> >> I am observing a behavior of the scikit.learn implementation of OMP >> (sklearn.linear_model.orthogonal_mp) that I don't understand. I am >> performing the following experiment: >> >> - Generate a dictionary D (input data) with i.i.d. gaussian entries >> (with the column norm normalized to one) with dimension m by n. >> - Generate an s-sparse signal x with uniformly random support, and iid >> standard gaussian amplitudes over the support. >> - Generate a vector y = D * x >> - Recover x from D and y using OMP (sklearn.linear_model.orthogonal_mp). >> >> For a given set of parameters, I repeat the steps above a number of >> times (say 500 times), and then I compute the "probability of >> recovery" as the ratio between the number of successful recoveries and >> the number of trials (I declare a successful recovery when the norm of >> the estimate minus the signal is smaller than 1e-6). >> >> As the numbers of rows of D (n_samples in the algorithm) increases, I >> expect that the probability of recovery increases until it gets to a >> value close to 1. However, I observe that the probability doesn't goes >> beyond 0.85. >> >> I wrote a "naive" implementation of OMP (no matrix decomposition, >> solving the least-squares problem using the psudo-inverse) for >> comparison. This implementation behaves as I expecte. These are the >> results for n=256, s=10 and m in range(30, 200, 20): >> >> n: 256, m: 30, s: 10 -> p_skit = 0.23 p_naive = 0.27 >> n: 256, m: 50, s: 10 -> p_skit = 0.70 p_naive = 0.97 >> n: 256, m: 70, s: 10 -> p_skit = 0.83 p_naive = 1.00 >> n: 256, m: 90, s: 10 -> p_skit = 0.82 p_naive = 0.99 >> n: 256, m: 110, s: 10 -> p_skit = 0.83 p_naive = 0.99 >> n: 256, m: 130, s: 10 -> p_skit = 0.83 p_naive = 1.00 >> n: 256, m: 150, s: 10 -> p_skit = 0.83 p_naive = 1.00 >> n: 256, m: 170, s: 10 -> p_skit = 0.83 p_naive = 0.99 >> n: 256, m: 190, s: 10 -> p_skit = 0.82 p_naive = 1.00 >> >> where p_skit is the probability of recovery for orthogonal_mp and >> p_naive is the probability of recovery of my naive implementation. >> >> The code is available here: http://pastebin.com/799sent0 >> >> Am I doing something wrong? >> >> Alejandro. >> >> ------------------------------------------------------------------------------ >> All the data continuously generated in your IT infrastructure contains a >> definitive record of customers, application performance, security >> threats, fraudulent activity and more. Splunk takes this data and makes >> sense of it. Business sense. IT sense. Common sense. >> http://p.sf.net/sfu/splunk-d2d-oct >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
