Thank you for the observation. I have been looking into this since yesterday where the same thing has been reported on my blog by Bob L. Strum. At the moment I have no idea what the cause is. Does it behave in the same way if you use the gram solver instead?
Best, Vlad On Tue, Oct 18, 2011 at 5:16 AM, Alejandro Weinstein <[email protected]> wrote: > Hi: > > I am observing a behavior of the scikit.learn implementation of OMP > (sklearn.linear_model.orthogonal_mp) that I don't understand. I am > performing the following experiment: > > - Generate a dictionary D (input data) with i.i.d. gaussian entries > (with the column norm normalized to one) with dimension m by n. > - Generate an s-sparse signal x with uniformly random support, and iid > standard gaussian amplitudes over the support. > - Generate a vector y = D * x > - Recover x from D and y using OMP (sklearn.linear_model.orthogonal_mp). > > For a given set of parameters, I repeat the steps above a number of > times (say 500 times), and then I compute the "probability of > recovery" as the ratio between the number of successful recoveries and > the number of trials (I declare a successful recovery when the norm of > the estimate minus the signal is smaller than 1e-6). > > As the numbers of rows of D (n_samples in the algorithm) increases, I > expect that the probability of recovery increases until it gets to a > value close to 1. However, I observe that the probability doesn't goes > beyond 0.85. > > I wrote a "naive" implementation of OMP (no matrix decomposition, > solving the least-squares problem using the psudo-inverse) for > comparison. This implementation behaves as I expecte. These are the > results for n=256, s=10 and m in range(30, 200, 20): > > n: 256, m: 30, s: 10 -> p_skit = 0.23 p_naive = 0.27 > n: 256, m: 50, s: 10 -> p_skit = 0.70 p_naive = 0.97 > n: 256, m: 70, s: 10 -> p_skit = 0.83 p_naive = 1.00 > n: 256, m: 90, s: 10 -> p_skit = 0.82 p_naive = 0.99 > n: 256, m: 110, s: 10 -> p_skit = 0.83 p_naive = 0.99 > n: 256, m: 130, s: 10 -> p_skit = 0.83 p_naive = 1.00 > n: 256, m: 150, s: 10 -> p_skit = 0.83 p_naive = 1.00 > n: 256, m: 170, s: 10 -> p_skit = 0.83 p_naive = 0.99 > n: 256, m: 190, s: 10 -> p_skit = 0.82 p_naive = 1.00 > > where p_skit is the probability of recovery for orthogonal_mp and > p_naive is the probability of recovery of my naive implementation. > > The code is available here: http://pastebin.com/799sent0 > > Am I doing something wrong? > > Alejandro. > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
