Hi: I am observing a behavior of the scikit.learn implementation of OMP (sklearn.linear_model.orthogonal_mp) that I don't understand. I am performing the following experiment:
- Generate a dictionary D (input data) with i.i.d. gaussian entries (with the column norm normalized to one) with dimension m by n. - Generate an s-sparse signal x with uniformly random support, and iid standard gaussian amplitudes over the support. - Generate a vector y = D * x - Recover x from D and y using OMP (sklearn.linear_model.orthogonal_mp). For a given set of parameters, I repeat the steps above a number of times (say 500 times), and then I compute the "probability of recovery" as the ratio between the number of successful recoveries and the number of trials (I declare a successful recovery when the norm of the estimate minus the signal is smaller than 1e-6). As the numbers of rows of D (n_samples in the algorithm) increases, I expect that the probability of recovery increases until it gets to a value close to 1. However, I observe that the probability doesn't goes beyond 0.85. I wrote a "naive" implementation of OMP (no matrix decomposition, solving the least-squares problem using the psudo-inverse) for comparison. This implementation behaves as I expecte. These are the results for n=256, s=10 and m in range(30, 200, 20): n: 256, m: 30, s: 10 -> p_skit = 0.23 p_naive = 0.27 n: 256, m: 50, s: 10 -> p_skit = 0.70 p_naive = 0.97 n: 256, m: 70, s: 10 -> p_skit = 0.83 p_naive = 1.00 n: 256, m: 90, s: 10 -> p_skit = 0.82 p_naive = 0.99 n: 256, m: 110, s: 10 -> p_skit = 0.83 p_naive = 0.99 n: 256, m: 130, s: 10 -> p_skit = 0.83 p_naive = 1.00 n: 256, m: 150, s: 10 -> p_skit = 0.83 p_naive = 1.00 n: 256, m: 170, s: 10 -> p_skit = 0.83 p_naive = 0.99 n: 256, m: 190, s: 10 -> p_skit = 0.82 p_naive = 1.00 where p_skit is the probability of recovery for orthogonal_mp and p_naive is the probability of recovery of my naive implementation. The code is available here: http://pastebin.com/799sent0 Am I doing something wrong? Alejandro. ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
