Hi:

I am observing a behavior of the scikit.learn implementation of OMP
(sklearn.linear_model.orthogonal_mp) that I don't understand. I am
performing the following experiment:

- Generate a dictionary D (input data) with i.i.d. gaussian entries
(with the column norm normalized to one) with dimension m by n.
- Generate an s-sparse signal x with uniformly random support, and iid
standard gaussian amplitudes over the support.
-  Generate a vector y = D * x
- Recover x from D and y using OMP (sklearn.linear_model.orthogonal_mp).

For a given set of parameters, I repeat the steps above a number of
times (say 500 times), and then I compute the "probability of
recovery" as the ratio between the number of successful recoveries and
the number of trials (I declare a successful recovery when the norm of
the estimate minus the signal is smaller than 1e-6).

As the numbers of rows of D (n_samples in the algorithm) increases, I
expect that the probability of recovery increases until it gets to a
value close to 1. However, I observe that the probability doesn't goes
beyond 0.85.

I wrote a "naive" implementation of OMP (no matrix decomposition,
solving the least-squares problem using the psudo-inverse) for
comparison. This implementation behaves as I expecte. These are the
results for n=256, s=10 and m in range(30, 200, 20):

n: 256, m: 30, s: 10 -> p_skit = 0.23 p_naive = 0.27
n: 256, m: 50, s: 10 -> p_skit = 0.70 p_naive = 0.97
n: 256, m: 70, s: 10 -> p_skit = 0.83 p_naive = 1.00
n: 256, m: 90, s: 10 -> p_skit = 0.82 p_naive = 0.99
n: 256, m: 110, s: 10 -> p_skit = 0.83 p_naive = 0.99
n: 256, m: 130, s: 10 -> p_skit = 0.83 p_naive = 1.00
n: 256, m: 150, s: 10 -> p_skit = 0.83 p_naive = 1.00
n: 256, m: 170, s: 10 -> p_skit = 0.83 p_naive = 0.99
n: 256, m: 190, s: 10 -> p_skit = 0.82 p_naive = 1.00

where p_skit is the probability of recovery for orthogonal_mp and
p_naive is the probability of recovery of my naive implementation.

The code is available here: http://pastebin.com/799sent0

Am I doing something wrong?

Alejandro.

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to