Thank you for the observation. I have been looking into this since
yesterday where the same thing has been reported on my blog by Bob L.
Strum. At the moment I have no idea what the cause is. Does it behave
in the same way if you use the gram solver instead?

Best,
Vlad

On Tue, Oct 18, 2011 at 5:16 AM, Alejandro Weinstein
<[email protected]> wrote:
> Hi:
>
> I am observing a behavior of the scikit.learn implementation of OMP
> (sklearn.linear_model.orthogonal_mp) that I don't understand. I am
> performing the following experiment:
>
> - Generate a dictionary D (input data) with i.i.d. gaussian entries
> (with the column norm normalized to one) with dimension m by n.
> - Generate an s-sparse signal x with uniformly random support, and iid
> standard gaussian amplitudes over the support.
> -  Generate a vector y = D * x
> - Recover x from D and y using OMP (sklearn.linear_model.orthogonal_mp).
>
> For a given set of parameters, I repeat the steps above a number of
> times (say 500 times), and then I compute the "probability of
> recovery" as the ratio between the number of successful recoveries and
> the number of trials (I declare a successful recovery when the norm of
> the estimate minus the signal is smaller than 1e-6).
>
> As the numbers of rows of D (n_samples in the algorithm) increases, I
> expect that the probability of recovery increases until it gets to a
> value close to 1. However, I observe that the probability doesn't goes
> beyond 0.85.
>
> I wrote a "naive" implementation of OMP (no matrix decomposition,
> solving the least-squares problem using the psudo-inverse) for
> comparison. This implementation behaves as I expecte. These are the
> results for n=256, s=10 and m in range(30, 200, 20):
>
> n: 256, m: 30, s: 10 -> p_skit = 0.23 p_naive = 0.27
> n: 256, m: 50, s: 10 -> p_skit = 0.70 p_naive = 0.97
> n: 256, m: 70, s: 10 -> p_skit = 0.83 p_naive = 1.00
> n: 256, m: 90, s: 10 -> p_skit = 0.82 p_naive = 0.99
> n: 256, m: 110, s: 10 -> p_skit = 0.83 p_naive = 0.99
> n: 256, m: 130, s: 10 -> p_skit = 0.83 p_naive = 1.00
> n: 256, m: 150, s: 10 -> p_skit = 0.83 p_naive = 1.00
> n: 256, m: 170, s: 10 -> p_skit = 0.83 p_naive = 0.99
> n: 256, m: 190, s: 10 -> p_skit = 0.82 p_naive = 1.00
>
> where p_skit is the probability of recovery for orthogonal_mp and
> p_naive is the probability of recovery of my naive implementation.
>
> The code is available here: http://pastebin.com/799sent0
>
> Am I doing something wrong?
>
> Alejandro.
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2d-oct
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to