Re: [Scikit-learn-general] OMP behavior

Vlad Niculae Tue, 18 Oct 2011 01:14:03 -0700

It seems that my implementation strays from the naive implementation
quite a lot.
p_diff is the proportion of time they were different


n: 256, m: 30, s: 10 -> p_diff = 0.81
n: 256, m: 50, s: 10 -> p_diff = 0.32
n: 256, m: 70, s: 10 -> p_diff = 0.16
n: 256, m: 90, s: 10 -> p_diff = 0.17
n: 256, m: 110, s: 10 -> p_diff = 0.18
n: 256, m: 130, s: 10 -> p_diff = 0.19
n: 256, m: 150, s: 10 -> p_diff = 0.17
n: 256, m: 170, s: 10 -> p_diff = 0.17
n: 256, m: 190, s: 10 -> p_diff = 0.18

However given that the tests pass (so the behaviour is deterministic)
I assume this is triggered by something in the data. I am still
stumped.

Vlad

On Tue, Oct 18, 2011 at 10:15 AM, Vlad Niculae <[email protected]> wrote:
> Thank you for the observation. I have been looking into this since
> yesterday where the same thing has been reported on my blog by Bob L.
> Strum. At the moment I have no idea what the cause is. Does it behave
> in the same way if you use the gram solver instead?
>
> Best,
> Vlad
>
> On Tue, Oct 18, 2011 at 5:16 AM, Alejandro Weinstein
> <[email protected]> wrote:
>> Hi:
>>
>> I am observing a behavior of the scikit.learn implementation of OMP
>> (sklearn.linear_model.orthogonal_mp) that I don't understand. I am
>> performing the following experiment:
>>
>> - Generate a dictionary D (input data) with i.i.d. gaussian entries
>> (with the column norm normalized to one) with dimension m by n.
>> - Generate an s-sparse signal x with uniformly random support, and iid
>> standard gaussian amplitudes over the support.
>> -  Generate a vector y = D * x
>> - Recover x from D and y using OMP (sklearn.linear_model.orthogonal_mp).
>>
>> For a given set of parameters, I repeat the steps above a number of
>> times (say 500 times), and then I compute the "probability of
>> recovery" as the ratio between the number of successful recoveries and
>> the number of trials (I declare a successful recovery when the norm of
>> the estimate minus the signal is smaller than 1e-6).
>>
>> As the numbers of rows of D (n_samples in the algorithm) increases, I
>> expect that the probability of recovery increases until it gets to a
>> value close to 1. However, I observe that the probability doesn't goes
>> beyond 0.85.
>>
>> I wrote a "naive" implementation of OMP (no matrix decomposition,
>> solving the least-squares problem using the psudo-inverse) for
>> comparison. This implementation behaves as I expecte. These are the
>> results for n=256, s=10 and m in range(30, 200, 20):
>>
>> n: 256, m: 30, s: 10 -> p_skit = 0.23 p_naive = 0.27
>> n: 256, m: 50, s: 10 -> p_skit = 0.70 p_naive = 0.97
>> n: 256, m: 70, s: 10 -> p_skit = 0.83 p_naive = 1.00
>> n: 256, m: 90, s: 10 -> p_skit = 0.82 p_naive = 0.99
>> n: 256, m: 110, s: 10 -> p_skit = 0.83 p_naive = 0.99
>> n: 256, m: 130, s: 10 -> p_skit = 0.83 p_naive = 1.00
>> n: 256, m: 150, s: 10 -> p_skit = 0.83 p_naive = 1.00
>> n: 256, m: 170, s: 10 -> p_skit = 0.83 p_naive = 0.99
>> n: 256, m: 190, s: 10 -> p_skit = 0.82 p_naive = 1.00
>>
>> where p_skit is the probability of recovery for orthogonal_mp and
>> p_naive is the probability of recovery of my naive implementation.
>>
>> The code is available here: http://pastebin.com/799sent0
>>
>> Am I doing something wrong?
>>
>> Alejandro.
>>
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure contains a
>> definitive record of customers, application performance, security
>> threats, fraudulent activity and more. Splunk takes this data and makes
>> sense of it. Business sense. IT sense. Common sense.
>> http://p.sf.net/sfu/splunk-d2d-oct
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] OMP behavior

Reply via email to