I believe I have fixed the bug. Please allow me to also design a test
to ensure this.

In short, the problem was that when doing the column swapping for
efficient memory output, the algorithm would get fooled if a low-order
column would take the place of an already-chosen high-order column and
get chosen again.

Here is the output I observe from Alejandro's script after the fix. I
hope that the first values can be attributed to chance.

Comparing scikit.learn version of OMP versus naive implementation
n: 256, m: 30, s: 10 -> p_skit = 0.24 p_naive = 0.28
n: 256, m: 50, s: 10 -> p_skit = 0.92 p_naive = 0.97
n: 256, m: 70, s: 10 -> p_skit = 1.00 p_naive = 0.99
n: 256, m: 90, s: 10 -> p_skit = 1.00 p_naive = 0.99
n: 256, m: 110, s: 10 -> p_skit = 1.00 p_naive = 1.00
n: 256, m: 130, s: 10 -> p_skit = 1.00 p_naive = 0.99
n: 256, m: 150, s: 10 -> p_skit = 1.00 p_naive = 0.99
n: 256, m: 170, s: 10 -> p_skit = 1.00 p_naive = 0.99
n: 256, m: 190, s: 10 -> p_skit = 1.00 p_naive = 0.99


This was a very very subtle bug! Enormous thanks to Alejandro and Bob.
If you would like to play with the fixed code for now, it is on the
"omp_bug" branch on my github. I will merge to master after writing a
test for exactly this case, in case of future refactoring.

Best,
Vlad

On Wed, Oct 19, 2011 at 9:25 PM, Vlad Niculae <[email protected]> wrote:
> Interesting. I've been staring at the code but the algorithm itself
> shouldn't be losing precision. On the other hand, there are those
> "stopping conditions" that I had taken from the C implementation of
> the author of the Cholesky-OMP paper. If it's as you say, it could be
> that when it fails, OMP stops one iteration early, but it should raise
> a warning when doing so. Allow me to run the debugger on the gist.
>
> Vlad
>
> On Wed, Oct 19, 2011 at 6:54 PM, Alejandro Weinstein
> <[email protected]> wrote:
>> On Tue, Oct 18, 2011 at 7:49 PM, Alexandre Gramfort
>> <[email protected]> wrote:
>>> could you open an issue with a small test script with one X and y that
>>> produce a different result using both implementations?
>>
>> Here it is:
>>
>> https://github.com/scikit-learn/scikit-learn/issues/403
>>
>> I notice that when it fails, the support of the orthogonal_mp
>> solutions is "almost" correct. For example, in the issue, the support
>> is missing one index. It is like if orthogonal_mp is exiting the loop
>> one iteration too soon.
>>
>> Alejandro.
>>
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure contains a
>> definitive record of customers, application performance, security
>> threats, fraudulent activity and more. Splunk takes this data and makes
>> sense of it. Business sense. IT sense. Common sense.
>> http://p.sf.net/sfu/splunk-d2d-oct
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>

------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Ciosco Self-Assessment and learn 
about Cisco certifications, training, and career opportunities. 
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to