The attached graph shows the results for 80/20, 60/40, 40/60, and
20/80 splits of the data set into a training data set and a testing
data set.  I ran 40/60 and 20/80 because the KNN classifier showed the
same performance for 80/20 and 60/40 so I wanted to see when its
accuracy would start to fall off which it does, but just a little.
The KNN classifier by itself outperforms the SP + KNN combination on
all splits.  My inclination is to work on improving performance on
this task rather than adding noise to the images because it seems like
a good algorithm should do this task perfectly.

On Wed, Aug 27, 2014 at 1:27 AM, Jim Bridgewater <[email protected]> wrote:
> The code is available at github.com/baroobob/nupic.vision
>
> Today I made data sets for 80/20 and 60/40 splits between training and
> testing, I'll post the results here when I have them.
>
> On Thu, Aug 21, 2014 at 8:47 AM, Subutai Ahmad <[email protected]> wrote:
>>
>> Hi Jim,
>>
>> Thanks for doing this study. I echo Pedro's comments and questions - we need
>> more investigations like this.
>>
>> In terms of future experiments, it would be great to see a result with a
>> typical 80-20 split, i.e. 80% of the data for training and 20% for testing.
>> Ideally it would be nice to see performance with random added noise or other
>> distortions.  Comparing against straight KNN for all of this is a good
>> standard thing to do (one of my pet peeves in machine learning is that
>> people don't do this enough. The KNN is provably almost optimal for large
>> data sets - see the original Duda and Hart book!).
>>
>> My general expectation here is that the SP needs one or more thousand images
>> to start doing a decent job. It probably wouldn't do hugely better or worse
>> than KNN, though it should do better once you start adding noise, dropouts,
>> etc.  The SP won't learn too many invariances.   In general I don't expect
>> it do much better than standard spatial techniques on purely spatial tasks.
>> The main job of the SP is to create a decent SDR for temporal memory and
>> temporal pooling.
>>
>> Parameter tuning is admittedly difficult here. The specific numbers matter
>> and you have to develop some intuitions. Unfortunately you can't directly
>> apply our swarm - that is currently tied to the OPF and temporal datasets.
>> It would be nice to improve that code to make it more general - it could
>> help a lot in these kinds of investigations.
>>
>> Overall I am glad you started this project! Thank you for keeping us
>> informed throughout, and taking it all the way to a good writeup. Is your
>> code available somewhere in case others want to try additional experiments?
>>
>> --Subutai
>>
>>
>>
>> On Tue, Aug 19, 2014 at 6:52 PM, Jim Bridgewater <[email protected]> wrote:
>>>
>>> Hi Pedro,
>>>
>>> Thank you for the feedback.
>>>
>>> 1.  Previously I have run it using a randomly initialized SP with
>>> learning disabled and got results comparable to those with learning
>>> turned on which emphasizes that the spatial pooler as configured is
>>> not generalizing particularly well.  I never tried sending the bit
>>> vectors directly to the classifier, but since you recommended it I
>>> made a pass through version of the SP which simply copies the input to
>>> the output and this produces better results than those using the real
>>> SP (as I have it configured)!
>>>
>>> 2. I haven't and in terms of the images it's actually 100% or 0%, but
>>> in terms of the characters the images represent (ground truth) it's
>>> always 100% which was my rational for using the small training data
>>> set since there are only 62 characters in both data sets (0-9, A-Z,
>>> a-z).  I have run a small case where I train on 62 images (normal
>>> font) and test on 124 (normal and bold fonts) and I get around 80%
>>> accuracy which seems a bit low for what amounts to a pretty simple
>>> generalization task.
>>>
>>> 3. I am aware of MNIST, but I wanted to focus more on machine printed
>>> characters for document recognition.  That coupled with the fact that
>>> when I was looking for data sets I did not find a place where MNIST
>>> was freely available was enough to keep me from using it.
>>>
>>> 4. I started with the parameters in Ian's sp_viewer demo, ran a few
>>> simple parameter searches to get a feel for how increment and
>>> decrement values affected the SP, and got some advice from Subutai on
>>> the mailing list.  These parameters are probably not optimal.
>>>
>>>
>>> How well do you guess an optimized SP can do on tasks like these?
>>>
>>> On Mon, Aug 18, 2014 at 8:45 PM, Pedro Tabacof <[email protected]> wrote:
>>> > Hello Jim,
>>> >
>>> > Thank you for your work and report, we need more investigations like
>>> > yours.
>>> > A few suggestions:
>>> >
>>> > Since you're using a KNN classifier, it'd be nice to use it directly on
>>> > the
>>> > pixels as a baseline. It's an important benchmark to show that NuPIC
>>> > indeed
>>> > is doing the heavy work.
>>> > Have you tried a more balanced division between training and testing
>>> > sets?
>>> > Using 100% or 1% of the data to train seems a bit to extreme to me.
>>> > Did you look at the MNIST dataset? It's probably the most widely used
>>> > benchmark for computer vision. It's gonna be computationally demanding
>>> > (50-60K images), but we will have results that can be compared to other
>>> > machine learning approaches.
>>> > Did you use swarming or grid search to find out the best
>>> > meta-parameters?
>>> >
>>> > A long time ago I used the previous NuPIC implementation for static
>>> > classification (just the spatial pooler) and it was competitive with
>>> > SVMs.
>>> >
>>> > Pedro.
>>> >
>>> >
>>> > On Tue, Aug 19, 2014 at 12:24 AM, Jim Bridgewater <[email protected]>
>>> > wrote:
>>> >>
>>> >> Hi everyone,
>>> >>
>>> >> I've written up a summary of the work I did this summer as part of
>>> >> Season of NuPIC that includes the most recent results.  This summary
>>> >> is attached along with a separate file that contains 8,928 images from
>>> >> 144 fonts.  These images were used to test the spatial pooler.  The
>>> >> gist of it is that the SP does very well (>97% accuracy) when you
>>> >> train it on all of the images you test it on which is good, but very
>>> >> time consuming and doesn't require any ability to generalize.  When I
>>> >> trained the SP on a much smaller data set of 186 images containing
>>> >> normal, bold, and italic characters not included in the larger data
>>> >> set the accuracy fell to about 32%.  There are several ways to improve
>>> >> this.  One is reducing the potential radius so columns learn features
>>> >> rather than entire characters.  I tried this, but there appears to be
>>> >> a bug in the SP's potential mapping that currently prevents this
>>> >> technique from helping.  Another way is to try different potential
>>> >> mappings, like lines with different orientations, again in an effort
>>> >> to get the SP's columns to learn features rather than entire
>>> >> characters.  I've written a mapping for this but have not tried it.
>>> >> And yet another way to improve these results would be to add
>>> >> additional SP regions in an effort to get more generalization.
>>> >>
>>> >> I look forward to hearing your comments!
>>> >>
>>> >> --
>>> >> James Bridgewater, PhD
>>> >> Arizona State University
>>> >> 480-227-9592
>>> >>
>>> >> _______________________________________________
>>> >> nupic mailing list
>>> >> [email protected]
>>> >> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Pedro Tabacof
>>> >
>>> > _______________________________________________
>>> > nupic mailing list
>>> > [email protected]
>>> > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>> >
>>>
>>>
>>>
>>> --
>>> James Bridgewater, PhD
>>> Arizona State University
>>> 480-227-9592
>>>
>>> _______________________________________________
>>> nupic mailing list
>>> [email protected]
>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>
>
>
> --
> James Bridgewater, PhD
> Arizona State University
> 480-227-9592



-- 
James Bridgewater, PhD
Arizona State University
480-227-9592

Attachment: accuracy.pdf
Description: Adobe PDF document

Reply via email to