Hey Ian,

Thank you for the feedback! I really appreciate you taking the time to do this. 

> The SP discovers relevant and repeated features in data, which can be 
> composed through hierarchy to form larger features. In this way it also 
> behaves like an autoencoder. Both ways of looking at it are dimensionality 
> reduction, but feature extraction is emphasized in the latter. 

I’ve been thinking about testing the “auto encoding” capability of the SP as a 
next step, probably compare it to some established (unsupervised) learning 
algorithms like SOM that are known to boast that same capability. Haven’t 
gotten around to formulating the best way to approach it though especially in 
setting up a set of comprehensive experiments.  

> This doesn't match the current implementation. There are two distributions. 
> 1. Synapses chosen to be connected initially are given permanences very close 
> to the permanence threshold and 2. All the other potential synapses are 
> randomly distributed between 0 and the connected threshold.

Unfortunately, I’ve run into these kind of discrepancies a lot. Most of my 
understanding of the topic is based on the white paper and the lack of 
documentation for the implementation has me reviewing that understanding too 
often. I will look into the code to check the exact implementation of the 
initialization process and edit my work accordingly. Thanks for pointing that 
out!

> It's actually fairly easy to modify the SP to fully support scalar inputs. 
> The input is 0 if the synapse is below connected and then passed through if 
> above connected. The overlap score becomes a sum of inputs, and then a 
> normalized version of this sum becomes the output of the column. I've 
> implemented this in the javascript port of the sp and it works just fine.

I’m curious. Have you found any practical reason to use scalars instead? Does 
it make the algorithms fall in line more with traditional ANNs and therefore 
create an opportunity to compare both on equal grounds?

> What do you think about using even fewer columns? Also once boosting kicks 
> in, you'll lose the stability you're using for your convergence metric. The 
> columns will begin to oscillate back and forth as they fight for the chance 
> to become active.
> Also, if you're comparing to clustering algorithms that allow you to specify 
> a maximum number of clusters you can force this behavior in the SP by setting 
> the number of columns = to the number of desired clusters. 

I used as few columns at possible at the 2% sparsity level and I believe I 
turned boosting off (haven’t found reason to utilize it just yet). 
I do like your idea of using only as many as needed for a fixed cluster count. 
It’ll make the comparative work more solid as well since the cluster count will 
be fixed across all evaluated algs. Will definitely look into making that 
change to the doc!

> Where is the variance in these experiments coming from? We usually run the SP 
> with a fixed random seed so I'm surprised there is any variance between runs.

I was under the impression that the initialization of permanence value was not 
repeatable (fixed seed). I thought the variance was caused by that. Will have 
to look into it. 

Again, thanks a bunch for your comments. I’ll certainly take them into 
consideration for my next re-write!

best,
Nick


On Apr 21, 2014, at 12:23 AM, Ian Danforth <[email protected]> wrote:

> Nick,
> 
>  This is a great write up and a very neat series of experiments. Thank you 
> very much for taking the time to do this work and write it up! Below you'll 
> find some feedback, I hope it is useful.
> 
> "The role of the spatial pooler (SP) in CLA is analogous to that of a 
> clustering algorithm."
> 
> This is very true! However there is another way to view the role of the SP 
> which you may consider mentioning, if just for completeness. The SP discovers 
> relevant and repeated features in data, which can be composed through 
> hierarchy to form larger features. In this way it also behaves like an 
> autoencoder. Both ways of looking at it are dimensionality reduction, but 
> feature extraction is emphasized in the latter. Of course in the SP 
> reconstruction error is substituted with an activity penalty, i.e. those 
> columns with high overlap trend toward even better overlap of a specific, 
> discovered feature, and those with poor overlap are left idle, until a near 
> feature appears or one of the boosting mechanisms is put in play. I don't 
> mention this to undermine the very valid and interesting point of your paper, 
> but just to suggest another angle you could describe for contrast.
> 
> "Each of these synapses is configured with a permanence value chosen pseudo 
> randomly from a range centered on the threshold for synaptic activation with 
> higher values 
> registered closer to the natural center of a column over the input region."
> 
> This doesn't match the current implementation. There are two distributions. 
> 1. Synapses chosen to be connected initially are given permanences very close 
> to the permanence threshold and 2. All the other potential synapses are 
> randomly distributed between 0 and the connected threshold.
> 
> Having initially connected synapses within 1 decrement step of being 
> disconnected has a very strong impact on learning rates. In many cases a new 
> pattern is learned in one presentation due to this initial distribution.
> 
> Figure 2.
> 
> The diagram should probably mention there is usually some overlap in the 
> receptive fields of columns even if the diagram is more clear depicting them 
> as being non-overlapping.
> 
> "Unlike traditional neural networks, CLA has been designed to process binary 
> input. 
> According to Hawkins, the stochastic nature of biological neurons and their 
> firing behaviour 
> renders any precise continuous weighting of their connections unreliable 
> [REF]."
> 
> This isn't quite right. The biology has a hard time supporting a sigmoid 
> function across the entire range of connection strengths. But it has no 
> problem supporting a rectified linear weight. So once a connection is made it 
> can transmit signals of varying strength, but not very precisely. The fact 
> that binary inputs are used frequently with CLA is 1. For computational 
> efficiency and 2. Because no one has shown jeff a compelling example where 
> scalar inputs are better. NOT because there is a belief that inputs should be 
> binary, or biology doesn't support scalar signals.
> 
> It's actually fairly easy to modify the SP to fully support scalar inputs. 
> The input is 0 if the synapse is below connected and then passed through if 
> above connected. The overlap score becomes a sum of inputs, and then a 
> normalized version of this sum becomes the output of the column. I've 
> implemented this in the javascript port of the sp and it works just fine.
> 
> 
> Figure 4.
> 
> There is some confusion over the term "steps" and "passes." It looks like the 
> figure should say "Number of passes" rather than steps?
> 
> "With 64 columns and a forced output sparsity of 2%, only a single column is 
> allowed to represent 
> an input; as opposed to 20 columns with the 1024 column region."
> 
> What do you think about using even fewer columns? Also once boosting kicks 
> in, you'll lose the stability you're using for your convergence metric. The 
> columns will begin to oscillate back and forth as they fight for the chance 
> to become active.
> 
> "we set up a pair scalar encoders of size 128"
> 
> typo: "set up a pair OF scalar"
> 
> Figures 15/16
> 
> Where is the variance in these experiments coming from? We usually run the SP 
> with a fixed random seed so I'm surprised there is any variance between runs.
> 
> Final thoughts:
> 
>  Really really cool! I hope you continue with these experiments and develop 
> graphs relating the various free parameters to performance.
> 
>  Also, if you're comparing to clustering algorithms that allow you to specify 
> a maximum number of clusters you can force this behavior in the SP by setting 
> the number of columns = to the number of desired clusters. (Yes, only 5 
> columns). This is especially true if you're using columns that have 100% 
> potential pool (a receptive field that covers the whole input), and global 
> inhibition. It's only when you have some kind of topology and partial/no 
> overlap of receptive fields that you ever need more than one column per 
> cluster.
> 
> -Ian
> 
> 
> 
> On Thu, Apr 17, 2014 at 7:41 AM, Nicholas Mitri <[email protected]> wrote:
> Hello all.
> 
> I just completed a rough draft (and by rough I mean rough!) of a document on 
> evaluating the spatial pooler as a clustering algorithm.
> I’m attaching the document here for your thoughts or in case anyone is 
> interested.
> Please excuse any inaccuracies or typos. I’ll refine it on the second pass 
> before considering adding it to my thesis.
> 
> best,
> Nick
> 
> 
> 
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
> 
> 
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to