Re: [nupic-discuss] HTM Classifier Discussion

John Blackburn Wed, 20 Aug 2014 02:28:44 -0700

I definitely think the whitepaper should be updated to reflect this major
change. In fact, AFAIK the latest version of the whitepaper is dated 2011
so I'm sure there are other changes to be included...




On Tue, Aug 19, 2014 at 1:52 PM, David Ray <[email protected]>
wrote:

> Should we open an issue to fix that portion of the white paper?
>
> Sent from my iPhone
>
> On Aug 19, 2014, at 7:37 AM, Fergal Byrne <[email protected]>
> wrote:
>
> Hi John,
>
> Good spot. That was an error in the white paper, and nothing ever came
> from attempts to implement it. It's been superseded by the new theory
> involving sensorimotor memory, in which inter-region feedforward
> communication is composed of an SDR of active neurons in L3.
>
> Regards,
>
> Fergal Byrne
>
>
> On Tue, Aug 19, 2014 at 1:15 PM, John Blackburn <
> [email protected]> wrote:
>
>> Fergal, Thanks for your replies. You have certainly made things clearer
>> to me.
>>
>> However, I think the whitepaper (v0.2.1, Sep 2011) says that both
>> predictive and active cells are passed to the next region:
>>
>> p25: "The output of a region is the activity of all cells in the region,
>> including the cells active because of feed-forward input and the cells
>> active in the predictive state. As mentioned earlier, predictions"
>>
>> p31: "Note that only cells that are active due to feed-forward input
>> propagate activity within the region, otherwise predictions would lead to
>> further predictions. But all the active cells (feed-forward and predictive)
>> form the output of a region and propagate to the next region in the
>> hierarchy."
>>
>> John.
>>
>>
>> On Tue, Aug 19, 2014 at 1:01 PM, Fergal Byrne <
>> [email protected]> wrote:
>>
>>> Hi Nick,
>>>
>>> Only active states are ever transmitted.
>>>
>>> There are several reasons for this. CLA is a computational model for
>>> neocortex, so it must conform with neuroscience at certain levels of
>>> detail. In particular, inter-neuron communication over a certain distance
>>> must only be (in neocortex) by the propagation over axons of action
>>> potentials by firing neurons, or (in CLA), by a neuron being active in the
>>> current timestep. Any other information is invisible.
>>>
>>> Much more locally, however, predictive potential does play a role. In
>>> forming an SDR in CLA, we enforce sparseness by choosing the n% (usually
>>> 2%) highest "potentials" among cells (among columns in NuPIC), based on
>>> their response to the input.  We call this "inhibition". In the neocortex,
>>> what actually happens is that each cell is depolarised at a different rate
>>> depending on synaptic inputs. The cells with the highest rates reach their
>>> firing potential first and fire, triggering a wave of inhibition to spread
>>> outwards and drastically reduce their neighbours' rates of depolarisation.
>>>
>>> In NuPIC, potential due to feedforward alone is used in SP to choose the
>>> columns, and then potential due to lateral or predictive inputs is used to
>>> choose the active cell within each column. In the neocortex, and in a more
>>> faithful CLA implementation, predictive depolarisation is combined with
>>> feedforward depolarisation to choose individual cells.
>>>
>>> Regards,
>>>
>>> Fergal Byrne
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Aug 19, 2014 at 11:48 AM, Nicholas Mitri <[email protected]>
>>> wrote:
>>>
>>>> Thanks Fergal,
>>>>
>>>> I’d like to iterate John’s question though. Are the predictive and
>>>> active states passed on to the next region as ‘1’ or do we follow the same
>>>> paradigm in assuming all relevant information is encoded in the active bits
>>>> and only propagate those states upward while ignoring predictive states?
>>>>
>>>> Nick
>>>>
>>>>
>>>> On Aug 19, 2014, at 1:40 PM, Fergal Byrne <[email protected]>
>>>> wrote:
>>>>
>>>> Hi John,
>>>>
>>>> The classifier is extra-cortical - it's a piece of engineering added to
>>>> efficiently extract useful predictions. To explain how it works, let's use
>>>> a concrete example of predicting energy use 10 steps ahead in the hotgym
>>>> use case.
>>>>
>>>> Firstly, at the outset you tell NuPIC you want to predict a certain
>>>> field a certain number of steps ahead (you can do multiple predictions but
>>>> these are just copies of the same process). The classifier sets up a
>>>> virtual histogram for every cell, which will store the 10-step predictions
>>>> of energy use for that cell. For every input seen, the classifier looks at
>>>> the active cells from 10 steps in the past and updates their histograms
>>>> with the current value of energy use.
>>>>
>>>> To extract a prediction for 10 steps in the future, look at all the
>>>> active cells' histograms, and combine their predictions.
>>>>
>>>> The reason this (often, usually) works is that the pattern of currently
>>>> active cells (not just columns) identifies the current input in the current
>>>> learned sequence. This very sparse representation statistically implies a
>>>> very limited set of future outcomes, and the layer's collective beliefs,
>>>> derived from combining the histograms, form a good estimate of the future
>>>> of the data.
>>>>
>>>> The pattern of predictive cells in CLA is a prediction of the next
>>>> SDR(s) one timestep ahead. It could also be used for prediction if you're
>>>> only interested in exactly one step ahead, but it would have to be
>>>> "decoded" to reconstruct the next input for one field; the histogram
>>>> already has its data in the input domain, so it's easier and cheaper just
>>>> to use the histograms.
>>>>
>>>> The predictive pattern is, however, crucial in identifying which cells
>>>> to activate in the next timestep, which then become the sparse set of
>>>> active cells from which we derive the 10-step prediction, so predictive
>>>> states are key to NuPIC's predictive power.
>>>>
>>>> Regards,
>>>>
>>>> Fergal Byrne
>>>>
>>>>
>>>>
>>>> On Tue, Aug 19, 2014 at 11:14 AM, John Blackburn <
>>>> [email protected]> wrote:
>>>>
>>>>> I've been following this discussion with interest. One question, you
>>>>> say only active cells are considered in the classifier but my 
>>>>> understanding
>>>>> is the input to the next region is the union of active and predictive
>>>>> cells. That is, if the cell is active or predictive, the next region in 
>>>>> the
>>>>> hierarchy gets a 1. If it is inactive it gets a 0. Thus, the next region
>>>>> cannot distinguish between active and predictive cells. Is that still the
>>>>> case? If so, why does the classifier not take the same approach?
>>>>>
>>>>> Many thanks for your advice,
>>>>>
>>>>> John Blackburn
>>>>>
>>>>>
>>>>> On Tue, Aug 19, 2014 at 8:42 AM, Nicholas Mitri <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Great! Thank Subutai. Much appreciated.
>>>>>>
>>>>>>
>>>>>> On Aug 19, 2014, at 3:32 AM, Subutai Ahmad <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Nick,
>>>>>>
>>>>>> I believe your understanding is exactly right. If we are predicting
>>>>>> 10 steps into the future, the classifier has to keep a rolling buffer of
>>>>>> the last 10 sets of active bits. The classifier sort-of outputs the
>>>>>> conditional probability of each bucket given the current activation. I 
>>>>>> say
>>>>>> "sort-of" because there's a rolling average in there, so it's really a
>>>>>> "recent conditional probability".  This is how the OPF outputs
>>>>>> probabilities for each set of predictions.
>>>>>>
>>>>>> I believe the implementation stores the indices only for the
>>>>>> historical buffer.   The C++ code for this is in nupic.core, in
>>>>>> FastClaClassifier.hpp/cpp.
>>>>>>
>>>>>> --Subutai
>>>>>>
>>>>>>
>>>>>> On Sat, Aug 16, 2014 at 6:14 AM, Nicholas Mitri <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Subutai,
>>>>>>>
>>>>>>> So we’re using the predictive state of the cells as a middle step
>>>>>>> (during learning) to encode context into the representation of the input
>>>>>>> pattern using only active bits? But that’s the extent of their practical
>>>>>>> use as far as the CLA classifier is concerned.
>>>>>>>
>>>>>>> I understood the point you made about the fact that context encoded
>>>>>>> into active bits gives us all the information we need for prediction, 
>>>>>>> but
>>>>>>> there’s still one issue I’m having with the operation of the CLA
>>>>>>> classifier.
>>>>>>>
>>>>>>> If we’re only using active bits, then the RADC matrix we’re storing
>>>>>>> should maintain and update a coincidence counter between the current 
>>>>>>> bucket
>>>>>>> and the active bits from a previous time step during its leaning phase. 
>>>>>>> In
>>>>>>> that way, when the classifier is in inference mode, the likelihood 
>>>>>>> becomes
>>>>>>> the conditional probability of a future bucket given current 
>>>>>>> activation. In
>>>>>>> other words, the classifier learning phase creates a relation between 
>>>>>>> past
>>>>>>> info (active output of TP at time = t - x) and current input value 
>>>>>>> (bucket
>>>>>>> index at time t) so that during inference we can use current information
>>>>>>> (at time = t) to predict future values (at time = t + x). (The document
>>>>>>> attached isn’t very clear on that point).
>>>>>>>
>>>>>>> If that’s the case, then the active state of the region should be
>>>>>>> stored for future use. Is any of that accurate? and if so, would we be
>>>>>>> storing the state of every cell or only the index of the active ones?
>>>>>>>
>>>>>>> best,
>>>>>>> Nick
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Aug 15, 2014, at 9:18 PM, Subutai Ahmad <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Nick,
>>>>>>>
>>>>>>> That’s a great question, and one we worked through as well. The
>>>>>>> classifier does really only use the active bits.  If you think about it,
>>>>>>> the active bits include all the available information about the high 
>>>>>>> order
>>>>>>> sequence. It includes the full dynamic context and all future 
>>>>>>> predictions
>>>>>>> about this sequence can be derived from the active bits.
>>>>>>>
>>>>>>> For example, suppose you've learned different melodies and start
>>>>>>> listening to a song. Once the first few notes are played, there could be
>>>>>>> many different musical pieces that start the same way. The active state
>>>>>>> includes all possible melodies that start with these notes.
>>>>>>>
>>>>>>> Once you are in the middle of the melody and it’s now unambiguous,
>>>>>>> the active state at any point is unique to that melody as well as the
>>>>>>> position within that melody. If you are a musician, you could actually 
>>>>>>> stop
>>>>>>> listening, take over and play the rest of the song. Similarly, a 
>>>>>>> classifier
>>>>>>> can take that state as input and predict the sequence of all those notes
>>>>>>> into the future with 100% accuracy.  This is a very cool property. It 
>>>>>>> is a
>>>>>>> result of the capacity inherent in sparse representations and critical 
>>>>>>> to
>>>>>>> representing high order sequences.
>>>>>>>
>>>>>>> As such, the classifier only needs the active state to predict the
>>>>>>> next N steps.
>>>>>>>
>>>>>>> So what is the predictive state? The predictive state is in fact
>>>>>>> just a function of the active bits and the current set of segments. It
>>>>>>> doesn’t add new information. However it has other uses. The predictive
>>>>>>> state is used in the Temporal Memory to update the set of active bits 
>>>>>>> given
>>>>>>> new sensory information. This helps fine tune the active state as you 
>>>>>>> get
>>>>>>> new information. It also helps the system refine learning as new 
>>>>>>> (possibly
>>>>>>> unpredicted) information comes in.
>>>>>>>
>>>>>>> —Subutai
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Aug 15, 2014 at 7:40 AM, Nicholas Mitri <[email protected]
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hi Subutai,
>>>>>>>>
>>>>>>>> Again, thanks for forwarding the document. It was really helpful.
>>>>>>>>
>>>>>>>> I have a quick question before I delve deeper into the classifier.
>>>>>>>> The document mentions that the classifier makes use of the ‘active’
>>>>>>>> bits of the temporal pooler. Are we grouping active and predictive bits
>>>>>>>> under the label ‘active' here?
>>>>>>>>
>>>>>>>> If the predictive bits are not mapped into actual values by the
>>>>>>>> classifier, then what module is performing that task when I query for 
>>>>>>>> the
>>>>>>>> predicted field value at any time step?
>>>>>>>>
>>>>>>>> If they are, what process is used to decouple multiple simultaneous
>>>>>>>> predictions and map each to its corresponding value to compare it 
>>>>>>>> against a
>>>>>>>> value after X time steps? Is it as simple as looking at the normalized 
>>>>>>>> RADC
>>>>>>>> table and picking the top 3 buckets with the highest likelihoods, 
>>>>>>>> mapping
>>>>>>>> them into their actual values, then attaching the likelihood to the
>>>>>>>> prediction as a confidence measure?
>>>>>>>>
>>>>>>>> There are clearly some major holes in my understanding of the
>>>>>>>> algorithms at play, I’d appreciate the clarifications :).
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>> Nick
>>>>>>>>
>>>>>>>> On Aug 13, 2014, at 8:39 PM, Subutai Ahmad <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Nick,
>>>>>>>>
>>>>>>>> Nice diagram!  In addition to the video David sent, we have a NuPIC
>>>>>>>> issue to create this document:
>>>>>>>>
>>>>>>>> https://github.com/numenta/nupic/issues/578
>>>>>>>>
>>>>>>>> I found some old documentation in our archives. Scott is planning
>>>>>>>> to update the wiki with this information. I have also attached it here 
>>>>>>>> for
>>>>>>>> reference (but warning, it may be a bit outdated!)
>>>>>>>>
>>>>>>>> --Subutai
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Aug 13, 2014 at 9:03 AM, cogmission1 . <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi Nicholas,
>>>>>>>>>
>>>>>>>>> This is the only source with any depth I have seen. Have you seen
>>>>>>>>> this?
>>>>>>>>>
>>>>>>>>> https://www.youtube.com/watch?v=z6r3ekreRzY
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Aug 13, 2014 at 10:46 AM, Nicholas Mitri <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hey all,
>>>>>>>>>>
>>>>>>>>>> Based on my understanding of the material in the wiki, the CLA
>>>>>>>>>> algorithms can be depicted by the figure below.
>>>>>>>>>> There’s plenty of info about SP and TP in both theory and
>>>>>>>>>> implementation details.
>>>>>>>>>> I can’t seem to find much information about the classifier
>>>>>>>>>> though.
>>>>>>>>>> If I’ve understood correctly, this is not a classifier in the
>>>>>>>>>> Machine Learning sense of the word but rather a mechanism to 
>>>>>>>>>> translate TP
>>>>>>>>>> output into values of the same data type as the input for comparison
>>>>>>>>>> purposes.
>>>>>>>>>>
>>>>>>>>>> I’d really appreciate some more involved explanation of the
>>>>>>>>>> process in terms of what data is stored step to step and how the
>>>>>>>>>> look-up/mapping mechanics are implemented.
>>>>>>>>>>
>>>>>>>>>> best,
>>>>>>>>>> Nick
>>>>>>>>>>
>>>>>>>>>> <Screen Shot 2013-12-02 at 4.00.01 PM.png>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> nupic mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> nupic mailing list
>>>>>>>>> [email protected]
>>>>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>> <multistep_prediction.docx>
>>>>>>>> _______________________________________________
>>>>>>>>
>>>>>>>> nupic mailing list
>>>>>>>> [email protected]
>>>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> nupic mailing list
>>>>>>>> [email protected]
>>>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> nupic mailing list
>>>>>>> [email protected]
>>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> nupic mailing list
>>>>>>> [email protected]
>>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> nupic mailing list
>>>>>> [email protected]
>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> nupic mailing list
>>>>>> [email protected]
>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> nupic mailing list
>>>>> [email protected]
>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Fergal Byrne, Brenter IT
>>>>
>>>> Author, Real Machine Intelligence with Clortex and NuPIC
>>>> https://leanpub.com/realsmartmachines
>>>>
>>>> Speaking on Clortex and HTM/CLA at euroClojure Krakow, June 2014:
>>>> http://euroclojure.com/2014/
>>>> and at LambdaJam Chicago, July 2014: http://www.lambdajam.com
>>>>
>>>> http://inbits.com - Better Living through Thoughtful Technology
>>>> http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne
>>>>
>>>> e:[email protected] t:+353 83 4214179
>>>> Join the quest for Machine Intelligence at http://numenta.org
>>>> Formerly of Adnet [email protected] http://www.adnet.ie
>>>>  _______________________________________________
>>>> nupic mailing list
>>>> [email protected]
>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> nupic mailing list
>>>> [email protected]
>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Fergal Byrne, Brenter IT
>>>
>>> Author, Real Machine Intelligence with Clortex and NuPIC
>>> https://leanpub.com/realsmartmachines
>>>
>>> Speaking on Clortex and HTM/CLA at euroClojure Krakow, June 2014:
>>> http://euroclojure.com/2014/
>>> and at LambdaJam Chicago, July 2014: http://www.lambdajam.com
>>>
>>> http://inbits.com - Better Living through Thoughtful Technology
>>> http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne
>>>
>>> e:[email protected] t:+353 83 4214179
>>> Join the quest for Machine Intelligence at http://numenta.org
>>> Formerly of Adnet [email protected] http://www.adnet.ie
>>>
>>> _______________________________________________
>>> nupic mailing list
>>> [email protected]
>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>
>>>
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>
>
> --
>
> Fergal Byrne, Brenter IT
>
> Author, Real Machine Intelligence with Clortex and NuPIC
> https://leanpub.com/realsmartmachines
>
> Speaking on Clortex and HTM/CLA at euroClojure Krakow, June 2014:
> http://euroclojure.com/2014/
> and at LambdaJam Chicago, July 2014: http://www.lambdajam.com
>
> http://inbits.com - Better Living through Thoughtful Technology
> http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne
>
> e:[email protected] t:+353 83 4214179
> Join the quest for Machine Intelligence at http://numenta.org
> Formerly of Adnet [email protected] http://www.adnet.ie
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-discuss] HTM Classifier Discussion

Reply via email to