Hi Chetan,

Great questions. These could go into an FAQ on the Wiki page for this new
bit of the theory. I'll give you my read on some answers:

On Sat, Jan 25, 2014 at 3:21 AM, Chetan Surpur <[email protected]> wrote:

> 1. *What problems did the previous design of the temporal pooler have?
> Would this one solve those problems?*
>

There was no previous design for Temporal Pooling in the theory. There is a
bit of the algorithm called Temporal Pooler (TP) which is misnamed, it
should be called Sequence Recognition or Sequence Prediction (Jeff uses the
term Sequence Memory so the choice of corrected name is unclear).

Temporal Pooling and Sequence Prediction are two different mechanisms in
the CLA. I'll describe the Sequence Prediction first, and Temporal Pooling
after that, so you can see them side by side.

In Layer 3 (where the current CLA and NuPIC live), Sequence Recognition or
Prediction happens when currently active cells signal to the distal
dendrites of other cells in the same layer (let's assume this for
simplicity). If enough such signals are received by a cell, the dendrite
will spike and raise the potential (partially depolarise) the cell body of
that cell. We say that the cell has become (partially) predictive. Since
such a cell has already raised its potential, it is more likely to fire if
it now receives a strong feedforward signal.

The matrix of synapse permanences between cells in a layer is learned by
experience of certain sequences happening repeatedly (there are other
causes, but again we'll simplify). If the layer sees a sequence starting
with E, that will excite a certain set of columns which come to represent
an E. Assuming the layer is initially set up randomly (simplification),
there will be connections from E cells to many other cells, and some of
these will become predictive. When the layer sees a V next, the cells which
just happened to become predictive due to E AND are in columns excited by
V, will become active now. Those cells will also strengthen their synapses
coming from the E cells. Thus the sequence fragment EV is learned (a little
better).

But now the V cells signal too (to many others), and some of those will
become predictive of the next letter. Of those cells, the ones predicting
an I (which turns up next) will again become active. Now the sequence EVI
has been seen, and the connections which predict each step of that have
been strengthened slightly. This continues for as long as the sequence
does.

It is possible that no cells are predicting the right connections when an
input is seen (because no connections "just happen" to be above the
permanence threshold for that transition right now, or because this is a
new sequence). In that case all the cells in the active columns, let's say
for L, will fire. This has two effects: first, all the outgoing connections
from all the L cells will signal, and again whichever of those are
predictive and get the right input (an S) will continue the learning of the
sequence; secondly, all the L cells which had some predictive signals from
I will strengthen those synapses, and thus the sequence EVIL will have been
learned (or at least learned a little better).

This process happens again and again and the connection matrix gradually
learns the actual sequences which are seen. There are a few more things
about this but this is the mechanism of learning variable order sequences.
Note that the activation of cells is "threaded" through all the connections
in the layer, passing from one cell for each input to the next in the
sequence. The threads branch as the experienced sequences do (EVIL and
EVOLVE have the first two cells in their threads in common, but after that
the threads diverge).

This is what the "TP" does. The sequence is learned through successive
prediction and confirmation of single steps - the current input confirms
the previous prediction, and in turn predicts the next input.

In NuPIC we could record all the cell activations, and thus we could say
that a certain sequence of cells indicates a certain sequence of inputs.
Alternatively, we could attach a data structure to each cell which recorded
the "sequence so far" and use that to help us identify the sequence it
represents.

The brain cannot do such things (the neurons only react to current, local
input;, the "memory" is intrinsic in the synaptic connections), so we wish
to find the mechanism somewhere which identifies the "this sequence" and
represents that to the brain. This mechanism is the real Temporal Pooler.
It is what represents words when we see letters, what represents a face
when we see a nose, an eye, a spot, and a nose again.

Most of us thought the Temporal Pooler was done *in* a higher region (a
"word" region) which is observing the current "letter" region. Jeff is now
saying that a simpler, more local mechanism may be doing this first in the
"letter" region. He posits this as the function of Layer 4, which he says
is getting extra information about the context of the input (namely a
recent or planned motor movement).

Forgive me Jeff for butchering this, but I'll simplify again and say that
Layer 4 receives feedforward signals for the letter V (it gets the "input"
feedforward as Layer 3 above) in EVIL, and it also gets the motor signal
"one letter to the right". This will cause Layer 4 to SP a slightly
different pattern (some close ties in activation will go the other way due
to the motor inputs), and so Layer 4 will learn a different prediction
pattern for the combined sequence [E>][V>][I>][L>].

In particular, let's say we don't just read forward all the time, but in
fact shuffle backwards a letter or two, or skip a letter as we try to
identify one particular word. Then we'll get a set of permutations of the
letters EVIL, mostly in order, but sometimes with repetitions and
omissions. Layer 3 will be confused by this, because it doesn't have any
context for the transitions and just sees the individual steps. So Layer 3
treats EVIL and EVEVIL as different sequences (each presentation of the
word is spaced apart in time).

Layer 4 however, sees [E>][V>][I>][L>] and [E>][V>][E<][V>][I>][L>], which
is a different experience. It can now learn to predict the "missteps" or
"misspellings" in the presentation of the letters, because it has
privileged access to the motor command which led to the transition. Now,
crucially, this requires that there are cycles in the sequence memory in
Layer 4, in other words, the thread for [E>V>I>L>] has loops which have
learned to predict [E<] when [V>] or [V<] have been seen (as well as the
more likely [I>]). If we also assume that a cell continues to fire for an
extra timestep or two, then we will have simultaneous activation of most of
the cells for the various E's, V's, I's, and L's in Layer 4. Layer 4 will
now have a pattern of activation which continues to represent the word EVIL
regardless of the exact order of presentation, and this representation is
also stable from shortly after the beginning of the sequence until shortly
after it is past (when a new stable representation appears for the next
sequence).

Thus a single region can learn both letters (and simple sequences) and
words (even if a little jumbled) and present either or both as "output" to
higher regions, which "see" one of those representations. In addition, the
combination of the stable EVIL in Layer 4 and the V->I in Layer 3, as well
as the input [V>] to the region, can be used (by Layer 5 if I remember,
Jeff) to order [>] to the eye motors and predict [I>].

I vaguely remember guessing that we have a "cameraman" function and a
"viewing" function which together perform saccades. This combination of
Layers 3-5, using something along these lines, could achieve this.


>
> 2. If I understand correctly, the purpose of the temporal pooler is to
> have a single neuron or set of neurons in a higher level active throughout
> a lower-level sequence. Whenever a predicted activation at the lower level
> occurs, it "excites" the higher level neuron to make it stay active for
> longer.
>
> *How long does this "excitement" last – into the very next activation, or
> the next few?*
>
> Also, *wouldn't a neuron representing a lower-level sequence stay active
> for some time even after the sequence is over, since it's "excited" by the
> last activation in the sequence*? In fact, this should be able to be
> experimentally verified, just see if neurons are active for a little longer
> than the sequences they represent. Is there any evidence for this?
>
> This also means that the neuron representing a lower-level sequence won't
> become active until the sequence is recognized and predicted by the lower
> level. In other words, a neuron representing the word CALIFORNIA won't
> necessarily become active until after the lower level sees the letters CAL
> (for example). *Is that expected?*
>

Yes to most of this, except Jeff is now saying the "higher" level is
actually Layer 4 (technically below Layer 3) in the same region. And the
continued excitation is actually caused by loops in the thread of
connections for a particular sequence, in combination with a different,
longer lasting synaptic mechanism. This will take a step or two while the
sequence is identified, and will last until a better sequence (the next
one) ramps up its loopy thread of activity and dominates.


>
> 3. *Robustness to noise comes with longer and longer activations of
> higher level neurons...right? *
>
> So let's say a neuron representing CALIFORNIA has so far seen CALIF, and
> "excited" by the correct prediction of the F (and therefore prepared to be
> active longer). Even if it sees an X next instead of a O, it'll remain
> active, because so far the predictions have been correct. If it keeps
> seeing misspellings, its "excitement" will run out, and it will stop being
> active. But if the sequence resumes correctly after this misspelling, the
> word neuron will stay active and be active throughout the (slightly)
> misspelled word.
>
> Is this what would happen under this new design? This assumes that the
> "excitement" can last beyond just the next activation, if the predictions
> have been very accurate so far. Is that the case? This comes back to
> question #2.
>

Yes, there are (at least two) mechanisms at work here.

Firstly, the Layer 4 cells, seeing [F>], will predict [O>] strongly and
also [I<] (and possibly [F-], ie the same letter with no movement). Layer 5
will decide to send [>] and predict [O>] (it's learned to produce a
representation of the motor signal and predict the resulting letter), or it
may say [<] and predict [I<] (it's stochastic after all). As long as these
layers have learned the sequence well (the one out in the world, not the
sequences of presentation), and as long as things go to plan (ie no-one has
flipped the flash-card to show "TEXAS"), then this cycle of looping around
the sequence will continue to cause the right representations.

The second component is the longer lasting excitation, using slower-acting
neurotransmitters. Note that in Layer 4, the C's, A's, L's and I's and so
on are all active, and they're all sending a reverberating, self-sustaining
cycle of throbs or signal throughout their network of mutual connections.
The activation patterns will pulse preferentially in sympathy with the most
common path through the network, but the layer will average out at showing
the word CALIFORNIA. Layer 5 will want to use this pattern and the activity
of Layer 3 (now showing F), along with the last motor input [>}, and it
will use all of this to decide whether to go right or left next (perhaps
using the "confidence" of Layer 4 cells to assist in boosting Layer 4's
flagging spirit).



>
> 4. It would be easier to understand and visualize the entire process with
> a toy example. *Could you describe the learning and inference involving
> SP/TP for a simple example, like reading the word HORSE over and over
> again?* It would be greatly appreciated :)
>

See above.


>
> 5. Why do regions need to be split into layers? Only if each layer got
> different information than the region as a whole did, right? For instance,
> if layer 4 of each region got motor information more directly from lower
> regions than layer 3. In that case, *how are the layers connected to each
> other; what information is each one getting from other layers and regions*
> ?
>

Regions are split into layers because they want to perform a number of
(complementary) computations using combinations of inputs, feedback, past
and issues motor commands, and so on. The exact "program" executed by a
region depends on the data it is sent to handle and the outputs (signals
up, motor and feedback down) it is called upon to learn to produce. No
doubt much of the "setup" of the "program" is done genetically, but there
is also a mechanism whereby a region can "reprogram" itself given the right
connectivity and exposure to the right data. The basic components are as
follows:

Layer 3 gets the least input, just the facts (the direct sensory input from
senses, or the direct value sent by a lower region).
Layer 4 gets the same, plus the associated motor (or feedback) signal (plus
Layer 3 output, which predicts the next data input).
Layer 5 gets data input, motor signal, and activity in Layers 3 and 4.

There are further input sources (which mainly come in over Layer 1) and
output pathways, but let's ignore them for now.


> 6. *How is necessity for *ordered* sequences reduced when given a copy of
> the motor command?*
>
> You mentioned this happens in layer 4. I'm not sure I fully understand,
> although I'm guessing that a copy of the motor command will allow for all
> combinations of motor=>inference sequences to be temporally pooled into a
> stable representation of the underlying object being perceived. So the
> necessity for order is implicitly being reduced in that way. Not sure if I
> explained that well though, I can try to rephrase if necessary.
>

This is explained above, as the predictions [O>] and [I<] result from
seeing CALIF.

>
> Much appreciated,
> Chetan
>

Thanks again for the great questions Chetan, hope that clarifies.

Regards,

Fergal Byrne

-- 

Fergal Byrne, Brenter IT

<http://www.examsupport.ie>http://inbits.com - Better Living through
Thoughtful Technology

e:[email protected] t:+353 83 4214179
Formerly of Adnet [email protected] http://www.adnet.ie
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to