Hi Chetan, Great questions. These could go into an FAQ on the Wiki page for this new bit of the theory. I'll give you my read on some answers:
On Sat, Jan 25, 2014 at 3:21 AM, Chetan Surpur <[email protected]> wrote: > 1. *What problems did the previous design of the temporal pooler have? > Would this one solve those problems?* > There was no previous design for Temporal Pooling in the theory. There is a bit of the algorithm called Temporal Pooler (TP) which is misnamed, it should be called Sequence Recognition or Sequence Prediction (Jeff uses the term Sequence Memory so the choice of corrected name is unclear). Temporal Pooling and Sequence Prediction are two different mechanisms in the CLA. I'll describe the Sequence Prediction first, and Temporal Pooling after that, so you can see them side by side. In Layer 3 (where the current CLA and NuPIC live), Sequence Recognition or Prediction happens when currently active cells signal to the distal dendrites of other cells in the same layer (let's assume this for simplicity). If enough such signals are received by a cell, the dendrite will spike and raise the potential (partially depolarise) the cell body of that cell. We say that the cell has become (partially) predictive. Since such a cell has already raised its potential, it is more likely to fire if it now receives a strong feedforward signal. The matrix of synapse permanences between cells in a layer is learned by experience of certain sequences happening repeatedly (there are other causes, but again we'll simplify). If the layer sees a sequence starting with E, that will excite a certain set of columns which come to represent an E. Assuming the layer is initially set up randomly (simplification), there will be connections from E cells to many other cells, and some of these will become predictive. When the layer sees a V next, the cells which just happened to become predictive due to E AND are in columns excited by V, will become active now. Those cells will also strengthen their synapses coming from the E cells. Thus the sequence fragment EV is learned (a little better). But now the V cells signal too (to many others), and some of those will become predictive of the next letter. Of those cells, the ones predicting an I (which turns up next) will again become active. Now the sequence EVI has been seen, and the connections which predict each step of that have been strengthened slightly. This continues for as long as the sequence does. It is possible that no cells are predicting the right connections when an input is seen (because no connections "just happen" to be above the permanence threshold for that transition right now, or because this is a new sequence). In that case all the cells in the active columns, let's say for L, will fire. This has two effects: first, all the outgoing connections from all the L cells will signal, and again whichever of those are predictive and get the right input (an S) will continue the learning of the sequence; secondly, all the L cells which had some predictive signals from I will strengthen those synapses, and thus the sequence EVIL will have been learned (or at least learned a little better). This process happens again and again and the connection matrix gradually learns the actual sequences which are seen. There are a few more things about this but this is the mechanism of learning variable order sequences. Note that the activation of cells is "threaded" through all the connections in the layer, passing from one cell for each input to the next in the sequence. The threads branch as the experienced sequences do (EVIL and EVOLVE have the first two cells in their threads in common, but after that the threads diverge). This is what the "TP" does. The sequence is learned through successive prediction and confirmation of single steps - the current input confirms the previous prediction, and in turn predicts the next input. In NuPIC we could record all the cell activations, and thus we could say that a certain sequence of cells indicates a certain sequence of inputs. Alternatively, we could attach a data structure to each cell which recorded the "sequence so far" and use that to help us identify the sequence it represents. The brain cannot do such things (the neurons only react to current, local input;, the "memory" is intrinsic in the synaptic connections), so we wish to find the mechanism somewhere which identifies the "this sequence" and represents that to the brain. This mechanism is the real Temporal Pooler. It is what represents words when we see letters, what represents a face when we see a nose, an eye, a spot, and a nose again. Most of us thought the Temporal Pooler was done *in* a higher region (a "word" region) which is observing the current "letter" region. Jeff is now saying that a simpler, more local mechanism may be doing this first in the "letter" region. He posits this as the function of Layer 4, which he says is getting extra information about the context of the input (namely a recent or planned motor movement). Forgive me Jeff for butchering this, but I'll simplify again and say that Layer 4 receives feedforward signals for the letter V (it gets the "input" feedforward as Layer 3 above) in EVIL, and it also gets the motor signal "one letter to the right". This will cause Layer 4 to SP a slightly different pattern (some close ties in activation will go the other way due to the motor inputs), and so Layer 4 will learn a different prediction pattern for the combined sequence [E>][V>][I>][L>]. In particular, let's say we don't just read forward all the time, but in fact shuffle backwards a letter or two, or skip a letter as we try to identify one particular word. Then we'll get a set of permutations of the letters EVIL, mostly in order, but sometimes with repetitions and omissions. Layer 3 will be confused by this, because it doesn't have any context for the transitions and just sees the individual steps. So Layer 3 treats EVIL and EVEVIL as different sequences (each presentation of the word is spaced apart in time). Layer 4 however, sees [E>][V>][I>][L>] and [E>][V>][E<][V>][I>][L>], which is a different experience. It can now learn to predict the "missteps" or "misspellings" in the presentation of the letters, because it has privileged access to the motor command which led to the transition. Now, crucially, this requires that there are cycles in the sequence memory in Layer 4, in other words, the thread for [E>V>I>L>] has loops which have learned to predict [E<] when [V>] or [V<] have been seen (as well as the more likely [I>]). If we also assume that a cell continues to fire for an extra timestep or two, then we will have simultaneous activation of most of the cells for the various E's, V's, I's, and L's in Layer 4. Layer 4 will now have a pattern of activation which continues to represent the word EVIL regardless of the exact order of presentation, and this representation is also stable from shortly after the beginning of the sequence until shortly after it is past (when a new stable representation appears for the next sequence). Thus a single region can learn both letters (and simple sequences) and words (even if a little jumbled) and present either or both as "output" to higher regions, which "see" one of those representations. In addition, the combination of the stable EVIL in Layer 4 and the V->I in Layer 3, as well as the input [V>] to the region, can be used (by Layer 5 if I remember, Jeff) to order [>] to the eye motors and predict [I>]. I vaguely remember guessing that we have a "cameraman" function and a "viewing" function which together perform saccades. This combination of Layers 3-5, using something along these lines, could achieve this. > > 2. If I understand correctly, the purpose of the temporal pooler is to > have a single neuron or set of neurons in a higher level active throughout > a lower-level sequence. Whenever a predicted activation at the lower level > occurs, it "excites" the higher level neuron to make it stay active for > longer. > > *How long does this "excitement" last – into the very next activation, or > the next few?* > > Also, *wouldn't a neuron representing a lower-level sequence stay active > for some time even after the sequence is over, since it's "excited" by the > last activation in the sequence*? In fact, this should be able to be > experimentally verified, just see if neurons are active for a little longer > than the sequences they represent. Is there any evidence for this? > > This also means that the neuron representing a lower-level sequence won't > become active until the sequence is recognized and predicted by the lower > level. In other words, a neuron representing the word CALIFORNIA won't > necessarily become active until after the lower level sees the letters CAL > (for example). *Is that expected?* > Yes to most of this, except Jeff is now saying the "higher" level is actually Layer 4 (technically below Layer 3) in the same region. And the continued excitation is actually caused by loops in the thread of connections for a particular sequence, in combination with a different, longer lasting synaptic mechanism. This will take a step or two while the sequence is identified, and will last until a better sequence (the next one) ramps up its loopy thread of activity and dominates. > > 3. *Robustness to noise comes with longer and longer activations of > higher level neurons...right? * > > So let's say a neuron representing CALIFORNIA has so far seen CALIF, and > "excited" by the correct prediction of the F (and therefore prepared to be > active longer). Even if it sees an X next instead of a O, it'll remain > active, because so far the predictions have been correct. If it keeps > seeing misspellings, its "excitement" will run out, and it will stop being > active. But if the sequence resumes correctly after this misspelling, the > word neuron will stay active and be active throughout the (slightly) > misspelled word. > > Is this what would happen under this new design? This assumes that the > "excitement" can last beyond just the next activation, if the predictions > have been very accurate so far. Is that the case? This comes back to > question #2. > Yes, there are (at least two) mechanisms at work here. Firstly, the Layer 4 cells, seeing [F>], will predict [O>] strongly and also [I<] (and possibly [F-], ie the same letter with no movement). Layer 5 will decide to send [>] and predict [O>] (it's learned to produce a representation of the motor signal and predict the resulting letter), or it may say [<] and predict [I<] (it's stochastic after all). As long as these layers have learned the sequence well (the one out in the world, not the sequences of presentation), and as long as things go to plan (ie no-one has flipped the flash-card to show "TEXAS"), then this cycle of looping around the sequence will continue to cause the right representations. The second component is the longer lasting excitation, using slower-acting neurotransmitters. Note that in Layer 4, the C's, A's, L's and I's and so on are all active, and they're all sending a reverberating, self-sustaining cycle of throbs or signal throughout their network of mutual connections. The activation patterns will pulse preferentially in sympathy with the most common path through the network, but the layer will average out at showing the word CALIFORNIA. Layer 5 will want to use this pattern and the activity of Layer 3 (now showing F), along with the last motor input [>}, and it will use all of this to decide whether to go right or left next (perhaps using the "confidence" of Layer 4 cells to assist in boosting Layer 4's flagging spirit). > > 4. It would be easier to understand and visualize the entire process with > a toy example. *Could you describe the learning and inference involving > SP/TP for a simple example, like reading the word HORSE over and over > again?* It would be greatly appreciated :) > See above. > > 5. Why do regions need to be split into layers? Only if each layer got > different information than the region as a whole did, right? For instance, > if layer 4 of each region got motor information more directly from lower > regions than layer 3. In that case, *how are the layers connected to each > other; what information is each one getting from other layers and regions* > ? > Regions are split into layers because they want to perform a number of (complementary) computations using combinations of inputs, feedback, past and issues motor commands, and so on. The exact "program" executed by a region depends on the data it is sent to handle and the outputs (signals up, motor and feedback down) it is called upon to learn to produce. No doubt much of the "setup" of the "program" is done genetically, but there is also a mechanism whereby a region can "reprogram" itself given the right connectivity and exposure to the right data. The basic components are as follows: Layer 3 gets the least input, just the facts (the direct sensory input from senses, or the direct value sent by a lower region). Layer 4 gets the same, plus the associated motor (or feedback) signal (plus Layer 3 output, which predicts the next data input). Layer 5 gets data input, motor signal, and activity in Layers 3 and 4. There are further input sources (which mainly come in over Layer 1) and output pathways, but let's ignore them for now. > 6. *How is necessity for *ordered* sequences reduced when given a copy of > the motor command?* > > You mentioned this happens in layer 4. I'm not sure I fully understand, > although I'm guessing that a copy of the motor command will allow for all > combinations of motor=>inference sequences to be temporally pooled into a > stable representation of the underlying object being perceived. So the > necessity for order is implicitly being reduced in that way. Not sure if I > explained that well though, I can try to rephrase if necessary. > This is explained above, as the predictions [O>] and [I<] result from seeing CALIF. > > Much appreciated, > Chetan > Thanks again for the great questions Chetan, hope that clarifies. Regards, Fergal Byrne -- Fergal Byrne, Brenter IT <http://www.examsupport.ie>http://inbits.com - Better Living through Thoughtful Technology e:[email protected] t:+353 83 4214179 Formerly of Adnet [email protected] http://www.adnet.ie
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
