Hi Rik,

That's great!  You figured out the math and also ran simulations to check
it. This is something I've also wanted to do at some point, thank you! Your
reasoning looks correct to me. Perhaps someone else can check it too?

In terms of the CLA, I think you are right. This starts to give some
insight into how many different futures the TP might be able to predict
simultaneously.  So far this is an idealized random scenario.  To make it
more realistic, there are (at least) a couple of complications:

1) You mentioned that matches often look at percent match rather than an
absolute match. This is true but reduces the accuracy. For example, if a
50% match is sufficient, then you will start getting false matches sooner.
It would be interesting to show how accuracy degrades with this percentage.
Is there also a sudden fall off?

2) In practice SDR's are NOT random. Bits contain some semantic meaning.
Similar inputs will have similar representations. This increases the chance
of "false positives" but is also a good thing since you want the system to
generalize. It makes the analysis harder though.  Among other things we
need to consider how the bits are generated through the SP. And what do you
really mean by a false positive? It would be nice if we could work towards
an end-to-end theory around the "capacity" of a CLA.

Regardless of 1 and 2, there is definitely a limit as to how many patterns
a single TP level can predict simultaneously. I believe one of the reasons
the brain needs a hierarchy is to overcome this limit.

--Subutai

P.S. BTW, there is a question around "what is the output of the temporal
pooler?". I intentionally glossed over it in my talk but I'm glad people
started asking about it more at the hackathon.

One possible output is in column space.  Each bottom up input has a
representation that has 40 out of 2048 inputs on. If a TP cell in a column
is ON, that means the TP is predicting that column to be ON. You can take
the OR of the active cells along each column. This results in a 2048 long
vector with a 1 for each column that is predicted.  Since the TP can
predict multiple inputs, this representation contains the OR of all the
predicted bottom-up inputs. (Note: there are two OR's going on here, the OR
of active cells and the OR across predicted inputs.)  This representation
is the easiest to think about and corresponds to your analysis above. It's
the representation Matt used in his CEPT demo to retrieve predicted words.

A second possible output is the set of active TP cells. With our hotgym
parameters, this would be a vector that has 32*2048 dimensions. This
represents the OR of the predicted bottom up inputs with full temporal
context.

A third possible output is the set of predicted cells. With our current
mapping to biology this information is internal to the cell and could not
be transmitted to other cells directly.  But, this contains the "pooling"
information which is helpful to pass up a hierarchy.   We haven't resolved
this issue.

P.P.S. The theory of Bloom
filters<http://en.wikipedia.org/wiki/Bloom_filter>is very related, as
well as Kanerva's work.


On Mon, Nov 4, 2013 at 5:54 AM, Rik <[email protected]> wrote:

> Hi,
>
> Yesterday in the CLA presentation at the hackathon Subutai was asking
> "how many 40-out-of-2048 SDRs can you store in an SDR?" where "storing"
> meaning or'ing them all up (as in Boolean "or") and then comparing the
> combined SDR to a sample SDR. (The combined SDR may actually not be so
> sparse anymore but I stick to calling all bit vectors "SDR" here.) All
> "1" bits of the sample SDR should only be present in the result if the
> sample was part of the set in the first place, otherwise it'd be a false
> positive. A matching random SDR would almost certainly be a false
> positive as any realistic set of input SDRs are a vanishingly small
> subset of all possible SDRs.
>
> Since false positives are inevitable and their probability increases the
> more SDRs you "or" together, the question should really ask for a
> capacity given a desired probability of false positives, or rather a
> desired accuracy which is the complement (1 - x) of false positives
> (there being no false negatives). So how does the accuracy relate to the
> capacity? And what's the capacity for the above 40/2048 assuming a
> desired accuracy of, say, 99%?
>
> The attached Mathematica notebook "sdr_capacity.nb" seeks to work this
> out. First the accuracy is determined as a function of length, density
> and capacity and then solved for capacity.
>
> Starting with the example 40-out-of-2048, the density is 40/2048 = .0195
> meaning 1.95% of bits are "1" and the density of 0's (called "density0")
> is 1 - density = .9804. Because two bits or'ed together are only 0 if
> both bits are 0, the density0 of two SDRs or'ed together is the product
> of the original density0's. Or for the case of, say, 100 SDRs of the
> same density0 or'ed together: density0^100 = .1142. The density (of 1's)
> is then 1 - density0 = .8858 (= 1814 bits for a length of 2048).
>
> To determine the probability of a random SDR to have all its 1-bits
> present in the combined SDR we take its first 1-bit position and check
> for a presence of 1 in the combined SDR. With a density of .8858 the
> probability is .8858. If true we can then picture taking that 1-bit out
> of the combined SDR leaving an empty spot (not a 0-bit). The remaining
> SDR shorted by one bit pretty much still has the same density, this is
> an approximation for low densities in the original and sample SDRs. So
> for all 40 bits the probability is again the product of all 40
> probabilities = .8858^40 = .0078.
>
> (In reality the density goes down as you remove 1's so this
> approximation overestimates the probability of false positives a little
> bit and underestimates the accuracy. The exact calculation is left as an
> exercise for the reader.)
>
> The accuracy then is the complement (1 - x) of the probability of a
> false positive = .9922 or 99.22%. In general, in Mathematica notation:
>
> PoolerAccuracy[length_, density_, capacity_] := N[1 - (1 - (1 -
> density)^capacity)^(length*density)]
>
> Plotted for capacities from 0 to 1000, the attached graph "accuracy.png"
> shows the expected sigmoid function with a sharp drop off in accuracy
> starting from around 120.
>
> Mathematica then solves for capacity and for the values of length =
> 2048, density 40/2048 and accuracy .99 yields a capacity of 112. The
> attached plots "capacity_vs_density.png" and "capacity_vs_length.png"
> show that decreasing density or increasing length increase capacity.
>
> To test my own math I perform a little experiment in attached
> Mathematica notebook "sdr_experiment.nb". I combined 100 SDRs using
> Boolean "or" and then test 10,000 sample SDRs against the combined SDR.
> This yields an accuracy of 9892 out of 10,000 = 99% as expected.
>
> All correct? And how does this relate to CLA? The temporal pooler feeds
> an or'ed SDR of the current input and all predictions into the spatial
> pooler with its learned one SDR per each column. This doesn't test for
> exact matches, it's more like a certain percentage of bits has to match
> and that percentage depends on match rates of the neighbors. This
> requires some other model of "capacity" than the above. Correct?
>
> Cheers
>
> Rik
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to