Re: [nupic-dev] SDR capacity

Rik Mon, 04 Nov 2013 06:29:58 -0800

Try [1]http://www.wolfram.com/products/player/ to view, transcribe into
whatever utility suits you. I doubt though yours has "Solve". You'll
have to do some thinking yourself then. :)






On Tue, Nov 5, 2013, at 01:21 AM, Ian Danforth wrote:

Rik,

 This is awesome, exactly what I wanted to see following Subutai's
talk. Unfortunately I don't have access to Mathematica. Any chance we
could get an iPython Notebook version? Ultimately I think an
interactive graphing tool in JS would be the perfect way to visualize
these properties and let people explore the parameters.

 Thanks again,

 Ian



On Mon, Nov 4, 2013 at 5:54 AM, Rik <[2][email protected]> wrote:

Hi,



Yesterday in the CLA presentation at the hackathon Subutai was asking

"how many 40-out-of-2048 SDRs can you store in an SDR?" where "storing"

meaning or'ing them all up (as in Boolean "or") and then comparing the

combined SDR to a sample SDR. (The combined SDR may actually not be so

sparse anymore but I stick to calling all bit vectors "SDR" here.) All

"1" bits of the sample SDR should only be present in the result if the

sample was part of the set in the first place, otherwise it'd be a
false

positive. A matching random SDR would almost certainly be a false

positive as any realistic set of input SDRs are a vanishingly small

subset of all possible SDRs.



Since false positives are inevitable and their probability increases
the

more SDRs you "or" together, the question should really ask for a

capacity given a desired probability of false positives, or rather a

desired accuracy which is the complement (1 - x) of false positives

(there being no false negatives). So how does the accuracy relate to
the

capacity? And what's the capacity for the above 40/2048 assuming a

desired accuracy of, say, 99%?



The attached Mathematica notebook "sdr_capacity.nb" seeks to work this

out. First the accuracy is determined as a function of length, density

and capacity and then solved for capacity.



Starting with the example 40-out-of-2048, the density is 40/2048 =
.0195

meaning 1.95% of bits are "1" and the density of 0's (called
"density0")

is 1 - density = .9804. Because two bits or'ed together are only 0 if

both bits are 0, the density0 of two SDRs or'ed together is the product

of the original density0's. Or for the case of, say, 100 SDRs of the

same density0 or'ed together: density0^100 = .1142. The density (of
1's)

is then 1 - density0 = .8858 (= 1814 bits for a length of 2048).



To determine the probability of a random SDR to have all its 1-bits

present in the combined SDR we take its first 1-bit position and check

for a presence of 1 in the combined SDR. With a density of .8858 the

probability is .8858. If true we can then picture taking that 1-bit out

of the combined SDR leaving an empty spot (not a 0-bit). The remaining

SDR shorted by one bit pretty much still has the same density, this is

an approximation for low densities in the original and sample SDRs. So

for all 40 bits the probability is again the product of all 40

probabilities = .8858^40 = .0078.



(In reality the density goes down as you remove 1's so this

approximation overestimates the probability of false positives a little

bit and underestimates the accuracy. The exact calculation is left as
an

exercise for the reader.)



The accuracy then is the complement (1 - x) of the probability of a

false positive = .9922 or 99.22%. In general, in Mathematica notation:



PoolerAccuracy[length_, density_, capacity_] := N[1 - (1 - (1 -

density)^capacity)^(length*density)]



Plotted for capacities from 0 to 1000, the attached graph
"accuracy.png"

shows the expected sigmoid function with a sharp drop off in accuracy

starting from around 120.



Mathematica then solves for capacity and for the values of length =

2048, density 40/2048 and accuracy .99 yields a capacity of 112. The

attached plots "capacity_vs_density.png" and "capacity_vs_length.png"

show that decreasing density or increasing length increase capacity.



To test my own math I perform a little experiment in attached

Mathematica notebook "sdr_experiment.nb". I combined 100 SDRs using

Boolean "or" and then test 10,000 sample SDRs against the combined SDR.

This yields an accuracy of 9892 out of 10,000 = 99% as expected.



All correct? And how does this relate to CLA? The temporal pooler feeds

an or'ed SDR of the current input and all predictions into the spatial

pooler with its learned one SDR per each column. This doesn't test for

exact matches, it's more like a certain percentage of bits has to match

and that percentage depends on match rates of the neighbors. This

requires some other model of "capacity" than the above. Correct?



Cheers



Rik



_______________________________________________

nupic mailing list

[3][email protected]

[4]http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org




_______________________________________________

nupic mailing list

[5][email protected]

[6]http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

References

1. http://www.wolfram.com/products/player/
2. mailto:[email protected]
3. mailto:[email protected]
4. http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
5. mailto:[email protected]
6. http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-dev] SDR capacity

Reply via email to