Re: [nupic-discuss] Benchmarking CLA: information capacity of Spatial Pooler

Marek Otahal Fri, 22 Nov 2013 14:54:31 -0800

Hi Scott, thank you for the response!

On Fri, Nov 22, 2013 at 11:00 PM, Scott Purdy <sc...@numenta.org> wrote:

>
> On Fri, Nov 22, 2013 at 6:55 AM, Marek Otahal <markota...@gmail.com>wrote:
>
>> Guys,
>>
>> ...
>> The top cap is (100 choose 20) which is some crazy number of 5*10^20. All
>> these SDRs will be sparse, but not distributed (right??) because a change
>> in one bit will already be another pattern.
>>
>
> The number of possible unique SP outputs is (1000 choose 20), or ~10^41.
>

Yes, I missed one zero there.

> These are all 2% sparsity. Changing one input bit doesn't necessary result
> in a different SP output though. There could be many more input bit
> patterns than combinations of 20 SP columns. For instance, 1000 input bits
> have 10^300 possible patterns. And regardless of that, the semantic
> information learned by the SP is distributed across the 1000 columns so it
> would still be distributed.
>

I wasn't clear there - I was thinking top-down, so a 1bit change to the
SP's output would result in representation of a different learned input
pattern. This raises a question: the "robustness"-feature from SDRs, is it
related to the input bits (I mess with some of the input bits and still
expect to get same SP's output), or to the output ON bits? where when I
have representations be (1000 choose 20), so even if I flip 3-5 of the
output bits, there's still a good chance the result is the closest to my
original input, and not some other input? And is "robust"=="distributed"?
Or distributed means 2^1000 states are represented by (only) (1000 choose
20) states?

>
>> So my question is, what is the "usable" capacity where all outputs are
>> still sparse (they all are) and distributed (=robust to noice). Is there a
>> percentage of bits (say 20% bits chaotic and still recognizes the pattern
>> still considered distributed/robust?)
>>
>
> This is still a valid question for real world datasets but is completely
> dependent on the particular dataset. For instance, regardless of the SP
> parameters, the dataset may have 10000 input bits but only ~50 of them
> change regularly. The tolerance to noise at this point is limited by the
> dataset.
>

Nice point, I haven't considered that. And assuming all of the bits carry
information? Which is what I believe happens at the higher levels of the
regions (?) - the useless data is cropped out. Do we have some data (from
biology?) showing there have to be atleast say 5% (=R) of bits robustness
at the output SDR? (eg because of errors at the synapses, etc..) So, for
example, inputA causes SDR_A, iff I turn 5 of the 20 ON bits off, input_A
would still be the most likely.

This would allow me to lower the max number of patterns, because for (1000
choose 20) I'd actually require (1000 choose 25).

>
>>
>> Or is it the other way around and the SP tries to maximize this robustnes
>> for the given number of patterns it is presented? I if I feed it huge
>> number of patterns I'll pay the obvious price of reducing the border
>> between two patterns?
>>
>
> I think the answer to the first question is yes but to the second no. The
> SP attempts to maximize the distance between the column input bits relative
> to the actual data (rather than the entire input space). But feeding many
> patterns in doesn't necessarily have an impact on this. If the input data
> are not random, then the more data fed into the SP, I would expect the more
> the columns will converge to the optimal representations.
>

This is true, I was (falsely) assuming random input. But in real use-cases,
the SP will find "patterns" in the input patterns, so  even for higher
number of input data, we may actually see drop in the entropy,as the SP
will find some rule that separates the inputs.

>> Either way, is there a reasonable way to measure what I defined a
>> capacity?
>>
>> I was thinking like:
>>
>> for 10 repetitions:
>>    for p in patterns_to_present:
>>       sp.input(p)
>>
>> sp.disableLearning()
>> for p in patterns_to_present:
>>    p_mod = randomize_some_percentage_of_pattern(p, percentage)  # what
>> should the percentage be? see above
>>    if( sp.input(p) == sp.input(p_mod):
>>         # ok, it's same, pattern learned
>>
>
> This seems like a good methodology for determining how tolerant the model
> is to noise for this particular dataset. The amount of data fed in before
> disabling learning will have a large impact on the noise tolerance (but
> with diminishing returns).
>

I think your answers led me to clearing it up, so a short summary... Does
robustness to noise reciprocally correlate to the total number of input
patterns I;m able to distinguish? (1/(rob tolerance) ~ #patterns) From what
has been said, I think it is not necessary for real world datasets.

 PS:
Is there a (lower bound) limit on the number of columns in SP? So would a
20 col SP work? That way, I could achieve the (20 choose 3) and reach the
state of info-full SP.

Regards, Mark

>>
>> Thanks for your replies,
>> Mark
>>
>>
>> --
>> Marek Otahal :o)
>>
>> _______________________________________________
>> nupic mailing list
>> nupic@lists.numenta.org
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>
> _______________________________________________
> nupic mailing list
> nupic@lists.numenta.org
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>

-- 
Marek Otahal :o)

_______________________________________________
nupic mailing list
nupic@lists.numenta.org
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-discuss] Benchmarking CLA: information capacity of Spatial Pooler

Reply via email to