Hi Wakan,

It may be helpful for you to study the HTM white paper [1] to understand
the terminology of the HTM algorithms.  My comments presumed that you
understand the basic terminology and algorithms as described in this paper.

A "column" refers to a column in the HTM, which is either active or not.
Input to the HTM may cause a column to become active (think, binary "1"
value) or inactive (think, binary "0" value).  The cells in a column
determine if the HTM predicts the column to become active on the next
step.  The details of this are described clearly in the white paper.

The HTM works on bit patterns, not numbers.  Every number you provide to
the HTM must be translated to a bit pattern.  That is what the encoders
do.  I have never used OPF, so I can't advise you on it.  Similarly, the
prediction of the HTM is a bit pattern (predicted columns are 1's and not
predicted columns are 0's).  To get back to a number, the bit pattern must
be decoded.  You may find it helpful to study the comments and source code
for the Scalar Encoder [2] to get a feel for the translation of numbers to
bit patterns.  It is very important to understand how that works if you
want to understand HTM.

My point was that the HTM may make precise predictions or diffuse
predictions and the anomaly score may be low in both cases.  A precise
prediction corresponds to a small number of predicted columns (a predicted
bit pattern with few 1's).  A diffuse prediction corresponds to a large
number of predicted columns (a bit pattern with many 1's).  In the former
case, the HTM only knows of a single or small number of next possible
patterns in the temporal sequence, while in the latter case, it may know
many possible next steps in the sequence.  Both scenarios can result in a
low anomaly score as long as the actual bit pattern (the active columns in
the next step) highly overlaps with the predicted bit pattern (the
predicted columns in the previous step).

[1] http://numenta.org/resources/HTM_CorticalLearningAlgorithms.pdf
[2]
https://github.com/numenta/nupic/blob/master/src/nupic/encoders/scalar.py

Regards,
Daniel

On Fri, Nov 6, 2015 at 4:18 PM, Wakan Tanka <[email protected]> wrote:

> Hello Daniel,
>
>
> On 11/03/2015 09:56 PM, Daniel McDonald wrote:
>
>> There's one aspect of this thread that I don't feel has been touched on,
>> which may help in understanding prediction and the anomaly score.  I
>> learned this at the spring hackathon in NY.
>>
>> If you look at how the anomaly score is implemented [1], you'll see that
>> it computes the ratio of the difference of the number of active columns
>> and the number of active columns which were also predicted to the number
>> of actives.  That is, (#active - #activeAndPredicted) / #active.  Note
>> that this formula does not depend on the total number of predicted
>> columns.  In fact, if the HTM predicts all columns, the anomaly score
>> will be 0 for any subsequent input.  In this case, the HTM would be
>> completely uncertain about the next step in the sequence, so it predicts
>> a superposition of all possible patterns; therefore, any subsequent
>> input is not anomalous.
>>
>> What do you mean by "active columns" and "number of actives"? Are you are
> saying anomaly score is depending on number of columns (values or
> dimensions if you want) which are predicted (e.g. memory consumption, cpu
> sonsumption ...)? This is what you mean? Sorry for my bad English.
>
> That is exactly what happened to my Market Patterns hack at the
>> hackathon.  After training the HTM on years of stock market data, the
>> anomaly score dropped quite low; however, when I looked carefully at
>> what was going on, the HTM had, in fact, saturated and was predicting
>> more than half of the columns to be active at each step in the
>> sequence.  In effect, it was saying that the sequences were
>> unpredictable and anything was possible in the next step (we already
>> knew that about the stock market, right?).  Consequently, whatever
>> happened next was not anomalous.
>>
>> You are saying that is it possible to have low anomaly score on nearly
> all value that is predicted due to many possible predictions which are
> equally confident?
>
>
> When I look at your example data, I read it this way:
>>
>> At 175, 0.0 was read and 0.0 is the prediction for the next step.  The
>> anomaly score of 0.325 is meaningless, because we don't have data from
>> the previous step.
>>
>> At 176, 62 was read, which doesn't match the prediction of 0.0 (from
>> 175), so it is anomalous (0.65).  52 is predicted for the next step.
>>
>> At 177, 402 is read.  It is completely anomalous (1.0).  That is there
>> is no overlap in the columns predicted for the value 52 and the columns
>> active for the value 402.  If you are using a scalar encoder, that makes
>> sense, since the bit patterns for such different numbers likely have no
>> overlap in the encoding or in the SDR produced by the SP.  0.0 is
>> predicted for the next step.
>>
> How can I know if I'm using scalar encoder. Can I see somehow the SDR of
> 402, 52 is there any mapping for this? I'm using OPF.
>
>
>
>
>> At 178, 0 is read, and the anomaly score drops low (0.125), since the
>> actual matches closely to what was predicted at the previous step.  The
>> score isn't exactly 0, because the predicted SDR from the previous step
>> and the encoded SDR for the new input may differ in some columns.  In
>> other words, in the previous step, when 0.0 was reported as the
>> prediction, this was only an approximate translation of a predicted SDR,
>> where 0.0 was the closest decoded representation.  0.0 is predicted for
>> the next step.
>>
>> At 179, 402 is read, which is completely anomalous (1.0) because the
>> predicted SDR for 0.0 had no column in common with the encoded SDR for
>> 402.
>>
>> 180 is similar to 178, and 0.0 is predicted.
>>
>> At 181, 3 is read.  The anomaly score is low (0.05), because the scalar
>> encoder produces overlapping patterns for similar numbers, so there is
>> likely overlap in the SDR's for 0 and 3.  402 is predicted.
>>
>> At 182, 50 is read.  The anomaly score is low (0.1), which is a bit
>> puzzling; however, it may be due to saturation.  The prediction of 402
>> could represent a case where many columns were predicted representing a
>> superposition of possible states, and 402 was just the strongest one
>> (i.e., had the highest overlap of the encoded SDR for 402 with the
>> predicted columns).  That is, 52 may have also been predicted, but to a
>> lesser degree than 402.  It may be helpful to look at how many columns
>> are predicted vs. active in each step to see when this happens.  If the
>> number of predicted columns suddenly jumps, it means that the HTM is
>> uncertain about the next step (or, that it sees many possible next steps
>> given the current context).
>>
> Wht do you mean by "how many columns are predicted vs active"?
>
>
> Thank you very much
>
>

Reply via email to