It seems like you figured out resets directly with the CLA model but just
for future reference: you can specify resets in data for description.py
files through a specific type of column. The file format has three header
lines and the third is the "FieldMetaSpecial.".

You can specify "S" (for sequence) to have a reset inserted right before
any new value. In other words, you put the same value in the column for
every row in the sequence and when the OPF sees a new value it will know it
is the start of a new sequence and insert a reset before the record.

Alternatively, you can specify "R" as a boolean field. This is a different
method for achieving the same result.

See the FieldMetaSpecial class here:
py/nupic/data/fieldmeta.py


On Thu, Nov 14, 2013 at 8:48 PM, Marek Otahal <[email protected]> wrote:

> Hey, thanks a lott!!
>
> First I wanted direct access to TP/SP, but looking at the model I found
> this, which is great!
>
> https://github.com/numenta/nupic/blob/master/py/nupic/frameworks/opf/clamodel.py#L239
>
> When calling the reset for sentence separators (!,.,?,:,",....), the
> results look much more accurate: see below.
>
> Btw, the cpp impl of SP serves linguist well. I'll send a PR to your
> branch tmr.
>
> Best regards, Mark
>
>
> ----------------Linguist on
> child-stories.txt-------------------------------------
> [27944]  s ==>  would come toge (0.59 | 0.43 | 0.44 | 0.43 | 0.43 | 0.45 |
> 0.64 | 0.45 | 0.60 | 0.43 | 0.45 | 0.65 | 0.47 | 0.43 | 0.43 | 0.45)
> [27945]  . ==> |If a boat saile (0.92 | 0.87 | 0.87 | 0.87 | 0.60 | 0.69 |
> 0.60 | 0.65 | 0.60 | 0.60 | 0.64 | 0.62 | 0.60 | 0.60 | 0.60 | 0.65)
> DEBUG:  Result of PyRegion::executeCommand : 'None'
> reset
> [27946]  | ==> If a boat sailed (0.84 | 0.84 | 0.85 | 0.57 | 0.62 | 0.57 |
> 0.57 | 0.57 | 0.57 | 0.61 | 0.58 | 0.58 | 0.58 | 0.57 | 0.58 | 0.58)
> [27947]  T ==> hey were as high (1.00 | 1.00 | 0.92 | 0.93 | 0.91 | 0.85 |
> 0.85 | 0.85 | 0.85 | 0.85 | 0.85 | 0.86 | 0.85 | 0.86 | 0.85 | 0.86)
> [27948]  h ==> e  were as high  (0.96 | 0.41 | 0.55 | 0.40 | 0.36 | 0.36 |
> 0.37 | 0.53 | 0.36 | 0.36 | 0.44 | 0.36 | 0.37 | 0.37 | 0.42 | 0.41)
> [27949]  e ==>   atee  the rock (0.52 | 0.36 | 0.36 | 0.33 | 0.29 | 0.25 |
> 0.25 | 0.25 | 0.28 | 0.24 | 0.24 | 0.46 | 0.27 | 0.27 | 0.27 | 0.27)
> [27950]  s ==> .|Thed come toge (0.51 | 0.51 | 0.35 | 0.35 | 0.35 | 0.45 |
> 0.64 | 0.45 | 0.60 | 0.43 | 0.45 | 0.65 | 0.47 | 0.43 | 0.43 | 0.45)
> [27951]  e ==> therhehd break t (0.26 | 0.25 | 0.25 | 0.25 | 0.36 | 0.36 |
> 0.26 | 0.34 | 0.36 | 0.32 | 0.31 | 0.31 | 0.31 | 0.31 | 0.32 | 0.31)
> [27952]    ==> poeces.|T esetle (0.23 | 0.46 | 0.32 | 0.23 | 0.28 | 0.26 |
> 0.26 | 0.26 | 0.23 | 0.29 | 0.26 | 0.23 | 0.27 | 0.29 | 0.25 | 0.54)
> [27953]  r ==> ocks wo ld tome  (0.36 | 0.36 | 0.65 | 0.36 | 0.35 | 0.35 |
> 0.35 | 0.37 | 0.35 | 0.35 | 0.42 | 0.37 | 0.35 | 0.35 | 0.35 | 0.37)
> [27954]  o ==> at  ooeeder aehe (0.25 | 0.25 | 0.47 | 0.40 | 0.22 | 0.40 |
> 0.21 | 0.29 | 0.26 | 0.30 | 0.30 | 0.21 | 0.21 | 0.37 | 0.38 | 0.38)
> [27955]  c ==> ese These ro and (0.34 | 0.63 | 0.34 | 0.34 | 0.34 | 0.34 |
> 0.34 | 0.35 | 0.36 | 0.35 | 0.34 | 0.35 | 0.53 | 0.48 | 0.48 | 0.48)
> [27956]  k ==> s would come tog (0.66 | 0.66 | 0.64 | 0.65 | 0.64 | 0.64 |
> 0.65 | 0.75 | 0.64 | 0.64 | 0.64 | 0.64 | 0.64 | 0.64 | 0.64 | 0.64)
>
>
>
> On Fri, Nov 15, 2013 at 4:33 AM, Chetan Surpur <[email protected]> wrote:
>
>> Mark,
>>
>> Linguist doesn't use the OPF other than for swarming. It directly calls
>> methods on the CLA model. If you want to have it reset the sequence when it
>> reads a particular character, you can just add that logic to the Linguist
>> code.
>>
>> - Chetan
>>
>>
>> On Thu, Nov 14, 2013 at 6:51 PM, Marek Otahal <[email protected]>wrote:
>>
>>> This problem touches text prediction/generation. But is of a general
>>> Nupic algorithmic topic.
>>>
>>> Playing with Chetan's linguist repo
>>> https://github.com/chetan51/linguist/issues/1 , I discussed the
>>> (relatively poor) results with Chetan and Scott. (conversation below)
>>>
>>>  Then I realized we do not do resets in the text streams. And text
>>> streams are one example where resets are well reasonable to do (and well
>>> defined too).
>>>
>>> From what I recall, OPF allows to force a TP reset after periodic time
>>> intervals, that is unusable here (worst case, I could set it to an average
>>> sentence length). The other example where OPF does reset is end of the
>>> dataset and start of a new epoch. That;s why relatively good results on
>>> trivial "Hello World!" datasets.
>>>
>>> Ideally, I'd like to set a set of "terminators" = ['!','.','?'] and call
>>> a reset() whenever the new char == one of those. Is there a reasonable way
>>> to rewrite (where?) OPF to allow this behavior?
>>>
>>> Related to the OPF & API thread, that's why I'd like OPF, or its
>>> successor to have a choice for 'fnName' : 'listOfParams' setting, where
>>> fnName would be executed each round with parameters listOfParams. This way,
>>> I could just simply pass def _checkTerminate(c,listTerm): if c in listTerm:
>>> TP.reset();
>>>
>>>
>>> You may say I don't use OPF then. For this  case I probably will, as
>>> it's easy to chain encoder|SP|TP. OPF does some improved things for the
>>> inference etc, see Scott below.
>>>
>>> Cheers! Mark.
>>>
>>>
>>> ---------------------------------------------
>>>
>>> The temporal pooler will have a set of cells predicted at each step
>>> (multiple simultaneous predictions). The classifier converts the predicted
>>> cells back to letters. So when it sees "m" it may be predicting the TP
>>> cells for both "a" in "made" and "a" in "matches". The classifier is
>>> guessing that the "m" is the start of "made" but when the "a" comes the TP
>>> doesn't necessarily lock on to just the "made" sequence. So in the next
>>> step the classifier is still guessing whether you are in the "made"
>>> sequence or the "matches" sequence.
>>>
>>> I am sort of spitballing here but it seems like the behavior seen, while
>>> not intuitive, could be correct, at least for some of the letters.
>>>
>>> The spatial pooler and the CLA classifier make it a little hard to
>>> reason about the results. Perhaps an alternative would be to use just the
>>> temporal pooler. You could have 40 or so columns for each character that
>>> you want to include. I would limit the characters you include (convert
>>> everything to lowercase, for instance). If you have 30 characters with 40
>>> columns per character than you need a TP with 1200 columns. Assign the
>>> first 40 columns to "a", the next 40 to "b", etc. And you can directly map
>>> the predicted cells/columns back into predicted letters (and the more
>>> predicted columns for a given letter, the more likely you can say that
>>> letter will come next).
>>>
>>> The downside is that you can only predict one step ahead. So not sure if
>>> you want to move to this but it would make it easier to reason about the
>>> results. You can see examples of using the TP directly here:
>>> https://github.com/numenta/nupic/tree/master/examples/tp
>>>
>>> Hope that helps a little.
>>>
>>>
>>> --
>>> Marek Otahal :o)
>>>
>>> _______________________________________________
>>> nupic mailing list
>>> [email protected]
>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>
>>>
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>
>
> --
> Marek Otahal :o)
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to