It seems like you figured out resets directly with the CLA model but just for future reference: you can specify resets in data for description.py files through a specific type of column. The file format has three header lines and the third is the "FieldMetaSpecial.".
You can specify "S" (for sequence) to have a reset inserted right before any new value. In other words, you put the same value in the column for every row in the sequence and when the OPF sees a new value it will know it is the start of a new sequence and insert a reset before the record. Alternatively, you can specify "R" as a boolean field. This is a different method for achieving the same result. See the FieldMetaSpecial class here: py/nupic/data/fieldmeta.py On Thu, Nov 14, 2013 at 8:48 PM, Marek Otahal <[email protected]> wrote: > Hey, thanks a lott!! > > First I wanted direct access to TP/SP, but looking at the model I found > this, which is great! > > https://github.com/numenta/nupic/blob/master/py/nupic/frameworks/opf/clamodel.py#L239 > > When calling the reset for sentence separators (!,.,?,:,",....), the > results look much more accurate: see below. > > Btw, the cpp impl of SP serves linguist well. I'll send a PR to your > branch tmr. > > Best regards, Mark > > > ----------------Linguist on > child-stories.txt------------------------------------- > [27944] s ==> would come toge (0.59 | 0.43 | 0.44 | 0.43 | 0.43 | 0.45 | > 0.64 | 0.45 | 0.60 | 0.43 | 0.45 | 0.65 | 0.47 | 0.43 | 0.43 | 0.45) > [27945] . ==> |If a boat saile (0.92 | 0.87 | 0.87 | 0.87 | 0.60 | 0.69 | > 0.60 | 0.65 | 0.60 | 0.60 | 0.64 | 0.62 | 0.60 | 0.60 | 0.60 | 0.65) > DEBUG: Result of PyRegion::executeCommand : 'None' > reset > [27946] | ==> If a boat sailed (0.84 | 0.84 | 0.85 | 0.57 | 0.62 | 0.57 | > 0.57 | 0.57 | 0.57 | 0.61 | 0.58 | 0.58 | 0.58 | 0.57 | 0.58 | 0.58) > [27947] T ==> hey were as high (1.00 | 1.00 | 0.92 | 0.93 | 0.91 | 0.85 | > 0.85 | 0.85 | 0.85 | 0.85 | 0.85 | 0.86 | 0.85 | 0.86 | 0.85 | 0.86) > [27948] h ==> e were as high (0.96 | 0.41 | 0.55 | 0.40 | 0.36 | 0.36 | > 0.37 | 0.53 | 0.36 | 0.36 | 0.44 | 0.36 | 0.37 | 0.37 | 0.42 | 0.41) > [27949] e ==> atee the rock (0.52 | 0.36 | 0.36 | 0.33 | 0.29 | 0.25 | > 0.25 | 0.25 | 0.28 | 0.24 | 0.24 | 0.46 | 0.27 | 0.27 | 0.27 | 0.27) > [27950] s ==> .|Thed come toge (0.51 | 0.51 | 0.35 | 0.35 | 0.35 | 0.45 | > 0.64 | 0.45 | 0.60 | 0.43 | 0.45 | 0.65 | 0.47 | 0.43 | 0.43 | 0.45) > [27951] e ==> therhehd break t (0.26 | 0.25 | 0.25 | 0.25 | 0.36 | 0.36 | > 0.26 | 0.34 | 0.36 | 0.32 | 0.31 | 0.31 | 0.31 | 0.31 | 0.32 | 0.31) > [27952] ==> poeces.|T esetle (0.23 | 0.46 | 0.32 | 0.23 | 0.28 | 0.26 | > 0.26 | 0.26 | 0.23 | 0.29 | 0.26 | 0.23 | 0.27 | 0.29 | 0.25 | 0.54) > [27953] r ==> ocks wo ld tome (0.36 | 0.36 | 0.65 | 0.36 | 0.35 | 0.35 | > 0.35 | 0.37 | 0.35 | 0.35 | 0.42 | 0.37 | 0.35 | 0.35 | 0.35 | 0.37) > [27954] o ==> at ooeeder aehe (0.25 | 0.25 | 0.47 | 0.40 | 0.22 | 0.40 | > 0.21 | 0.29 | 0.26 | 0.30 | 0.30 | 0.21 | 0.21 | 0.37 | 0.38 | 0.38) > [27955] c ==> ese These ro and (0.34 | 0.63 | 0.34 | 0.34 | 0.34 | 0.34 | > 0.34 | 0.35 | 0.36 | 0.35 | 0.34 | 0.35 | 0.53 | 0.48 | 0.48 | 0.48) > [27956] k ==> s would come tog (0.66 | 0.66 | 0.64 | 0.65 | 0.64 | 0.64 | > 0.65 | 0.75 | 0.64 | 0.64 | 0.64 | 0.64 | 0.64 | 0.64 | 0.64 | 0.64) > > > > On Fri, Nov 15, 2013 at 4:33 AM, Chetan Surpur <[email protected]> wrote: > >> Mark, >> >> Linguist doesn't use the OPF other than for swarming. It directly calls >> methods on the CLA model. If you want to have it reset the sequence when it >> reads a particular character, you can just add that logic to the Linguist >> code. >> >> - Chetan >> >> >> On Thu, Nov 14, 2013 at 6:51 PM, Marek Otahal <[email protected]>wrote: >> >>> This problem touches text prediction/generation. But is of a general >>> Nupic algorithmic topic. >>> >>> Playing with Chetan's linguist repo >>> https://github.com/chetan51/linguist/issues/1 , I discussed the >>> (relatively poor) results with Chetan and Scott. (conversation below) >>> >>> Then I realized we do not do resets in the text streams. And text >>> streams are one example where resets are well reasonable to do (and well >>> defined too). >>> >>> From what I recall, OPF allows to force a TP reset after periodic time >>> intervals, that is unusable here (worst case, I could set it to an average >>> sentence length). The other example where OPF does reset is end of the >>> dataset and start of a new epoch. That;s why relatively good results on >>> trivial "Hello World!" datasets. >>> >>> Ideally, I'd like to set a set of "terminators" = ['!','.','?'] and call >>> a reset() whenever the new char == one of those. Is there a reasonable way >>> to rewrite (where?) OPF to allow this behavior? >>> >>> Related to the OPF & API thread, that's why I'd like OPF, or its >>> successor to have a choice for 'fnName' : 'listOfParams' setting, where >>> fnName would be executed each round with parameters listOfParams. This way, >>> I could just simply pass def _checkTerminate(c,listTerm): if c in listTerm: >>> TP.reset(); >>> >>> >>> You may say I don't use OPF then. For this case I probably will, as >>> it's easy to chain encoder|SP|TP. OPF does some improved things for the >>> inference etc, see Scott below. >>> >>> Cheers! Mark. >>> >>> >>> --------------------------------------------- >>> >>> The temporal pooler will have a set of cells predicted at each step >>> (multiple simultaneous predictions). The classifier converts the predicted >>> cells back to letters. So when it sees "m" it may be predicting the TP >>> cells for both "a" in "made" and "a" in "matches". The classifier is >>> guessing that the "m" is the start of "made" but when the "a" comes the TP >>> doesn't necessarily lock on to just the "made" sequence. So in the next >>> step the classifier is still guessing whether you are in the "made" >>> sequence or the "matches" sequence. >>> >>> I am sort of spitballing here but it seems like the behavior seen, while >>> not intuitive, could be correct, at least for some of the letters. >>> >>> The spatial pooler and the CLA classifier make it a little hard to >>> reason about the results. Perhaps an alternative would be to use just the >>> temporal pooler. You could have 40 or so columns for each character that >>> you want to include. I would limit the characters you include (convert >>> everything to lowercase, for instance). If you have 30 characters with 40 >>> columns per character than you need a TP with 1200 columns. Assign the >>> first 40 columns to "a", the next 40 to "b", etc. And you can directly map >>> the predicted cells/columns back into predicted letters (and the more >>> predicted columns for a given letter, the more likely you can say that >>> letter will come next). >>> >>> The downside is that you can only predict one step ahead. So not sure if >>> you want to move to this but it would make it easier to reason about the >>> results. You can see examples of using the TP directly here: >>> https://github.com/numenta/nupic/tree/master/examples/tp >>> >>> Hope that helps a little. >>> >>> >>> -- >>> Marek Otahal :o) >>> >>> _______________________________________________ >>> nupic mailing list >>> [email protected] >>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>> >>> >> >> _______________________________________________ >> nupic mailing list >> [email protected] >> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >> >> > > > -- > Marek Otahal :o) > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org > >
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
