Mark, Linguist doesn't use the OPF other than for swarming. It directly calls methods on the CLA model. If you want to have it reset the sequence when it reads a particular character, you can just add that logic to the Linguist code.
- Chetan On Thu, Nov 14, 2013 at 6:51 PM, Marek Otahal <[email protected]> wrote: > This problem touches text prediction/generation. But is of a general Nupic > algorithmic topic. > > Playing with Chetan's linguist repo > https://github.com/chetan51/linguist/issues/1 , I discussed the > (relatively poor) results with Chetan and Scott. (conversation below) > > Then I realized we do not do resets in the text streams. And text streams > are one example where resets are well reasonable to do (and well defined > too). > > From what I recall, OPF allows to force a TP reset after periodic time > intervals, that is unusable here (worst case, I could set it to an average > sentence length). The other example where OPF does reset is end of the > dataset and start of a new epoch. That;s why relatively good results on > trivial "Hello World!" datasets. > > Ideally, I'd like to set a set of "terminators" = ['!','.','?'] and call a > reset() whenever the new char == one of those. Is there a reasonable way to > rewrite (where?) OPF to allow this behavior? > > Related to the OPF & API thread, that's why I'd like OPF, or its successor > to have a choice for 'fnName' : 'listOfParams' setting, where fnName would > be executed each round with parameters listOfParams. This way, I could just > simply pass def _checkTerminate(c,listTerm): if c in listTerm: TP.reset(); > > > You may say I don't use OPF then. For this case I probably will, as it's > easy to chain encoder|SP|TP. OPF does some improved things for the > inference etc, see Scott below. > > Cheers! Mark. > > > --------------------------------------------- > > The temporal pooler will have a set of cells predicted at each step > (multiple simultaneous predictions). The classifier converts the predicted > cells back to letters. So when it sees "m" it may be predicting the TP > cells for both "a" in "made" and "a" in "matches". The classifier is > guessing that the "m" is the start of "made" but when the "a" comes the TP > doesn't necessarily lock on to just the "made" sequence. So in the next > step the classifier is still guessing whether you are in the "made" > sequence or the "matches" sequence. > > I am sort of spitballing here but it seems like the behavior seen, while > not intuitive, could be correct, at least for some of the letters. > > The spatial pooler and the CLA classifier make it a little hard to reason > about the results. Perhaps an alternative would be to use just the temporal > pooler. You could have 40 or so columns for each character that you want to > include. I would limit the characters you include (convert everything to > lowercase, for instance). If you have 30 characters with 40 columns per > character than you need a TP with 1200 columns. Assign the first 40 columns > to "a", the next 40 to "b", etc. And you can directly map the predicted > cells/columns back into predicted letters (and the more predicted columns > for a given letter, the more likely you can say that letter will come next). > > The downside is that you can only predict one step ahead. So not sure if > you want to move to this but it would make it easier to reason about the > results. You can see examples of using the TP directly here: > https://github.com/numenta/nupic/tree/master/examples/tp > > Hope that helps a little. > > > -- > Marek Otahal :o) > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org > >
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
