This problem touches text prediction/generation. But is of a general Nupic algorithmic topic.
Playing with Chetan's linguist repo https://github.com/chetan51/linguist/issues/1 , I discussed the (relatively poor) results with Chetan and Scott. (conversation below) Then I realized we do not do resets in the text streams. And text streams are one example where resets are well reasonable to do (and well defined too). >From what I recall, OPF allows to force a TP reset after periodic time intervals, that is unusable here (worst case, I could set it to an average sentence length). The other example where OPF does reset is end of the dataset and start of a new epoch. That;s why relatively good results on trivial "Hello World!" datasets. Ideally, I'd like to set a set of "terminators" = ['!','.','?'] and call a reset() whenever the new char == one of those. Is there a reasonable way to rewrite (where?) OPF to allow this behavior? Related to the OPF & API thread, that's why I'd like OPF, or its successor to have a choice for 'fnName' : 'listOfParams' setting, where fnName would be executed each round with parameters listOfParams. This way, I could just simply pass def _checkTerminate(c,listTerm): if c in listTerm: TP.reset(); You may say I don't use OPF then. For this case I probably will, as it's easy to chain encoder|SP|TP. OPF does some improved things for the inference etc, see Scott below. Cheers! Mark. --------------------------------------------- The temporal pooler will have a set of cells predicted at each step (multiple simultaneous predictions). The classifier converts the predicted cells back to letters. So when it sees "m" it may be predicting the TP cells for both "a" in "made" and "a" in "matches". The classifier is guessing that the "m" is the start of "made" but when the "a" comes the TP doesn't necessarily lock on to just the "made" sequence. So in the next step the classifier is still guessing whether you are in the "made" sequence or the "matches" sequence. I am sort of spitballing here but it seems like the behavior seen, while not intuitive, could be correct, at least for some of the letters. The spatial pooler and the CLA classifier make it a little hard to reason about the results. Perhaps an alternative would be to use just the temporal pooler. You could have 40 or so columns for each character that you want to include. I would limit the characters you include (convert everything to lowercase, for instance). If you have 30 characters with 40 columns per character than you need a TP with 1200 columns. Assign the first 40 columns to "a", the next 40 to "b", etc. And you can directly map the predicted cells/columns back into predicted letters (and the more predicted columns for a given letter, the more likely you can say that letter will come next). The downside is that you can only predict one step ahead. So not sure if you want to move to this but it would make it easier to reason about the results. You can see examples of using the TP directly here: https://github.com/numenta/nupic/tree/master/examples/tp Hope that helps a little. -- Marek Otahal :o)
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
