This problem touches text prediction/generation. But is of a general Nupic
algorithmic topic.

Playing with Chetan's linguist repo
https://github.com/chetan51/linguist/issues/1 , I discussed the (relatively
poor) results with Chetan and Scott. (conversation below)

 Then I realized we do not do resets in the text streams. And text streams
are one example where resets are well reasonable to do (and well defined
too).

>From what I recall, OPF allows to force a TP reset after periodic time
intervals, that is unusable here (worst case, I could set it to an average
sentence length). The other example where OPF does reset is end of the
dataset and start of a new epoch. That;s why relatively good results on
trivial "Hello World!" datasets.

Ideally, I'd like to set a set of "terminators" = ['!','.','?'] and call a
reset() whenever the new char == one of those. Is there a reasonable way to
rewrite (where?) OPF to allow this behavior?

Related to the OPF & API thread, that's why I'd like OPF, or its successor
to have a choice for 'fnName' : 'listOfParams' setting, where fnName would
be executed each round with parameters listOfParams. This way, I could just
simply pass def _checkTerminate(c,listTerm): if c in listTerm: TP.reset();


You may say I don't use OPF then. For this  case I probably will, as it's
easy to chain encoder|SP|TP. OPF does some improved things for the
inference etc, see Scott below.

Cheers! Mark.


---------------------------------------------

The temporal pooler will have a set of cells predicted at each step
(multiple simultaneous predictions). The classifier converts the predicted
cells back to letters. So when it sees "m" it may be predicting the TP
cells for both "a" in "made" and "a" in "matches". The classifier is
guessing that the "m" is the start of "made" but when the "a" comes the TP
doesn't necessarily lock on to just the "made" sequence. So in the next
step the classifier is still guessing whether you are in the "made"
sequence or the "matches" sequence.

I am sort of spitballing here but it seems like the behavior seen, while
not intuitive, could be correct, at least for some of the letters.

The spatial pooler and the CLA classifier make it a little hard to reason
about the results. Perhaps an alternative would be to use just the temporal
pooler. You could have 40 or so columns for each character that you want to
include. I would limit the characters you include (convert everything to
lowercase, for instance). If you have 30 characters with 40 columns per
character than you need a TP with 1200 columns. Assign the first 40 columns
to "a", the next 40 to "b", etc. And you can directly map the predicted
cells/columns back into predicted letters (and the more predicted columns
for a given letter, the more likely you can say that letter will come next).

The downside is that you can only predict one step ahead. So not sure if
you want to move to this but it would make it easier to reason about the
results. You can see examples of using the TP directly here:
https://github.com/numenta/nupic/tree/master/examples/tp

Hope that helps a little.


-- 
Marek Otahal :o)
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to