Regarding giving it feedback with a reward or error metric, I had written an email to this mailing list a while ago about using Emotion as a supervisor, a natural sort of feedback loop.
http://lists.numenta.org/pipermail/nupic_lists.numenta.org/2013-August/000738.html It doesn't look like NuPIC has this mechanism yet, but I believe it would be useful to have when implementing a goal-seeking (or game-playing) AI based on the CLA. On Mon, Aug 26, 2013 at 2:26 AM, Matt Keith <[email protected]> wrote: > Ian, > > Thank you for the reply. Please see my comments in-line below. > > On Aug 25, 2013, at 11:47 PM, Ian Danforth <[email protected]> > wrote: > > >> 1. Is there a string encoder yet? Originally, I wanted to just send the >> model the full ski slope line, but converted it to the int array to pass in >> scaler values. >> > > Strings are encoded as categories, so not really. A category is > (generally) a non-overlapping set of bits in the encoded vector. This > removes a lot of semantic closeness as compared to the scalar encoder. For > your world there might be value in having greater similarity at the encoder > level between states. > > > What type of encode values would work best. I initially encoded the line > as padding (space before left tree), width (space between trees), and skier > position. Then I thought that it might be better to have them all be > positional values, so I switched to tree, skier, tree. Would it be better > for the SDRs to limit the line size to 64 and bit encode slope line with 1s > for any object (tree or skier)? To a human, that would just look like a > bunch of random numbers. But when viewed in binary, the patterns would > emerge. The other option would be to encode it as 80 scaler values, one > for each character in the slope line. > > 2. The trained model seems to work for a little bit, but then stops as it >> keeps learning during the live run. > > > Learning can be turned off. You need a learning flag field 'L' and when > you want to toggle learning on or off you have that field contain a non > None value for that record. NOTE: I haven't done this, this is just my > reading of clamodel.py (374) and record_stream.py (102). > > > I thought about that, but it doesn't really address the intent of my test. > Ideally, I would like to have the model learn how to play on its own > without being trained beforehand, so I would like to have some type of > metric for the model to optimize on for improvements. > > I would like the model to "see" the results of it's prediction (i.e. how >> it moved the skier to complete the feedback loop), but I also want to have >> some kind of error value so it knows that it's prediction was not optimal. >> Does a mechanism like this exist in the current code? >> > > Could you describe this sequence of events more precisely? > > > Sure, maybe an example will help illustrate my thought process. Let's say > the model is presented with the following slope line: > | H | > After being trained on data from perfect runs, I thought that the model > would predict this line: > | H | > However, since the skier can only move one space at a time, it would > actually generate this line: > | H | > I would like the model to know that the line was improved, but it is still > three spaces from the optimal position. > > On the flip side of this, if the skier crashes, is there a way to flag the > bad event so the last move is unlearned and a different outcome can occur > next time. > > 3. Most of the model settings are just copied directly from the hotgym >> example. Should I change some of the values to work better in this >> scenario? >> > > Almost certainly, however the best way to find those settings is using > swarming ... not sure if that's fully available yet. > > > I saw a bunch of swarming files get checked into the master branch and I > look forward to hearing more about that feature. > > 4. Any other comments or suggestions to improve the demonstration? >> > > Is your first goal to have it learn a single track (probably simpler) or > to learn in general how to deal with any track? > > > I never seem to choose the simple path. :) I would like to have it learn > how to play the game on any track if it is given a goal (stay in the > middle). I was thinking that we could implement the goal through an error > metric (number of spaces away from the center space). In this way, it is > kind of mixing a genetic algorithm with an HTM. If we have a reward or > error metric in place, then the system should continually move toward an > optimized solution. > > Thanks, > > Matt > > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org > >
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
