Ian, Thank you for the reply. Please see my comments in-line below.
On Aug 25, 2013, at 11:47 PM, Ian Danforth <[email protected]> wrote: > > 1. Is there a string encoder yet? Originally, I wanted to just send the > model the full ski slope line, but converted it to the int array to pass in > scaler values. > > Strings are encoded as categories, so not really. A category is (generally) a > non-overlapping set of bits in the encoded vector. This removes a lot of > semantic closeness as compared to the scalar encoder. For your world there > might be value in having greater similarity at the encoder level between > states. What type of encode values would work best. I initially encoded the line as padding (space before left tree), width (space between trees), and skier position. Then I thought that it might be better to have them all be positional values, so I switched to tree, skier, tree. Would it be better for the SDRs to limit the line size to 64 and bit encode slope line with 1s for any object (tree or skier)? To a human, that would just look like a bunch of random numbers. But when viewed in binary, the patterns would emerge. The other option would be to encode it as 80 scaler values, one for each character in the slope line. > 2. The trained model seems to work for a little bit, but then stops as it > keeps learning during the live run. > > Learning can be turned off. You need a learning flag field 'L' and when you > want to toggle learning on or off you have that field contain a non None > value for that record. NOTE: I haven't done this, this is just my reading of > clamodel.py (374) and record_stream.py (102). I thought about that, but it doesn't really address the intent of my test. Ideally, I would like to have the model learn how to play on its own without being trained beforehand, so I would like to have some type of metric for the model to optimize on for improvements. > I would like the model to "see" the results of it's prediction (i.e. how it > moved the skier to complete the feedback loop), but I also want to have some > kind of error value so it knows that it's prediction was not optimal. Does a > mechanism like this exist in the current code? > > Could you describe this sequence of events more precisely? Sure, maybe an example will help illustrate my thought process. Let's say the model is presented with the following slope line: | H | After being trained on data from perfect runs, I thought that the model would predict this line: | H | However, since the skier can only move one space at a time, it would actually generate this line: | H | I would like the model to know that the line was improved, but it is still three spaces from the optimal position. On the flip side of this, if the skier crashes, is there a way to flag the bad event so the last move is unlearned and a different outcome can occur next time. > 3. Most of the model settings are just copied directly from the hotgym > example. Should I change some of the values to work better in this scenario? > > Almost certainly, however the best way to find those settings is using > swarming ... not sure if that's fully available yet. I saw a bunch of swarming files get checked into the master branch and I look forward to hearing more about that feature. > 4. Any other comments or suggestions to improve the demonstration? > > Is your first goal to have it learn a single track (probably simpler) or to > learn in general how to deal with any track? I never seem to choose the simple path. :) I would like to have it learn how to play the game on any track if it is given a goal (stay in the middle). I was thinking that we could implement the goal through an error metric (number of spaces away from the center space). In this way, it is kind of mixing a genetic algorithm with an HTM. If we have a reward or error metric in place, then the system should continually move toward an optimized solution. Thanks, Matt
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
