Regarding giving it feedback with a reward or error metric, I had written
an email to this mailing list a while ago about using Emotion as a
supervisor, a natural sort of feedback loop.

http://lists.numenta.org/pipermail/nupic_lists.numenta.org/2013-August/000738.html

It doesn't look like NuPIC has this mechanism yet, but I believe it would
be useful to have when implementing a goal-seeking (or game-playing) AI
based on the CLA.


On Mon, Aug 26, 2013 at 2:26 AM, Matt Keith <[email protected]> wrote:

> Ian,
>
> Thank you for the reply.  Please see my comments in-line below.
>
> On Aug 25, 2013, at 11:47 PM, Ian Danforth <[email protected]>
> wrote:
>
>
>> 1. Is there a string encoder yet?  Originally, I wanted to just send the
>> model the full ski slope line, but converted it to the int array to pass in
>> scaler values.
>>
>
> Strings are encoded as categories, so not really. A category is
> (generally) a non-overlapping set of bits in the encoded vector. This
> removes a lot of semantic closeness as compared to the scalar encoder. For
> your world there might be value in having greater similarity at the encoder
> level between states.
>
>
> What type of encode values would work best.  I initially encoded the line
> as padding (space before left tree), width (space between trees), and skier
> position.  Then I thought that it might be better to have them all be
> positional values, so I switched to tree, skier, tree.  Would it be better
> for the SDRs to limit the line size to 64 and bit encode slope line with 1s
> for any object (tree or skier)?  To a human, that would just look like a
> bunch of random numbers.  But when viewed in binary, the patterns would
> emerge.  The other option would be to encode it as 80 scaler values, one
> for each character in the slope line.
>
> 2. The trained model seems to work for a little bit, but then stops as it
>> keeps learning during the live run.
>
>
> Learning can be turned off. You need a learning flag field 'L' and when
> you want to toggle learning on or off you have that field contain a non
> None value for that record. NOTE: I haven't done this, this is just my
> reading of clamodel.py (374) and record_stream.py (102).
>
>
> I thought about that, but it doesn't really address the intent of my test.
>  Ideally, I would like to have the model learn how to play on its own
> without being trained beforehand, so I would like to have some type of
> metric for the model to optimize on for improvements.
>
>  I would like the model to "see" the results of it's prediction (i.e. how
>> it moved the skier to complete the feedback loop), but I also want to have
>> some kind of error value so it knows that it's prediction was not optimal.
>>  Does a mechanism like this exist in the current code?
>>
>
> Could you describe this sequence of events more precisely?
>
>
> Sure, maybe an example will help illustrate my thought process.  Let's say
> the model is presented with the following slope line:
>      |  H          |
> After being trained on data from perfect runs, I thought that the model
> would predict this line:
>      |      H      |
> However, since the skier can only move one space at a time, it would
> actually generate this line:
>      |   H         |
> I would like the model to know that the line was improved, but it is still
> three spaces from the optimal position.
>
> On the flip side of this, if the skier crashes, is there a way to flag the
> bad event so the last move is unlearned and a different outcome can occur
> next time.
>
> 3. Most of the model settings are just copied directly from the hotgym
>> example.  Should I change some of the values to work better in this
>> scenario?
>>
>
> Almost certainly, however the best way to find those settings is using
> swarming ... not sure if that's fully available yet.
>
>
> I saw a bunch of swarming files get checked into the master branch and I
> look forward to hearing more about that feature.
>
>  4. Any other comments or suggestions to improve the demonstration?
>>
>
> Is your first goal to have it learn a single track (probably simpler) or
> to learn in general how to deal with any track?
>
>
> I never seem to choose the simple path. :)  I would like to have it learn
> how to play the game on any track if it is given a goal (stay in the
> middle).  I was thinking that we could implement the goal through an error
> metric (number of spaces away from the center space).  In this way, it is
> kind of mixing a genetic algorithm with an HTM.  If we have a reward or
> error metric in place, then the system should continually move toward an
> optimized solution.
>
> Thanks,
>
> Matt
>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to