Hi,

Thought it would be best to start a new thread about this.

I've been going through the code this morning and it turns out that the
NonUniformScalarEncoder (in the encoders directory) is performing a version
of the encoding scheme I was describing yesterday.

This class has a set of bins, which represent ranges of data values. The
encodeIntoArray method finds the bin for the input and returns the usual
ScalarEncoder encoding for the bin's index.

It thus transforms a data distribution which is non-uniform and generates
one where every bit is equally likely to be on.

The ComputeBins utility function takes an array of data values and uses an
inverse cumulative distribution interpolation to spread the bins out in
almost exactly the way I described yesterday.

There are a couple of important differences. First, the ComputeBins method
is a batch mode rather than online process, so it uses a sample of data to
set up the whole mapping once off. In addition, the min and max will be
fixed in the same way. There's no way to adapt the bins gradually when you
get new data.

I'm guessing that you could add an update step for every k new inputs,
which would adjust the bins gradually to reflect the new statistics of the
data. Probably keeping the previous cumulative weight array and adding in
the new data would do it (someone familiar with the code can answer this).

It looks like this encoder is an orphan. It's not an allowable choice in
the mainstream nuPIC code. The only thing that's been done to it recently
is some response to a failed test.


-- 

Fergal Byrne

<http://www.examsupport.ie>Brenter IT
[email protected] +353 83 4214179
Formerly of Adnet [email protected] http://www.adnet.ie
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to