Hi, Thought it would be best to start a new thread about this.
I've been going through the code this morning and it turns out that the NonUniformScalarEncoder (in the encoders directory) is performing a version of the encoding scheme I was describing yesterday. This class has a set of bins, which represent ranges of data values. The encodeIntoArray method finds the bin for the input and returns the usual ScalarEncoder encoding for the bin's index. It thus transforms a data distribution which is non-uniform and generates one where every bit is equally likely to be on. The ComputeBins utility function takes an array of data values and uses an inverse cumulative distribution interpolation to spread the bins out in almost exactly the way I described yesterday. There are a couple of important differences. First, the ComputeBins method is a batch mode rather than online process, so it uses a sample of data to set up the whole mapping once off. In addition, the min and max will be fixed in the same way. There's no way to adapt the bins gradually when you get new data. I'm guessing that you could add an update step for every k new inputs, which would adjust the bins gradually to reflect the new statistics of the data. Probably keeping the previous cumulative weight array and adding in the new data would do it (someone familiar with the code can answer this). It looks like this encoder is an orphan. It's not an allowable choice in the mainstream nuPIC code. The only thing that's been done to it recently is some response to a failed test. -- Fergal Byrne <http://www.examsupport.ie>Brenter IT [email protected] +353 83 4214179 Formerly of Adnet [email protected] http://www.adnet.ie
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
