On Fri, Aug 22, 2014 at 3:28 AM, Cavan Day-Lewis <[email protected]> wrote:
> Classification: NPL Management Ltd - Commercial > > Ø In fact, the more correlated a field is with the predicted field, the > more likely that it is > > Ø unnecessary and will be left out. (This is the opposite of what > happens in most machine > > Ø learning applications.) > > > > This is very interesting, why is this so? > > > It's because of the problem formulation with streaming data. Suppose you have two variables, x and y and suppose x is the predicted field. In the OPF, the HTM is solving the following problem. Given: : x(t-2) y(t-2) x(t-1) y(t-1) x(t) y(t) Predict: x(t+1) Because of sequence learning, the HTM is good at exploiting information from time t and past time stamps. And it has access to all that data. If y(t) and x(t) are perfectly correlated, y(t) adds no additional value over x(t). The important thing is to have temporal correlation between y(t) and x(t+1) that is above and beyond the correlation between x(t) and x(t+1). With fast moving data streams, I've found that temperature often changes so slowly that the effects of including it are minimal because the effects are already contained within x(t). In static machine learning problems, the problem formulation is usually: given y(t) (and possibly other variables from time t), predict x(t). It's a very different formulation. BTW, I agree with Matt's comment - you don't need to swarm over all the data. Just swarm over a couple of thousand records, then use the resulting parameters on the full dataset. --Subutai
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
