Hi Sam, I'm not sure how much data you have, but typically we do not run the swarm on all the data. We usually just run one to two thousand rows, then use those model parameters to run through the whole data stream starting from the beginning. That should avoid overfitting. If you want to be extra careful you can ignore the first N predictions, where N is the number of rows used for swarming.
Another option for your specific situation is to run swarm for some of the subjects, but report results only for the other subjects. --Subutai ----- @SubutaiAhmad On Tue, Apr 12, 2016 at 2:20 PM, Samuel O Heiserman <[email protected] > wrote: > Hey Nupic! > > I'm wondering: when running data ]through Nupic, should I not run the > same file to build the model as I did to swarm for the parameters? Since > the parameters were tuned to that exact data, it seems like a potential > overfitting risk. The data is a series of control actions of subjects > playing a simple game. What I'm trying to do is train a model on the > subject 1's data, save that model and use it to forecast for subjects 1 - > 20. > I hope to show that the HTM can learn the individual behavioral > patterns of a given subject distinct from the others, and I plan to show > this capacity with a result where the model does well forecasting for all > subjects, but especially well at forecasting for the subject it was trained > on. However I wonder if when testing the model on subject 1, I should use > different subject 1 data than I used to swarm for the parameters. Thanks > again! > > -- Sam >
