Re: [nupic-dev] Training/testing data ...

Pedro Tabacof Tue, 24 Sep 2013 12:01:05 -0700

Hi Subutai,

I'm not sure I agree on the "retraining" methodology you proposed. Since
the classical time-series prediction algorithms were designed to be fed
with the prediction error and not retrained with it, I think it's unfair to
compare them with an online learner such as NuPIC as you are using them on
a different way. Perhaps it'd be better to compare the CLA online learning
with other online learning algorithms such as Recursive Least Squares
applied to an ESN [1], for example. Also, it'd be very informative to
compare NuPIC with learning on and off on the test data to see the actual
difference.


Do you have the dataset used to generate the CLA vs ARIMA comparison? I
searched for it on the Github page but I was unable to find it. I'd like to
try out a simple ESN implementation on it (no online learning) and see the
results.

By the way, I found something which seems promising, it's a two year
dataset with real temperature and energy consumption data that was used on
a competition a few years back:
http://neuron.tuke.sk/competition/index.php
http://www.eunite.org/knowledge/Competitions/1st_competition/Introduction/Introduction.htm
It's temporal, multivariable (temperature and energy load), with discrete
information (day of the week and holidays), so I think it would be very
interesting to compare NuPIC's performance with the winners of the
competition considering this seems very appropriate for the CLA. This would
have to be done with learning off and multistep (30) prediciton to make a
fair comparison.

Pedro.

[1] Küçükemre, Ali Uygar. *Echo state networks for adaptive filtering*.
Diss. University of Applied Sciences, 2006.
http://organic.elis.ugent.be/sites/organic.elis.ugent.be/files/Kucukemre.pdf


On Tue, Sep 24, 2013 at 1:15 PM, Subutai Ahmad <[email protected]> wrote:

>
> Hi Pedro,
>
> As long as we use the right type of dataset, the approach you outline
> could work. It's critical to maintain temporal continuity in the training
> and test set. A couple of tweaks:
>
> 1) If your dataset is noisy, you don't want to repeatedly train the CLA
> with the same data. It can over fit to random sequences in the data.   This
> is really important for temporal domains and sequence learning (more so
> than non-temporal domains).  You need to ensure the original dataset is
> large enough that you can just do one pass.
>
> 2) You can actually keep learning turned on. This is because the error is
> measured for each time step before any learning occurs for that time step.
> For other time series techniques which are not online, you can do the
> following. Train the algorithm on all data up to time T. Make a prediction
> for time T+1 and calculate error against the actual value for T+1. Then
> retrain the algorithm on all data up to time T+1. Make prediction for time
> T+2, and calculate error against the actual value for T+2. And so on. This
> is slower but I'd rather keep continuous learning since it is a core
> feature of the CLA.
>
> I did this for CLA's and compared against ARIMA using our hotgym dataset.
> This is a real world energy dataset that we received from customers.  For
> CLA's I used our swarm to find the parameters. For ARIMA I used R's
> auto.arima to fit the parameters.  I have attached a PDF of some slides I
> did several months ago with the results.
>
> *Note*: CLA's do quite well here. I am not an ARIMA expert but I believe
> with hand tuning you could do better than auto.arima. That could be true
> for CLA's too. You have to take all these comparisons with a grain of salt.
> However, this is still an encouraging result.
>
> --Subutai
>
>
>
> On Tue, Sep 24, 2013 at 6:59 AM, Pedro Tabacof <[email protected]> wrote:
>
>> Has NuPIC been compared to any other time-series prediction approach?
>>
>> I believe it's easy to separate training and test data for NuPIC. For
>> example, you could use the first 80% samples for training and the last 20%
>> for testing. Just input the first 80% as many times as necessary and the
>> last 20% once and with learning off, and then calculate the one-step
>> prediction error.
>>
>> This way, I think you could compare it to other interesting approaches
>> such as the classical time-series predictor ARMA or state-of-the-art
>> recurrent neural networks such as ESN or LSTM.
>>
>>
>> On Tue, Sep 24, 2013 at 10:51 AM, Matthew Taylor <[email protected]>wrote:
>>
>>> Hello,
>>>
>>> Unlike many other ML techniques, we do not have separate training data
>>> sets for NuPIC. It is an online-learning system, so it will update its
>>> representations of input data on-the-fly.
>>>
>>> We do have a list of potential data sources on our wiki:
>>> https://github.com/numenta/nupic/wiki/Data-Sets-for-NuPIC
>>>
>>> While receiving new input, NuPIC judges how well it is doing by
>>> comparing its previous predictions to actual values as they occur. It
>>> continuously updates its representations based on how well it is making
>>> predictions.
>>>
>>>
>>> ---------
>>> Matt Taylor
>>> OS Community Flag-Bearer
>>> Numenta
>>>
>>>
>>> On Mon, Sep 23, 2013 at 2:23 PM, mraptor <[email protected]> wrote:
>>>
>>>> What sort of training/test data do you guys use ... is there some sort
>>>> of public domain data that can be used.
>>>> What metrics do you use to compare how good the reproduction of
>>>> original patterns were .. is it just simple deviation from the original
>>>> data OR some more advanced techniques ?!
>>>>
>>>> thanks
>>>>
>>>> _______________________________________________
>>>> nupic mailing list
>>>> [email protected]
>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>
>>>>
>>>
>>> _______________________________________________
>>> nupic mailing list
>>> [email protected]
>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>
>>>
>>
>>
>> --
>> Pedro Tabacof,
>> Unicamp - Eng. de Computação 08.
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>


-- 
Pedro Tabacof,
Unicamp - Eng. de Computação 08.

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-dev] Training/testing data ...

Reply via email to