Hello NuPIC,
I would like to clarify a few questions:
1a. In "Predicting a Sine Waves with NuPIC" video at 30:00 Matt says
that the predicted data are a bit shifted (I suppose this is due to step
1 that was chosen) compared to input data. Matt also says that as NuPIC
sees more and more data the predicted wave should have less bumps and
also anomaly score should have less spikes. In my experiments I have
huge anomaly score at the beginning (nearly 1.0 for the whole 1st period
of sine) which seems logic because it took time of size 1st period of
sine for NuPIC to see repeating pattern, everything before 1st period is
new/unknown for NuPIC and the high anomaly score is result of that. But
how you explain following: I often see in my experiments high anomaly
score but nice predicting wave (absence of bumps) and vice versa, parts
where predicted wave does not copy the input data nicely and has a lot
of bumps have small anomaly score, is this OK if yes why? How actually
anomaly score works, what is the correlation between anomaly score and
predicted wave?
1b. Somewhere I've also read (or saw in videos) if NuPIC does not know
what to predict (or does not have enough data) it simply repeats after
the input. Can I somehow know when NuPIC repeats and when predicts,
should I worry about this?
1c. In my sine experiments I see nice prediction at the beginning (I
suppose NuPIC is repeating) then some bumps (I suppose NuPIC is
predicting) and then again nice prediction and then again some bumps. My
(very simplified) understanding of this is here (in ASCII art):
In first 1/4 of period everything is new (and thus the anomaly score is
high) for NuPIC and it is repeating what it sees:
/
Then in 2/4 of period it start to see some repeating pattern:
/\
After the first period it strengthening confidence of what it sees (two
times same pattern):
/\/
As it is constantly learning it does not know at the moment how data
will looks like next (so it still report some anomaly score). Maybe the
repeating pattern is following: 1 period of frequency N + 1 period of
frequency N/2 as shown here:
__
/\/ \__/
This leads me to following conclusion: There will be never such enough
data (you know, maybe frequency change will happen after 2nd 3rd ....
1000th .... period, who knows) that NuPIC can say for 100% sure that
something has 0 anomaly score and it was totally expected. There will be
always some unpredictability. But in my sine experiments I've often see
that anomaly score equals to 0. My understanding is: anomaly score was
computed only after whole dataset was seen. But if anomaly score was
calculated after whole dataset was seen why there is no anomaly score of
0 in whole file except 1st period? So how and when it was computed?
1d. I've found this interesting article
https://github.com/subutai/nupic.subutai/blob/master/swarm_examples/README.md
which refers to Jeff Hawkins mail about improving the prediction
http://lists.numenta.org/pipermail/nupic_lists.numenta.org/2013-June/000327.html
It is 2 years old stuff. Jeff mentioned there data encoding (sampling
rate, noise...) and also evolutionary algorithm that would experiment
with different parameters until it a found a set that worked well
solving the exact problem specified. He said that: "It is our intention
to put this PSO code into NuPIC but we haven't been able to do that yet"
so I would like to ask if this problem is already solved and PSO is in
NuPIC now? Isn't swarm process (finding the best model parameters) what
there is called PSO? Should I worry about tunning prediction if yes how,
or it is part of NuPIC nowadays?
1e. Should I know the details of HTM or some other low level stuff to
fully understand those questions?
Thank you
PS: I'm non native English speaker so please bare with me ;)
Best Regards
Name: Wakan Tanka a.k.a. Wakatana a.k.a. MackoP00h
Location: Europe
Contact:
[email protected]
http://stackoverflow.com/users/1616488/wakan-tanka
https://github.com/wakatana
https://twitter.com/MackoP00h