henrikingo commented on PR #96:
URL: https://github.com/apache/otava/pull/96#issuecomment-3539323799
> Regarding your question:
>
> > Do I understand correctly that running this Kappa from 0 to T is exactly
the same as if I would start with two points, then append one point at a time
to the timeseries, re-running otava between each step, and then keeping all
change points found along the way?
>
> It's kind of a loaded question, but the short answer is no. However, I
think you'll be interested in the long answer.
>
> It's not **exactly** the same because as we add a point to the end of
series it might cause a different point to become the best candidate. The
minimal example that I came up with is `[0, 29, 60]` and adding `27` to it:
>
When asking the question, I had slightly misunderstood where this happens:
This is about generating the set of Q-values, not the set of change points
found. (weak or regular...)
So I think the correct use of kappa increases the set of q-values, and
therefore candidate change points, so that the change points that the Hunter
paper describes as missing, or disappearing rather, could be found. But you're
right that only the best one will be picked, and then of course in the next
iteration things have changed, so I guess it is not guaranteed that the
variating of kappa will generate all the same change points as would be found
by computing the algorithm over all {series[:1], series[:2] ... series[:N]} and
just keeping the union of all change points.
Even so, the effect of:
0 < tau < kappa <= T, where kappa goes from 2 to T (your implementation)
seems to me very close to
0 < tau < kappa = t, where t goes from 2 to T (my question)
But as you point out, in the first case we may not actually pick all the
change points that would be generated in the second case. But I feel like the
potential is there, as the first case should generate the same "peaks" of
q-values, but it's not guaranteed, only more likely.
> Next, if I understand correctly why you are asking this question, it is
because of the issues described in the [Hunter
paper](https://arxiv.org/abs/2301.03034). Namely, the claim:
My motivation for the question was to understand whether this fully explains
the phenomenon of change points that first are found and then disappear. It
seems to me it mostly does, but we cannot say for certain it "fully" does so in
all scenarios.
> >>> def figure1_test(N):
> ... base = 440 + np.random.randn(N) * 5
> ... drop = 400 + np.random.randn(N) * 5
> ... recover = 445 + np.random.randn(N) * 5
> ... series = np.concatenate((base, drop, recover))
I think to generate a data set that the hunter paper was concerned with, you
need the drop to be short, maybe even 1-2 only:
drop = 400 + np.random.randn(2) * 5
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]