Re: [PR] Replacing signal_processing_algorithms with internal implementation [otava]

via GitHub Sun, 16 Nov 2025 12:27:44 -0800


henrikingo commented on PR #96:
URL: https://github.com/apache/otava/pull/96#issuecomment-3539323799


   > Regarding your question:
   > 
   > > Do I understand correctly that running this Kappa from 0 to T is exactly 
the same as if I would start with two points, then append one point at a time 
to the timeseries, re-running otava between each step, and then keeping all 
change points found along the way?
   > 
   > It's kind of a loaded question, but the short answer is no. However, I 
think you'll be interested in the long answer.
   > 
   > It's not **exactly** the same because as we add a point to the end of 
series it might cause a different point to become the best candidate. The 
minimal example that I came up with is `[0, 29, 60]` and adding `27` to it:
   > 
   
   When asking the question, I had slightly misunderstood where this happens: 
This is about generating the set of Q-values, not the set of change points 
found. (weak or regular...)
   
   So I think the correct use of kappa increases the set of q-values, and 
therefore candidate change points, so that the change points that the Hunter 
paper describes as missing, or disappearing rather, could be found. But you're 
right that only the best one will be picked, and then of course in the next 
iteration things have changed, so I guess it is not guaranteed that the 
variating of kappa will generate all the same change points as would be found 
by computing the algorithm over all {series[:1], series[:2] ... series[:N]} and 
just keeping the union of all change points. 
   
   Even so, the effect of:
   
   0 < tau < kappa <= T, where kappa goes from 2 to T (your implementation)
   
   seems to me very close to 
   
   0 < tau < kappa = t, where t goes from 2 to T (my question)
   
   But as you point out, in the first case we may not actually pick all the 
change points that would be generated in the second case. But I feel like the 
potential is there, as the first case should generate the same "peaks" of 
q-values, but it's not guaranteed, only more likely.
   
   > Next, if I understand correctly why you are asking this question, it is 
because of the issues described in the [Hunter 
paper](https://arxiv.org/abs/2301.03034). Namely, the claim:
   
   My motivation for the question was to understand whether this fully explains 
the phenomenon of change points that first are found and then disappear. It 
seems to me it mostly does, but we cannot say for certain it "fully" does so in 
all scenarios.
   
   
   
   > >>> def figure1_test(N):
   > ...     base = 440 + np.random.randn(N) * 5
   > ...     drop = 400 + np.random.randn(N) * 5
   > ...     recover = 445 + np.random.randn(N) * 5
   > ...     series = np.concatenate((base, drop, recover))
   
   I think to generate a data set that the hunter paper was concerned with, you 
need the drop to be short, maybe even 1-2 only:
   
        drop = 400 + np.random.randn(2) * 5
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Replacing signal_processing_algorithms with internal implementation [otava]

Reply via email to