henrikingo opened a new issue, #84:
URL: https://github.com/apache/otava/issues/84

   In Otava, Piotr introduced window_len as a user facing parameter, and for a 
given data point, only `window_len/2` data points are included in the 
computation. This gives better quality results, in particular it makes it 
possible to find two adjacent change points. This is because computation is 
focused on the local neighborhood of each candidate change point, and data 
points far away are completely ignored.
   
   In Piotr's version, the input data is split into `2*datapoints/window_len` 
windows and the regular e-divisive is run on each window. After this there is a 
process to merge the small windows together. The windows also overlap, so the 
same point could be found once or twice, with different p-value for each.
   
   Mathematically it would be more elegant, and consistent, and potentially 
also a performance improvement (because we don't count every point twice 
because no need for overlap) to rather just do a single pass, as in the 
original e-divisive, but incorporate window length into the core e-divisive 
code. This could be something like this:
   
   
https://github.com/mongodb/signal-processing-algorithms/blob/74d6a8fa3a377ee5dce4821ba9dfd2f2528790b2/src/signal_processing_algorithms/e_divisive.py#L222-L250
   
           length = window_len
           n = 1   # Note: this is 2 in the quoted code, but that is 
unnecessarily conservative.
           w_start = max( n - window_len / 2, 0)
           w_stop = min( n + window_len / 2, len(series) )
           m = w_stop - n
   
           # term1 = sum(diffs[i][j] for i in range(w_start, n) for j in 
range(n, w_stop))
           term1 = np.sum(diffs[w_start:n, n:w_stop])
   
           # term2 = sum(diffs[i][k] for i in range(w_start, n) for k in 
range(i + 1, n))
           term2 = np.sum(np.triu(diffs[w_start:n, w_start:n], 0))
   
           # term3 = sum(diffs[j][k] for j in range(n, w_stop)
           #                         for k in range(j + 1, w_stop)) 
           term3 = np.sum(np.triu(diffs[n:w_stop, n + 1 : w_stop], 0))
   
           qhat_values[n] = self.calculate_q(term1, term2, term3, m, n)
   
           for n in range(3, (length - 2)):
               w_start = max( n - window_len / 2, 0)
               w_stop = min( n + window_len / 2, len(series) )
               m = w_stop - n
   
               # TODO: may need a +1 and / or -1 somewhere
               column_delta = np.sum(diffs[n - 1, w_start : n - 1])
               row_delta = np.sum(diffs[n:w_stop, n - 1])
   
               term1 = term1 - column_delta + row_delta
               term2 = term2 + column_delta
               term3 = term3 - row_delta
   
               qhat_values[n] = self.calculate_q(term1, term2, term3, m, 
n-w_start)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to