henrikingo opened a new issue, #84: URL: https://github.com/apache/otava/issues/84
In Otava, Piotr introduced window_len as a user facing parameter, and for a given data point, only `window_len/2` data points are included in the computation. This gives better quality results, in particular it makes it possible to find two adjacent change points. This is because computation is focused on the local neighborhood of each candidate change point, and data points far away are completely ignored. In Piotr's version, the input data is split into `2*datapoints/window_len` windows and the regular e-divisive is run on each window. After this there is a process to merge the small windows together. The windows also overlap, so the same point could be found once or twice, with different p-value for each. Mathematically it would be more elegant, and consistent, and potentially also a performance improvement (because we don't count every point twice because no need for overlap) to rather just do a single pass, as in the original e-divisive, but incorporate window length into the core e-divisive code. This could be something like this: https://github.com/mongodb/signal-processing-algorithms/blob/74d6a8fa3a377ee5dce4821ba9dfd2f2528790b2/src/signal_processing_algorithms/e_divisive.py#L222-L250 length = window_len n = 1 # Note: this is 2 in the quoted code, but that is unnecessarily conservative. w_start = max( n - window_len / 2, 0) w_stop = min( n + window_len / 2, len(series) ) m = w_stop - n # term1 = sum(diffs[i][j] for i in range(w_start, n) for j in range(n, w_stop)) term1 = np.sum(diffs[w_start:n, n:w_stop]) # term2 = sum(diffs[i][k] for i in range(w_start, n) for k in range(i + 1, n)) term2 = np.sum(np.triu(diffs[w_start:n, w_start:n], 0)) # term3 = sum(diffs[j][k] for j in range(n, w_stop) # for k in range(j + 1, w_stop)) term3 = np.sum(np.triu(diffs[n:w_stop, n + 1 : w_stop], 0)) qhat_values[n] = self.calculate_q(term1, term2, term3, m, n) for n in range(3, (length - 2)): w_start = max( n - window_len / 2, 0) w_stop = min( n + window_len / 2, len(series) ) m = w_stop - n # TODO: may need a +1 and / or -1 somewhere column_delta = np.sum(diffs[n - 1, w_start : n - 1]) row_delta = np.sum(diffs[n:w_stop, n - 1]) term1 = term1 - column_delta + row_delta term2 = term2 + column_delta term3 = term3 - row_delta qhat_values[n] = self.calculate_q(term1, term2, term3, m, n-w_start) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
