I follow the thought, and seems reasonable. Having the external dependency seems to limit options, and if unlikely to get a donation creating rest of components seems reasonable.
Maybe related: Ultimately, looking forward to Otava being deployed to a usable/current version of python [ to one that is at least currently supported/getting-updates ]. On Mon, Sep 22, 2025 at 11:37 AM Henrik Ingo <[email protected]> wrote: > Ok so let me restart this discussion... > > After a successful first release as an ASF incubating project, we started > discussing what to do with the main dependency, the > signal_processing_algorithms repo. Driving motivation here is that it is > rather central to what Otava is doing, and long term it will be better for > development if we can easily make changes in both halves. > > The guidance from our mentors was that the repo is too large to just copy > paste into Otava (2600 lines). For such additions, ASF usually prefers to > receive a copyright transfer/donation in writing from the original author / > copyright holder. > > So the guidance was that someone from the Otava IPMC (me) should contact > MongoDB to find out whether they would be open to such a transfer.For > context, we were already in contact with MongoDB when we drafted the > project proposal a year ago. While they were mostly enthusiastic, in > hindsight it seems formally joining Apache Otava (incubating) wasn't a > priority so that it would have actually happened. So chances are, the same > dialogue would play out again: general excitement, but a high risk that the > legal department has other priorities, and in the end we just wasted time > on talking instead of programming. > > > > So, I looked deeper into what we have in front of us, and have discussed > this off-list with Alex. > > While all of the codebase in the signal_processing_algorithms is indeed > over 2k lines, most of that is code we don't use in Otava, or don't need. > Also, the number is inflated, because the repo contains multiple different > implementations, all doing exactly the same thing, > > > > > > > in particular: > > > * Piotr already replaced the significance test (which is like the latter > half of what e-divisive does) with a student t test. > * Also for the main part of the algorithm, Piotr introduced the windowing > approach, which is novel and not in the MongoDB code > > While Piotr's implementation kind of wraps around the original MongoDB > e-divisive implementation, it could have been done more elegantly and > efficiently if it was modifying the e-divisive code directly. So that's > where we get into the discussion about why don't we just do that then. > > > * Finally, in Otava we have my optimziation from last year, the incremental > e-divisive implementation, which is also novel and MongoDB code has nothing > like that. However, it still uses the very core part of the e-divisive > algorithm, so from a "code coverage" perspective, it no longer reduces the > amount of lines that we depend on in the signal processing repo. > > When all the above is accounted for, there's about 100 lines of code that > executes the very heart of e-divisive: pairwise comparison of the data > points in a time series. This could be rewritten by someone just by > implementing line by line the math from the Matteson & James (2013) paper > (formulas 5 and 6). Given that Piotr's and my work also optimizes the > amount of needed computation a lot, for a first version we don't need to > implement this in C, nor use fancy numpy functions, it could just be the > double for loop that you get when implementing the \sum ... \sum (xi-xj)^2 > from the paper. > > > There aren't a lot of drawbacks with this idea. Ralistically we drop > support for the --orig-edivisive mode, as that by definition depends on the > original signal_proccessing code. > > > > Let me know what you think > henrik > > > > > > -- > *nyrkio.com <http://nyrkio.com/>* ~ *git blame for performance* > > Henrik Ingo, CEO > [email protected] LinkedIn: > www.linkedin.com/in/heingo > +358 40 569 7354 Twitter: > twitter.com/h_ingo >
