For reference, our last conversation about the state of replication was https://lists.apache.org/thread.html/ra65ecbfcdb26af2672b7a064d313c0db0285b7d9f228c09559a14842%40%3Cdev.accumulo.apache.org%3E ; in that, I tried to make the community aware of the issues involving the long-running and frequently broken ITs that were becoming a burden and interfering with progress in other areas of our code. After that discussion, we disabled the consistently failing tests, with a call for somebody to volunteer to pick up the maintenance burden. Since that discussion, nobody has volunteered.
I do think we need to: 1. Communicate to users the current state, so they don't have high expectations for its reliability when we know differently, and 2. Make a plan to deprecate and remove the feature (as it currently exists, anyway), from Accumulo, in order to prevent the technical debt and tight coupling to critical WAL code from inhibiting other development work in Accumulo. We can do #1 by updating the properties for the feature to Experimental and/or Deprecated. Both states are reversible if the status quo changes, but I think it's important users aren't misled into thinking the feature is more stable and well-maintained than we know it to be. For #2, I think it would be okay to deprecate it in the next minor release, and remove it in the next major release after that. Again, the deprecated state can be reversed if the status quo substantially changes. On Tue, Oct 19, 2021 at 8:19 PM Ed Coleman <[email protected]> wrote: > > I stared a general thread concerning topics for the next release. One major > topic raised was the state of replication and trying to determine if there is > consensus for a way forward. I stared this thread so that replication > discussions can occur in a single thread for continuity. From the general > email thread: > > It is hard to know what the state of replication is and maybe we need to mark > it as either experimental or deprecated to convey that to users. The > replication tests have been unstable and failing with transient errors and > have been removed from the regular build process – this reduced the automated > build time by over 2 hours. A recent example is accumulo-testing issue #164 > (https://github.com/apache/accumulo-testing/issues/164) Without the test > running regularly, it is hard to state with any confidence that replication > works reliably in a production environment. This should not be interpreted > as advocating that we remove replication at this point, but we need a way > forward. Maybe someone volunteers to examine the tests and fixes them so that > they run reliably and in a reasonable time, or maybe we begin to explore > other approaches – for example, maybe some kind of NiFi connector or > something else entirely. I really don’t know, but it seems we need to > clearly communicate so > mething to any users that may be using or considering using replication in > the next release the current state and to signal possible future intentions.
