Re: Replication-related IT failures

Adam J. Shook Fri, 31 Jan 2020 09:52:22 -0800

I see the value of having the replication system remain intact in Accumulo
and would vote to keep.  From my personal experience of using it, it works
well but ultimately ended up disabling replication in production due to the
high latency.  It is still used in lower non-production environments where
latency is less of a concern.  Additionally, since we don't know the full
user base of Accumulo, I cannot personally recommend the feature be phased
out.


As far as the flakey integration test are concerned, if no one steps up to
work on them, I am +1 on adding @Ignore.

On Fri, Jan 31, 2020 at 10:30 AM Josh Elser <[email protected]> wrote:

> I'm really upset that you think suggesting removal of the feature is
> appropriate.
>
> More installations than not of HBase (IMO which should be considered
> Accumulo's biggest competitor) use replication. The only users of HBase
> I see who without a disaster recovery plan is developer-focused
> instances with zero uptime guarantees. I'll even go farther to say: any
> user who deploys a database into a production scenario would *require* a
> D/R solution for that database before it would be allowed to be called
> "production".
>
> Yes, there are D/R solutions that can be implemented at the data
> processing layer, but this is almost always less ideal as the cost of
> reprocessing and shipping the raw data is much greater than what
> Accumulo replication could do.
>
> While I am deflated that no other developers have seen this and have any
> interest in helping work through bugs/issues, they are volunteers and I
> can only be sad about this. However, I will not let an argument which
> equates to "we should junk the car because it has a flat tire" go
> without response.
>
> On 1/28/20 10:58 PM, Christopher wrote:
> > As succinctly as I can:
> >
> > 1. Replication-related IT have been flakey for a long time,
> > 2. The feature is not actively maintained (critical, or at least,
> > untriaged issues exist dating back to 2014 in JIRA),
> > 3. No volunteers have stepped up thus far to maintain them and make
> > them reliable or to develop/maintain replication,
> > 4. I don't have time to fix the flakey ITs, and don't have interest or
> > use case for maintaining the feature,
> > 5. The IT breakages interfere with build testing on CI servers and for
> releases.
> >
> > Therefore:
> >
> > A. I want to @Ignore the flakey ITs, so they don't keep interfering
> > with test builds,
> > B. We can re-enable the ITs if/when a volunteer contributes
> > reliability fixes for them,
> > C. If nobody steps up, we should have a separate conversation about
> > possibly phasing out the feature and what that would look like.
> >
> > The conversation I suggest in "C" is a bit premature right now. I'm
> > starting with this email to see if any volunteers want to step up.
> >
> > Even if somebody steps up immediately, they may not have a fix
> > immediately. So, if there's no objections, I'm going to disable the
> > flakey tests soon by adding the '@Ignore' JUnit annotation until a fix
> > is contributed, so they don't keep getting in the way of
> > troubleshooting other build-related issues. We already know they are
> > flakey... the constant failures aren't telling us anything new, so the
> > tests aren't useful as is.
> >
>

Re: Replication-related IT failures

Reply via email to