I think FaTE ensures that the transaction is started and it waits for it to
finish. It must be the case that a failure is not being propagated back up
to fail the transaction. Are you seeing FaTE restarting the same compaction
over and over again, or are the multiple IN_PROGRESS transactions from
different compactions (my guess is the latter)? It would be interesting to
see if the Iterator Test Harness[1,2] exposes the issue in your iterator.
You can delete the FaTE transactions, but you will need to shut down the
Manager (Master) to do so.

[1]
https://accumulo.apache.org/1.10/accumulo_user_manual.html#_iterator_testing
[2]
https://accumulo.apache.org/docs/2.x/development/development_tools#iterator-test-harness

On Wed, Jul 6, 2022 at 10:59 PM Christopher <ctubb...@apache.org> wrote:

> The behavior in case of error is likely undefined, so I'm not entirely
> surprised it's behaving this way. There may be things we can do to try to
> handle errors more gracefully for user initiated compactions when an
> iterator throws an exception, but it's definitely a good idea to write
> custom iterators in a way that tries to handle its own errors as much as
> possible.
>
> On Wed, Jul 6, 2022, 20:42 Logan Jones <lo...@codescratch.com> wrote:
>
> > Thanks Chris for the quick reply. I'll explain the behavior I'm seeing,
> and
> > then maybe you all could either confirm this is the intended behavior, or
> > decide it's maybe not that great.
> >
> > My understanding of the happy case for running a user-initiated
> compaction
> > is that a fate/transaction gets created in zookeeper, and the Accumulo
> > master node ends up farming off the compactions to the correct tablet
> > servers, once the tablets have been completed, somehow the
> > fates/transactions in zookeeper get cleaned up.
> >
> > I experienced a problem, however, in the unhappy case for compactions
> which
> > I have since reproduced. We had a custom iterator configured for a table,
> > and that custom iterator was in a bad state (i.e. it was always throwing
> an
> > exception during initialization). What we noticed is that the fates are
> > indefinitely stuck IN_PROGRESS and never go away in this case.
> Effectively
> > we have a poison pill, and if you issue too many compactions against that
> > table, you can cause other bad problems.
> >
> > I created a repo to demonstrate the problem as succinctly as I could
> > manage:
> >
> > https://github.com/loganasherjones/accumulo-iterator-failures
> >
> > I thought initially that maybe it was due to the fact that our iterator
> was
> > throwing an error during initialization, but this appears to be happening
> > for any error on next, seek, or init calls.
> >
> > So my questions are
> >
> > 1. Is it expected that a failure in a seek, next, or init in an iterator
> > during a user-initiated compaction would cause accumulo to non-stop retry
> > the compaction
> > 2. If so, could you help me understand why?
> >
> > Thanks in advance,
> >
> > - Logan
> >
> >
> >
> > On Wed, Jul 6, 2022 at 6:31 PM Christopher <ctubb...@apache.org> wrote:
> >
> > > Yes, either here (especially if it's related to a bug or proposed code
> > > change) or at user@ would work, if it's more of a user question. Here
> is
> > > fine if you're not sure.
> > >
> > > On Wed, Jul 6, 2022, 16:35 Logan Jones <lo...@codescratch.com> wrote:
> > >
> > > > Hello:
> > > >
> > > > I would like to discuss what happens when iterators cause
> > user-initiated
> > > > compactions to fail, specifically in relation to the fate
> transactions.
> > > Is
> > > > this the right list for this discussion?
> > > >
> > > > Thanks,
> > > >
> > > > - Logan
> > > >
> > >
> >
>

Reply via email to