I think FaTE ensures that the transaction is started and it waits for it to finish. It must be the case that a failure is not being propagated back up to fail the transaction. Are you seeing FaTE restarting the same compaction over and over again, or are the multiple IN_PROGRESS transactions from different compactions (my guess is the latter)? It would be interesting to see if the Iterator Test Harness[1,2] exposes the issue in your iterator. You can delete the FaTE transactions, but you will need to shut down the Manager (Master) to do so.
[1] https://accumulo.apache.org/1.10/accumulo_user_manual.html#_iterator_testing [2] https://accumulo.apache.org/docs/2.x/development/development_tools#iterator-test-harness On Wed, Jul 6, 2022 at 10:59 PM Christopher <ctubb...@apache.org> wrote: > The behavior in case of error is likely undefined, so I'm not entirely > surprised it's behaving this way. There may be things we can do to try to > handle errors more gracefully for user initiated compactions when an > iterator throws an exception, but it's definitely a good idea to write > custom iterators in a way that tries to handle its own errors as much as > possible. > > On Wed, Jul 6, 2022, 20:42 Logan Jones <lo...@codescratch.com> wrote: > > > Thanks Chris for the quick reply. I'll explain the behavior I'm seeing, > and > > then maybe you all could either confirm this is the intended behavior, or > > decide it's maybe not that great. > > > > My understanding of the happy case for running a user-initiated > compaction > > is that a fate/transaction gets created in zookeeper, and the Accumulo > > master node ends up farming off the compactions to the correct tablet > > servers, once the tablets have been completed, somehow the > > fates/transactions in zookeeper get cleaned up. > > > > I experienced a problem, however, in the unhappy case for compactions > which > > I have since reproduced. We had a custom iterator configured for a table, > > and that custom iterator was in a bad state (i.e. it was always throwing > an > > exception during initialization). What we noticed is that the fates are > > indefinitely stuck IN_PROGRESS and never go away in this case. > Effectively > > we have a poison pill, and if you issue too many compactions against that > > table, you can cause other bad problems. > > > > I created a repo to demonstrate the problem as succinctly as I could > > manage: > > > > https://github.com/loganasherjones/accumulo-iterator-failures > > > > I thought initially that maybe it was due to the fact that our iterator > was > > throwing an error during initialization, but this appears to be happening > > for any error on next, seek, or init calls. > > > > So my questions are > > > > 1. Is it expected that a failure in a seek, next, or init in an iterator > > during a user-initiated compaction would cause accumulo to non-stop retry > > the compaction > > 2. If so, could you help me understand why? > > > > Thanks in advance, > > > > - Logan > > > > > > > > On Wed, Jul 6, 2022 at 6:31 PM Christopher <ctubb...@apache.org> wrote: > > > > > Yes, either here (especially if it's related to a bug or proposed code > > > change) or at user@ would work, if it's more of a user question. Here > is > > > fine if you're not sure. > > > > > > On Wed, Jul 6, 2022, 16:35 Logan Jones <lo...@codescratch.com> wrote: > > > > > > > Hello: > > > > > > > > I would like to discuss what happens when iterators cause > > user-initiated > > > > compactions to fail, specifically in relation to the fate > transactions. > > > Is > > > > this the right list for this discussion? > > > > > > > > Thanks, > > > > > > > > - Logan > > > > > > > > > >