thanks a lot for the explanation. if I understand it correctly it basically
back pressure from C*, it's telling me that it's overloaded and that I need
to back off.
I better start a few more nodes, I guess.
T#
On Thu, May 30, 2013 at 10:47 PM, Robert Coli wrote:
> On Thu, May 30, 2013 at 8:24 AM, Theo Hultberg wrote:
> > I'm using Cassandra 1.2.4 on EC2 (3 x m1.large, this is a test cluster),
> and
> > my application is talking to it over the binary protocol (I'm using JRuby
> > and the cql-rb driver). I get this error quite frequently: "Too many in
> > flight hints: 2411" (the exact number varies)
> >
> > Has anyone any idea of what's causing it? I'm pushing the cluster quite
> hard
> > with writes (but no reads at all).
>
> The code that produces this message (below) sets the bound based on
> the number of available processors. It is a bound of number of in
> progress hints. An in progress hint (for some reason redundantly
> referred to as "in flight") is a hint which has been submitted to the
> executor which will ultimately write it to local disk. If you get
> OverloadedException, this means that you were trying to write hints to
> this executor so fast that you risked OOM, so Cassandra refused to
> submit your hint to the hint executor and therefore (partially) failed
> your write.
>
> "
> private static volatile int maxHintsInProgress = 1024 *
> FBUtilities.getAvailableProcessors();
> [... snip ...]
> for (InetAddress destination : targets)
> {
> // avoid OOMing due to excess hints. we need to do this
> check even for "live" nodes, since we can
> // still generate hints for those if it's overloaded or
> simply dead but not yet known-to-be-dead.
> // The idea is that if we have over maxHintsInProgress
> hints in flight, this is probably due to
> // a small number of nodes causing problems, so we should
> avoid shutting down writes completely to
> // healthy nodes. Any node with no hintsInProgress is
> considered healthy.
> if (totalHintsInProgress.get() > maxHintsInProgress
> && (hintsInProgress.get(destination).get() > 0 &&
> shouldHint(destination)))
> {
> throw new OverloadedException("Too many in flight
> hints: " + totalHintsInProgress.get());
> }
> "
>
> If Cassandra didn't return this exception, it might OOM while
> enqueueing your hints to be stored. Giving up on trying to enqueue a
> hint for the failed write is chosen instead. The solution is to reduce
> your write rate, ideally by enough that you don't even queue hints in
> the first place.
>
> =Rob
>