Re: Getting error "Too many in flight hints"

2013-05-30 Thread Theo Hultberg
thanks a lot for the explanation. if I understand it correctly it basically
back pressure from C*, it's telling me that it's overloaded and that I need
to back off.

I better start a few more nodes, I guess.

T#


On Thu, May 30, 2013 at 10:47 PM, Robert Coli  wrote:

> On Thu, May 30, 2013 at 8:24 AM, Theo Hultberg  wrote:
> > I'm using Cassandra 1.2.4 on EC2 (3 x m1.large, this is a test cluster),
> and
> > my application is talking to it over the binary protocol (I'm using JRuby
> > and the cql-rb driver). I get this error quite frequently: "Too many in
> > flight hints: 2411" (the exact number varies)
> >
> > Has anyone any idea of what's causing it? I'm pushing the cluster quite
> hard
> > with writes (but no reads at all).
>
> The code that produces this message (below) sets the bound based on
> the number of available processors. It is a bound of   number of in
> progress hints. An in progress hint (for some reason redundantly
> referred to as "in flight") is a hint which has been submitted to the
> executor which will ultimately write it to local disk. If you get
> OverloadedException, this means that you were trying to write hints to
> this executor so fast that you risked OOM, so Cassandra refused to
> submit your hint to the hint executor and therefore (partially) failed
> your write.
>
> "
> private static volatile int maxHintsInProgress = 1024 *
> FBUtilities.getAvailableProcessors();
> [... snip ...]
> for (InetAddress destination : targets)
> {
> // avoid OOMing due to excess hints.  we need to do this
> check even for "live" nodes, since we can
> // still generate hints for those if it's overloaded or
> simply dead but not yet known-to-be-dead.
> // The idea is that if we have over maxHintsInProgress
> hints in flight, this is probably due to
> // a small number of nodes causing problems, so we should
> avoid shutting down writes completely to
> // healthy nodes.  Any node with no hintsInProgress is
> considered healthy.
> if (totalHintsInProgress.get() > maxHintsInProgress
> && (hintsInProgress.get(destination).get() > 0 &&
> shouldHint(destination)))
> {
> throw new OverloadedException("Too many in flight
> hints: " + totalHintsInProgress.get());
> }
> "
>
> If Cassandra didn't return this exception, it might OOM while
> enqueueing your hints to be stored. Giving up on trying to enqueue a
> hint for the failed write is chosen instead. The solution is to reduce
> your write rate, ideally by enough that you don't even queue hints in
> the first place.
>
> =Rob
>


Re: Getting error "Too many in flight hints"

2013-05-30 Thread Robert Coli
On Thu, May 30, 2013 at 8:24 AM, Theo Hultberg  wrote:
> I'm using Cassandra 1.2.4 on EC2 (3 x m1.large, this is a test cluster), and
> my application is talking to it over the binary protocol (I'm using JRuby
> and the cql-rb driver). I get this error quite frequently: "Too many in
> flight hints: 2411" (the exact number varies)
>
> Has anyone any idea of what's causing it? I'm pushing the cluster quite hard
> with writes (but no reads at all).

The code that produces this message (below) sets the bound based on
the number of available processors. It is a bound of   number of in
progress hints. An in progress hint (for some reason redundantly
referred to as "in flight") is a hint which has been submitted to the
executor which will ultimately write it to local disk. If you get
OverloadedException, this means that you were trying to write hints to
this executor so fast that you risked OOM, so Cassandra refused to
submit your hint to the hint executor and therefore (partially) failed
your write.

"
private static volatile int maxHintsInProgress = 1024 *
FBUtilities.getAvailableProcessors();
[... snip ...]
for (InetAddress destination : targets)
{
// avoid OOMing due to excess hints.  we need to do this
check even for "live" nodes, since we can
// still generate hints for those if it's overloaded or
simply dead but not yet known-to-be-dead.
// The idea is that if we have over maxHintsInProgress
hints in flight, this is probably due to
// a small number of nodes causing problems, so we should
avoid shutting down writes completely to
// healthy nodes.  Any node with no hintsInProgress is
considered healthy.
if (totalHintsInProgress.get() > maxHintsInProgress
&& (hintsInProgress.get(destination).get() > 0 &&
shouldHint(destination)))
{
throw new OverloadedException("Too many in flight
hints: " + totalHintsInProgress.get());
}
"

If Cassandra didn't return this exception, it might OOM while
enqueueing your hints to be stored. Giving up on trying to enqueue a
hint for the failed write is chosen instead. The solution is to reduce
your write rate, ideally by enough that you don't even queue hints in
the first place.

=Rob