Hi all,

I put a link to this in a JIRA comment, but figured I'd send a note to dev@
as well since it's easy to miss JIRA comments on issues you aren't watching.

Here's a document which covers an election storm issue that we've been
seeing in some of the more heavily-loaded test clusters at Cloudera, and
particularly badly in one where we are testing DWH-like workloads (TPC-DS,
TPC-H):

https://docs.google.com/document/d/1066W63e2YUTNnecmfRwgAHghBPnL1Pte_gJYAaZ_Bjo/edit

I've seen some users on the mailing list and Slack complaining of issues
which might be attributed to this as well, so I think it's important to
make some improvements in this area sooner rather than later.

The design document contains some info on how to reproduce and measure the
issue, as well as a list of ideas which could help fix the problem. I see
it more as a "roadmap of incremental improvements" rather than a "we must
complete 100% of these items". Perhaps if we just tackle the top items (in
terms of bang-for-buck) the problem will be sufficiently addressed that we
don't need to do the more difficult items.

Please take a look and feel free to leave comments/suggestions.
-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to