HI all:

  Drill's current schedule policy seems a little simple. The
SimpleParallelizer assigns endpoints in round robin model which ignores the
system's load and other factors. To critical scenario, some drillbits are
suffering frequent full GCs which will let their control RPC blocked.
Current assignment will not exclude these drillbits from the next coming
queries's assignment. then the problem will get worse .
  I propose to add a zk path to hold bad drillbits. Forman will recognize
bad drillbits by waiting timeout (timeout of  control response from
intermediate fragments), then update the bad drillbits path. Next coming
queries will exclude these drillbits from the assignment list.
  How do you think about it or any suggests ? If sounds ok ,will file a
JIRA and give some contributes.

Reply via email to