Keuntae Park created TAJO-613:
---------------------------------
Summary: Hedging against unusually slow TajoWorker
Key: TAJO-613
URL: https://issues.apache.org/jira/browse/TAJO-613
Project: Tajo
Issue Type: Improvement
Reporter: Keuntae Park
When one of disks in my Tajo cluster becomes not healthy (that means slow
response time due to hardware problem), it results in extremely slow query
processing time.
Following is kernel log of the server that has unhealthy disk:
{noformat}
Feb 18 15:20:12 ceo-tajo03 kernel: sd 0:2:4:0: [sde] Unhandled error code
Feb 18 15:20:12 ceo-tajo03 kernel: sd 0:2:4:0: [sde] Result: hostbyte=DID_ERROR
driverbyte=DRIVER_OK
Feb 18 15:20:12 ceo-tajo03 kernel: sd 0:2:4:0: [sde] CDB: Read(16): 88 00 00 00
00 01 57 ec 66 32 00 00 01 00 00 00
...
{noformat}
This problem makes TaskRunner, which normally takes less than 3 seconds for the
given query, takes 1700 seconds, and total query execution time also becomes
1750 seconds, which is normally 70 seconds before.
I think Tajo needs a mechanism like speculative execution of MapReduce.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)