[jira] [Created] (IMPALA-9224) Blacklist nodes with faulty disks

Sahil Takiar (Jira) Mon, 09 Dec 2019 16:51:24 -0800

Sahil Takiar created IMPALA-9224:
------------------------------------

             Summary: Blacklist nodes with faulty disks
                 Key: IMPALA-9224
                 URL: https://issues.apache.org/jira/browse/IMPALA-9224
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
            Reporter: Sahil Takiar



Similar to IMPALA-8339 and IMPALA-9137, Impala should blacklist nodes with 
faulty disks. Specifically, if a query fails because of a disk error, the node 
with that disk should be blacklisted and the query should be retried.

We shouldn't need to blacklist nodes that fail to read from HDFS / S3, since 
they contain their own internal mechanisms for recovering from faulty disks. We 
should only blacklist nodes when failing to read / write from *local* disks.

The two main components of Impala that read / write from local disk are the 
spill-to-disk and data caching features. Whenever a query fails because of a 
disk failure during spill-to-disk, the node should be blacklisted.

Reads / writes from / to the data cache are a bit different. If a cache read 
fails due to a disk error, the error will be printed out and the Lookup() call 
to the cache will return 0 bytes read, which means it couldn't find the data in 
the cache. This should cause the scan to fall back to a normal, un-cached read. 
While this doesn't affect query correctness or the ability for a query to 
complete, it can affect performance. Since cache failures don't result in query 
failures, we might consider having a threshold of data cache read / writes 
errors before blacklisting a node.

We need to be careful to only capture specific disk failures - e.g. disk quota, 
permission denied, etc. errors shouldn't result in blacklisting as they 
typically are a result of system misconfiguration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9224) Blacklist nodes with faulty disks

Reply via email to