[ https://issues.apache.org/jira/browse/IMPALA-7738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712325#comment-16712325 ]
ASF subversion and git services commented on IMPALA-7738: --------------------------------------------------------- Commit 938be0e840c84263a2b47fb89e655d998363b819 in impala's branch refs/heads/master from [~joemcdonnell] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=938be0e ] IMPALA-7738: Implement timeouts for HDFS open calls This is part 1 of a push to add timeouts for all HDFS operations. It adds timeouts for opening an HDFS file handle. It introduces a new SynchronousThreadPool, which executes an operation in a thread pool and waits up to a specified timeout for the operation to complete. This type of thread pool can accept any subclass of SynchronousWorkItem, and a single thread pool can process different types of work items. It is tested by a new test case in thread-pool-test. This also introduces a new HdfsMonitor which implements timeouts for HDFS operations, currently limited to hdfsOpenFile(). This is implemented using a SynchronousThreadPool. The timeout for hdfs operations is specified by hdfs_operation_timeout_sec, which defaults to 5 minutes. Testing: 1. Added a test to thread-pool-test for the new SynchronousThreadPool. 2. Core tests 3. Added a custom cluster test that does "kill -STOP" for the NameNode and verifies that a subsequent hdfsOpenFile operation times out. Change-Id: Ia14403ca5f3f19c6d5f61b9ab2306b0ad3267454 Reviewed-on: http://gerrit.cloudera.org:8080/11874 Reviewed-by: Joe McDonnell <joemcdonn...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > Implement timeouts for HDFS calls > --------------------------------- > > Key: IMPALA-7738 > URL: https://issues.apache.org/jira/browse/IMPALA-7738 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Affects Versions: Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, > Impala 2.11.0, Impala 3.0, Impala 2.12.0 > Reporter: Michael Ho > Assignee: Joe McDonnell > Priority: Critical > > Currently, there is no timeout with the various HDFS calls (e.g. hdfsOpen(), > hdfsRead()) we made in libhdfs.so in either the disk-io-mgr thread or scanner > thread context. Various users of Impala have complaint in the past about hung > queries which eventually boiled down to stuck hdfs calls. HDFS maintainers > have been slow to find the root cause of those hangs. To make this kind of > stuck queries problem easier to identify in the future, we should just > enforce a timeout in various hdfs calls so the queries will fail when certain > HDFS calls take longer than a designated timeout period. > There may be multiple layers which this timeout can be enforced: > * at Impala level, we can have a fixed sized thread pool which handles all > hdfs calls. The existing hdfs calls will be a wrapper with a timeout. > * at libhdfs.so, enforce a timeout at places in the HDFS client code which > may block forever. > The second option is probably beyond the charter of Apache Impala project. > cc'ing [~tarmstr...@cloudera.com], [~joemcdonnell] -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org