[ 
https://issues.apache.org/jira/browse/IMPALA-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726316#comment-16726316
 ] 

ASF subversion and git services commented on IMPALA-7946:
---------------------------------------------------------

Commit 9a52dd67bad7b8eb84fdeb6fb193505af7af931e in impala's branch 
refs/heads/master from Joe McDonnell
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=9a52dd6 ]

IMPALA-7946: Use original timeout in THREAD_POOL_TASK_TIMED_OUT message

When SynchronousThreadPool::SynchronousOffer() times out, it can
sometimes print the wrong time out in the error message. This happens
because it is enforcing a total timeout across multiple operations.
For example, if there is a total timeout of 5 seconds and the first
step takes 1 second, the remaining step is given a 4 second timeout
to enforce the total timeout. However, this 4 second timeout should
not be expressed in the THREAD_POOL_TASK_TIMED_OUT error message
if the task times out. Instead, SynchronousOffer() should always use
the original timeout as the internal time out is unimportant to users.

This changes the code to make the error message always use the
original timeout.

Change-Id: Ib7bc31f58a8d29abfdc24959dc2730a0ae24ec56
Reviewed-on: http://gerrit.cloudera.org:8080/12062
Reviewed-by: Joe McDonnell <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> SynchronousThreadPool::SynchronousOffer() can return a timeout Status with 
> the wrong time limit
> -----------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-7946
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7946
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.2.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Blocker
>              Labels: broken-build, flaky
>
> A recent core build failed on custom_cluster/test_hdfs_timeout.py with this 
> test output:
> {noformat}
> custom_cluster/test_hdfs_timeout.py:82: in test_hdfs_open_timeout
>     assert len(re.findall(error_pattern, str(ex))) > 0
> E   assert 0 > 0
> E    +  where 0 = len([])
> E    +    where [] = <function findall at 0x7f09aaa4e938>('hdfsOpenFile\\(\\) 
> for.*failed to finish before the 5 second timeout', 
> 'ImpalaBeeswaxException:\n Query aborted:hdfsOpenFile() for 
> hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=11/091101.txt 
> failed to finish before the 4 second timeout\n\n')
> E    +      where <function findall at 0x7f09aaa4e938> = re.findall
> E    +      and   'ImpalaBeeswaxException:\n Query aborted:hdfsOpenFile() for 
> hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=11/091101.txt 
> failed to finish before the 4 second timeout\n\n' = 
> str(ImpalaBeeswaxException()){noformat}
> When executing SynchronousOffer(), two different operation count towards the 
> timeout. The first is submitting the task by calling Offer with the 
> SynchronousWorkItem. The second is waiting for the task to complete by 
> calling SynchronousWorkItem::Wait(). If the first part task takes any 
> measurable time, then SynchronousOffer() modifies the timeout that it passes 
> into SynchronousWorkItem::Wait() so that the total timeout is respected. The 
> enforcement of the new timeout is correct, but it results in an incorrect 
> error message (in this case, showing 4 seconds rather than 5).
> This should pass in the original timeout and the current elapsed time. This 
> would allow for correct enforcement with a correct error message.
> This issue is flaky.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to