[jira] [Commented] (IMPALA-10050) DCHECK was hit possibly while executing TestFailpoints::test_failpoints

2020-08-06 Thread Attila Jeges (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172412#comment-17172412
 ] 

Attila Jeges commented on IMPALA-10050:
---

[~rizaon] randomly assigning to you. Please feel free to reassign. 

> DCHECK was hit possibly while executing TestFailpoints::test_failpoints
> ---
>
> Key: IMPALA-10050
> URL: https://issues.apache.org/jira/browse/IMPALA-10050
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: Attila Jeges
>Assignee: Riza Suminto
>Priority: Blocker
>  Labels: broken-build, crash
> Fix For: Impala 4.0
>
>
> A DCHECK was hit during  ASAN core e2e tests. Time-frame suggests that it 
> happened while executing TestFailpoints::test_failpoints e2e test.
> {code}
> 10:56:38  TestFailpoints.test_failpoints[protocol: beeswax | table_format: 
> avro/snap/block | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | mt_dop: 4 | 
> location: PREPARE | action: MEM_LIMIT_EXCEEDED | query: select 1 from 
> alltypessmall a join alltypessmall b on a.id = b.id] 
> 10:56:38 failure/test_failpoints.py:128: in test_failpoints
> 10:56:38 self.execute_query(query, vector.get_value('exec_option'))
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_test_suite.py:811:
>  in wrapper
> 10:56:38 return function(*args, **kwargs)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_test_suite.py:843:
>  in execute_query
> 10:56:38 return self.__execute_query(self.client, query, query_options)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_test_suite.py:909:
>  in __execute_query
> 10:56:38 return impalad_client.execute(query, user=user)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_connection.py:205:
>  in execute
> 10:56:38 return self.__beeswax_client.execute(sql_stmt, user=user)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/beeswax/impala_beeswax.py:187:
>  in execute
> 10:56:38 handle = self.__execute_query(query_string.strip(), user=user)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/beeswax/impala_beeswax.py:365:
>  in __execute_query
> 10:56:38 self.wait_for_finished(handle)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/beeswax/impala_beeswax.py:386:
>  in wait_for_finished
> 10:56:38 raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> 10:56:38 E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> 10:56:38 EQuery aborted:RPC from 127.0.0.1:27000 to 127.0.0.1:27002 failed
> 10:56:38 E   TransmitData() to 127.0.0.1:27002 failed: Network error: Client 
> connection negotiation failed: client connection to 127.0.0.1:27002: connect: 
> Connection refused (error 111)
> {code}
> Impalad log:
> {code}
> Log file created at: 2020/08/05 01:52:56
> Running on machine: 
> impala-ec2-centos74-r5-4xlarge-ondemand-017c.vpc.cloudera.com
> Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
> F0805 01:52:56.979769 17313 query-state.cc:803] 
> 3941a3d92a71e242:15c963f3] Check failed: is_cancelled_.Load() == 1 (0 
> vs. 1) 
> {code}
> Stack trace
> {code}
> Thread 368 (crashed)
>  0  libc-2.17.so + 0x351f7
> rax = 0x   rdx = 0x0006
> rcx = 0x   rbx = 0x0004
> rsi = 0x43a1   rdi = 0x37e4
> rbp = 0x7efcd4c53080   rsp = 0x7efcd4c52d08
>  r8 = 0xr9 = 0x7efcd4c52b80
> r10 = 0x0008   r11 = 0x0206
> r12 = 0x093de7c0   r13 = 0x0086
> r14 = 0x093de7c4   r15 = 0x093d6de0
> rip = 0x7f05c9d231f7
> Found by: given as instruction pointer in context
>  1  impalad!google::LogMessage::Flush() + 0x1eb
> rbp = 0x7efcd4c53250   rsp = 0x7efcd4c53090
> rip = 0x05727e3b
> Found by: previous frame's frame pointer
>  2  impalad!google::LogMessageFatal::~LogMessageFatal() + 0x9
> rbx = 0x7efcd4c532a0   rbp = 0x7efcd4c53310
> rsp = 0x7efcd4c53130   r12 = 0x0fe01a982628
> r13 = 0x61d000da0a6c   r14 = 0x7efcd4c53250
> r15 = 0x7efcd4c53270   rip = 0x0572ba39
> Found by: call frame info
>  3  impalad!impala::QueryState::MonitorFInstances() [query-state.cc : 803 + 
> 0x45]
> rbx = 0x7efcd4c532a0  

[jira] [Commented] (IMPALA-10050) DCHECK was hit possibly while executing TestFailpoints::test_failpoints

2020-08-06 Thread Thomas Tauber-Marshall (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172571#comment-17172571
 ] 

Thomas Tauber-Marshall commented on IMPALA-10050:
-

I suspect this may have been caused by https://gerrit.cloudera.org/#/c/16215/ 
so [~wzhou] might be the right person to take a look

> DCHECK was hit possibly while executing TestFailpoints::test_failpoints
> ---
>
> Key: IMPALA-10050
> URL: https://issues.apache.org/jira/browse/IMPALA-10050
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: Attila Jeges
>Assignee: Riza Suminto
>Priority: Blocker
>  Labels: broken-build, crash, flaky
> Fix For: Impala 4.0
>
>
> A DCHECK was hit during  ASAN core e2e tests. Time-frame suggests that it 
> happened while executing TestFailpoints::test_failpoints e2e test.
> {code}
> 10:56:38  TestFailpoints.test_failpoints[protocol: beeswax | table_format: 
> avro/snap/block | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | mt_dop: 4 | 
> location: PREPARE | action: MEM_LIMIT_EXCEEDED | query: select 1 from 
> alltypessmall a join alltypessmall b on a.id = b.id] 
> 10:56:38 failure/test_failpoints.py:128: in test_failpoints
> 10:56:38 self.execute_query(query, vector.get_value('exec_option'))
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_test_suite.py:811:
>  in wrapper
> 10:56:38 return function(*args, **kwargs)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_test_suite.py:843:
>  in execute_query
> 10:56:38 return self.__execute_query(self.client, query, query_options)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_test_suite.py:909:
>  in __execute_query
> 10:56:38 return impalad_client.execute(query, user=user)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_connection.py:205:
>  in execute
> 10:56:38 return self.__beeswax_client.execute(sql_stmt, user=user)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/beeswax/impala_beeswax.py:187:
>  in execute
> 10:56:38 handle = self.__execute_query(query_string.strip(), user=user)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/beeswax/impala_beeswax.py:365:
>  in __execute_query
> 10:56:38 self.wait_for_finished(handle)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/beeswax/impala_beeswax.py:386:
>  in wait_for_finished
> 10:56:38 raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> 10:56:38 E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> 10:56:38 EQuery aborted:RPC from 127.0.0.1:27000 to 127.0.0.1:27002 failed
> 10:56:38 E   TransmitData() to 127.0.0.1:27002 failed: Network error: Client 
> connection negotiation failed: client connection to 127.0.0.1:27002: connect: 
> Connection refused (error 111)
> {code}
> Impalad log:
> {code}
> Log file created at: 2020/08/05 01:52:56
> Running on machine: 
> impala-ec2-centos74-r5-4xlarge-ondemand-017c.vpc.cloudera.com
> Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
> F0805 01:52:56.979769 17313 query-state.cc:803] 
> 3941a3d92a71e242:15c963f3] Check failed: is_cancelled_.Load() == 1 (0 
> vs. 1) 
> {code}
> Stack trace
> {code}
> Thread 368 (crashed)
>  0  libc-2.17.so + 0x351f7
> rax = 0x   rdx = 0x0006
> rcx = 0x   rbx = 0x0004
> rsi = 0x43a1   rdi = 0x37e4
> rbp = 0x7efcd4c53080   rsp = 0x7efcd4c52d08
>  r8 = 0xr9 = 0x7efcd4c52b80
> r10 = 0x0008   r11 = 0x0206
> r12 = 0x093de7c0   r13 = 0x0086
> r14 = 0x093de7c4   r15 = 0x093d6de0
> rip = 0x7f05c9d231f7
> Found by: given as instruction pointer in context
>  1  impalad!google::LogMessage::Flush() + 0x1eb
> rbp = 0x7efcd4c53250   rsp = 0x7efcd4c53090
> rip = 0x05727e3b
> Found by: previous frame's frame pointer
>  2  impalad!google::LogMessageFatal::~LogMessageFatal() + 0x9
> rbx = 0x7efcd4c532a0   rbp = 0x7efcd4c53310
> rsp = 0x7efcd4c53130   r12 = 0x0fe01a982628
> r13 = 0x61d000da0a6c   r14 = 0x7efcd4c53250
> r15 = 0x7efcd4c53270   rip = 0x0572ba39
> Found by: call frame info
>  3  impalad!impala::QueryS

[jira] [Commented] (IMPALA-10050) DCHECK was hit possibly while executing TestFailpoints::test_failpoints

2020-08-06 Thread Wenzhe Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172684#comment-17172684
 ] 

Wenzhe Zhou commented on IMPALA-10050:
--

*01:52:58* 
failure/test_failpoints.py::TestFailpoints::test_failpoints[protocol: beeswax | 
table_format: avro/snap/block | exec_option: \{'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | mt_dop: 4 | 
location: PREPARE | action: MEM_LIMIT_EXCEEDED | query: select 1 from 
alltypessmall a join alltypessmall b on a.id = b.id] FAILED
h3. Error Message

ImpalaBeeswaxException: ImpalaBeeswaxException: Query aborted:RPC from 
127.0.0.1:27000 to 127.0.0.1:27002 failed TransmitData() to 127.0.0.1:27002 
failed: Network error: Client connection negotiation failed: client connection 
to 127.0.0.1:27002: connect: Connection refused (error 111)
h3. Stacktrace

failure/test_failpoints.py:128: in test_failpoints self.execute_query(query, 
vector.get_value('exec_option')) 
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_test_suite.py:811:
 in wrapper return function(*args, **kwargs) 
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_test_suite.py:843:
 in execute_query return self.__execute_query(self.client, query, 
query_options) 
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_test_suite.py:909:
 in __execute_query return impalad_client.execute(query, user=user) 
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_connection.py:205:
 in execute return self.__beeswax_client.execute(sql_stmt, user=user) 
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/beeswax/impala_beeswax.py:187:
 in execute handle = self.__execute_query(query_string.strip(), user=user) 
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/beeswax/impala_beeswax.py:365:
 in __execute_query self.wait_for_finished(handle) 
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/beeswax/impala_beeswax.py:386:
 in wait_for_finished raise ImpalaBeeswaxException("Query aborted:" + 
error_log, None) E ImpalaBeeswaxException: ImpalaBeeswaxException: E Query 
aborted:RPC from 127.0.0.1:27000 to 127.0.0.1:27002 failed E TransmitData() to 
127.0.0.1:27002 failed: Network error: Client connection negotiation failed: 
client connection to 127.0.0.1:27002: connect: Connection refused (error 111)

 

> DCHECK was hit possibly while executing TestFailpoints::test_failpoints
> ---
>
> Key: IMPALA-10050
> URL: https://issues.apache.org/jira/browse/IMPALA-10050
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: Attila Jeges
>Assignee: Wenzhe Zhou
>Priority: Blocker
>  Labels: broken-build, crash, flaky
> Fix For: Impala 4.0
>
>
> A DCHECK was hit during  ASAN core e2e tests. Time-frame suggests that it 
> happened while executing TestFailpoints::test_failpoints e2e test.
> {code}
> 10:56:38  TestFailpoints.test_failpoints[protocol: beeswax | table_format: 
> avro/snap/block | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | mt_dop: 4 | 
> location: PREPARE | action: MEM_LIMIT_EXCEEDED | query: select 1 from 
> alltypessmall a join alltypessmall b on a.id = b.id] 
> 10:56:38 failure/test_failpoints.py:128: in test_failpoints
> 10:56:38 self.execute_query(query, vector.get_value('exec_option'))
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_test_suite.py:811:
>  in wrapper
> 10:56:38 return function(*args, **kwargs)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_test_suite.py:843:
>  in execute_query
> 10:56:38 return self.__execute_query(self.client, query, query_options)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_test_suite.py:909:
>  in __execute_query
> 10:56:38 return impalad_client.execute(query, user=user)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_connection.py:205:
>  in execute
> 10:56:38 return self.__beeswax_client.execute(sql_stmt, user=user)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/beeswax/impala_beeswax.py:187:
>  in execute
> 10:56:38 handle = self.__execute_query(query_string.strip(), user=user)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan

[jira] [Commented] (IMPALA-10050) DCHECK was hit possibly while executing TestFailpoints::test_failpoints

2020-08-18 Thread Wenzhe Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17180309#comment-17180309
 ] 

Wenzhe Zhou commented on IMPALA-10050:
--

The issue could be re-produced by running following script for a few hours:

    #!/bin/bash

    set -euo pipefail

    for iter in `seq 1 100`; do
         ${IMPALA_HOME}/bin/impala-py.test 
tests/failure/test_failpoints.py::TestFailpoints::test_failpoints
    done

Added more debug messages in the code, and found that when executor report 
status with error, it may receive response with status 'ok' occasionally. If 
the executor is in the terminal state and receive response 'ok' for report with 
error, it will not call "Cancel()" hence hit the DCHECK error. The root cause 
is the bug in coordinator, not in executor.

The patch [https://gerrit.cloudera.org/#/c/16215/] only apply to executor so 
this issue was not introduced by the patch. Ran the above script with the code 
rolled back before patch [https://gerrit.cloudera.org/#/c/16215/], the issue 
still could be re-produced. 

Ran more tests with the script and confirmed that this issue was introduced by 
patch [https://gerrit.cloudera.org/#/c/16192/] 
([IMPALA-6788|http://issues.apache.org/jira/browse/IMPALA-6788]: Abort 
ExecFInstance() RPC loop early after query failure). 

 

  

 

> DCHECK was hit possibly while executing TestFailpoints::test_failpoints
> ---
>
> Key: IMPALA-10050
> URL: https://issues.apache.org/jira/browse/IMPALA-10050
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: Attila Jeges
>Assignee: Wenzhe Zhou
>Priority: Blocker
>  Labels: broken-build, crash, flaky
> Fix For: Impala 4.0
>
>
> A DCHECK was hit during  ASAN core e2e tests. Time-frame suggests that it 
> happened while executing TestFailpoints::test_failpoints e2e test.
> {code}
> 10:56:38  TestFailpoints.test_failpoints[protocol: beeswax | table_format: 
> avro/snap/block | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | mt_dop: 4 | 
> location: PREPARE | action: MEM_LIMIT_EXCEEDED | query: select 1 from 
> alltypessmall a join alltypessmall b on a.id = b.id] 
> 10:56:38 failure/test_failpoints.py:128: in test_failpoints
> 10:56:38 self.execute_query(query, vector.get_value('exec_option'))
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_test_suite.py:811:
>  in wrapper
> 10:56:38 return function(*args, **kwargs)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_test_suite.py:843:
>  in execute_query
> 10:56:38 return self.__execute_query(self.client, query, query_options)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_test_suite.py:909:
>  in __execute_query
> 10:56:38 return impalad_client.execute(query, user=user)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_connection.py:205:
>  in execute
> 10:56:38 return self.__beeswax_client.execute(sql_stmt, user=user)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/beeswax/impala_beeswax.py:187:
>  in execute
> 10:56:38 handle = self.__execute_query(query_string.strip(), user=user)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/beeswax/impala_beeswax.py:365:
>  in __execute_query
> 10:56:38 self.wait_for_finished(handle)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/beeswax/impala_beeswax.py:386:
>  in wait_for_finished
> 10:56:38 raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> 10:56:38 E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> 10:56:38 EQuery aborted:RPC from 127.0.0.1:27000 to 127.0.0.1:27002 failed
> 10:56:38 E   TransmitData() to 127.0.0.1:27002 failed: Network error: Client 
> connection negotiation failed: client connection to 127.0.0.1:27002: connect: 
> Connection refused (error 111)
> {code}
> Impalad log:
> {code}
> Log file created at: 2020/08/05 01:52:56
> Running on machine: 
> impala-ec2-centos74-r5-4xlarge-ondemand-017c.vpc.cloudera.com
> Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
> F0805 01:52:56.979769 17313 query-state.cc:803] 
> 3941a3d92a71e242:15c963f3] Check failed: is_cancelled_.Load() == 1 (0 
> vs. 1) 
> {code}
> Stack trace
> {code}
> Thread 368 (crashed)
>  0  libc-2.17.so + 0x351f7
> rax = 0x   rdx = 0x0006
> rcx = 0x  

[jira] [Commented] (IMPALA-10050) DCHECK was hit possibly while executing TestFailpoints::test_failpoints

2020-08-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186680#comment-17186680
 ] 

ASF subversion and git services commented on IMPALA-10050:
--

Commit 3733c4cc2cfb78d7f13463fb1ee9e1c4560d4a3d in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3733c4c ]

IMPALA-10050: Fixed DCHECK error for backend in terminal state.

Recent patch for IMPALA-6788 makes coordinator to cancel inflight
query fragment instances when it receives failure report from one
backend. It's possible the BackendState::Cancel() is called for
one fragment instance before the first execution status report
from its backend is received and processed by the coordinator.
Since the status of BackendState is set as Cancelled after Cancel()
is called, the execution of the fragment instance is treated as
Done in such case so that the status report will NOT be processed.
Hence the backend receives response OK from coordinator even it
sent a report with execution error. This make backend hit DCHECK
error if backend in the terminal state with error.
This patch fixs the issue by making coordinator send CANCELLED
status in the response of status report if the backend status is not
ok and the execution status report is not applied.

Testing:
 - The issue could be reproduced by running test_failpoints for about
   20 iterations. Verified the fixing by running test_failpoints over
   200 iterations without DCHECK failure.
 - Passed TestProcessFailures::test_kill_coordinator.
 - Psssed TestRPCException::test_state_report_error.
 - Passed exhaustive tests.

Change-Id: Iba6a72f98c0f9299c22c58830ec5a643335b966a
Reviewed-on: http://gerrit.cloudera.org:8080/16303
Reviewed-by: Thomas Tauber-Marshall 
Tested-by: Impala Public Jenkins 


> DCHECK was hit possibly while executing TestFailpoints::test_failpoints
> ---
>
> Key: IMPALA-10050
> URL: https://issues.apache.org/jira/browse/IMPALA-10050
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: Attila Jeges
>Assignee: Wenzhe Zhou
>Priority: Blocker
>  Labels: broken-build, crash, flaky
> Fix For: Impala 4.0
>
>
> A DCHECK was hit during  ASAN core e2e tests. Time-frame suggests that it 
> happened while executing TestFailpoints::test_failpoints e2e test.
> {code}
> 10:56:38  TestFailpoints.test_failpoints[protocol: beeswax | table_format: 
> avro/snap/block | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | mt_dop: 4 | 
> location: PREPARE | action: MEM_LIMIT_EXCEEDED | query: select 1 from 
> alltypessmall a join alltypessmall b on a.id = b.id] 
> 10:56:38 failure/test_failpoints.py:128: in test_failpoints
> 10:56:38 self.execute_query(query, vector.get_value('exec_option'))
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_test_suite.py:811:
>  in wrapper
> 10:56:38 return function(*args, **kwargs)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_test_suite.py:843:
>  in execute_query
> 10:56:38 return self.__execute_query(self.client, query, query_options)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_test_suite.py:909:
>  in __execute_query
> 10:56:38 return impalad_client.execute(query, user=user)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_connection.py:205:
>  in execute
> 10:56:38 return self.__beeswax_client.execute(sql_stmt, user=user)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/beeswax/impala_beeswax.py:187:
>  in execute
> 10:56:38 handle = self.__execute_query(query_string.strip(), user=user)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/beeswax/impala_beeswax.py:365:
>  in __execute_query
> 10:56:38 self.wait_for_finished(handle)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/beeswax/impala_beeswax.py:386:
>  in wait_for_finished
> 10:56:38 raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> 10:56:38 E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> 10:56:38 EQuery aborted:RPC from 127.0.0.1:27000 to 127.0.0.1:27002 failed
> 10:56:38 E   TransmitData() to 127.0.0.1:27002 failed: Network error: Client 
> connection negotiation failed: client connection to 127.0.0.1:27002: connect: 
> Connection refused (error 111)
> {code}
> Impalad log:
> {code}
> Log file crea