[ https://issues.apache.org/jira/browse/IMPALA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16784082#comment-16784082 ]
Tim Armstrong commented on IMPALA-6326: --------------------------------------- I have a strong suspicious that the root cause of at least some of the issues is the way run_query forks off a thread in _hash_result(), because that could end up with two threads accessing the same underlying thrift connection. I might try to inject some failures there to see if the symptoms reproduce more frequently. > segfault during impyla HiveServer2Cursor.cancel_operation() over SSL > -------------------------------------------------------------------- > > Key: IMPALA-6326 > URL: https://issues.apache.org/jira/browse/IMPALA-6326 > Project: IMPALA > Issue Type: Bug > Components: Clients > Affects Versions: Impala 2.10.0, Impala 2.11.0 > Reporter: Matthew Mulder > Priority: Major > Attachments: test_fork_crash.py > > > During a stress test on a secure cluster one of the clients crashed in > HiveServer2Cursor.cancel_operation(). > The stress test debug log shows{code}2017-12-13 16:50:52,624 21607 Query > Consumer DEBUG:concurrent_select[579]:Requesting memory reservation > 2017-12-13 16:50:52,624 21607 Query Consumer > DEBUG:concurrent_select[245]:Reserved 102 MB; 1455 MB available; 95180 MB > overcommitted > 2017-12-13 16:50:52,625 21607 Query Consumer > DEBUG:concurrent_select[581]:Received memory reservation > 2017-12-13 16:50:52,658 21607 Query Consumer > DEBUG:concurrent_select[865]:Using tpcds_300_decimal_parquet database > 2017-12-13 16:50:52,658 21607 Query Consumer DEBUG:db_connection[203]:IMPALA: > USE tpcds_300_decimal_parquet > 2017-12-13 16:50:52,825 21607 Query Consumer DEBUG:db_connection[203]:IMPALA: > SET ABORT_ON_ERROR=1 > 2017-12-13 16:50:53,060 21607 Query Consumer > DEBUG:concurrent_select[877]:Setting mem limit to 102 MB > 2017-12-13 16:50:53,060 21607 Query Consumer DEBUG:db_connection[203]:IMPALA: > SET MEM_LIMIT=102M > 2017-12-13 16:50:53,370 21607 Query Consumer > DEBUG:concurrent_select[881]:Running query with 102 MB mem limit at > vc0704.test with timeout secs 52: > select > dt.d_year, > item.i_category_id, > item.i_category, > sum(ss_ext_sales_price) > from > date_dim dt, > store_sales, > item > where > dt.d_date_sk = store_sales.ss_sold_date_sk > and store_sales.ss_item_sk = item.i_item_sk > and item.i_manager_id = 1 > and dt.d_moy = 11 > and dt.d_year = 2000 > group by > dt.d_year, > item.i_category_id, > item.i_category > order by > sum(ss_ext_sales_price) desc, > dt.d_year, > item.i_category_id, > item.i_category > limit 100; > 2017-12-13 16:51:08,491 21607 Query Consumer > DEBUG:concurrent_select[889]:Query id is b6425b84aa45f633:9ce7cad900000000 > 2017-12-13 16:51:15,337 21607 Query Consumer > DEBUG:concurrent_select[900]:Waiting for query to execute > 2017-12-13 16:51:22,316 21607 Query Consumer > DEBUG:concurrent_select[900]:Waiting for query to execute > 2017-12-13 16:51:27,266 21607 Fetch Results b6425b84aa45f633:9ce7cad900000000 > DEBUG:concurrent_select[1009]:Fetching result for query with id > b6425b84aa45f633:9ce7cad900000000 > 2017-12-13 16:51:44,625 21607 Query Consumer > DEBUG:concurrent_select[940]:Attempting cancellation of query with id > b6425b84aa45f633:9ce7cad900000000 > 2017-12-13 16:51:44,627 21607 Query Consumer INFO:hiveserver2[259]:Canceling > active operation{code}The impalad log shows{code}I1213 16:50:54.287511 136399 > admission-controller.cc:510] Schedule for > id=b6425b84aa45f633:9ce7cad900000000 in pool_name=root.systest > cluster_mem_needed=816.00 MB PoolConfig: max_requests=-1 max_queued=200 > max_mem=-1.00 B > I1213 16:50:54.289767 136399 admission-controller.cc:515] Stats: > agg_num_running=184, agg_num_queued=0, agg_mem_reserved=1529.63 GB, > local_host(local_mem_admitted=132.02 GB, num_admitted_running=21, > num_queued=0, backend_mem_reserved=194.58 GB) > I1213 16:50:54.291550 136399 admission-controller.cc:531] Admitted query > id=b6425b84aa45f633:9ce7cad900000000 > I1213 16:50:54.296922 136399 coordinator.cc:99] Exec() > query_id=b6425b84aa45f633:9ce7cad900000000 stmt=/* Mem: 102 MB. Coordinator: > vc0704.test. */ > select > dt.d_year, > item.i_category_id, > item.i_category, > sum(ss_ext_sales_price) > from > date_dim dt, > store_sales, > item > where > dt.d_date_sk = store_sales.ss_sold_date_sk > and store_sales.ss_item_sk = item.i_item_sk > and item.i_manager_id = 1 > and dt.d_moy = 11 > and dt.d_year = 2000 > group by > dt.d_year, > item.i_category_id, > item.i_category > order by > sum(ss_ext_sales_price) desc, > dt.d_year, > item.i_category_id, > item.i_category > limit 100; > I1213 16:50:59.263310 136399 query-state.cc:151] Using query memory limit > from query options: 102.00 MB > I1213 16:50:59.267033 136399 mem-tracker.cc:189] Using query memory limit: > 102.00 MB > I1213 16:50:59.272271 136399 coordinator.cc:357] starting execution on 8 > backends for query b6425b84aa45f633:9ce7cad900000000 > I1213 16:51:07.525143 136399 coordinator.cc:370] started execution on 8 > backends for query b6425b84aa45f633:9ce7cad900000000 > I1213 16:51:08.358772 136399 impala-hs2-server.cc:490] ExecuteStatement(): > return_val=TExecuteStatementResp { > 01: status (struct) = TStatus { > 01: statusCode (i32) = 0, > }, > 02: operationHandle (struct) = TOperationHandle { > 01: operationId (struct) = THandleIdentifier { > 01: guid (string) = > "3\xf6E\xaa\x84[B\xb6\x00\x00\x00\x00\xd9\xca\xe7\x9c", > 02: secret (string) = > "3\xf6E\xaa\x84[B\xb6\x00\x00\x00\x00\xd9\xca\xe7\x9c", > }, > 02: operationType (i32) = 0, > 03: hasResultSet (bool) = true, > }, > } > I1213 16:51:27.310685 136399 impala-hs2-server.cc:725] > GetResultSetMetadata(): query_id=b6425b84aa45f633:9ce7cad900000000 > I1213 16:51:27.316963 136399 impala-hs2-server.cc:759] > GetResultSetMetadata(): return_val=TGetResultSetMetadataResp { > 01: status (struct) = TStatus { > 01: statusCode (i32) = 0, > }, > 02: schema (struct) = TTableSchema { > 01: columns (list) = list<struct>[4] { > [0] = TColumnDesc { > 01: columnName (string) = "d_year", > 02: typeDesc (struct) = TTypeDesc { > 01: types (list) = list<struct>[1] { > [0] = TTypeEntry { > 01: primitiveEntry (struct) = TPrimitiveTypeEntry { > 01: type (i32) = 3, > }, > }, > }, > }, > 03: position (i32) = 0, > }, > [1] = TColumnDesc { > 01: columnName (string) = "i_category_id", > 02: typeDesc (struct) = TTypeDesc { > 01: types (list) = list<struct>[1] { > [0] = TTypeEntry { > 01: primitiveEntry (struct) = TPrimitiveTypeEntry { > 01: type (i32) = 3, > }, > }, > }, > }, > 03: position (i32) = 1, > }, > [2] = TColumnDesc { > 01: columnName (string) = "i_category", > 02: typeDesc (struct) = TTypeDesc { > 01: types (list) = list<struct>[1] { > [0] = TTypeEntry { > 01: primitiveEntry (struct) = TPrimitiveTypeEntry { > 01: type (i32) = 7, > }, > }, > }, > }, > 03: position (i32) = 2, > }, > [3] = TColumnDesc { > 01: columnName (string) = "sum(ss_ext_sales_price)", > 02: typeDesc (struct) = TTypeDesc { > 01: types (list) = list<struct>[1] { > [0] = TTypeEntry { > 01: primitiveEntry (struct) = TPrimitiveTypeEntry { > 01: type (i32) = 15, > 02: typeQualifiers (struct) = TTypeQualifiers { > 01: qualifiers (map) = map<string,struct>[2] { > "precision" -> TTypeQualifierValue { > 01: i32Value (i32) = 38, > }, > "scale" -> TTypeQualifierValue { > 01: i32Value (i32) = 2, > }, > }, > }, > }, > }, > }, > }, > 03: position (i32) = 3, > }, > }, > }, > } > I1213 16:51:27.426277 136399 coordinator.cc:789] Coordinator waiting for > backends to finish, 2 remaining > I1213 16:52:42.215745 136399 coordinator.cc:794] All backends finished > successfully. > I1213 16:52:42.223140 136399 coordinator.cc:1090] Release admssion control > resources for query b6425b84aa45f633:9ce7cad900000000 > I1213 16:52:42.230653 136399 impala-hs2-server.cc:677] CancelOperation(): > query_id=b6425b84aa45f633:9ce7cad900000000 > I1213 16:52:42.230882 136399 impala-server.cc:1075] Cancel(): > query_id=b6425b84aa45f633:9ce7cad900000000 > I1213 16:52:42.233353 136399 coordinator.cc:895] Cancel() > query_id=b6425b84aa45f633:9ce7cad900000000 > I1213 16:52:42.234686 136399 coordinator.cc:905] CancelBackends() > query_id=b6425b84aa45f633:9ce7cad900000000, tried to cancel 0 backends > I1213 16:52:42.276818 136399 impala-server.cc:1796] Connection from client > 172.28.194.153:60218 closed, closing 2 associated session(s) > I1213 16:52:42.277026 136399 impala-server.cc:992] UnregisterQuery(): > query_id=b6425b84aa45f633:9ce7cad900000000 > I1213 16:52:42.279270 136399 impala-server.cc:1075] Cancel(): > query_id=b6425b84aa45f633:9ce7cad900000000 > I1213 16:52:42.924029 136399 query-exec-mgr.cc:149] ReleaseQueryState(): > query_id=b6425b84aa45f633:9ce7cad900000000 refcnt=1 > I1213 16:52:43.094966 136399 thrift-util.cc:123] SSL_shutdown: tlsv1 alert > record overflow{code}The test host kernel log shows{code}Dec 14 00:52:42 > ip-172-28-194-153 kernel: [ 6295.010585] show_signal_msg: 36 callbacks > suppressed > Dec 14 00:52:42 ip-172-28-194-153 kernel: [ 6295.010592] python[22107]: > segfault at 7f35001d4000 ip 00007f3557110794 sp 00007f350dfe4308 error 4 in > libc-2.19.so[7f3557076000+1be000]{code}Although the stress test debug log > shows that it started fetching results, none were saved to the output file > for that query. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org