[ https://issues.apache.org/jira/browse/IMPALA-7906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862627#comment-16862627 ]
WangSheng commented on IMPALA-7906: ----------------------------------- Our impala cluster met the same problem, and cannot reproduced. Our impala version is 2.12.0. And here is the error info in hs_err_pid*.logļ¼ {code:java} [error occurred during error reporting (printing register info), id 0xb] Stack: [0x00007fce79458000,0x00007fce79559000], sp=0x00007fce795577a0, free space=1021k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x984ab2] oopDesc* PSPromotionManager::copy_to_survivor_space<false>(oopDesc*)+0x162 V [libjvm.so+0x988276] StealTask::do_it(GCTaskManager*, unsigned int)+0x2f6 V [libjvm.so+0x5d459f] GCTaskThread::run()+0x12f V [libjvm.so+0x91d9d8] java_start(Thread*)+0x108 {code} > Crash in JVM PSPromotionManager::copy_to_survivor_space > ------------------------------------------------------- > > Key: IMPALA-7906 > URL: https://issues.apache.org/jira/browse/IMPALA-7906 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Impala 3.2.0 > Reporter: Tim Armstrong > Assignee: Tim Armstrong > Priority: Critical > Labels: broken-build, crash > Attachments: hs_err_pid6290.log > > > {noformat} > #0 0x00007f44ca5261f7 in raise () from /lib64/libc.so.6 > #1 0x00007f44ca5278e8 in abort () from /lib64/libc.so.6 > #2 0x00007f44cd726185 in os::abort(bool) () from > /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so > #3 0x00007f44cd8c8593 in VMError::report_and_die() () from > /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so > #4 0x00007f44cd8c8a7e in crash_handler(int, siginfo*, void*) () from > /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so > #5 0x00007f44cd724f72 in os::Linux::chained_handler(int, siginfo*, void*) () > from /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so > #6 0x00007f44cd72b5f6 in JVM_handle_linux_signal () from > /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so > #7 0x00007f44cd721be3 in signalHandler(int, siginfo*, void*) () from > /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so > #8 <signal handler called> > #9 0x00007f44cd713e95 in oopDesc::print_on(outputStream*) const () from > /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so > #10 0x00007f44cd72afdb in os::print_register_info(outputStream*, void*) () > from /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so > #11 0x00007f44cd8c6c13 in VMError::report(outputStream*) () from > /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so > #12 0x00007f44cd8c818a in VMError::report_and_die() () from > /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so > #13 0x00007f44cd72b68f in JVM_handle_linux_signal () from > /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so > #14 0x00007f44cd721be3 in signalHandler(int, siginfo*, void*) () from > /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so > #15 <signal handler called> > #16 0x00007f44cd78f562 in oopDesc* > PSPromotionManager::copy_to_survivor_space<false>(oopDesc*) () from > /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so > #17 0x00007f44cd7924a5 in PSRootsClosure<false>::do_oop(oopDesc**) () from > /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so > #18 0x00007f44cd716a96 in InterpreterOopMap::iterate_oop(OffsetClosure*) > const () from /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so > #19 0x00007f44cd38f789 in frame::oops_interpreted_do(OopClosure*, > CLDClosure*, RegisterMap const*, bool) () from > /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so > #20 0x00007f44cd86eaa1 in JavaThread::oops_do(OopClosure*, CLDClosure*, > CodeBlobClosure*) () from > /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so > #21 0x00007f44cd79270f in ThreadRootsTask::do_it(GCTaskManager*, unsigned > int) () from /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so > #22 0x00007f44cd3d7ecf in GCTaskThread::run() () from > /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so > #23 0x00007f44cd727338 in java_start(Thread*) () from > /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so > #24 0x00007f44ca8bbe25 in start_thread () from /lib64/libpthread.so.0 > #25 0x00007f44ca5e934d in clone () from /lib64/libc.so.6 > {noformat} > These are the tests running at the time > {noformat} > 006:53:04 [gw1] PASSED > query_test/test_mem_usage_scaling.py::TestQueryMemLimitScaling::test_mem_usage_scaling[mem_limit: > -1 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': > 0} | table_format: parquet/none] > 06:53:07 > query_test/test_mem_usage_scaling.py::TestQueryMemLimitScaling::test_mem_usage_scaling[mem_limit: > 400m | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': > 0} | table_format: parquet/none] > 06:53:07 [gw5] PASSED > query_test/test_analytic_tpcds.py::TestAnalyticTpcds::test_analytic_functions_tpcds[batch_size: > 1 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': > 0} | table_format: parquet/none] > 06:53:08 > query_test/test_cancellation.py::TestCancellationParallel::test_cancel_select[protocol: > beeswax | table_format: text/gzip/block | exec_option: {'batch_size': 0, > 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': > False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | query_type: SELECT | wait_action: > 0:GETNEXT:WAIT | cancel_delay: 0.01 | cpu_limit_s: 100000 | query: select * > from lineitem limit 50 | fail_rpc_action: > COORD_CANCEL_QUERY_FINSTANCES_RPC:FAIL | join_before_close: True | > buffer_pool_limit: 0] > 06:53:08 [gw5] PASSED > query_test/test_cancellation.py::TestCancellationParallel::test_cancel_select[protocol: > beeswax | table_format: text/gzip/block | exec_option: {'batch_size': 0, > 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': > False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | query_type: SELECT | wait_action: > 0:GETNEXT:WAIT | cancel_delay: 0.01 | cpu_limit_s: 100000 | query: select * > from lineitem limit 50 | fail_rpc_action: > COORD_CANCEL_QUERY_FINSTANCES_RPC:FAIL | join_before_close: True | > buffer_pool_limit: 0] > 06:53:08 [gw2] PASSED > query_test/test_decimal_casting.py::TestDecimalCasting::test_min_max_zero_null[cast_from: > number | decimal_type: (31, 14) | exec_option: {'decimal_v2': 'true'}] > 06:53:09 > query_test/test_decimal_casting.py::TestDecimalCasting::test_min_max_zero_null[cast_from: > number | decimal_type: (31, 22) | exec_option: {'decimal_v2': 'true'}] > 06:54:07 > query_test/test_cancellation.py::TestCancellationParallel::test_cancel_select[protocol: > beeswax | table_format: kudu/none | exec_option: {'batch_size': 0, > 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': > False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | query_type: SELECT | wait_action: > 0:GETNEXT:WAIT | cancel_delay: 0 | cpu_limit_s: 100000 | query: compute stats > lineitem | fail_rpc_action: COORD_CANCEL_QUERY_FINSTANCES_RPC:FAIL | > join_before_close: True | buffer_pool_limit: 0] > 06:54:08 [gw6] FAILED > query_test/test_decimal_fuzz.py::TestDecimalFuzz::test_decimal_ops[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0}] > 06:54:08 > query_test/test_decimal_fuzz.py::TestDecimalFuzz::test_width_bucket[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0}] > 06:54:08 [gw6] FAILED > query_test/test_decimal_fuzz.py::TestDecimalFuzz::test_width_bucket[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0}] > 06:54:08 > query_test/test_decimal_queries.py::TestDecimalQueries::test_queries[protocol: > beeswax | exec_option: {'disable_codegen_rows_threshold': 0, > 'disable_codegen': 'false', 'decimal_v2': 'false', 'batch_size': 0} | > table_format: text/none] > 06:54:08 [gw6] ERROR > query_test/test_decimal_queries.py::TestDecimalQueries::test_queries[protocol: > beeswax | exec_option: {'disable_codegen_rows_threshold': 0, > 'disable_codegen': 'false', 'decimal_v2': 'false', 'batch_size': 0} | > table_format: text/none] > 06:54:08 > query_test/test_decimal_queries.py::TestDecimalQueries::test_queries[protocol: > hs2 | exec_option: {'disable_codegen_rows_threshold': 0, 'disable_codegen': > 'true', 'decimal_v2': 'false', 'batch_size': 0} | table_format: parquet/none] > {noformat} > One thing that's a little interesting is that it's running select > repeat('AZ', 128 * 1024 * 1024), which passes a large string from the backend > to frontend - maybe something went wrong there? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org