[ https://issues.apache.org/jira/browse/TRAFODION-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218296#comment-15218296 ]
Hans Zeller commented on TRAFODION-1910: ---------------------------------------- The issue in this bug was when a JDBC client disconnected. During disconnect, we delete the HiveClient_JNI object with this stack trace: {noformat} #1 0x00007ffff25ad180 in HiveClient_JNI::~HiveClient_JNI (this=0x7fffdf726900, __in_chrg=<value optimized out>) at ../executor/HBaseClient_JNI.cpp:4882 #2 0x00007ffff25ad0d3 in HiveClient_JNI::deleteInstance () at ../executor/HBaseClient_JNI.cpp:4870 #3 0x00007ffff3eda59a in SQL_EXEC_DeleteHbaseJNI () at ../cli/CliExtern.cpp:6319 #4 0x00007fffeba7466e in CmpStatement::process (this=0x7fffcd1d6420, es=...) at ../arkcmp/CmpStatement.cpp:1314 #5 0x00007fffeba61f86 in CmpContext::compileDirect (this=0x7fffdeb80090, data=0x7fffe0beca30 "\002", data_len=4, outHeap=0x7fffe0189138, charset=15, op=CmpMessageObj::END_SESSION, gen_code=@0x7fffe0beca18, gen_code_len=@0x7fffe0beca14, parserFlags=4194304, parentQid=0x0, parentQidLen=0, diagsArea=0x0) at ../arkcmp/CmpContext.cpp:894 #6 0x00007ffff3e7678f in ContextCli::endMxcmpSession (this=0x7fffe0189128, cleanupEsps=0, clearCmpCache=0) at ../cli/Context.cpp:3907 #7 0x00007ffff3e76c6e in ContextCli::endSession (this=0x7fffe0189128, cleanupEsps=0, cleanupEspsOnly=0, cleanupOpens=0) at ../cli/Context.cpp:4053 #8 0x00007ffff232cfbd in ExSetSessionDefaultTcb::work (this=0x7fffdf764a48) at ../executor/ex_control.cpp:815 #9 0x00007ffff2357c2d in ex_tcb::sWork (tcb=0x7fffdf764a48) at ../executor/ex_tcb.h:103 #10 0x00007ffff24e343f in ExSubtask::work (this=0x7fffdf764f80) at ../executor/ExScheduler.cpp:754 #11 0x00007ffff24e2802 in ExScheduler::work (this=0x7fffdf7645b0, prevWaitTime=0) at ../executor/ExScheduler.cpp:331 #12 0x00007ffff23b2b2a in ex_root_tcb::execute (this=0x7fffdf765000, cliGlobals=0x1086dc0, glob=0x7fffdf76f2d8, input_desc=0x0, diagsArea=@0x7fffe0bee220, reExecute=0) at ../executor/ex_root.cpp:1058 #13 0x00007ffff3ebea2b in CliStatement::execute (this=0x7fffdf7827e0, cliGlobals=0x1086dc0, input_desc=0x0, diagsArea=..., execute_state=CliStatement::INITIAL_STATE_, fixupOnly=0, cliflags=0) at ../cli/Statement.cpp:4525 #14 0x00007ffff3e407f4 in SQLCLI_PerformTasks(CliGlobals *, ULng32, SQLSTMT_ID *, SQLDESC_ID *, SQLDESC_ID *, Lng32, Lng32, typedef __va_list_tag __va_list_tag *, SQLCLI_PTR_PAIRS *, SQLCLI_PTR_PAIRS *) (cliGlobals=0x1086dc0, tasks=606, statement_id=0x30613e8, input_descriptor=0x0, output_descriptor=0x0, num_input_ptr_pairs=0, num_output_ptr_pairs=0, ap=0x7fffe0bee890, input_ptr_pairs=0x0, output_ptr_pairs=0x0) at ../cli/Cli.cpp:3297 #15 0x00007ffff3e418f6 in SQLCLI_ExecDirect2(CliGlobals *, SQLSTMT_ID *, SQLDESC_ID *, Int32, SQLDESC_ID *, Lng32, typedef __va_list_tag __va_list_tag *, SQLCLI_PTR_PAIRS *) (cliGlobals=0x1086dc0, statement_id=0x30613e8, sql_source=0x7fffe0beeae0, prepFlags=0, input_descriptor=0x0, num_ptr_pairs=0, ap=0x7fffe0bee890, ptr_pairs=0x0) at ../cli/Cli.cpp:3731 #16 0x00007ffff3ed4c17 in SQL_EXEC_ExecDirect2 (statement_id=0x30613e8, sql_source=0x7fffe0beeae0, prep_flags=0, input_descriptor=0x0, num_ptr_pairs=0) at ../cli/CliExtern.cpp:2329 #17 0x00007ffff69ceb3d in SRVR::WSQL_EXEC_ExecDirect (statement_id=0x30613e8, sql_source=0x7fffe0beeae0, input_descriptor=0x0, num_ptr_pairs=0) at SQLWrapper.cpp:363 #18 0x00007ffff69b594f in SRVR::EXECDIRECT (pSrvrStmt=0x3060dd0) at sqlinterface.cpp:4521 #19 0x00007ffff6941e73 in SRVR::ControlProc (pParam=0x3060dd0) at csrvrstmt.cpp:763 #20 0x00007ffff69414b1 in SRVR_STMT_HDL::ExecDirect (this=0x3060dd0, inCursorName=0x0, inSqlString=0x61c430 "SET SESSION DEFAULT SQL_SESSION 'END'", inStmtType=1, inSqlStmtType=0, inSqlAsyncEnable=0, inQueryTimeout=0) at csrvrstmt.cpp:445 #21 0x00007ffff69b614b in SRVR::EXECDIRECT (pSqlStr=0x61c430 "SET SESSION DEFAULT SQL_SESSION 'END'", WriteError=0) at sqlinterface.cpp:4702 #22 0x0000000000581162 in SRVR::SrvrSessionCleanup () at SrvrConnect.cpp:4080 #23 0x0000000000580d14 in odbc_SQLSvc_TerminateDialogue_ame_ (objtag_=0x10785e0, call_id_=0x1078638, dialogueId=903221211) at SrvrConnect.cpp:3950 #24 0x000000000051ce04 in SQLDISCONNECT_IOMessage (objtag_=0x10785e0, call_id_=0x1078638) at Interface/odbcs_srvr.cpp:653 #25 0x000000000051eff4 in DISPATCH_TCPIPRequest (objtag_=0x10785e0, call_id_=0x1078638, operation_id=3002) at Interface/odbcs_srvr.cpp:1775 #26 0x0000000000465928 in BUILD_TCPIP_REQUEST (pnode=0x10785e0) at ../Common/TCPIPSystemSrvr.cpp:606 #27 0x000000000046586f in PROCESS_TCPIP_REQUEST (pnode=0x10785e0) at ../Common/TCPIPSystemSrvr.cpp:584 #28 0x00000000004b32b0 in CNSKListenerSrvr::CheckTCPIPRequest (this=0xf2d850, ipnode=0x10785e0) at Interface/Listener_srvr.cpp:64 #29 0x00000000004c4939 in CNSKListenerSrvr::tcpip_listener (arg=0xf2d850) at Interface/linux/Listener_srvr_ps.cpp:403 #30 0x00007ffff43752f4 in sb_thread_sthr_disp (pp_arg=0x1073660) at threadl.cpp:256 #31 0x00007ffff4141a51 in start_thread () from /lib64/libpthread.so.0 #32 0x00007ffff467793d in clone () from /lib64/libc.so.6 {noformat} My assumption is that this will not get rid of the context itself. The problem is that we cache the pointer to the HiveClient_JNI object in the compiler (in class HiveMetaData, used by NATableDB). If we get rid of the CLI context, that will still delete the HiveClient_JNI object, it does not go through the CLI call I changed. My hope is that we will do the following: * JDBC disconnect: ** Call SQL_EXEC_DELETEHbase_JNI() through a SET SESSION command ** Delete HBaseClient_JNI ** Keep HiveClient_JNI, which is also cached in NATableDB/HiveMetaData * Destroy CLI context ** Delete both HBaseClient_JNI and HiveClient_JNI from ContextCli::deleteMe() > mxosrvr crashes on Hive query after reconnect > --------------------------------------------- > > Key: TRAFODION-1910 > URL: https://issues.apache.org/jira/browse/TRAFODION-1910 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-exe > Affects Versions: 1.3-incubating > Reporter: Hans Zeller > Assignee: Hans Zeller > > This is a problem Wei-Shiun found when running tests with many connections > that use Hive queries. He sees intermittent core dumps with this stack trace: > #0 0x00007f47cb0dd625 in raise () from /lib64/libc.so.6 > 0000001 0x00007f47cb0ded8d in abort () from /lib64/libc.so.6 > 0000002 0x00007f47cc613a55 in os::abort(bool) () > from /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so > 0000003 0x00007f47cc793f87 in VMError::report_and_die() () > from /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so > 0000004 0x00007f47cc61896f in JVM_handle_linux_signal () > from /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so > 0000005 <signal handler called> > 0000006 0x00007f47c92bd5ee in HiveMetaData::recordError (this=0x7f47a5e50088, > errCode=122, errMethodName=0x7f47c935aaa3 "HiveClient_JNI::getTableStr()") > at ../executor/hiveHook.cpp:228 > 0000007 0x00007f47c92bf613 in HiveMetaData::getTableDesc (this=0x7f47a5e50088, > schemaName=0x7f47b858e798 "mytest5", tblName=0x7f47b858e7c8 "mytable") > at ../executor/hiveHook.cpp:806 > 0000008 0x00007f47c4056307 in NATableDB::get (this=0x7f47b652d3c0, > corrName=..., > bindWA=0x7f47b85912d0, inTableDescStruct=<value optimized out>) > at ../optimizer/NATable.cpp:8377 > 0000009 0x00007f47c3db0743 in BindWA::getNATable (this=0x7f47b85912d0, > corrName=..., catmanCollectTableUsages=1, inTableDescStruct=0x0) > at ../optimizer/BindRelExpr.cpp:1514 > 0000010 0x00007f47c3db3290 in Describe::bindNode (this=0x7f47a2aae440, > bindWA=0x7f47b85912d0) at ../optimizer/BindRelExpr.cpp:13565 > 0000011 0x00007f47c3d989f7 in RelExpr::bindChildren (this=0x7f47a2aaf5f8, > bindWA=0x7f47b85912d0) at ../optimizer/BindRelExpr.cpp:2258 > 0000012 0x00007f47c3dccbce in RelRoot::bindNode (this=0x7f47a2aaf5f8, > bindWA=0x7f47b85912d0) at ../optimizer/BindRelExpr.cpp:5204 > 0000013 0x00007f47c577e84e in CmpMain::compile (this=0x7f47b8593c40, > input_str=0x7f47a5e0b690 "showddl mytable", charset=15, > queryExpr=@0x7f47b8593b78, gen_code=0x7f47a5e0c1a8, > gen_code_len=0x7f47a5e0c1a0, heap=0x7f47b70bbc00, phase=CmpMain::END, > fragmentDir=0x7f47b8593d98, op=3004, useQueryCache=1, > cacheable=0x7f47b8593b88, begTime=0x7f47b8593b60, shouldLog=0) > at ../sqlcomp/CmpMain.cpp:2071 > 0000014 0x00007f47c578168c in CmpMain::sqlcomp (this=0x7f47b8593c40, > input_str=0x7f47a5e0b690 "showddl mytable", charset=15, > queryExpr=@0x7f47b8593b78, gen_code=0x7f47a5e0c1a8, > gen_code_len=0x7f47a5e0c1a0, heap=0x7f47b70bbc00, phase=CmpMain::END, > fragmentDir=0x7f47b8593d98, op=3004, useQueryCache=1, > cacheable=0x7f47b8593b88, begTime=0x7f47b8593b60, shouldLog=0) > at ../sqlcomp/CmpMain.cpp:1684 > 0000015 0x00007f47c5782998 in CmpMain::sqlcomp (this=0x7f47b8593c40, > input=..., > gen_code=0x7f47a5e0c1a8, gen_code_len=0x7f47a5e0c1a0, heap=0x7f47b70bbc00, > phase=CmpMain::END, fragmentDir=0x7f47b8593d98, op=3004) > at ../sqlcomp/CmpMain.cpp:819 > 0000016 0x00007f47c33a8898 in CmpStatement::process (this=0x7f47a5e52f10, > sqltext=<value optimized out>) at ../arkcmp/CmpStatement.cpp:499 > 0000017 0x00007f47c339b48c in CmpContext::compileDirect (this=0x7f47b6525090, > data=0x7f47b7112db8 "\200", data_len=144, outHeap=0x7f47b7b2e128, > charset=15, op=CmpMessageObj::SQLTEXT_COMPILE, gen_code=@0x7f47b8594320, > gen_code_len=@0x7f47b8594328, parserFlags=4194304, parentQid=0x0, > parentQidLen=0, diagsArea=0x7f47b7112e50) at ../arkcmp/CmpContext.cpp:841 > 0000018 0x00007f47caa0dd38 in CliStatement::prepare2 (this=0x7f47b70d4028, > source=0x7f47b711ab18 "showddl mytable", diagsArea=..., > passed_gen_code=<value optimized out>, passed_gen_code_len=3081953576, > charset=15, unpackTdbs=1, cliFlags=129) at ../cli/Statement.cpp:1775 > 0000019 0x00007f47ca9bac94 in SQLCLI_Prepare2 (cliGlobals=0x27bcbb0, > statement_id=0x370a9c8, sql_source=0x7f47b8594610, gencode_ptr=0x0, > gencode_len=0, ret_gencode_len=0x0, query_cost_info=0x370abf8, > query_comp_stats_info=0x370ac48, uniqueStmtId=<value optimized out>, > uniqueStmtIdLen=0x370ab2c, flags=1) at ../cli/Cli.cpp:5927 > 0000020 0x00007f47caa1b1ae in SQL_EXEC_Prepare2 (statement_id=0x370a9c8, > sql_source=0x7f47b8594610, gencode_ptr=0x0, gencode_len=0, > ret_gencode_len=0x0, query_cost_info=0x370abf8, comp_stats_info=0x370ac48, > uniqueStmtId=0x370ab30 "", uniqueStmtIdLen=0x370ab2c, flags=1) > at ../cli/CliExtern.cpp:5034 > 0000021 0x00007f47cd4e31d9 in SRVR::WSQL_EXEC_Prepare2 > (statement_id=0x370a9c8, > sql_source=<value optimized out>, gencode_ptr=<value optimized out>, > gencode_len=<value optimized out>, ret_gencode_len=<value optimized out>, > query_cost_info=<value optimized out>, comp_stats_info=0x370ac48, > uniqueQueryId=0x370ab30 "", uniqueQueryIdLen=0x370ab2c, flags=1) > at SQLWrapper.cpp:803 > 0000022 0x00007f47cd4d7b45 in SRVR::PREPARE2 (pSrvrStmt=0x370a3b0, > isFromExecDirect=248) at sqlinterface.cpp:5057 > 0000023 0x00007f47cd508370 in odbc_SQLSvc_Prepare2_sme_ (inputRowCnt=0, > sqlStmtType=1, stmtLabel=<value optimized out>, > sqlString=0x2ba7254 "showddl mytable", holdableCursor=0, > returnCode=0x7f47b8594b08, sqlWarningOrErrorLength=0x7f47b8594b04, > sqlWarningOrError=@0x7f47b8594ae0, sqlQueryType=0x7f47b8594afc, > stmtHandle=0x7f47b8594ac0, estimatedCost=0x7f47b8594af8, > inputDescLength=0x7f47b8594af0, inputDesc=@0x7f47b8594ad0, > outputDescLength=0x7f47b8594aec, outputDesc=@0x7f47b8594ac8, > isFromExecDirect=true) at srvrothers.cpp:939 > 0000024 0x00000000004c6ca2 in odbc_SQLSrvr_ExecDirect_ame_ (objtag_=0x55e6ec0, > call_id_=0x55e6f18, dialogueId=259570813, stmtLabel=0x2ba7270 "SQL_CUR_7", > cursorName=0x0, stmtExplainLabel=<value optimized out>, stmtType=0, > sqlStmtType=1, sqlString=0x2ba7254 "showddl mytable", sqlAsyncEnable=0, > queryTimeout=0, inputRowCnt=0, txnID=0, holdableCursor=0) > at SrvrConnect.cpp:7894 > 0000025 0x0000000000495886 in SQLEXECUTE_IOMessage (objtag_=0x55e6ec0, > call_id_=0x55e6f18, operation_id=3012) at Interface/odbcs_srvr.cpp:1731 > 0000026 0x0000000000495934 in DISPATCH_TCPIPRequest (objtag_=0x55e6ec0, > call_id_=0x55e6f18, operation_id=<value optimized out>) > at Interface/odbcs_srvr.cpp:1796 > 0000027 0x0000000000434532 in BUILD_TCPIP_REQUEST (pnode=0x55e6ec0) > at ../Common/TCPIPSystemSrvr.cpp:606 > 0000028 0x0000000000434ecd in PROCESS_TCPIP_REQUEST (pnode=0x55e6ec0) > at ../Common/TCPIPSystemSrvr.cpp:584 > 0000029 0x00000000004631a6 in CNSKListenerSrvr::tcpip_listener (arg=0x2663560) > at Interface/linux/Listener_srvr_ps.cpp:403 > 0000030 0x00007f47cae91314 in sb_thread_sthr_disp (pp_arg=0x27a94a0) > at threadl.cpp:256 > 0000031 0x00007f47cac5da51 in start_thread () from /lib64/libpthread.so.0 > 0000032 0x00007f47cb19393d in clone () from /lib64/libc.so.6 > The problem does not happen with sqlci. -- This message was sent by Atlassian JIRA (v6.3.4#6332)