[ https://issues.apache.org/jira/browse/TRAFODION-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15839020#comment-15839020 ]
ASF GitHub Bot commented on TRAFODION-2455: ------------------------------------------- Github user selvaganesang commented on a diff in the pull request: https://github.com/apache/incubator-trafodion/pull/929#discussion_r97919059 --- Diff: core/sql/executor/HBaseClient_JNI.cpp --- @@ -1319,17 +1334,21 @@ HBC_RetCode HBaseClient_JNI::estimateRowCount(const char* tblName, jint jPartialRowSize = partialRowSize; jint jNumCols = numCols; + jint jRetryLimitMilliSeconds = retryLimitMilliSeconds; jlongArray jRowCount = jenv_->NewLongArray(1); tsRecentJMFromJNI = JavaMethods_[JM_EST_RC].jm_full_name; jboolean jresult = jenv_->CallBooleanMethod(javaObj_, JavaMethods_[JM_EST_RC].methodID, js_tblName, jPartialRowSize, - jNumCols, jRowCount); + jNumCols, jRetryLimitMilliSeconds, jRowCount); jboolean isCopy; jlong* arrayElems = jenv_->GetLongArrayElements(jRowCount, &isCopy); rowCount = *arrayElems; if (isCopy == JNI_TRUE) jenv_->ReleaseLongArrayElements(jRowCount, arrayElems, JNI_ABORT); + jenv_->DeleteLocalRef(js_tblName); --- End diff -- popLocalFrame would do this for you. Again I cleaned this code earlier to remove the unnecessary call to DeleteLocalRef if push/pop local frame is used > Initial Update Stats on 22B row 2.5TB OE table gets 0 rowcount from > estimator, fails with timeouts by doing select count (*) > ---------------------------------------------------------------------------------------------------------------------------- > > Key: TRAFODION-2455 > URL: https://issues.apache.org/jira/browse/TRAFODION-2455 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-cmp > Affects Versions: 2.1-incubating > Environment: A cluster large enough to host a 22 billion row table > Reporter: David Wayne Birdsall > Assignee: David Wayne Birdsall > > When loading a scale factor 73728 Order Entry database, if UPDATE STATISTICS > is done soon after the load on one particular table (the largest table, > having 22 billion rows), we get the following failure: > SQLEXCEPTION on Statement, Error Code = -9200 > update statistics for table trafodion.javabench.oe_orderline_73728 on > every column, (OL_W_ID, OL_I_ID), (OL_D_ID, OL_W_ID), (OL_D_ID, OL_I_ID) > sample > *** ERROR[9200] UPDATE STATISTICS for table > TRAFODION.JAVABENCH.OE_ORDERLINE_73728 encountered an error (8448) from > statement getRow(). [2017-01-09 02:07:22] > *** ERROR[8448] Unable to access Hbase interface. Call to > ExpHbaseInterface::coProcAggr returned error HBASE_ACCESS_ERROR(-706). Cause: > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after > attempts=3, exceptions: > Mon Jan 09 01:47:21 PST 2017, > RpcRetryingCaller{globalStartTime=1483954641419, pause=100, retries=3}, > java.io.IOException: Call to nap015.esgyn.local/10.1.10.20:60020 failed on > local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call > id=73, waitTime=600001, operationTimeout=600000 expired. > Mon Jan 09 01:57:21 PST 2017, > RpcRetryingCaller{globalStartTime=1483954641419, pause=100, retries=3}, > java.io.IOException: Call to nap015.esgyn.local/10.1.10.20:60020 failed on > local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call > id=185, waitTime=600001, operationTimeout=600000 expired. > Mon Jan 09 02:07:22 PST 2017, > RpcRetryingCaller{globalStartTime=1483954641419, pause=100, retries=3}, > java.io.IOException: Call to nap015.esgyn.local/10.1.10.20:60020 failed on > local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call > id=310, waitTime=600001, operationTimeout=600000 expired. > A subsequent update statistics command succeeds, but these failures take a > half hour or more. > Enabling logging for update stats shows that getrowcount returns 0, so update > stats assumes the table is small enough to do a select count (*). The plan > for this select count (*) (perhaps suffering from the same issue that causes > getrowcount to return a non-estimate) chooses the HBase aggregate > coprocessor. The table in question has 22 billion rows, so the the > coprocessor isn't a good choice, and the query times out. But the real issue > is, why can't the table get a rowcount estimate. > Rerunning UPDATE STATS on this table a few hours later succeeds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)