[jira] [Commented] (TRAFODION-1740) Use CQDs to prevent time-outs and speed sampling in UPDATE STATS

ASF GitHub Bot (JIRA) Thu, 07 Jan 2016 14:39:06 -0800

    [ 
https://issues.apache.org/jira/browse/TRAFODION-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088293#comment-15088293
 ]


ASF GitHub Bot commented on TRAFODION-1740:
-------------------------------------------

Github user DaveBirdsall commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/253#discussion_r49136585
  
    --- Diff: core/sql/ustat/hs_globals.cpp ---
    @@ -3762,6 +3765,68 @@ Lng32 HSSample::make(NABoolean rowCountIsEstimate, 
// input
         // For Hive tables the sample table used is a Trafodion table
         if (hs_globals->isHbaseTable || hs_globals->isHiveTable)
           {
    +        // The optimal degree of parallelism for the LOAD or UPSERT is
    +        // the number of partitions of the original table. Force that.
    +        // Note that when the default for 
AGGRESSIVE_ESP_ALLOCATION_PER_CORE
    +        // is permanently changed to 'ON', we may be able to remove this 
CQD.
    +        if (hs_globals->objDef->getNumPartitions() > 1)
    +          {
    +            char temp[40];  // way more space than needed, but it's safe
    +            sprintf(temp,"'%d'",hs_globals->objDef->getNumPartitions());
    +            NAString EspsCQD = "CONTROL QUERY DEFAULT PARALLEL_NUM_ESPS ";
    +            EspsCQD += temp;
    +            HSFuncExecQuery(EspsCQD);
    +            EspCQDUsed = TRUE;  // remember to reset later
    +          }
    +
    +        // If the table is very large, we risk HBase time-outs because the
    +        // sample scan doesn't return rows fast enough. In this case, we
    +        // want to reduce the HBase row cache size to a smaller number to
    +        // force more frequent returns. Experience shows that a value of
    +        // '10' worked well with a 17.7 billion row table with 128 regions
    +        // on six nodes (one million row sample). We'll assume a workable
    +        // HBase cache size value scales linearly with the sampling ratio.
    +        // That is, we'll assume the model:
    +        //
    +        //   workable value = (sample row count / actual row count) * c,
    +        //   where c is chosen so that we get 10 when the sample row count
    +        //   is 1,000,000 and the actual row count is 17.7 billion.
    +        //
    +        //   Solving for c, we get c = 10 * (17.7 billion/1 million).
    +        //
    +        // Note that the Generator does a similar calculation in
    +        // Generator::setHBaseNumCacheRows. The calculation here is more
    +        // conservative because we care more about getting UPDATE 
STATISTICS
    +        // done without a timeout, trading off possible speed improvements
    +        // by using a smaller cache size.
    +        //
    +        // Note that when we move to HBase 1.1, with its heartbeat 
protocol,
    +        // this time-out problem goes away and we can remove these CQDs.
    +        if (hs_globals->isHbaseTable)
    +          {
    +            double sampleRatio = (double)(sampleRowCnt) / 
hs_globals->actualRowCount;
    +            double calibrationFactor = 10 * (17700000000/1000000);
    +            Int64 workableCacheSize = (Int64)(sampleRatio * 
calibrationFactor);
    +            if (workableCacheSize < 1)
    +              workableCacheSize = 1;  // can't go below 1 unfortunately
    +
    +            Int32 max = getDefaultAsLong(HBASE_NUM_CACHE_ROWS_MAX);
    +            if ((workableCacheSize < 10000) && // don't bother if 10000 
works
    +                (max == 10000))  // don't do it if user has already set 
this CQD
    +              {
    +                char temp1[40];  // way more space than needed, but it's 
safe
    +                Lng32 wcs = (Lng32)workableCacheSize;  
    +                sprintf(temp1,"'%d'",wcs);
    +                NAString minCQD = "CONTROL QUERY HBASE_NUM_CACHE_ROWS_MIN 
";
    --- End diff --
    
    Bug: Need to add the DEFAULT keyword after CONTROL QUERY


> Use CQDs to prevent time-outs and speed sampling in UPDATE STATS
> ----------------------------------------------------------------
>
>                 Key: TRAFODION-1740
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-1740
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-cmp
>    Affects Versions: 2.0-incubating, 1.3-incubating
>         Environment: All; however this is particularly noticeable in clusters
>            Reporter: David Wayne Birdsall
>            Assignee: David Wayne Birdsall
>
> When doing UPDATE STATISTICS on large tables (one example had 17.7 billion 
> rows), we sometimes time-out with an error like the following:
> SQL>update statistics for table table1 on every column sample;
> *** ERROR[9200] UPDATE STATISTICS for table TRAFODION.SEABASE.TABLE1 encounter
> ed an error (8448) from statement Process_Query. [2015-12-15 12:18:45]
> *** ERROR[8448] Unable to access Hbase interface. Call to 
> ExpHbaseInterface::nex
> tRow returned error HBASE_ACCESS_ERROR(-706). Cause:
> java.util.concurrent.ExecutionException: 
> org.apache.hadoop.hbase.client.RetriesE
> xhaustedException: Failed after attempts=36, exceptions:
> Tue Dec 15 17:18:19 GMT 2015, null, java.net.SocketTimeoutException: 
> callTimeout
> =60000, callDuration=60303: row '?. [2015-12-15 12:18:45] 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TRAFODION-1740) Use CQDs to prevent time-outs and speed sampling in UPDATE STATS

Reply via email to