[ 
https://issues.apache.org/jira/browse/TRAFODION-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303043#comment-15303043
 ] 

ASF GitHub Bot commented on TRAFODION-2017:
-------------------------------------------

GitHub user DaveBirdsall opened a pull request:

    https://github.com/apache/incubator-trafodion/pull/505

    [TRAFODION-2017] Use a maximum HBase row cache of 50 for USTAT sample…

    … scans

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/DaveBirdsall/incubator-trafodion Trafodion2017

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-trafodion/pull/505.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #505
    
----
commit d03526a2a508097dc8f23ac6fd63abb756a2a98e
Author: Dave Birdsall <dbirds...@apache.org>
Date:   2016-05-26T22:03:10Z

    [TRAFODION-2017] Use a maximum HBase row cache of 50 for USTAT sample scans

----


> Tune HBase row cache sizes so UPDATE STATS completes
> ----------------------------------------------------
>
>                 Key: TRAFODION-2017
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2017
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-cmp
>    Affects Versions: 2.0-incubating
>         Environment: All, though prevalent on clusters with larger tables 
> and/or heavier loads
>            Reporter: David Wayne Birdsall
>            Assignee: David Wayne Birdsall
>             Fix For: 2.1-incubating
>
>
> UPDATE STATISTICS often fails with HBase socket timeout exception and/or 
> scanner timeout exception when run with sampling on larger tables or clusters 
> with heavy concurrent workloads.
> We have experimented in the past with setting various CQDs on large tables to 
> reduce these failures, however we were loathe to set them all the time due to 
> fears that this would lengthen elapsed time in non-failure scenarios. 
> Some recent work by Carol Pearson however shows that the increase in elapsed 
> time is negligible for smaller tables and in failure scenarios the failure 
> does not occur quickly, so paying a small penalty in elapsed time to increase 
> the probability of success seems a better trade-off.
> Carol's work involves tables of less than 1 billion rows. The existing CQD 
> logic is still required for larger tables. But for tables of less than 1 
> billion rows, she recommends setting HBASE_ROWS_CACHED_MIN and 
> HBASE_ROWS_CACHED_MAX to '50'. This JIRA is written to cover this change. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to