Samarth Jain created PHOENIX-2189:
-------------------------------------

             Summary: Starting from HBase 1.x, phoenix shouldn't probably 
override the hbase.client.scanner.caching attribute
                 Key: PHOENIX-2189
                 URL: https://issues.apache.org/jira/browse/PHOENIX-2189
             Project: Phoenix
          Issue Type: Bug
            Reporter: Samarth Jain


After PHOENIX-2188 is fixed, we need to think about whether it makes sense to 
override the scanner cache size in Phoenix for branches HBase 1.x. For ex  - in 
HBase 1.1, the default value of hbase.client.scanner.caching is now 
Integer.MAX_VALUE.

{code:xml}
<property>
    <name>hbase.client.scanner.caching</name>
    <value>2147483647</value>
    <description>Number of rows that we try to fetch when calling next
    on a scanner if it is not served from (local, client) memory. This 
configuration
    works together with hbase.client.scanner.max.result.size to try and use the
    network efficiently. The default value is Integer.MAX_VALUE by default so 
that
    the network will fill the chunk size defined by 
hbase.client.scanner.max.result.size
    rather than be limited by a particular number of rows since the size of 
rows varies
    table to table. If you know ahead of time that you will not require more 
than a certain
    number of rows from a scan, this configuration should be set to that row 
limit via
    Scan#setCaching. Higher caching values will enable faster scanners but will 
eat up more
    memory and some calls of next may take longer and longer times when the 
cache is empty.
    Do not set this value such that the time between invocations is greater 
than the scanner
    timeout; i.e. hbase.client.scanner.timeout.period</description>
  </property>
{code:xml}

>From the comments it sounds like, by default, HBase is going to provide an 
>upper bound on the scanner cache size in bytes and not number of records. 

If we end up overriding the hbase.client.scanner.caching to 1000, then 
potentially for narrower rows we will likely be fetching too few rows. For 
wider rows, likely the bytes limit will kick in to make sure we don't end up 
caching too much on the client.

Maybe we shouldn't be using the scanner caching override at all? Thoughts? 
[~jamestaylor], [~lhofhansl]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to