Samarth Jain created PHOENIX-2189: ------------------------------------- Summary: Starting from HBase 1.x, phoenix shouldn't probably override the hbase.client.scanner.caching attribute Key: PHOENIX-2189 URL: https://issues.apache.org/jira/browse/PHOENIX-2189 Project: Phoenix Issue Type: Bug Reporter: Samarth Jain
After PHOENIX-2188 is fixed, we need to think about whether it makes sense to override the scanner cache size in Phoenix for branches HBase 1.x. For ex - in HBase 1.1, the default value of hbase.client.scanner.caching is now Integer.MAX_VALUE. {code:xml} <property> <name>hbase.client.scanner.caching</name> <value>2147483647</value> <description>Number of rows that we try to fetch when calling next on a scanner if it is not served from (local, client) memory. This configuration works together with hbase.client.scanner.max.result.size to try and use the network efficiently. The default value is Integer.MAX_VALUE by default so that the network will fill the chunk size defined by hbase.client.scanner.max.result.size rather than be limited by a particular number of rows since the size of rows varies table to table. If you know ahead of time that you will not require more than a certain number of rows from a scan, this configuration should be set to that row limit via Scan#setCaching. Higher caching values will enable faster scanners but will eat up more memory and some calls of next may take longer and longer times when the cache is empty. Do not set this value such that the time between invocations is greater than the scanner timeout; i.e. hbase.client.scanner.timeout.period</description> </property> {code:xml} >From the comments it sounds like, by default, HBase is going to provide an >upper bound on the scanner cache size in bytes and not number of records. If we end up overriding the hbase.client.scanner.caching to 1000, then potentially for narrower rows we will likely be fetching too few rows. For wider rows, likely the bytes limit will kick in to make sure we don't end up caching too much on the client. Maybe we shouldn't be using the scanner caching override at all? Thoughts? [~jamestaylor], [~lhofhansl] -- This message was sent by Atlassian JIRA (v6.3.4#6332)