[ 
https://issues.apache.org/jira/browse/HBASE-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184697#comment-14184697
 ] 

Misty Stanley-Jones commented on HBASE-11985:
---------------------------------------------

{quote}
Indicating that 50 to 100 regions are recommended for between 1
to 2 CF would be a useful clarification.  It would also be a good way
for customers to be aware of the impact of increasing the number of
column families.
{quote}

Thanks [~gkamat]

{quote}
If you are storing time based machine data or logging information and the load 
is distributed by device id or service id + time , you can end up with the 
pattern where older data regions never have additional writes beyond a certain 
age. This can occur when the solution involves something like Hbase for new 
data (for example last 30 days) + Impala for older data 
In these situations, you can end up with a small number of active regions + a 
set of older regions no longer being written.

For these situations you can tolerate greater number of regions as your main 
resource consumption is driven by the active regions. This, of course, is very 
dependent on type of load and query patterns.
{quote}

Thanks [~rstokes]

{quote}
if only one CF is busy with writes, only that one accumulates memory. That is 
the same with inactive (only read-from) regions for the a single CF. 
{quote}

Thanks [~larsgeorge], you also had a diagram to illustrate this but the link I 
have doesn't work now. Can you point me there?

> Document sizing rules of thumb
> ------------------------------
>
>                 Key: HBASE-11985
>                 URL: https://issues.apache.org/jira/browse/HBASE-11985
>             Project: HBase
>          Issue Type: Task
>          Components: documentation
>            Reporter: Misty Stanley-Jones
>            Assignee: Misty Stanley-Jones
>
> I'm looking for tuning/sizing rules of thumb to put in the Ref Guide.
> Info I have gleaned so far:
> A reasonable region size is between 10 GB and 50 GB.
> A reasonable maximum cell size is 1 MB to 10 MB. If your cells are larger 
> than 10 MB, consider storing the cell contents in HDFS and storing a 
> reference to the location in HBase. Pending MOB work for 10 MB - 64 MB window.
> When you size your regions and cells, keep in mind that a region cannot split 
> across a row. If your row size is too large, or your region size is too 
> small, you can end up with a single row per region, which is not a good 
> pattern. It is also possible that one big column causes splits while other 
> columns are tiny, and this may not be great.
> A large # of columns probably means you are doing it wrong.
> Column names need to be short because they get stored for every value 
> (barring encoding). Don't need to be self-documenting like in RDBMS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to