Re: Common advices for hosting a huge table

2015-12-15 Thread
Thanks for your advices.

For option three, I think major compaction on a large region will affect 
performance of the region server. So the down time shall be down time for all 
the table on that RS, am i Right?




On 12/16/15, 5:12 AM, "Ted Yu"  wrote:

>w.r.t. option #1, also consider
>http://hbase.apache.org/book.html#arch.bulk.load
>
>FYI
>
>On Tue, Dec 15, 2015 at 12:17 PM, Frank Luo  wrote:
>
>> I am in a very similar situation.
>>
>> I guess you can try one of the options.
>>
>> Option one: avoid online insert by preparing data off-line. Do something
>> like http://hbase.apache.org/0.94/book/ops_mgt.html#importtsv
>>
>> Option two: If the first option doesn’t work for you. It will be better to
>> reduce your region size and increase read/write timeout. So that you allow
>> compact to happen while you insert data, but since the size is smaller, it
>> takes less time to compact/split. With this option, you can have a table
>> available 24/7 but the overall performance tends to go down dramatically
>> once some regions starts compacting.
>>
>> Option three: If you can afford some down time, ie, two hours every day.
>> You can manage compact/split during that time. What I usually do is to run
>> major-compact against all tables, then split ones that is large so that it
>> has enough room to grow for the next day’s insert.
>>
>> I hope it helps.
>>
>> From: 林豪 [mailto:lin...@qiyi.com]
>> Sent: Monday, December 14, 2015 11:51 PM
>> To: user@hbase.apache.org
>> Subject: Common advices for hosting a huge table
>>
>> Hi, all:
>>
>> We have a HBase Cluster which has several hundreds of region servers and
>> each RS hosts nearly 300 regions. Currently one of our tables has increased
>> to 16 TB and some region exceeds 10 GB. Major compaction on these regions
>> is painful as it produces a lot of disk I/O and will affect the performance
>> of RS. The auto splitting size of IncreasingToUpperBoundRegionSplitPolicy
>> increased to 16 GB or more for this huge table. My solution is set
>> attribute MAX_FILESIZE on this table so ConstantSizeRegionSplitPolicy auto
>> splitting will work again.
>>
>> My question is: What are the common advices or configuration options to
>> host such a huge table. If we decide to limit the region size, how can we
>> decide the optimised region size? If region size is too large, major
>> compaction is painful; but if region size is too small, then we have a lot
>> of small region which will overwhelm the RS.
>>
>> 林豪
>> 云平台  研发工程师
>>
>> 爱奇艺公司
>> QIYI.com, Inc.
>> 地址:上海市徐汇区宜山路1388号民润大厦6层
>> 邮编:201103
>> 手机:+86 136 1180 1618
>> 电话:+86 21 5451 9520 8393
>> 传真:+86 21 5451 9529
>> 邮箱:lin...@qiyi.com<mailto:zhouxiq...@qiyi.com>
>> 网址:www.iQIYI.com<http://www.iqiyi.com/>
>> [cid:B21E048D-B27D-4528-92D0-36BAE7117128]<http://www.iqiyi.com/>
>>
>> This email and any attachments transmitted with it are intended for use by
>> the intended recipient(s) only. If you have received this email in error,
>> please notify the sender immediately and then delete it. If you are not the
>> intended recipient, you must not keep, use, disclose, copy or distribute
>> this email without the author’s prior permission. We take precautions to
>> minimize the risk of transmitting software viruses, but we advise you to
>> perform your own virus checks on any attachment to this message. We cannot
>> accept liability for any loss or damage caused by software viruses. The
>> information contained in this communication may be confidential and may be
>> subject to the attorney-client privilege.
>>


Common advices for hosting a huge table

2015-12-14 Thread
Hi, all:

We have a HBase Cluster which has several hundreds of region servers and each 
RS hosts nearly 300 regions. Currently one of our tables has increased to 16 TB 
and some region exceeds 10 GB. Major compaction on these regions is painful as 
it produces a lot of disk I/O and will affect the performance of RS. The auto 
splitting size of IncreasingToUpperBoundRegionSplitPolicy increased to 16 GB or 
more for this huge table. My solution is set attribute MAX_FILESIZE on this 
table so ConstantSizeRegionSplitPolicy auto splitting will work again.

My question is: What are the common advices or configuration options to host 
such a huge table. If we decide to limit the region size, how can we decide the 
optimised region size? If region size is too large, major compaction is 
painful; but if region size is too small, then we have a lot of small region 
which will overwhelm the RS.

林豪
云平台  研发工程师

爱奇艺公司
QIYI.com, Inc.
地址:上海市徐汇区宜山路1388号民润大厦6层
邮编:201103
手机:+86 136 1180 1618
电话:+86 21 5451 9520 8393
传真:+86 21 5451 9529
邮箱:lin...@qiyi.com<mailto:zhouxiq...@qiyi.com>
网址:www.iQIYI.com<http://www.iqiyi.com/>
[C:\Users\a\Desktop\常有人问我要的东西T T\爱奇艺联合Logo-02.png]<http://www.iqiyi.com/>