Re: Table dead lock: ERROR 1120 (XCL20): Writes to table blocked until index can be updated

2018-09-29 Thread Jaanai Zhang
Did you restart the cluster and you should set 'hbase.hregion.max.filesize'
to a safeguard value which less than RS's capabilities.


   Jaanai Zhang
   Best regards!



Batyrshin Alexander <0x62...@gmail.com> 于2018年9月29日周六 下午5:28写道:

> Meanwhile we tried to disable regions split via per index table options
> 'SPLIT_POLICY' =>
> 'org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy'  and 
> hbase.hregion.max.filesize
> = 10737418240
> Looks like this options set doesn't. Some regions splits at size < 2GB
>
> Then we tried to disable all splits via hbase shell: splitormerge_switch
> 'SPLIT', false
> Seems that this also doesn't work.
>
> Any ideas why we can't disable regions split?
>
> On 27 Sep 2018, at 02:52, Vincent Poon  wrote:
>
> We are planning a Phoenix 4.14.1 release which will have this fix
>
> On Wed, Sep 26, 2018 at 3:36 PM Batyrshin Alexander <0x62...@gmail.com>
> wrote:
>
>> Thank you. We will try somehow...
>> Is there any chance that this fix will be included in next release for
>> HBASE-1.4 (not 2.0)?
>>
>> On 27 Sep 2018, at 01:04, Ankit Singhal  wrote:
>>
>> You might be hitting PHOENIX-4785
>> ,  you can apply the
>> patch on top of 4.14 and see if it fixes your problem.
>>
>> Regards,
>> Ankit Singhal
>>
>> On Wed, Sep 26, 2018 at 2:33 PM Batyrshin Alexander <0x62...@gmail.com>
>> wrote:
>>
>>> Any advices? Helps?
>>> I can reproduce problem and capture more logs if needed.
>>>
>>> On 21 Sep 2018, at 02:13, Batyrshin Alexander <0x62...@gmail.com> wrote:
>>>
>>> Looks like lock goes away 30 minutes after index region split.
>>> So i can assume that this issue comes from cache that configured by this
>>> option:* phoenix.coprocessor.maxMetaDataCacheTimeToLiveMs*
>>>
>>>
>>>
>>> On 21 Sep 2018, at 00:15, Batyrshin Alexander <0x62...@gmail.com> wrote:
>>>
>>> And how this split looks at Master logs:
>>>
>>> Sep 20 19:45:04 prod001 hbase[10838]: 2018-09-20 19:45:04,888 INFO
>>>  [AM.ZK.Worker-pool5-t282] master.RegionStates: Transition
>>> {3e44b85ddf407da831dbb9a871496986 state=OPEN,
>>> ts=1537304859509, server=prod013,60020,1537304282885} to
>>> {3e44b85ddf407da831dbb9a871496986 state=SPLITTING, ts=1537461904888,
>>> server=prod
>>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,340 INFO
>>>  [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition
>>> {3e44b85ddf407da831dbb9a871496986 state=SPLITTING, ts=1537461905340,
>>> server=prod013,60020,1537304282885} to {3e44b85ddf407da831dbb9a871496986
>>> state=SPLIT, ts=1537461905340, server=pro
>>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,340 INFO
>>>  [AM.ZK.Worker-pool5-t284] master.RegionStates: Offlined
>>> 3e44b85ddf407da831dbb9a871496986 from prod013,60020,1537304282885
>>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,341 INFO
>>>  [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition
>>> {33cba925c7acb347ac3f5e70e839c3cb state=SPLITTING_NEW, ts=1537461905340,
>>> server=prod013,60020,1537304282885} to {33cba925c7acb347ac3f5e70e839c3cb
>>> state=OPEN, ts=1537461905341, server=
>>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,341 INFO
>>>  [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition
>>> {acb8f16a004a894c8706f6e12cd26144 state=SPLITTING_NEW, ts=1537461905340,
>>> server=prod013,60020,1537304282885} to {acb8f16a004a894c8706f6e12cd26144
>>> state=OPEN, ts=1537461905341, server=
>>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,343 INFO
>>>  [AM.ZK.Worker-pool5-t284] master.AssignmentManager: Handled SPLIT
>>> event; 
>>> parent=IDX_MARK_O,\x107834005168\x46200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.,
>>> daughter a=IDX_MARK_O,\x107834005168\x46200020LWfBS4c,1
>>> Sep 20 19:47:41 prod001 hbase[10838]: 2018-09-20 19:47:41,972 INFO
>>>  [prod001,6,1537304851459_ChoreService_2]
>>> balancer.StochasticLoadBalancer: Skipping load balancing because balanced
>>> cluster; total cost is 17.82282205608522, sum multiplier is 1102.0 min cost
>>> which need balance is 0.05
>>> Sep 20 19:47:42 prod001 hbase[10838]: 2018-09-20 19:47:42,021 INFO
>>>  [prod001,6,1537304851459_ChoreService_1] hbase.MetaTableAccessor:
>>> Deleted 
>>> IDX_MARK_O,\x107834005168\x46200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.
>>> Sep 20 19:47:42 prod001 hbase[10838]: 2018-09-20 19:47:42,022 INFO
>>>  [prod001,6,1537304851459_ChoreService_1] master.CatalogJanitor:
>>> Scanned 779 catalog row(s), gc'd 0 unreferenced merged region(s) and 1
>>> unreferenced parent region(s)
>>>
>>> On 20 Sep 2018, at 21:43, Batyrshin Alexander <0x62...@gmail.com> wrote:
>>>
>>> Looks like problem was because of index region split
>>>
>>> Index region split at prod013:
>>> Sep 20 19:45:05 prod013 hbase[193055]: 2018-09-20 19:45:05,441 INFO
>>>  [regionserver/prod013/10.0.0.13:60020-splits-1537400010677]
>>> regionserver.SplitRequest: Region 

Re: Table dead lock: ERROR 1120 (XCL20): Writes to table blocked until index can be updated

2018-09-29 Thread Batyrshin Alexander
Meanwhile we tried to disable regions split via per index table options
'SPLIT_POLICY' => 
'org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy'  and 
hbase.hregion.max.filesize = 10737418240
Looks like this options set doesn't. Some regions splits at size < 2GB

Then we tried to disable all splits via hbase shell: splitormerge_switch 
'SPLIT', false
Seems that this also doesn't work.

Any ideas why we can't disable regions split?

> On 27 Sep 2018, at 02:52, Vincent Poon  wrote:
> 
> We are planning a Phoenix 4.14.1 release which will have this fix
> 
> On Wed, Sep 26, 2018 at 3:36 PM Batyrshin Alexander <0x62...@gmail.com 
> > wrote:
> Thank you. We will try somehow...
> Is there any chance that this fix will be included in next release for 
> HBASE-1.4 (not 2.0)?
> 
>> On 27 Sep 2018, at 01:04, Ankit Singhal > > wrote:
>> 
>> You might be hitting PHOENIX-4785 
>> ,  you can apply the patch 
>> on top of 4.14 and see if it fixes your problem.
>> 
>> Regards,
>> Ankit Singhal
>> 
>> On Wed, Sep 26, 2018 at 2:33 PM Batyrshin Alexander <0x62...@gmail.com 
>> > wrote:
>> Any advices? Helps?
>> I can reproduce problem and capture more logs if needed.
>> 
>>> On 21 Sep 2018, at 02:13, Batyrshin Alexander <0x62...@gmail.com 
>>> > wrote:
>>> 
>>> Looks like lock goes away 30 minutes after index region split.
>>> So i can assume that this issue comes from cache that configured by this 
>>> option: phoenix.coprocessor.maxMetaDataCacheTimeToLiveMs
>>> 
>>> 
>>> 
 On 21 Sep 2018, at 00:15, Batyrshin Alexander <0x62...@gmail.com 
 > wrote:
 
 And how this split looks at Master logs:
 
 Sep 20 19:45:04 prod001 hbase[10838]: 2018-09-20 19:45:04,888 INFO  
 [AM.ZK.Worker-pool5-t282] master.RegionStates: Transition 
 {3e44b85ddf407da831dbb9a871496986 state=OPEN, ts=1537304859509, 
 server=prod013,60020,1537304282885} to {3e44b85ddf407da831dbb9a871496986 
 state=SPLITTING, ts=1537461904888, server=prod
 Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,340 INFO  
 [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition 
 {3e44b85ddf407da831dbb9a871496986 state=SPLITTING, ts=1537461905340, 
 server=prod013,60020,1537304282885} to {3e44b85ddf407da831dbb9a871496986 
 state=SPLIT, ts=1537461905340, server=pro
 Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,340 INFO  
 [AM.ZK.Worker-pool5-t284] master.RegionStates: Offlined 
 3e44b85ddf407da831dbb9a871496986 from prod013,60020,1537304282885
 Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,341 INFO  
 [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition 
 {33cba925c7acb347ac3f5e70e839c3cb state=SPLITTING_NEW, ts=1537461905340, 
 server=prod013,60020,1537304282885} to {33cba925c7acb347ac3f5e70e839c3cb 
 state=OPEN, ts=1537461905341, server=
 Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,341 INFO  
 [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition 
 {acb8f16a004a894c8706f6e12cd26144 state=SPLITTING_NEW, ts=1537461905340, 
 server=prod013,60020,1537304282885} to {acb8f16a004a894c8706f6e12cd26144 
 state=OPEN, ts=1537461905341, server=
 Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,343 INFO  
 [AM.ZK.Worker-pool5-t284] master.AssignmentManager: Handled SPLIT event; 
 parent=IDX_MARK_O,\x107834005168\x46200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.,
  daughter a=IDX_MARK_O,\x107834005168\x46200020LWfBS4c,1
 Sep 20 19:47:41 prod001 hbase[10838]: 2018-09-20 19:47:41,972 INFO  
 [prod001,6,1537304851459_ChoreService_2] 
 balancer.StochasticLoadBalancer: Skipping load balancing because balanced 
 cluster; total cost is 17.82282205608522, sum multiplier is 1102.0 min 
 cost which need balance is 0.05
 Sep 20 19:47:42 prod001 hbase[10838]: 2018-09-20 19:47:42,021 INFO  
 [prod001,6,1537304851459_ChoreService_1] hbase.MetaTableAccessor: 
 Deleted 
 IDX_MARK_O,\x107834005168\x46200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.
 Sep 20 19:47:42 prod001 hbase[10838]: 2018-09-20 19:47:42,022 INFO  
 [prod001,6,1537304851459_ChoreService_1] master.CatalogJanitor: 
 Scanned 779 catalog row(s), gc'd 0 unreferenced merged region(s) and 1 
 unreferenced parent region(s)
 
> On 20 Sep 2018, at 21:43, Batyrshin Alexander <0x62...@gmail.com 
> > wrote:
> 
> Looks like problem was because of index region split
> 
> Index region split at prod013:
> Sep 20 19:45:05 prod013 hbase[193055]: 2018-09-20 19:45:05,441 INFO  
> [regionserver/prod013/10.0.0.13:60020-splits-1537400010677] 
> regionserver.SplitRequest: Region split, hbase:meta