[jira] [Commented] (HBASE-20943) Add offline/online region count into metrics

2018-08-07 Thread Tianying Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572030#comment-16572030
 ] 

Tianying Chang commented on HBASE-20943:


[~huaxiang] thanks for reviewing! 

> Add offline/online region count into metrics
> 
>
> Key: HBASE-20943
> URL: https://issues.apache.org/jira/browse/HBASE-20943
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 2.0.0, 1.2.6.1
>Reporter: Tianying Chang
>Assignee: jinghan xu
>Priority: Minor
> Attachments: HBASE-20943.patch, Screen Shot 2018-07-25 at 2.51.19 
> PM.png
>
>
> We intensively use metrics to monitor the health of our HBase production 
> cluster. We have seen some regions of a table stuck and cannot be brought 
> online due to AWS issue which cause some log file corrupted. It will be good 
> if we can catch this early. Although WebUI has this information, it is not 
> useful for automated monitoring. By adding this metric, we can easily monitor 
> them with our monitoring system. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20943) Add offline/online region count into metrics

2018-08-06 Thread Tianying Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570665#comment-16570665
 ] 

Tianying Chang commented on HBASE-20943:


[~yuzhih...@gmail.com] it seems I cannot assign this Jira to [~jinghanx], any 
permission needed? 

> Add offline/online region count into metrics
> 
>
> Key: HBASE-20943
> URL: https://issues.apache.org/jira/browse/HBASE-20943
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 2.0.0, 1.2.6.1
>Reporter: Tianying Chang
>Priority: Minor
> Attachments: Screen Shot 2018-07-25 at 2.51.19 PM.png
>
>
> We intensively use metrics to monitor the health of our HBase production 
> cluster. We have seen some regions of a table stuck and cannot be brought 
> online due to AWS issue which cause some log file corrupted. It will be good 
> if we can catch this early. Although WebUI has this information, it is not 
> useful for automated monitoring. By adding this metric, we can easily monitor 
> them with our monitoring system. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20943) Add offline/online region count into metrics

2018-08-06 Thread Tianying Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570561#comment-16570561
 ] 

Tianying Chang commented on HBASE-20943:


[~yuzhih...@gmail.com] Thanks.  My teammate Jinghan Xu will upload the patch. 

> Add offline/online region count into metrics
> 
>
> Key: HBASE-20943
> URL: https://issues.apache.org/jira/browse/HBASE-20943
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 2.0.0, 1.2.6.1
>Reporter: Tianying Chang
>Priority: Minor
> Attachments: Screen Shot 2018-07-25 at 2.51.19 PM.png
>
>
> We intensively use metrics to monitor the health of our HBase production 
> cluster. We have seen some regions of a table stuck and cannot be brought 
> online due to AWS issue which cause some log file corrupted. It will be good 
> if we can catch this early. Although WebUI has this information, it is not 
> useful for automated monitoring. By adding this metric, we can easily monitor 
> them with our monitoring system. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20943) Add offline/online region count into metrics

2018-07-25 Thread Tianying Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-20943:
---
Attachment: Screen Shot 2018-07-25 at 2.51.19 PM.png

> Add offline/online region count into metrics
> 
>
> Key: HBASE-20943
> URL: https://issues.apache.org/jira/browse/HBASE-20943
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 2.0.0, 1.2.6.1
>Reporter: Tianying Chang
>Priority: Minor
> Attachments: Screen Shot 2018-07-25 at 2.51.19 PM.png
>
>
> We intensively use metrics to monitor the health of our HBase production 
> cluster. We have seen some regions of a table stuck and cannot be brought 
> online due to AWS issue which cause some log file corrupted. It will be good 
> if we can catch this early. Although WebUI has this information, it is not 
> useful for automated monitoring. By adding this metric, we can easily monitor 
> them with our monitoring system. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20943) Add offline/online region count into metrics

2018-07-25 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-20943:
--

 Summary: Add offline/online region count into metrics
 Key: HBASE-20943
 URL: https://issues.apache.org/jira/browse/HBASE-20943
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 1.2.6.1, 2.0.0
Reporter: Tianying Chang


We intensively use metrics to monitor the health of our HBase production 
cluster. We have seen some regions of a table stuck and cannot be brought 
online due to AWS issue which cause some log file corrupted. It will be good if 
we can catch this early. Although WebUI has this information, it is not useful 
for automated monitoring. By adding this metric, we can easily monitor them 
with our monitoring system. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-15400) Use DateTieredCompactor for Date Tiered Compaction

2017-09-15 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168353#comment-16168353
 ] 

Tianying Chang commented on HBASE-15400:


thanks [~davelatham] for confirming! We will port the other two also. 

> Use DateTieredCompactor for Date Tiered Compaction
> --
>
> Key: HBASE-15400
> URL: https://issues.apache.org/jira/browse/HBASE-15400
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15400-0.98.patch, HBASE-15400-15389-v12.patch, 
> HBASE-15400-branch-1.patch, HBASE-15400.patch, HBASE-15400-v1.pa, 
> HBASE-15400-v3.patch, HBASE-15400-v3-v3.patch, HBASE-15400-v3-v4.patch, 
> HBASE-15400-v3-v5.patch, HBASE-15400-v6.patch, HBASE-15400-v7.patch
>
>
> When we compact, we can output multiple files along the current window 
> boundaries. There are two use cases:
> 1. Major compaction: We want to output date tiered store files with data 
> older than max age archived in trunks of the window size on the higher tier. 
> Once a window is old enough, we don't combine the windows to promote to the 
> next tier any further. So files in these windows retain the same timespan as 
> they were minor-compacted last time, which is the window size of the highest 
> tier. Major compaction will touch these files and we want to maintain the 
> same layout. This way, TTL and archiving will be simpler and more efficient.
> 2. Bulk load files and the old file generated by major compaction before 
> upgrading to DTCP.
> Pros: 
> 1. Restore locality, process versioning, updates and deletes while 
> maintaining the tiered layout.
> 2. The best way to fix a skewed layout.
>  
> This work is based on a prototype of DateTieredCompactor from HBASE-15389 and 
> focused on the part to meet needs for these two use cases while supporting 
> others. I have to call out a few design decisions:
> 1. We only want to output the files along all windows for major compaction. 
> And we want to output multiple files older than max age in the trunks of the 
> maximum tier window size determined by base window size, windows per tier and 
> max age.
> 2. For minor compaction, we don't want to output too many files, which will 
> remain around because of current restriction of contiguous compaction by seq 
> id. I will only output two files if all the files in the windows are being 
> combined, one for the data within window and the other for the out-of-window 
> tail. If there is any file in the window excluded from compaction, only one 
> file will be output from compaction. When the windows are promoted, the 
> situation of out of order data will gradually improve. For the incoming 
> window, we need to accommodate the case with user-specified future data.
> 3. We have to pass the boundaries with the list of store file as a complete 
> time snapshot instead of two separate calls because window layout is 
> determined by the time the computation is called. So we will need new type of 
> compaction request. 
> 4. Since we will assign the same seq id for all output files, we need to sort 
> by maxTimestamp subsequently. Right now all compaction policy gets the files 
> sorted for StoreFileManager which sorts by seq id and other criteria. I will 
> use this order for DTCP only, to avoid impacting other compaction policies. 
> 5. We need some cleanup of current design of StoreEngine and CompactionPolicy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-15400) Use DateTieredCompactor for Date Tiered Compaction

2017-09-14 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167075#comment-16167075
 ] 

Tianying Chang commented on HBASE-15400:


[~davelatham] Thanks for the information. We are in the process of backporting 
date tired compaction into our 1.2 branch now. One question we have is one 
teammate has backported HBASE-15181, but not sure if this one HBASE-15400 is 
also absolutely needed? It seems an important improvement that keep the number 
HFiles under control? Wondering does this mean if only backport 15181, the 
number of HFiles in older tier will grow too high? 

> Use DateTieredCompactor for Date Tiered Compaction
> --
>
> Key: HBASE-15400
> URL: https://issues.apache.org/jira/browse/HBASE-15400
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15400-0.98.patch, HBASE-15400-15389-v12.patch, 
> HBASE-15400-branch-1.patch, HBASE-15400.patch, HBASE-15400-v1.pa, 
> HBASE-15400-v3.patch, HBASE-15400-v3-v3.patch, HBASE-15400-v3-v4.patch, 
> HBASE-15400-v3-v5.patch, HBASE-15400-v6.patch, HBASE-15400-v7.patch
>
>
> When we compact, we can output multiple files along the current window 
> boundaries. There are two use cases:
> 1. Major compaction: We want to output date tiered store files with data 
> older than max age archived in trunks of the window size on the higher tier. 
> Once a window is old enough, we don't combine the windows to promote to the 
> next tier any further. So files in these windows retain the same timespan as 
> they were minor-compacted last time, which is the window size of the highest 
> tier. Major compaction will touch these files and we want to maintain the 
> same layout. This way, TTL and archiving will be simpler and more efficient.
> 2. Bulk load files and the old file generated by major compaction before 
> upgrading to DTCP.
> Pros: 
> 1. Restore locality, process versioning, updates and deletes while 
> maintaining the tiered layout.
> 2. The best way to fix a skewed layout.
>  
> This work is based on a prototype of DateTieredCompactor from HBASE-15389 and 
> focused on the part to meet needs for these two use cases while supporting 
> others. I have to call out a few design decisions:
> 1. We only want to output the files along all windows for major compaction. 
> And we want to output multiple files older than max age in the trunks of the 
> maximum tier window size determined by base window size, windows per tier and 
> max age.
> 2. For minor compaction, we don't want to output too many files, which will 
> remain around because of current restriction of contiguous compaction by seq 
> id. I will only output two files if all the files in the windows are being 
> combined, one for the data within window and the other for the out-of-window 
> tail. If there is any file in the window excluded from compaction, only one 
> file will be output from compaction. When the windows are promoted, the 
> situation of out of order data will gradually improve. For the incoming 
> window, we need to accommodate the case with user-specified future data.
> 3. We have to pass the boundaries with the list of store file as a complete 
> time snapshot instead of two separate calls because window layout is 
> determined by the time the computation is called. So we will need new type of 
> compaction request. 
> 4. Since we will assign the same seq id for all output files, we need to sort 
> by maxTimestamp subsequently. Right now all compaction policy gets the files 
> sorted for StoreFileManager which sorts by seq id and other criteria. I will 
> use this order for DTCP only, to avoid impacting other compaction policies. 
> 5. We need some cleanup of current design of StoreEngine and CompactionPolicy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17138) Backport read-path offheap (HBASE-11425) to branch-1

2017-03-24 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941195#comment-15941195
 ] 

Tianying Chang commented on HBASE-17138:


> the above jira ids are listed in chronological order I have backport,and 
> there are also some patch i dont list, such as:
> 1. keep Cell api compatible with our existing code.
> 2. hfile format related compatibility issues.
> 3. client compatibility issue

[~hongxi] [~carp84]  Just to want to clarify do you have the above related jira 
attached in the list? 

how much benefit does HBASE-15756 contribute for the Singles day performane 
boost on top of this offheap feature?  

> Backport read-path offheap (HBASE-11425) to branch-1
> 
>
> Key: HBASE-17138
> URL: https://issues.apache.org/jira/browse/HBASE-17138
> Project: HBase
>  Issue Type: Improvement
>Reporter: Yu Li
>Assignee: Yu Sun
> Attachments: 
> 0001-fix-EHB-511-Resolve-client-compatibility-issue-introduced-by-offheap-change.patch,
>  0001-to-EHB-446-offheap-hfile-format-should-keep-compatible-v3.patch, 
> 0001-to-EHB-456-Cell-should-be-compatible-with-branch-1.1.2.patch
>
>
> From the 
> [thread|http://mail-archives.apache.org/mod_mbox/hbase-user/201611.mbox/%3CCAM7-19%2Bn7cEiY4H9iLQ3N9V0NXppOPduZwk-hhgNLEaJfiV3kA%40mail.gmail.com%3E]
>  of sharing our experience and performance data of read-path offheap usage in 
> Alibaba search, we could see people are positive to have HBASE-11425 in 
> branch-1, so I'd like to create a JIRA and move the discussion and decision 
> making here.
> Echoing some comments from the mail thread:
> Bryan:
> Is the backported patch available anywhere? If it ends up not getting 
> officially backported to branch-1 due to 2.0 around the corner, some of us 
> who build our own deploy may want to integrate into our builds
> Andrew:
> Yes, please, the patches will be useful to the community even if we decide 
> not to backport into an official 1.x release.
> Enis:
> I don't see any reason why we cannot backport to branch-1.
> Ted:
> Opening a JIRA would be fine. This makes it easier for people to obtain the 
> patch(es)
> Nick:
> From the DISCUSS thread re: EOL of 1.1, it seems we'll continue to
> support 1.x releases for some time... I would guess these will be
> maintained until 2.2 at least. Therefore, offheap patches that have seen
> production exposure seem like a reasonable candidate for backport, perhaps in 
> a 1.4 or 1.5 release timeframe.
> Anoop:
> Because of some compatibility issues, we decide that this will be done in 2.0 
> only..  Ya as Andy said, it would be great to share the 1.x backported 
> patches.
> The following is all the jira ids we have back ported:
> HBASE-10930 Change Filters and GetClosestRowBeforeTracker to work with Cells 
> (Ram)
> HBASE-13373 Squash HFileReaderV3 together with HFileReaderV2 and 
> AbstractHFileReader; ditto for Scanners and BlockReader, etc.
> HBASE-13429 Remove deprecated seek/reseek methods from HFileScanner.
> HBASE-13450 - Purge RawBytescomparator from the writers and readers for 
> HBASE-10800 (Ram)
> HBASE-13501 - Deprecate/Remove getComparator() in HRegionInfo.
> HBASE-12048 Remove deprecated APIs from Filter.
> HBASE-10800 - Use CellComparator instead of KVComparator (Ram)
> HBASE-13679 Change ColumnTracker and SQM to deal with Cell instead of byte[], 
> int, int.
> HBASE-13642 Deprecate RegionObserver#postScannerFilterRow CP hook with 
> byte[],int,int args in favor of taking Cell arg.
> HBASE-13641 Deperecate Filter#filterRowKey(byte[] buffer, int offset, int 
> length) in favor of filterRowKey(Cell firstRowCell).
> HBASE-13827 Delayed scanner close in KeyValueHeap and StoreScanner.
> HBASE-13871 Change RegionScannerImpl to deal with Cell instead of byte[], 
> int, int.
> HBASE-11911 Break up tests into more fine grained categories (Alex Newman)
> HBASE-12059 Create hbase-annotations module
> HBASE-12106 Move test annotations to test artifact (Enis Soztutar)
> HBASE-13916 Create MultiByteBuffer an aggregation of ByteBuffers.
> HBASE-15679 Assertion on wrong variable in 
> TestReplicationThrottler#testThrottling
> HBASE-13931 Move Unsafe based operations to UnsafeAccess.
> HBASE-12345 Unsafe based ByteBuffer Comparator.
> HBASE-13998 Remove CellComparator#compareRows(byte[] left, int loffset, int 
> llength, byte[] right, int roffset, int rlength).
> HBASE-13998 Remove CellComparator#compareRows()- Addendum to fix javadoc warn
> HBASE-13579 Avoid isCellTTLExpired() for NO-TAG cases (partially backport 
> this patch)
> HBASE-13448 New Cell implementation with cached component offsets/lengths.
> HBASE-13387 Add ByteBufferedCell an extension to Cell.
> HBASE-13387 Add ByteBufferedCell an extension to Cell - addendum.
> HBASE-12650 Move ServerName to hbase-common module (partially backport this 

[jira] [Commented] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-23 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939355#comment-15939355
 ] 

Tianying Chang commented on HBASE-17453:


[~saint@gmail.com] One more question, I can see those method from 
Admin.proto has higher priority. But what is the criteria to decide what kind 
of methods should go to Admin.proto vs Client.proto. It seems all of them are 
implemented at RSRpcService.java any ways. 

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Fix For: 2.0.0, 1.2.5
>
> Attachments: HBASE-17453-1.2.patch, 
> HBASE-17453-master-fixWhiteSpace.patch, HBASE-17453-master.patch, 
> HBASE-17453-master-v1.patch, HBASE-17453-master-v2.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-23 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938876#comment-15938876
 ] 

Tianying Chang commented on HBASE-17453:


[~saint@gmail.com]  Actually, as long as the method is in RSRpcServices, it 
serves my purpose, since I just need to test the connection with the specific 
RS and reconnect if needed, to make sure other operations like get, or mutate 
can succeed. It is just in AsynchHBase, it only has 
Client.proto/Cell.proto/HBase.proto/RPC.proto, no Admin.Protos. it is just 
grouping/naming by Asynchbase. I will move the Ping API into Admin.proto in the 
server patch,(no need to make my Asynchbase client side code change at all) and 
regenerate the patch.   

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Fix For: 2.0.0, 1.2.5
>
> Attachments: HBASE-17453-1.2.patch, 
> HBASE-17453-master-fixWhiteSpace.patch, HBASE-17453-master.patch, 
> HBASE-17453-master-v1.patch, HBASE-17453-master-v2.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2017-03-20 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932869#comment-15932869
 ] 

Tianying Chang commented on HBASE-15181:


[~enis] that is a great point. [~davelatham] I am wondering will opentsdb 
benefit from it thought since I assume it set startRow/stoprow with start/stop 
time encoded in it. That should have already excluded all unnecessary data?   

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.18
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing
> Results in our production is at 
> https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit#



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-16 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-17453:
---
Status: In Progress  (was: Patch Available)

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Fix For: 2.0.0, 1.2.5
>
> Attachments: HBASE-17453-1.2.patch, 
> HBASE-17453-master-fixWhiteSpace.patch, HBASE-17453-master.patch, 
> HBASE-17453-master-v1.patch, HBASE-17453-master-v2.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-16 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-17453:
---
Attachment: HBASE-17453-master-fixWhiteSpace.patch

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Fix For: 2.0.0, 1.2.5
>
> Attachments: HBASE-17453-1.2.patch, 
> HBASE-17453-master-fixWhiteSpace.patch, HBASE-17453-master.patch, 
> HBASE-17453-master-v1.patch, HBASE-17453-master-v2.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-16 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-17453:
---
Status: Patch Available  (was: In Progress)

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Fix For: 2.0.0, 1.2.5
>
> Attachments: HBASE-17453-1.2.patch, 
> HBASE-17453-master-fixWhiteSpace.patch, HBASE-17453-master.patch, 
> HBASE-17453-master-v1.patch, HBASE-17453-master-v2.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2017-03-16 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928560#comment-15928560
 ] 

Tianying Chang commented on HBASE-15181:


[~davelatham] Good point about the set time range vs encoding the time 
information into the row. By opentsdb schema design, my gut feeling is it is 
not using set time range, unfortunately. I will verify it. 

If it is not using set time range, I guess since our opentsdb usage configured 
to only keep 28 days with TTL set to 28, one benefit is we can utilize is it 
can drop the whole store files with the expired TTL.  

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.18
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing
> Results in our production is at 
> https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit#



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2017-03-15 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927035#comment-15927035
 ] 

Tianying Chang commented on HBASE-15181:


[~davelatham] Thanks a lot for the information. Will look through the subtasks 
on HBASE-15339 also. Will give it a try to apply those patches to 1.2. My 
feeling is this kind of compaction algorithm should be very suitable for 
opentsdb use case, do you know if anyone had concrete experience on the 
improvement from this compaction algorithm on opentsdb? 

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.18
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing
> Results in our production is at 
> https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit#



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-15 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-17453:
---
Status: Patch Available  (was: In Progress)

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Fix For: 2.0.0, 1.2.5
>
> Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch, 
> HBASE-17453-master-v1.patch, HBASE-17453-master-v2.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-15 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-17453:
---
Status: In Progress  (was: Patch Available)

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Fix For: 2.0.0, 1.2.5
>
> Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch, 
> HBASE-17453-master-v1.patch, HBASE-17453-master-v2.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-15 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-17453:
---
Attachment: HBASE-17453-master-v2.patch

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Fix For: 2.0.0, 1.2.5
>
> Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch, 
> HBASE-17453-master-v1.patch, HBASE-17453-master-v2.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-15 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926652#comment-15926652
 ] 

Tianying Chang commented on HBASE-17453:


[~anoop.hbase] Yes, I agree the input parameter for ping is useless. will 
remove that. and thanks for catching included the MasterProtos.java into the 
package, will remove that also. 

[~saint@gmail.com] Our usage is to call from asynchbase client to test the 
client/server connection, so I feel it is better to be in Client.proto. But I 
am fine to put it in Admin.proto if it can achieve the same goal.  As to 
coprocessor, that means all the tables in the cluster has to deploy the noop 
coprocessor, right? If so, that seems too  much operation overhead.   

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Fix For: 2.0.0, 1.2.5
>
> Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch, 
> HBASE-17453-master-v1.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-15400) Use DateTieredCompactor for Date Tiered Compaction

2017-03-14 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925063#comment-15925063
 ] 

Tianying Chang commented on HBASE-15400:


[~clarax98007] I am wondering will this be back ported to 1.2.x? 

> Use DateTieredCompactor for Date Tiered Compaction
> --
>
> Key: HBASE-15400
> URL: https://issues.apache.org/jira/browse/HBASE-15400
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>
> Attachments: HBASE-15400-0.98.patch, HBASE-15400-15389-v12.patch, 
> HBASE-15400-branch-1.patch, HBASE-15400.patch, HBASE-15400-v1.pa, 
> HBASE-15400-v3.patch, HBASE-15400-v3-v3.patch, HBASE-15400-v3-v4.patch, 
> HBASE-15400-v3-v5.patch, HBASE-15400-v6.patch, HBASE-15400-v7.patch
>
>
> When we compact, we can output multiple files along the current window 
> boundaries. There are two use cases:
> 1. Major compaction: We want to output date tiered store files with data 
> older than max age archived in trunks of the window size on the higher tier. 
> Once a window is old enough, we don't combine the windows to promote to the 
> next tier any further. So files in these windows retain the same timespan as 
> they were minor-compacted last time, which is the window size of the highest 
> tier. Major compaction will touch these files and we want to maintain the 
> same layout. This way, TTL and archiving will be simpler and more efficient.
> 2. Bulk load files and the old file generated by major compaction before 
> upgrading to DTCP.
> Pros: 
> 1. Restore locality, process versioning, updates and deletes while 
> maintaining the tiered layout.
> 2. The best way to fix a skewed layout.
>  
> This work is based on a prototype of DateTieredCompactor from HBASE-15389 and 
> focused on the part to meet needs for these two use cases while supporting 
> others. I have to call out a few design decisions:
> 1. We only want to output the files along all windows for major compaction. 
> And we want to output multiple files older than max age in the trunks of the 
> maximum tier window size determined by base window size, windows per tier and 
> max age.
> 2. For minor compaction, we don't want to output too many files, which will 
> remain around because of current restriction of contiguous compaction by seq 
> id. I will only output two files if all the files in the windows are being 
> combined, one for the data within window and the other for the out-of-window 
> tail. If there is any file in the window excluded from compaction, only one 
> file will be output from compaction. When the windows are promoted, the 
> situation of out of order data will gradually improve. For the incoming 
> window, we need to accommodate the case with user-specified future data.
> 3. We have to pass the boundaries with the list of store file as a complete 
> time snapshot instead of two separate calls because window layout is 
> determined by the time the computation is called. So we will need new type of 
> compaction request. 
> 4. Since we will assign the same seq id for all output files, we need to sort 
> by maxTimestamp subsequently. Right now all compaction policy gets the files 
> sorted for StoreFileManager which sorts by seq id and other criteria. I will 
> use this order for DTCP only, to avoid impacting other compaction policies. 
> 5. We need some cleanup of current design of StoreEngine and CompactionPolicy.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2017-03-14 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924776#comment-15924776
 ] 

Tianying Chang commented on HBASE-15181:


[~clarax98007] Do you have patch for 1.2 also?

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.18
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing
> Results in our production is at 
> https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit#



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17138) Backport read-path offheap (HBASE-11425) to branch-1

2017-03-14 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924612#comment-15924612
 ] 

Tianying Chang commented on HBASE-17138:


[~carp84] I totally agree that should spend enough time to understand all the 
JIRA listed here, so have the quality and capability to handle problems in 
production if any. 

I am wondering if you can just post your patches against 1.1 here, just as a 
reference purpose? (no need to worry about clean apply-able) This way, for 
people doing back port, they can leverage your knowledge/lessons/experience, 
and have the extra information to debug if needed. This way, the back port can 
be done with less time. :)   

> Backport read-path offheap (HBASE-11425) to branch-1
> 
>
> Key: HBASE-17138
> URL: https://issues.apache.org/jira/browse/HBASE-17138
> Project: HBase
>  Issue Type: Improvement
>Reporter: Yu Li
>Assignee: Yu Sun
> Attachments: 
> 0001-fix-EHB-511-Resolve-client-compatibility-issue-introduced-by-offheap-change.patch,
>  0001-to-EHB-446-offheap-hfile-format-should-keep-compatible-v3.patch, 
> 0001-to-EHB-456-Cell-should-be-compatible-with-branch-1.1.2.patch
>
>
> From the 
> [thread|http://mail-archives.apache.org/mod_mbox/hbase-user/201611.mbox/%3CCAM7-19%2Bn7cEiY4H9iLQ3N9V0NXppOPduZwk-hhgNLEaJfiV3kA%40mail.gmail.com%3E]
>  of sharing our experience and performance data of read-path offheap usage in 
> Alibaba search, we could see people are positive to have HBASE-11425 in 
> branch-1, so I'd like to create a JIRA and move the discussion and decision 
> making here.
> Echoing some comments from the mail thread:
> Bryan:
> Is the backported patch available anywhere? If it ends up not getting 
> officially backported to branch-1 due to 2.0 around the corner, some of us 
> who build our own deploy may want to integrate into our builds
> Andrew:
> Yes, please, the patches will be useful to the community even if we decide 
> not to backport into an official 1.x release.
> Enis:
> I don't see any reason why we cannot backport to branch-1.
> Ted:
> Opening a JIRA would be fine. This makes it easier for people to obtain the 
> patch(es)
> Nick:
> From the DISCUSS thread re: EOL of 1.1, it seems we'll continue to
> support 1.x releases for some time... I would guess these will be
> maintained until 2.2 at least. Therefore, offheap patches that have seen
> production exposure seem like a reasonable candidate for backport, perhaps in 
> a 1.4 or 1.5 release timeframe.
> Anoop:
> Because of some compatibility issues, we decide that this will be done in 2.0 
> only..  Ya as Andy said, it would be great to share the 1.x backported 
> patches.
> The following is all the jira ids we have back ported:
> HBASE-10930 Change Filters and GetClosestRowBeforeTracker to work with Cells 
> (Ram)
> HBASE-13373 Squash HFileReaderV3 together with HFileReaderV2 and 
> AbstractHFileReader; ditto for Scanners and BlockReader, etc.
> HBASE-13429 Remove deprecated seek/reseek methods from HFileScanner.
> HBASE-13450 - Purge RawBytescomparator from the writers and readers for 
> HBASE-10800 (Ram)
> HBASE-13501 - Deprecate/Remove getComparator() in HRegionInfo.
> HBASE-12048 Remove deprecated APIs from Filter.
> HBASE-10800 - Use CellComparator instead of KVComparator (Ram)
> HBASE-13679 Change ColumnTracker and SQM to deal with Cell instead of byte[], 
> int, int.
> HBASE-13642 Deprecate RegionObserver#postScannerFilterRow CP hook with 
> byte[],int,int args in favor of taking Cell arg.
> HBASE-13641 Deperecate Filter#filterRowKey(byte[] buffer, int offset, int 
> length) in favor of filterRowKey(Cell firstRowCell).
> HBASE-13827 Delayed scanner close in KeyValueHeap and StoreScanner.
> HBASE-13871 Change RegionScannerImpl to deal with Cell instead of byte[], 
> int, int.
> HBASE-11911 Break up tests into more fine grained categories (Alex Newman)
> HBASE-12059 Create hbase-annotations module
> HBASE-12106 Move test annotations to test artifact (Enis Soztutar)
> HBASE-13916 Create MultiByteBuffer an aggregation of ByteBuffers.
> HBASE-15679 Assertion on wrong variable in 
> TestReplicationThrottler#testThrottling
> HBASE-13931 Move Unsafe based operations to UnsafeAccess.
> HBASE-12345 Unsafe based ByteBuffer Comparator.
> HBASE-13998 Remove CellComparator#compareRows(byte[] left, int loffset, int 
> llength, byte[] right, int roffset, int rlength).
> HBASE-13998 Remove CellComparator#compareRows()- Addendum to fix javadoc warn
> HBASE-13579 Avoid isCellTTLExpired() for NO-TAG cases (partially backport 
> this patch)
> HBASE-13448 New Cell implementation with cached component offsets/lengths.
> HBASE-13387 Add ByteBufferedCell an extension to Cell.
> HBASE-13387 Add ByteBufferedCell an extension to Cell - addendum.
> HBASE-12650 Move ServerName to 

[jira] [Commented] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-14 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924582#comment-15924582
 ] 

Tianying Chang commented on HBASE-17453:


It seems the test error is not related to my change except the whitespace one. 
Is this expected? 

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Fix For: 2.0.0, 1.2.5
>
> Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch, 
> HBASE-17453-master-v1.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-13 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-17453:
---
Status: Patch Available  (was: In Progress)

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Fix For: 2.0.0, 1.2.5
>
> Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch, 
> HBASE-17453-master-v1.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-13 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923560#comment-15923560
 ] 

Tianying Chang commented on HBASE-17453:


attached the patch with the extra white space removed. 

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Fix For: 2.0.0, 1.2.5
>
> Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch, 
> HBASE-17453-master-v1.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-13 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-17453:
---
Status: In Progress  (was: Patch Available)

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Fix For: 2.0.0, 1.2.5
>
> Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch, 
> HBASE-17453-master-v1.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-13 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-17453:
---
Attachment: HBASE-17453-master-v1.patch

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Fix For: 2.0.0, 1.2.5
>
> Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch, 
> HBASE-17453-master-v1.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-12 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-17453:
---
Status: Patch Available  (was: In Progress)

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Fix For: 2.0.0, 1.2.5
>
> Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-11 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906295#comment-15906295
 ] 

Tianying Chang commented on HBASE-17453:


[~ted_yu]  "Submit Patch" is the right action to take at this moment?  

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Fix For: 2.0.0, 1.2.5
>
> Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-11 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-17453:
---
Fix Version/s: 1.2.5
   2.0.0

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Fix For: 2.0.0, 1.2.5
>
> Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-11 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-17453 started by Tianying Chang.
--
> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work stopped] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-11 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-17453 stopped by Tianying Chang.
--
> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-11 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906231#comment-15906231
 ] 

Tianying Chang commented on HBASE-17453:


[~ted_yu] Thanks a lot for the url link! Will read that. I found 
hbase-protocol-shaded/README.txt and followed that to regenerate the patch. Now 
it is passing those errors on my box. I have re-resubmitted the patch. Thanks 
again for your fast response!  

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-11 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-17453:
---
Attachment: HBASE-17453-master.patch

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-11 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-17453:
---
Attachment: (was: HBASE-17453-master.patch)

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Attachments: HBASE-17453-1.2.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17138) Backport read-path offheap (HBASE-11425) to branch-1

2017-03-10 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906118#comment-15906118
 ] 

Tianying Chang commented on HBASE-17138:


thanks [~anoop.hbase]  what do you mean by trunk patches backport? Is this 
Alibaba's patch list for backporting  from trunk/2.0 to 1.1? So the patch list 
above  theoretically can be applied one by one on 1.2.1 open source version?  

> Backport read-path offheap (HBASE-11425) to branch-1
> 
>
> Key: HBASE-17138
> URL: https://issues.apache.org/jira/browse/HBASE-17138
> Project: HBase
>  Issue Type: Improvement
>Reporter: Yu Li
>Assignee: Yu Sun
> Attachments: 
> 0001-fix-EHB-511-Resolve-client-compatibility-issue-introduced-by-offheap-change.patch,
>  0001-to-EHB-446-offheap-hfile-format-should-keep-compatible-v3.patch, 
> 0001-to-EHB-456-Cell-should-be-compatible-with-branch-1.1.2.patch
>
>
> From the 
> [thread|http://mail-archives.apache.org/mod_mbox/hbase-user/201611.mbox/%3CCAM7-19%2Bn7cEiY4H9iLQ3N9V0NXppOPduZwk-hhgNLEaJfiV3kA%40mail.gmail.com%3E]
>  of sharing our experience and performance data of read-path offheap usage in 
> Alibaba search, we could see people are positive to have HBASE-11425 in 
> branch-1, so I'd like to create a JIRA and move the discussion and decision 
> making here.
> Echoing some comments from the mail thread:
> Bryan:
> Is the backported patch available anywhere? If it ends up not getting 
> officially backported to branch-1 due to 2.0 around the corner, some of us 
> who build our own deploy may want to integrate into our builds
> Andrew:
> Yes, please, the patches will be useful to the community even if we decide 
> not to backport into an official 1.x release.
> Enis:
> I don't see any reason why we cannot backport to branch-1.
> Ted:
> Opening a JIRA would be fine. This makes it easier for people to obtain the 
> patch(es)
> Nick:
> From the DISCUSS thread re: EOL of 1.1, it seems we'll continue to
> support 1.x releases for some time... I would guess these will be
> maintained until 2.2 at least. Therefore, offheap patches that have seen
> production exposure seem like a reasonable candidate for backport, perhaps in 
> a 1.4 or 1.5 release timeframe.
> Anoop:
> Because of some compatibility issues, we decide that this will be done in 2.0 
> only..  Ya as Andy said, it would be great to share the 1.x backported 
> patches.
> The following is all the jira ids we have back ported:
> HBASE-10930 Change Filters and GetClosestRowBeforeTracker to work with Cells 
> (Ram)
> HBASE-13373 Squash HFileReaderV3 together with HFileReaderV2 and 
> AbstractHFileReader; ditto for Scanners and BlockReader, etc.
> HBASE-13429 Remove deprecated seek/reseek methods from HFileScanner.
> HBASE-13450 - Purge RawBytescomparator from the writers and readers for 
> HBASE-10800 (Ram)
> HBASE-13501 - Deprecate/Remove getComparator() in HRegionInfo.
> HBASE-12048 Remove deprecated APIs from Filter.
> HBASE-10800 - Use CellComparator instead of KVComparator (Ram)
> HBASE-13679 Change ColumnTracker and SQM to deal with Cell instead of byte[], 
> int, int.
> HBASE-13642 Deprecate RegionObserver#postScannerFilterRow CP hook with 
> byte[],int,int args in favor of taking Cell arg.
> HBASE-13641 Deperecate Filter#filterRowKey(byte[] buffer, int offset, int 
> length) in favor of filterRowKey(Cell firstRowCell).
> HBASE-13827 Delayed scanner close in KeyValueHeap and StoreScanner.
> HBASE-13871 Change RegionScannerImpl to deal with Cell instead of byte[], 
> int, int.
> HBASE-11911 Break up tests into more fine grained categories (Alex Newman)
> HBASE-12059 Create hbase-annotations module
> HBASE-12106 Move test annotations to test artifact (Enis Soztutar)
> HBASE-13916 Create MultiByteBuffer an aggregation of ByteBuffers.
> HBASE-15679 Assertion on wrong variable in 
> TestReplicationThrottler#testThrottling
> HBASE-13931 Move Unsafe based operations to UnsafeAccess.
> HBASE-12345 Unsafe based ByteBuffer Comparator.
> HBASE-13998 Remove CellComparator#compareRows(byte[] left, int loffset, int 
> llength, byte[] right, int roffset, int rlength).
> HBASE-13998 Remove CellComparator#compareRows()- Addendum to fix javadoc warn
> HBASE-13579 Avoid isCellTTLExpired() for NO-TAG cases (partially backport 
> this patch)
> HBASE-13448 New Cell implementation with cached component offsets/lengths.
> HBASE-13387 Add ByteBufferedCell an extension to Cell.
> HBASE-13387 Add ByteBufferedCell an extension to Cell - addendum.
> HBASE-12650 Move ServerName to hbase-common module (partially backport this 
> patch)
> HBASE-12296 Filters should work with ByteBufferedCell.
> HBASE-14120 ByteBufferUtils#compareTo small optimization.
> HBASE-13510 - Purge ByteBloomFilter (Ram)
> HBASE-13451 - Make the HFileBlockIndex blockKeys to Cells so that it could 

[jira] [Commented] (HBASE-17138) Backport read-path offheap (HBASE-11425) to branch-1

2017-03-10 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906117#comment-15906117
 ] 

Tianying Chang commented on HBASE-17138:


thanks [~anoop.hbase]  what do you mean by trunk patches backport? Is this 
Alibaba's patch list for backporting  from trunk/2.0 to 1.1? So the patch list 
above  theoretically can be applied one by one on 1.2.1 open source version?  

> Backport read-path offheap (HBASE-11425) to branch-1
> 
>
> Key: HBASE-17138
> URL: https://issues.apache.org/jira/browse/HBASE-17138
> Project: HBase
>  Issue Type: Improvement
>Reporter: Yu Li
>Assignee: Yu Sun
> Attachments: 
> 0001-fix-EHB-511-Resolve-client-compatibility-issue-introduced-by-offheap-change.patch,
>  0001-to-EHB-446-offheap-hfile-format-should-keep-compatible-v3.patch, 
> 0001-to-EHB-456-Cell-should-be-compatible-with-branch-1.1.2.patch
>
>
> From the 
> [thread|http://mail-archives.apache.org/mod_mbox/hbase-user/201611.mbox/%3CCAM7-19%2Bn7cEiY4H9iLQ3N9V0NXppOPduZwk-hhgNLEaJfiV3kA%40mail.gmail.com%3E]
>  of sharing our experience and performance data of read-path offheap usage in 
> Alibaba search, we could see people are positive to have HBASE-11425 in 
> branch-1, so I'd like to create a JIRA and move the discussion and decision 
> making here.
> Echoing some comments from the mail thread:
> Bryan:
> Is the backported patch available anywhere? If it ends up not getting 
> officially backported to branch-1 due to 2.0 around the corner, some of us 
> who build our own deploy may want to integrate into our builds
> Andrew:
> Yes, please, the patches will be useful to the community even if we decide 
> not to backport into an official 1.x release.
> Enis:
> I don't see any reason why we cannot backport to branch-1.
> Ted:
> Opening a JIRA would be fine. This makes it easier for people to obtain the 
> patch(es)
> Nick:
> From the DISCUSS thread re: EOL of 1.1, it seems we'll continue to
> support 1.x releases for some time... I would guess these will be
> maintained until 2.2 at least. Therefore, offheap patches that have seen
> production exposure seem like a reasonable candidate for backport, perhaps in 
> a 1.4 or 1.5 release timeframe.
> Anoop:
> Because of some compatibility issues, we decide that this will be done in 2.0 
> only..  Ya as Andy said, it would be great to share the 1.x backported 
> patches.
> The following is all the jira ids we have back ported:
> HBASE-10930 Change Filters and GetClosestRowBeforeTracker to work with Cells 
> (Ram)
> HBASE-13373 Squash HFileReaderV3 together with HFileReaderV2 and 
> AbstractHFileReader; ditto for Scanners and BlockReader, etc.
> HBASE-13429 Remove deprecated seek/reseek methods from HFileScanner.
> HBASE-13450 - Purge RawBytescomparator from the writers and readers for 
> HBASE-10800 (Ram)
> HBASE-13501 - Deprecate/Remove getComparator() in HRegionInfo.
> HBASE-12048 Remove deprecated APIs from Filter.
> HBASE-10800 - Use CellComparator instead of KVComparator (Ram)
> HBASE-13679 Change ColumnTracker and SQM to deal with Cell instead of byte[], 
> int, int.
> HBASE-13642 Deprecate RegionObserver#postScannerFilterRow CP hook with 
> byte[],int,int args in favor of taking Cell arg.
> HBASE-13641 Deperecate Filter#filterRowKey(byte[] buffer, int offset, int 
> length) in favor of filterRowKey(Cell firstRowCell).
> HBASE-13827 Delayed scanner close in KeyValueHeap and StoreScanner.
> HBASE-13871 Change RegionScannerImpl to deal with Cell instead of byte[], 
> int, int.
> HBASE-11911 Break up tests into more fine grained categories (Alex Newman)
> HBASE-12059 Create hbase-annotations module
> HBASE-12106 Move test annotations to test artifact (Enis Soztutar)
> HBASE-13916 Create MultiByteBuffer an aggregation of ByteBuffers.
> HBASE-15679 Assertion on wrong variable in 
> TestReplicationThrottler#testThrottling
> HBASE-13931 Move Unsafe based operations to UnsafeAccess.
> HBASE-12345 Unsafe based ByteBuffer Comparator.
> HBASE-13998 Remove CellComparator#compareRows(byte[] left, int loffset, int 
> llength, byte[] right, int roffset, int rlength).
> HBASE-13998 Remove CellComparator#compareRows()- Addendum to fix javadoc warn
> HBASE-13579 Avoid isCellTTLExpired() for NO-TAG cases (partially backport 
> this patch)
> HBASE-13448 New Cell implementation with cached component offsets/lengths.
> HBASE-13387 Add ByteBufferedCell an extension to Cell.
> HBASE-13387 Add ByteBufferedCell an extension to Cell - addendum.
> HBASE-12650 Move ServerName to hbase-common module (partially backport this 
> patch)
> HBASE-12296 Filters should work with ByteBufferedCell.
> HBASE-14120 ByteBufferUtils#compareTo small optimization.
> HBASE-13510 - Purge ByteBloomFilter (Ram)
> HBASE-13451 - Make the HFileBlockIndex blockKeys to Cells so that it could 

[jira] [Commented] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-10 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906112#comment-15906112
 ] 

Tianying Chang commented on HBASE-17453:


[~ted_yu]  It seems for generating the patch for master 2.0 version, I also 
need to add the new API into 
"hbase-protocol-shaded/src/main/protobuf/Client.proto" besides just 
"hbase-protocol/src/main/protobuf/Client.proto" as I did for 1.2.5. ? 

I made same change to "hbase-protocol-shaded/src/main/protobuf/Client.proto", 
but when generating the ClientProtos.java file, they have differences other 
than my change, e.g. "return Consistency.forNumber(number);"  vs "return 
Consistency.valueOf(number);  I am wondering what version of protoc is HBase 
master using? 

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-10 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905900#comment-15905900
 ] 

Tianying Chang commented on HBASE-17453:


[~ted_yu] thanks for checking. I only run the full test and passed for the 
patch for 1.2.5. So did not run the test for master patch on my box. Let me 
find out where I missed the porting. 

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-17138) Backport read-path offheap (HBASE-11425) to branch-1

2017-03-10 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905595#comment-15905595
 ] 

Tianying Chang edited comment on HBASE-17138 at 3/10/17 7:44 PM:
-

[~anoop.hbase] Yes, we are interested in getting this patch in our production 
online facint cluster which is running on 1.2.1, any help from you will be 
great!

It seems the customized version in Alibaba was forked from 1.1 (although on top 
of it with many of their private patches).  If so I thought the effort fro 
backport to 1.2 should be similar? Do you have a high level sense what else 
need to be done besides those 70+ patches? 

Thanks


was (Author: tychang):
[~anoopamz] Yes, we are interested in getting this patch in our production 
online facint cluster which is running on 1.2.1, any help from you will be 
great!

It seems the customized version in Alibaba was forked from 1.1 (although on top 
of it with many of their private patches).  If so I thought the effort fro 
backport to 1.2 should be similar? Do you have a high level sense what else 
need to be done besides those 70+ patches? 

Thanks

> Backport read-path offheap (HBASE-11425) to branch-1
> 
>
> Key: HBASE-17138
> URL: https://issues.apache.org/jira/browse/HBASE-17138
> Project: HBase
>  Issue Type: Improvement
>Reporter: Yu Li
>Assignee: Yu Sun
> Attachments: 
> 0001-fix-EHB-511-Resolve-client-compatibility-issue-introduced-by-offheap-change.patch,
>  0001-to-EHB-446-offheap-hfile-format-should-keep-compatible-v3.patch, 
> 0001-to-EHB-456-Cell-should-be-compatible-with-branch-1.1.2.patch
>
>
> From the 
> [thread|http://mail-archives.apache.org/mod_mbox/hbase-user/201611.mbox/%3CCAM7-19%2Bn7cEiY4H9iLQ3N9V0NXppOPduZwk-hhgNLEaJfiV3kA%40mail.gmail.com%3E]
>  of sharing our experience and performance data of read-path offheap usage in 
> Alibaba search, we could see people are positive to have HBASE-11425 in 
> branch-1, so I'd like to create a JIRA and move the discussion and decision 
> making here.
> Echoing some comments from the mail thread:
> Bryan:
> Is the backported patch available anywhere? If it ends up not getting 
> officially backported to branch-1 due to 2.0 around the corner, some of us 
> who build our own deploy may want to integrate into our builds
> Andrew:
> Yes, please, the patches will be useful to the community even if we decide 
> not to backport into an official 1.x release.
> Enis:
> I don't see any reason why we cannot backport to branch-1.
> Ted:
> Opening a JIRA would be fine. This makes it easier for people to obtain the 
> patch(es)
> Nick:
> From the DISCUSS thread re: EOL of 1.1, it seems we'll continue to
> support 1.x releases for some time... I would guess these will be
> maintained until 2.2 at least. Therefore, offheap patches that have seen
> production exposure seem like a reasonable candidate for backport, perhaps in 
> a 1.4 or 1.5 release timeframe.
> Anoop:
> Because of some compatibility issues, we decide that this will be done in 2.0 
> only..  Ya as Andy said, it would be great to share the 1.x backported 
> patches.
> The following is all the jira ids we have back ported:
> HBASE-10930 Change Filters and GetClosestRowBeforeTracker to work with Cells 
> (Ram)
> HBASE-13373 Squash HFileReaderV3 together with HFileReaderV2 and 
> AbstractHFileReader; ditto for Scanners and BlockReader, etc.
> HBASE-13429 Remove deprecated seek/reseek methods from HFileScanner.
> HBASE-13450 - Purge RawBytescomparator from the writers and readers for 
> HBASE-10800 (Ram)
> HBASE-13501 - Deprecate/Remove getComparator() in HRegionInfo.
> HBASE-12048 Remove deprecated APIs from Filter.
> HBASE-10800 - Use CellComparator instead of KVComparator (Ram)
> HBASE-13679 Change ColumnTracker and SQM to deal with Cell instead of byte[], 
> int, int.
> HBASE-13642 Deprecate RegionObserver#postScannerFilterRow CP hook with 
> byte[],int,int args in favor of taking Cell arg.
> HBASE-13641 Deperecate Filter#filterRowKey(byte[] buffer, int offset, int 
> length) in favor of filterRowKey(Cell firstRowCell).
> HBASE-13827 Delayed scanner close in KeyValueHeap and StoreScanner.
> HBASE-13871 Change RegionScannerImpl to deal with Cell instead of byte[], 
> int, int.
> HBASE-11911 Break up tests into more fine grained categories (Alex Newman)
> HBASE-12059 Create hbase-annotations module
> HBASE-12106 Move test annotations to test artifact (Enis Soztutar)
> HBASE-13916 Create MultiByteBuffer an aggregation of ByteBuffers.
> HBASE-15679 Assertion on wrong variable in 
> TestReplicationThrottler#testThrottling
> HBASE-13931 Move Unsafe based operations to UnsafeAccess.
> HBASE-12345 Unsafe based ByteBuffer Comparator.
> HBASE-13998 Remove CellComparator#compareRows(byte[] left, int loffset, int 
> llength, 

[jira] [Commented] (HBASE-17138) Backport read-path offheap (HBASE-11425) to branch-1

2017-03-10 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905595#comment-15905595
 ] 

Tianying Chang commented on HBASE-17138:


[~anoopamz] Yes, we are interested in getting this patch in our production 
online facint cluster which is running on 1.2.1, any help from you will be 
great!

It seems the customized version in Alibaba was forked from 1.1 (although on top 
of it with many of their private patches).  If so I thought the effort fro 
backport to 1.2 should be similar? Do you have a high level sense what else 
need to be done besides those 70+ patches? 

Thanks

> Backport read-path offheap (HBASE-11425) to branch-1
> 
>
> Key: HBASE-17138
> URL: https://issues.apache.org/jira/browse/HBASE-17138
> Project: HBase
>  Issue Type: Improvement
>Reporter: Yu Li
>Assignee: Yu Sun
> Attachments: 
> 0001-fix-EHB-511-Resolve-client-compatibility-issue-introduced-by-offheap-change.patch,
>  0001-to-EHB-446-offheap-hfile-format-should-keep-compatible-v3.patch, 
> 0001-to-EHB-456-Cell-should-be-compatible-with-branch-1.1.2.patch
>
>
> From the 
> [thread|http://mail-archives.apache.org/mod_mbox/hbase-user/201611.mbox/%3CCAM7-19%2Bn7cEiY4H9iLQ3N9V0NXppOPduZwk-hhgNLEaJfiV3kA%40mail.gmail.com%3E]
>  of sharing our experience and performance data of read-path offheap usage in 
> Alibaba search, we could see people are positive to have HBASE-11425 in 
> branch-1, so I'd like to create a JIRA and move the discussion and decision 
> making here.
> Echoing some comments from the mail thread:
> Bryan:
> Is the backported patch available anywhere? If it ends up not getting 
> officially backported to branch-1 due to 2.0 around the corner, some of us 
> who build our own deploy may want to integrate into our builds
> Andrew:
> Yes, please, the patches will be useful to the community even if we decide 
> not to backport into an official 1.x release.
> Enis:
> I don't see any reason why we cannot backport to branch-1.
> Ted:
> Opening a JIRA would be fine. This makes it easier for people to obtain the 
> patch(es)
> Nick:
> From the DISCUSS thread re: EOL of 1.1, it seems we'll continue to
> support 1.x releases for some time... I would guess these will be
> maintained until 2.2 at least. Therefore, offheap patches that have seen
> production exposure seem like a reasonable candidate for backport, perhaps in 
> a 1.4 or 1.5 release timeframe.
> Anoop:
> Because of some compatibility issues, we decide that this will be done in 2.0 
> only..  Ya as Andy said, it would be great to share the 1.x backported 
> patches.
> The following is all the jira ids we have back ported:
> HBASE-10930 Change Filters and GetClosestRowBeforeTracker to work with Cells 
> (Ram)
> HBASE-13373 Squash HFileReaderV3 together with HFileReaderV2 and 
> AbstractHFileReader; ditto for Scanners and BlockReader, etc.
> HBASE-13429 Remove deprecated seek/reseek methods from HFileScanner.
> HBASE-13450 - Purge RawBytescomparator from the writers and readers for 
> HBASE-10800 (Ram)
> HBASE-13501 - Deprecate/Remove getComparator() in HRegionInfo.
> HBASE-12048 Remove deprecated APIs from Filter.
> HBASE-10800 - Use CellComparator instead of KVComparator (Ram)
> HBASE-13679 Change ColumnTracker and SQM to deal with Cell instead of byte[], 
> int, int.
> HBASE-13642 Deprecate RegionObserver#postScannerFilterRow CP hook with 
> byte[],int,int args in favor of taking Cell arg.
> HBASE-13641 Deperecate Filter#filterRowKey(byte[] buffer, int offset, int 
> length) in favor of filterRowKey(Cell firstRowCell).
> HBASE-13827 Delayed scanner close in KeyValueHeap and StoreScanner.
> HBASE-13871 Change RegionScannerImpl to deal with Cell instead of byte[], 
> int, int.
> HBASE-11911 Break up tests into more fine grained categories (Alex Newman)
> HBASE-12059 Create hbase-annotations module
> HBASE-12106 Move test annotations to test artifact (Enis Soztutar)
> HBASE-13916 Create MultiByteBuffer an aggregation of ByteBuffers.
> HBASE-15679 Assertion on wrong variable in 
> TestReplicationThrottler#testThrottling
> HBASE-13931 Move Unsafe based operations to UnsafeAccess.
> HBASE-12345 Unsafe based ByteBuffer Comparator.
> HBASE-13998 Remove CellComparator#compareRows(byte[] left, int loffset, int 
> llength, byte[] right, int roffset, int rlength).
> HBASE-13998 Remove CellComparator#compareRows()- Addendum to fix javadoc warn
> HBASE-13579 Avoid isCellTTLExpired() for NO-TAG cases (partially backport 
> this patch)
> HBASE-13448 New Cell implementation with cached component offsets/lengths.
> HBASE-13387 Add ByteBufferedCell an extension to Cell.
> HBASE-13387 Add ByteBufferedCell an extension to Cell - addendum.
> HBASE-12650 Move ServerName to hbase-common module (partially backport this 
> patch)
> HBASE-12296 Filters 

[jira] [Commented] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-09 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904162#comment-15904162
 ] 

Tianying Chang commented on HBASE-17453:


[~ted_yu] [~Apache9] [~saint@gmail.com] attached a patch generated from 
master branch. 

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-09 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-17453:
---
Attachment: HBASE-17453-master.patch

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-09 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-17453:
---
Attachment: HBASE-17453-master.patch

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Attachments: HBASE-17453-1.2.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-09 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-17453:
---
Attachment: (was: HBASE-17453-master.patch)

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Attachments: HBASE-17453-1.2.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17138) Backport read-path offheap (HBASE-11425) to branch-1

2017-03-09 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904015#comment-15904015
 ] 

Tianying Chang commented on HBASE-17138:


[~haoran] [~carp84] This improvement looks super useful to us too. We are 
currently on HBase 1.2.1. Will be very helpful if this can be backport. Anyone 
currently working on it?   

> Backport read-path offheap (HBASE-11425) to branch-1
> 
>
> Key: HBASE-17138
> URL: https://issues.apache.org/jira/browse/HBASE-17138
> Project: HBase
>  Issue Type: Improvement
>Reporter: Yu Li
>Assignee: Yu Sun
> Attachments: 
> 0001-fix-EHB-511-Resolve-client-compatibility-issue-introduced-by-offheap-change.patch,
>  0001-to-EHB-446-offheap-hfile-format-should-keep-compatible-v3.patch, 
> 0001-to-EHB-456-Cell-should-be-compatible-with-branch-1.1.2.patch
>
>
> From the 
> [thread|http://mail-archives.apache.org/mod_mbox/hbase-user/201611.mbox/%3CCAM7-19%2Bn7cEiY4H9iLQ3N9V0NXppOPduZwk-hhgNLEaJfiV3kA%40mail.gmail.com%3E]
>  of sharing our experience and performance data of read-path offheap usage in 
> Alibaba search, we could see people are positive to have HBASE-11425 in 
> branch-1, so I'd like to create a JIRA and move the discussion and decision 
> making here.
> Echoing some comments from the mail thread:
> Bryan:
> Is the backported patch available anywhere? If it ends up not getting 
> officially backported to branch-1 due to 2.0 around the corner, some of us 
> who build our own deploy may want to integrate into our builds
> Andrew:
> Yes, please, the patches will be useful to the community even if we decide 
> not to backport into an official 1.x release.
> Enis:
> I don't see any reason why we cannot backport to branch-1.
> Ted:
> Opening a JIRA would be fine. This makes it easier for people to obtain the 
> patch(es)
> Nick:
> From the DISCUSS thread re: EOL of 1.1, it seems we'll continue to
> support 1.x releases for some time... I would guess these will be
> maintained until 2.2 at least. Therefore, offheap patches that have seen
> production exposure seem like a reasonable candidate for backport, perhaps in 
> a 1.4 or 1.5 release timeframe.
> Anoop:
> Because of some compatibility issues, we decide that this will be done in 2.0 
> only..  Ya as Andy said, it would be great to share the 1.x backported 
> patches.
> The following is all the jira ids we have back ported:
> HBASE-10930 Change Filters and GetClosestRowBeforeTracker to work with Cells 
> (Ram)
> HBASE-13373 Squash HFileReaderV3 together with HFileReaderV2 and 
> AbstractHFileReader; ditto for Scanners and BlockReader, etc.
> HBASE-13429 Remove deprecated seek/reseek methods from HFileScanner.
> HBASE-13450 - Purge RawBytescomparator from the writers and readers for 
> HBASE-10800 (Ram)
> HBASE-13501 - Deprecate/Remove getComparator() in HRegionInfo.
> HBASE-12048 Remove deprecated APIs from Filter.
> HBASE-10800 - Use CellComparator instead of KVComparator (Ram)
> HBASE-13679 Change ColumnTracker and SQM to deal with Cell instead of byte[], 
> int, int.
> HBASE-13642 Deprecate RegionObserver#postScannerFilterRow CP hook with 
> byte[],int,int args in favor of taking Cell arg.
> HBASE-13641 Deperecate Filter#filterRowKey(byte[] buffer, int offset, int 
> length) in favor of filterRowKey(Cell firstRowCell).
> HBASE-13827 Delayed scanner close in KeyValueHeap and StoreScanner.
> HBASE-13871 Change RegionScannerImpl to deal with Cell instead of byte[], 
> int, int.
> HBASE-11911 Break up tests into more fine grained categories (Alex Newman)
> HBASE-12059 Create hbase-annotations module
> HBASE-12106 Move test annotations to test artifact (Enis Soztutar)
> HBASE-13916 Create MultiByteBuffer an aggregation of ByteBuffers.
> HBASE-15679 Assertion on wrong variable in 
> TestReplicationThrottler#testThrottling
> HBASE-13931 Move Unsafe based operations to UnsafeAccess.
> HBASE-12345 Unsafe based ByteBuffer Comparator.
> HBASE-13998 Remove CellComparator#compareRows(byte[] left, int loffset, int 
> llength, byte[] right, int roffset, int rlength).
> HBASE-13998 Remove CellComparator#compareRows()- Addendum to fix javadoc warn
> HBASE-13579 Avoid isCellTTLExpired() for NO-TAG cases (partially backport 
> this patch)
> HBASE-13448 New Cell implementation with cached component offsets/lengths.
> HBASE-13387 Add ByteBufferedCell an extension to Cell.
> HBASE-13387 Add ByteBufferedCell an extension to Cell - addendum.
> HBASE-12650 Move ServerName to hbase-common module (partially backport this 
> patch)
> HBASE-12296 Filters should work with ByteBufferedCell.
> HBASE-14120 ByteBufferUtils#compareTo small optimization.
> HBASE-13510 - Purge ByteBloomFilter (Ram)
> HBASE-13451 - Make the HFileBlockIndex blockKeys to Cells so that it could be 
> easy to use in the CellComparators (Ram)
> 

[jira] [Commented] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-03-07 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15900381#comment-15900381
 ] 

Tianying Chang commented on HBASE-17453:


 [~stack] sorry for the long delay. What we want is a simple API at RS side 
that we can use to poll to verify if the connection behave weird, and need 
reconnect. Earlier, GetPrococolVersion in 94 was borrowed to achieve this goal 
as a side effect. Basically, this call should be a simple and cheap one. I can 
made a patch for trunk if needed. 

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Attachments: HBASE-17453-1.2.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-02-06 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855004#comment-15855004
 ] 

Tianying Chang commented on HBASE-17453:


Thanks [~tedyu] [~stack]  for the comments. We need a method to reliably know 
if a connection with the RS still live. Previously, we rely on 
GetProtocolVersion(). When there is no traffic received within certain time, 
this could be either really there is no traffic, or the connection is bad. By 
sending a "Ping" and get response back, we know for sure it is not because 
connection is bad, therefore, no need to reconnect. If there is no response 
back from "Ping", we will reconnect. So we just need a lightweight response so 
that we know the communication link is healthy.  The PintProtocol.proto you 
mentioned above matches what we need. If it is a live API hosted by the RS, we 
can definitely use that.  


> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Attachments: HBASE-17453-1.2.patch
>
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-01-13 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822735#comment-15822735
 ] 

Tianying Chang commented on HBASE-17453:


[~Apache9] thanks. I will put the patch out next week. 

> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-01-13 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-17453:
---
Description: 
Our HBase service is hosted in AWS. We saw cases where the connection between 
the client (Asynchbase in our case) and server stop working but did not throw 
any exception, therefore traffic stuck. So we added a "Ping" feature in 
AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
if no traffic for given time, we send the "Ping", if no response back for 
"Ping", we assume the connect is bad and reconnect. 

Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
deprecated. To be able to support same detect/reconnect feature, we added 
Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
Asynchbase 1.7.

We would like to open source this feature since it is useful for use case in 
AWS environment. 


  was:
Our HBase service is hosted in AWS. We saw cases where the connection between 
the client (Asynchbase in our case) and server stop working but did not throw 
any exception, therefore traffic stuck. So we added a "Ping" feature in 
AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
if no traffic for given time, we send the "Ping", if no response back for 
"Ping", we assume the connect is bad and reconnect. 

Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
deprecated. To be able to support same detect/reconnect feature, we added 
Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
Asynchbase 1.7.

We would like to open source this feature since it is useful for use case in 
AWS environment. 


We used GetProtocolVersion in AsyncHBase to detect unhealthy connection to RS 
since in AWS, sometimes it enters a state the connection 


> add Ping into HBase server for deprecated GetProtocolVersion
> 
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 1.2.2
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
>
> Our HBase service is hosted in AWS. We saw cases where the connection between 
> the client (Asynchbase in our case) and server stop working but did not throw 
> any exception, therefore traffic stuck. So we added a "Ping" feature in 
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
> if no traffic for given time, we send the "Ping", if no response back for 
> "Ping", we assume the connect is bad and reconnect. 
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
> deprecated. To be able to support same detect/reconnect feature, we added 
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in 
> AWS environment. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-01-11 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-17453:
--

 Summary: add Ping into HBase server for deprecated 
GetProtocolVersion
 Key: HBASE-17453
 URL: https://issues.apache.org/jira/browse/HBASE-17453
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 1.2.2
Reporter: Tianying Chang
Assignee: Tianying Chang
Priority: Minor


Our HBase service is hosted in AWS. We saw cases where the connection between 
the client (Asynchbase in our case) and server stop working but did not throw 
any exception, therefore traffic stuck. So we added a "Ping" feature in 
AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
if no traffic for given time, we send the "Ping", if no response back for 
"Ping", we assume the connect is bad and reconnect. 

Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
deprecated. To be able to support same detect/reconnect feature, we added 
Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
Asynchbase 1.7.

We would like to open source this feature since it is useful for use case in 
AWS environment. 


We used GetProtocolVersion in AsyncHBase to detect unhealthy connection to RS 
since in AWS, sometimes it enters a state the connection 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-07-01 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359803#comment-15359803
 ] 

Tianying Chang commented on HBASE-16030:


Attached a new patch. Still use my old way, but updated to fit 1.2, and 
addressed the earlier CR comments. 

> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is 
> on, causing flush spike
> --
>
> Key: HBASE-16030
> URL: https://issues.apache.org/jira/browse/HBASE-16030
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3
>
> Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, Screen Shot 
> 2016-06-15 at 11.52.38 PM.png, hbase-16030-v2.patch, hbase-16030-v3.patch, 
> hbase-16030.patch
>
>
> In our production cluster, we observed that memstore flush spike every hour 
> for all regions/RS. (we use the default memstore periodic flush time of 1 
> hour). 
> This will happend when two conditions are met: 
> 1. the memstore does not have enough data to be flushed before 1 hour limit 
> reached;
> 2. all regions are opened around the same time, (e.g. all RS are started at 
> the same time when start a cluster). 
> With above two conditions, all the regions will be flushed around the same 
> time at: startTime+1hour-delay again and again.
> We added a flush jittering time to randomize the flush time of each region, 
> so that they don't get flushed at around the same time. We had this feature 
> running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found 
> this issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-07-01 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-16030:
---
Attachment: hbase-16030-v3.patch

> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is 
> on, causing flush spike
> --
>
> Key: HBASE-16030
> URL: https://issues.apache.org/jira/browse/HBASE-16030
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3
>
> Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, Screen Shot 
> 2016-06-15 at 11.52.38 PM.png, hbase-16030-v2.patch, hbase-16030-v3.patch, 
> hbase-16030.patch
>
>
> In our production cluster, we observed that memstore flush spike every hour 
> for all regions/RS. (we use the default memstore periodic flush time of 1 
> hour). 
> This will happend when two conditions are met: 
> 1. the memstore does not have enough data to be flushed before 1 hour limit 
> reached;
> 2. all regions are opened around the same time, (e.g. all RS are started at 
> the same time when start a cluster). 
> With above two conditions, all the regions will be flushed around the same 
> time at: startTime+1hour-delay again and again.
> We added a flush jittering time to randomize the flush time of each region, 
> so that they don't get flushed at around the same time. We had this feature 
> running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found 
> this issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16128) add support for p999 histogram metrics

2016-07-01 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359650#comment-15359650
 ] 

Tianying Chang commented on HBASE-16128:


Should I provide a patch for a different version? 

> add support for p999 histogram metrics
> --
>
> Key: HBASE-16128
> URL: https://issues.apache.org/jira/browse/HBASE-16128
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Attachments: HBase-16128.patch
>
>
> Currently there is support for p75,p90,p99, but not support for p999. We need 
> p999 metrics for reflecting p99 metrics at client level, especially client 
> side is fanout call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16128) add support for p999 histogram metrics

2016-06-30 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-16128:
---
Status: Patch Available  (was: In Progress)

> add support for p999 histogram metrics
> --
>
> Key: HBASE-16128
> URL: https://issues.apache.org/jira/browse/HBASE-16128
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Attachments: HBase-16128.patch
>
>
> Currently there is support for p75,p90,p99, but not support for p999. We need 
> p999 metrics for reflecting p99 metrics at client level, especially client 
> side is fanout call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16128) add support for p999 histogram metrics

2016-06-30 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15357848#comment-15357848
 ] 

Tianying Chang commented on HBASE-16128:


have deployed in our production cluster and see the p999 metrics now. 

> add support for p999 histogram metrics
> --
>
> Key: HBASE-16128
> URL: https://issues.apache.org/jira/browse/HBASE-16128
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Attachments: HBase-16128.patch
>
>
> Currently there is support for p75,p90,p99, but not support for p999. We need 
> p999 metrics for reflecting p99 metrics at client level, especially client 
> side is fanout call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16128) add support for p999 histogram metrics

2016-06-29 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356086#comment-15356086
 ] 

Tianying Chang commented on HBASE-16128:


Attached a patch for 1.2.1

> add support for p999 histogram metrics
> --
>
> Key: HBASE-16128
> URL: https://issues.apache.org/jira/browse/HBASE-16128
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Attachments: HBase-16128.patch
>
>
> Currently there is support for p75,p90,p99, but not support for p999. We need 
> p999 metrics for reflecting p99 metrics at client level, especially client 
> side is fanout call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HBASE-16128) add support for p999 histogram metrics

2016-06-29 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-16128 started by Tianying Chang.
--
> add support for p999 histogram metrics
> --
>
> Key: HBASE-16128
> URL: https://issues.apache.org/jira/browse/HBASE-16128
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Attachments: HBase-16128.patch
>
>
> Currently there is support for p75,p90,p99, but not support for p999. We need 
> p999 metrics for reflecting p99 metrics at client level, especially client 
> side is fanout call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16128) add support for p999 histogram metrics

2016-06-29 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-16128:
---
Attachment: HBase-16128.patch

> add support for p999 histogram metrics
> --
>
> Key: HBASE-16128
> URL: https://issues.apache.org/jira/browse/HBASE-16128
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>Priority: Minor
> Attachments: HBase-16128.patch
>
>
> Currently there is support for p75,p90,p99, but not support for p999. We need 
> p999 metrics for reflecting p99 metrics at client level, especially client 
> side is fanout call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-16128) add support for p999 histogram metrics

2016-06-27 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-16128:
--

 Summary: add support for p999 histogram metrics
 Key: HBASE-16128
 URL: https://issues.apache.org/jira/browse/HBASE-16128
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 1.2.1
Reporter: Tianying Chang
Assignee: Tianying Chang
Priority: Minor


Currently there is support for p75,p90,p99, but not support for p999. We need 
p999 metrics for reflecting p99 metrics at client level, especially client side 
is fanout call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-20 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340691#comment-15340691
 ] 

Tianying Chang commented on HBASE-16030:


[~enis] good scenario! So with the previous hard code 5 mins delay, the flush 
WILL NOT happen for 5 mins. It sounds also a problem if flush is stalled for 5 
mins (although not as bad as 30 mins) Feels like the right way of jittering 
should not be putting into a blocking queue ahead of time. What do you think? 

> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is 
> on, causing flush spike
> --
>
> Key: HBASE-16030
> URL: https://issues.apache.org/jira/browse/HBASE-16030
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3
>
> Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, Screen Shot 
> 2016-06-15 at 11.52.38 PM.png, hbase-16030-v2.patch, hbase-16030.patch
>
>
> In our production cluster, we observed that memstore flush spike every hour 
> for all regions/RS. (we use the default memstore periodic flush time of 1 
> hour). 
> This will happend when two conditions are met: 
> 1. the memstore does not have enough data to be flushed before 1 hour limit 
> reached;
> 2. all regions are opened around the same time, (e.g. all RS are started at 
> the same time when start a cluster). 
> With above two conditions, all the regions will be flushed around the same 
> time at: startTime+1hour-delay again and again.
> We added a flush jittering time to randomize the flush time of each region, 
> so that they don't get flushed at around the same time. We had this feature 
> running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found 
> this issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-20 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15339804#comment-15339804
 ] 

Tianying Chang commented on HBASE-16030:


[~enis] I feel that we should have check for configurable value to be a 
non-negative to prevent user error. But it seems all the places for retrieving 
the configurable value does not do any check. Is this the convention for HBase 
that user has the responsibility make sure a reasonable value is used? 

> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is 
> on, causing flush spike
> --
>
> Key: HBASE-16030
> URL: https://issues.apache.org/jira/browse/HBASE-16030
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3
>
> Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, Screen Shot 
> 2016-06-15 at 11.52.38 PM.png, hbase-16030-v2.patch, hbase-16030.patch
>
>
> In our production cluster, we observed that memstore flush spike every hour 
> for all regions/RS. (we use the default memstore periodic flush time of 1 
> hour). 
> This will happend when two conditions are met: 
> 1. the memstore does not have enough data to be flushed before 1 hour limit 
> reached;
> 2. all regions are opened around the same time, (e.g. all RS are started at 
> the same time when start a cluster). 
> With above two conditions, all the regions will be flushed around the same 
> time at: startTime+1hour-delay again and again.
> We added a flush jittering time to randomize the flush time of each region, 
> so that they don't get flushed at around the same time. We had this feature 
> running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found 
> this issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-19 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15336324#comment-15336324
 ] 

Tianying Chang edited comment on HBASE-16030 at 6/19/16 6:22 AM:
-

[~enis] sure, I can make another patch. Question, by increasing the flush delay 
time, the flush request will stay in the queue for 30 minutes. Will this cause 
any issue?  


was (Author: tychang):
@enis sure, I can make another patch. Question, by increasing the flush delay 
time, the flush request will stay in the queue for 30 minutes. Will this cause 
any issue?  

> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is 
> on, causing flush spike
> --
>
> Key: HBASE-16030
> URL: https://issues.apache.org/jira/browse/HBASE-16030
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3
>
> Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, Screen Shot 
> 2016-06-15 at 11.52.38 PM.png, hbase-16030-v2.patch, hbase-16030.patch
>
>
> In our production cluster, we observed that memstore flush spike every hour 
> for all regions/RS. (we use the default memstore periodic flush time of 1 
> hour). 
> This will happend when two conditions are met: 
> 1. the memstore does not have enough data to be flushed before 1 hour limit 
> reached;
> 2. all regions are opened around the same time, (e.g. all RS are started at 
> the same time when start a cluster). 
> With above two conditions, all the regions will be flushed around the same 
> time at: startTime+1hour-delay again and again.
> We added a flush jittering time to randomize the flush time of each region, 
> so that they don't get flushed at around the same time. We had this feature 
> running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found 
> this issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-19 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338403#comment-15338403
 ] 

Tianying Chang commented on HBASE-16030:


[~enis] attached a new patch which simply make the jitter value configurable 
and with larger default value. 

> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is 
> on, causing flush spike
> --
>
> Key: HBASE-16030
> URL: https://issues.apache.org/jira/browse/HBASE-16030
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3
>
> Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, Screen Shot 
> 2016-06-15 at 11.52.38 PM.png, hbase-16030-v2.patch, hbase-16030.patch
>
>
> In our production cluster, we observed that memstore flush spike every hour 
> for all regions/RS. (we use the default memstore periodic flush time of 1 
> hour). 
> This will happend when two conditions are met: 
> 1. the memstore does not have enough data to be flushed before 1 hour limit 
> reached;
> 2. all regions are opened around the same time, (e.g. all RS are started at 
> the same time when start a cluster). 
> With above two conditions, all the regions will be flushed around the same 
> time at: startTime+1hour-delay again and again.
> We added a flush jittering time to randomize the flush time of each region, 
> so that they don't get flushed at around the same time. We had this feature 
> running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found 
> this issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-19 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-16030:
---
Attachment: hbase-16030-v2.patch

> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is 
> on, causing flush spike
> --
>
> Key: HBASE-16030
> URL: https://issues.apache.org/jira/browse/HBASE-16030
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3
>
> Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, Screen Shot 
> 2016-06-15 at 11.52.38 PM.png, hbase-16030-v2.patch, hbase-16030.patch
>
>
> In our production cluster, we observed that memstore flush spike every hour 
> for all regions/RS. (we use the default memstore periodic flush time of 1 
> hour). 
> This will happend when two conditions are met: 
> 1. the memstore does not have enough data to be flushed before 1 hour limit 
> reached;
> 2. all regions are opened around the same time, (e.g. all RS are started at 
> the same time when start a cluster). 
> With above two conditions, all the regions will be flushed around the same 
> time at: startTime+1hour-delay again and again.
> We added a flush jittering time to randomize the flush time of each region, 
> so that they don't get flushed at around the same time. We had this feature 
> running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found 
> this issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-17 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15336324#comment-15336324
 ] 

Tianying Chang commented on HBASE-16030:


@enis sure, I can make another patch. Question, by increasing the flush delay 
time, the flush request will stay in the queue for 30 minutes. Will this cause 
any issue?  

> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is 
> on, causing flush spike
> --
>
> Key: HBASE-16030
> URL: https://issues.apache.org/jira/browse/HBASE-16030
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3
>
> Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, Screen Shot 
> 2016-06-15 at 11.52.38 PM.png, hbase-16030.patch
>
>
> In our production cluster, we observed that memstore flush spike every hour 
> for all regions/RS. (we use the default memstore periodic flush time of 1 
> hour). 
> This will happend when two conditions are met: 
> 1. the memstore does not have enough data to be flushed before 1 hour limit 
> reached;
> 2. all regions are opened around the same time, (e.g. all RS are started at 
> the same time when start a cluster). 
> With above two conditions, all the regions will be flushed around the same 
> time at: startTime+1hour-delay again and again.
> We added a flush jittering time to randomize the flush time of each region, 
> so that they don't get flushed at around the same time. We had this feature 
> running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found 
> this issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-16 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-16030:
---
Attachment: Screen Shot 2016-06-15 at 11.52.38 PM.png

> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is 
> on, causing flush spike
> --
>
> Key: HBASE-16030
> URL: https://issues.apache.org/jira/browse/HBASE-16030
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3
>
> Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, Screen Shot 
> 2016-06-15 at 11.52.38 PM.png, hbase-16030.patch
>
>
> In our production cluster, we observed that memstore flush spike every hour 
> for all regions/RS. (we use the default memstore periodic flush time of 1 
> hour). 
> This will happend when two conditions are met: 
> 1. the memstore does not have enough data to be flushed before 1 hour limit 
> reached;
> 2. all regions are opened around the same time, (e.g. all RS are started at 
> the same time when start a cluster). 
> With above two conditions, all the regions will be flushed around the same 
> time at: startTime+1hour-delay again and again.
> We added a flush jittering time to randomize the flush time of each region, 
> so that they don't get flushed at around the same time. We had this feature 
> running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found 
> this issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-16 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333224#comment-15333224
 ] 

Tianying Chang commented on HBASE-16030:


[~enis] thanks for reviewing the patch. Yes, 5 minutes is not enough, we would 
like to see the flush uniformly distributed through the one hour range in 
online facing production cluster. I am fine if we can make this value 
configurable, therefore larger than 5 min. Will it have a problem if flush 
request is queued and delayed for up to 1 hour? 

BTW, attached a new graph to show the impact of the hourly spike on the 
network/disk/cpu on our new 1.2RC test cluster.

> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is 
> on, causing flush spike
> --
>
> Key: HBASE-16030
> URL: https://issues.apache.org/jira/browse/HBASE-16030
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3
>
> Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, 
> hbase-16030.patch
>
>
> In our production cluster, we observed that memstore flush spike every hour 
> for all regions/RS. (we use the default memstore periodic flush time of 1 
> hour). 
> This will happend when two conditions are met: 
> 1. the memstore does not have enough data to be flushed before 1 hour limit 
> reached;
> 2. all regions are opened around the same time, (e.g. all RS are started at 
> the same time when start a cluster). 
> With above two conditions, all the regions will be flushed around the same 
> time at: startTime+1hour-delay again and again.
> We added a flush jittering time to randomize the flush time of each region, 
> so that they don't get flushed at around the same time. We had this feature 
> running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found 
> this issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-16 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-16030:
---
Attachment: Screen Shot 2016-06-15 at 11.35.42 PM.png

> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is 
> on, causing flush spike
> --
>
> Key: HBASE-16030
> URL: https://issues.apache.org/jira/browse/HBASE-16030
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3
>
> Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, 
> hbase-16030.patch
>
>
> In our production cluster, we observed that memstore flush spike every hour 
> for all regions/RS. (we use the default memstore periodic flush time of 1 
> hour). 
> This will happend when two conditions are met: 
> 1. the memstore does not have enough data to be flushed before 1 hour limit 
> reached;
> 2. all regions are opened around the same time, (e.g. all RS are started at 
> the same time when start a cluster). 
> With above two conditions, all the regions will be flushed around the same 
> time at: startTime+1hour-delay again and again.
> We added a flush jittering time to randomize the flush time of each region, 
> so that they don't get flushed at around the same time. We had this feature 
> running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found 
> this issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-16 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-16030:
---
Attachment: (was: screenshot-1.png)

> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is 
> on, causing flush spike
> --
>
> Key: HBASE-16030
> URL: https://issues.apache.org/jira/browse/HBASE-16030
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3
>
> Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, 
> hbase-16030.patch
>
>
> In our production cluster, we observed that memstore flush spike every hour 
> for all regions/RS. (we use the default memstore periodic flush time of 1 
> hour). 
> This will happend when two conditions are met: 
> 1. the memstore does not have enough data to be flushed before 1 hour limit 
> reached;
> 2. all regions are opened around the same time, (e.g. all RS are started at 
> the same time when start a cluster). 
> With above two conditions, all the regions will be flushed around the same 
> time at: startTime+1hour-delay again and again.
> We added a flush jittering time to randomize the flush time of each region, 
> so that they don't get flushed at around the same time. We had this feature 
> running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found 
> this issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-15 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-16030:
---
Attachment: screenshot-1.png

> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is 
> on, causing flush spike
> --
>
> Key: HBASE-16030
> URL: https://issues.apache.org/jira/browse/HBASE-16030
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
> Fix For: 2.0.0, 1.3.0, 1.2.3
>
> Attachments: hbase-16030.patch, screenshot-1.png
>
>
> In our production cluster, we observed that memstore flush spike every hour 
> for all regions/RS. (we use the default memstore periodic flush time of 1 
> hour). 
> This will happend when two conditions are met: 
> 1. the memstore does not have enough data to be flushed before 1 hour limit 
> reached;
> 2. all regions are opened around the same time, (e.g. all RS are started at 
> the same time when start a cluster). 
> With above two conditions, all the regions will be flushed around the same 
> time at: startTime+1hour-delay again and again.
> We added a flush jittering time to randomize the flush time of each region, 
> so that they don't get flushed at around the same time. We had this feature 
> running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found 
> this issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-16028) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-15 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang resolved HBASE-16028.

Resolution: Duplicate

> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is 
> on, causing flush spike
> --
>
> Key: HBASE-16028
> URL: https://issues.apache.org/jira/browse/HBASE-16028
> Project: HBase
>  Issue Type: Improvement
>  Components: hbase, Performance
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>
> In our production cluster, we observed that memstore flush spike every hour 
> for all regions/RS. (we use the default memstore periodic flush time of 1 
> hour). 
> This will happend when two conditions are met: 
> 1. the memstore does not have enough data to be flushed before 1 hour limit 
> reached;
> 2. all regions are opened around the same time, (e.g. all RS are started at 
> the same time when start a cluster). 
> With above two conditions, all the regions will be flushed around the same 
> time at: startTime+1hour-delay again and again.
> We added a flush jittering time to randomize the flush time of each region, 
> so that they don't get flushed at around the same time. We had this feature 
> running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found 
> this issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-16029) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-15 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang resolved HBASE-16029.

Resolution: Duplicate

> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is 
> on, causing flush spike
> --
>
> Key: HBASE-16029
> URL: https://issues.apache.org/jira/browse/HBASE-16029
> Project: HBase
>  Issue Type: Improvement
>  Components: hbase, Performance
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-16027) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-15 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang resolved HBASE-16027.

Resolution: Duplicate

> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is 
> on, causing flush spike
> --
>
> Key: HBASE-16027
> URL: https://issues.apache.org/jira/browse/HBASE-16027
> Project: HBase
>  Issue Type: Bug
>  Components: hbase, Performance
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>
> In our production cluster, we observed that memstore flush spike every hour 
> for all regions/RS. (we use the default memstore periodic flush time of 1 
> hour). 
> This will happend when two conditions are met: 
> 1. the memstore does not have enough data to be flushed before 1 hour limit 
> reached;
> 2. all regions are opened around the same time, (e.g. all RS are started at 
> the same time when start a cluster). 
> With above two conditions, all the regions will be flushed around the same 
> time at: startTime+1hour-delay again and again.
> We added a flush jittering time to randomize the flush time of each region, 
> so that they don't get flushed at around the same time. We had this feature 
> running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found 
> this issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-14 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-16030:
---
Fix Version/s: 1.2.1
   Status: Patch Available  (was: In Progress)

> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is 
> on, causing flush spike
> --
>
> Key: HBASE-16030
> URL: https://issues.apache.org/jira/browse/HBASE-16030
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
> Fix For: 1.2.1
>
> Attachments: hbase-16030.patch
>
>
> In our production cluster, we observed that memstore flush spike every hour 
> for all regions/RS. (we use the default memstore periodic flush time of 1 
> hour). 
> This will happend when two conditions are met: 
> 1. the memstore does not have enough data to be flushed before 1 hour limit 
> reached;
> 2. all regions are opened around the same time, (e.g. all RS are started at 
> the same time when start a cluster). 
> With above two conditions, all the regions will be flushed around the same 
> time at: startTime+1hour-delay again and again.
> We added a flush jittering time to randomize the flush time of each region, 
> so that they don't get flushed at around the same time. We had this feature 
> running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found 
> this issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-14 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330913#comment-15330913
 ] 

Tianying Chang commented on HBASE-16030:


attached patch for 1.2.1

> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is 
> on, causing flush spike
> --
>
> Key: HBASE-16030
> URL: https://issues.apache.org/jira/browse/HBASE-16030
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
> Fix For: 1.2.1
>
> Attachments: hbase-16030.patch
>
>
> In our production cluster, we observed that memstore flush spike every hour 
> for all regions/RS. (we use the default memstore periodic flush time of 1 
> hour). 
> This will happend when two conditions are met: 
> 1. the memstore does not have enough data to be flushed before 1 hour limit 
> reached;
> 2. all regions are opened around the same time, (e.g. all RS are started at 
> the same time when start a cluster). 
> With above two conditions, all the regions will be flushed around the same 
> time at: startTime+1hour-delay again and again.
> We added a flush jittering time to randomize the flush time of each region, 
> so that they don't get flushed at around the same time. We had this feature 
> running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found 
> this issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-14 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-16030:
---
Attachment: hbase-16030.patch

> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is 
> on, causing flush spike
> --
>
> Key: HBASE-16030
> URL: https://issues.apache.org/jira/browse/HBASE-16030
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
> Fix For: 1.2.1
>
> Attachments: hbase-16030.patch
>
>
> In our production cluster, we observed that memstore flush spike every hour 
> for all regions/RS. (we use the default memstore periodic flush time of 1 
> hour). 
> This will happend when two conditions are met: 
> 1. the memstore does not have enough data to be flushed before 1 hour limit 
> reached;
> 2. all regions are opened around the same time, (e.g. all RS are started at 
> the same time when start a cluster). 
> With above two conditions, all the regions will be flushed around the same 
> time at: startTime+1hour-delay again and again.
> We added a flush jittering time to randomize the flush time of each region, 
> so that they don't get flushed at around the same time. We had this feature 
> running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found 
> this issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-14 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-16030 started by Tianying Chang.
--
> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is 
> on, causing flush spike
> --
>
> Key: HBASE-16030
> URL: https://issues.apache.org/jira/browse/HBASE-16030
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>
> In our production cluster, we observed that memstore flush spike every hour 
> for all regions/RS. (we use the default memstore periodic flush time of 1 
> hour). 
> This will happend when two conditions are met: 
> 1. the memstore does not have enough data to be flushed before 1 hour limit 
> reached;
> 2. all regions are opened around the same time, (e.g. all RS are started at 
> the same time when start a cluster). 
> With above two conditions, all the regions will be flushed around the same 
> time at: startTime+1hour-delay again and again.
> We added a flush jittering time to randomize the flush time of each region, 
> so that they don't get flushed at around the same time. We had this feature 
> running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found 
> this issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-14 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-16030:
--

 Summary: All Regions are flushed at about same time when 
MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
 Key: HBASE-16030
 URL: https://issues.apache.org/jira/browse/HBASE-16030
 Project: HBase
  Issue Type: Improvement
Affects Versions: 1.2.1
Reporter: Tianying Chang
Assignee: Tianying Chang


In our production cluster, we observed that memstore flush spike every hour for 
all regions/RS. (we use the default memstore periodic flush time of 1 hour). 

This will happend when two conditions are met: 
1. the memstore does not have enough data to be flushed before 1 hour limit 
reached;
2. all regions are opened around the same time, (e.g. all RS are started at the 
same time when start a cluster). 

With above two conditions, all the regions will be flushed around the same time 
at: startTime+1hour-delay again and again.

We added a flush jittering time to randomize the flush time of each region, so 
that they don't get flushed at around the same time. We had this feature 
running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found this 
issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-16029) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-14 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-16029:
--

 Summary: All Regions are flushed at about same time when 
MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
 Key: HBASE-16029
 URL: https://issues.apache.org/jira/browse/HBASE-16029
 Project: HBase
  Issue Type: Improvement
  Components: hbase, Performance
Affects Versions: 1.2.1
Reporter: Tianying Chang
Assignee: Tianying Chang


 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-16028) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-14 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-16028:
--

 Summary: All Regions are flushed at about same time when 
MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
 Key: HBASE-16028
 URL: https://issues.apache.org/jira/browse/HBASE-16028
 Project: HBase
  Issue Type: Improvement
  Components: hbase, Performance
Affects Versions: 1.2.1
Reporter: Tianying Chang
Assignee: Tianying Chang


In our production cluster, we observed that memstore flush spike every hour for 
all regions/RS. (we use the default memstore periodic flush time of 1 hour). 

This will happend when two conditions are met: 
1. the memstore does not have enough data to be flushed before 1 hour limit 
reached;
2. all regions are opened around the same time, (e.g. all RS are started at the 
same time when start a cluster). 

With above two conditions, all the regions will be flushed around the same time 
at: startTime+1hour-delay again and again.

We added a flush jittering time to randomize the flush time of each region, so 
that they don't get flushed at around the same time. We had this feature 
running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found this 
issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-16027) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-14 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-16027:
--

 Summary: All Regions are flushed at about same time when 
MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
 Key: HBASE-16027
 URL: https://issues.apache.org/jira/browse/HBASE-16027
 Project: HBase
  Issue Type: Bug
  Components: hbase, Performance
Affects Versions: 1.2.1
Reporter: Tianying Chang
Assignee: Tianying Chang


In our production cluster, we observed that memstore flush spike every hour for 
all regions/RS. (we use the default memstore periodic flush time of 1 hour). 

This will happend when two conditions are met: 
1. the memstore does not have enough data to be flushed before 1 hour limit 
reached;
2. all regions are opened around the same time, (e.g. all RS are started at the 
same time when start a cluster). 

With above two conditions, all the regions will be flushed around the same time 
at: startTime+1hour-delay again and again.

We added a flush jittering time to randomize the flush time of each region, so 
that they don't get flushed at around the same time. We had this feature 
running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found this 
issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-7055) port HBASE-6371 tier-based compaction from 0.89-fb to trunk (with changes)

2016-05-03 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang reassigned HBASE-7055:
-

Assignee: Tianying Chang  (was: Sergey Shelukhin)

> port HBASE-6371 tier-based compaction from 0.89-fb to trunk (with changes)
> --
>
> Key: HBASE-7055
> URL: https://issues.apache.org/jira/browse/HBASE-7055
> Project: HBase
>  Issue Type: Task
>  Components: Compaction
>Affects Versions: 0.95.2
>Reporter: Sergey Shelukhin
>Assignee: Tianying Chang
> Attachments: HBASE-6371-squashed.patch, HBASE-6371-v2-squashed.patch, 
> HBASE-6371-v3-refactor-only-squashed.patch, 
> HBASE-6371-v4-refactor-only-squashed.patch, 
> HBASE-6371-v5-refactor-only-squashed.patch, HBASE-7055-v0.patch, 
> HBASE-7055-v1.patch, HBASE-7055-v2.patch, HBASE-7055-v3.patch, 
> HBASE-7055-v4.patch, HBASE-7055-v5.patch, HBASE-7055-v6.patch, 
> HBASE-7055-v7.patch, HBASE-7055-v7.patch, Tier Based Compaction Settings.pdf
>
>
> See HBASE-6371 for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15155) Show All RPC handler tasks stops working after cluster is under heavy load for a while

2016-02-03 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130776#comment-15130776
 ] 

Tianying Chang commented on HBASE-15155:


We deployed at our production cluster running 94.26, it is working fine with 
heavy traffic for over a week now. 

> Show All RPC handler tasks stops working after cluster is under heavy load 
> for a while
> --
>
> Key: HBASE-15155
> URL: https://issues.apache.org/jira/browse/HBASE-15155
> Project: HBase
>  Issue Type: Bug
>  Components: monitoring
>Affects Versions: 0.98.0, 1.0.0, 0.94.19
>Reporter: Tianying Chang
>Assignee: Tianying Chang
> Attachments: hbase-15155.patch
>
>
> After we upgrade from 94.7 to 94.26 and 1.0, we found that "Show All RPC 
> handler status" link on RS webUI stops working after running in production 
> cluster with relatively high load for several days.  
> Turn out to be it is a bug introduced by 
> https://issues.apache.org/jira/browse/HBASE-10312 The BoundedFIFOBuffer cause 
> RPCHandler Status overriden/removed permanently when there is a spike of 
> non-RPC tasks status that is over the MAX_SIZE (1000).  So as long as the RS 
> experienced "high" load once, the RPC status monitoring is gone forever, 
> until RS is restarted. 
>  We added a unit test that can repro this. And the fix can pass the test.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-15155) Show All RPC handler tasks stop working after cluster is under heavy load for a while

2016-01-21 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-15155:
--

 Summary: Show All RPC handler tasks stop working after cluster is 
under heavy load for a while
 Key: HBASE-15155
 URL: https://issues.apache.org/jira/browse/HBASE-15155
 Project: HBase
  Issue Type: Bug
  Components: monitoring
Affects Versions: 0.94.19, 1.0.0, 0.98.0
Reporter: Tianying Chang
Assignee: Tianying Chang


After we upgrade from 94.7 to 94.26 and 1.0, we found that "Show All RPC 
handler status" link on RS webUI stops working after running in production 
cluster with relatively high load for several days.  

Turn out to be it is a bug introduced by 
https://issues.apache.org/jira/browse/HBASE-10312 The BoundedFIFOBuffer cause 
RPCHandler Status overriden/removed permanently when there is a spike of 
non-RPC tasks status that is over the MAX_SIZE (1000).  So as long as the RS 
experienced "high" load once, the RPC status monitoring is gone forever, until 
RS is restarted. 

 We added a unit test that can repro this. And the fix can pass the test.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15155) Show All RPC handler tasks stop working after cluster is under heavy load for a while

2016-01-21 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-15155:
---
Attachment: hbase-15155.patch

> Show All RPC handler tasks stop working after cluster is under heavy load for 
> a while
> -
>
> Key: HBASE-15155
> URL: https://issues.apache.org/jira/browse/HBASE-15155
> Project: HBase
>  Issue Type: Bug
>  Components: monitoring
>Affects Versions: 0.98.0, 1.0.0, 0.94.19
>Reporter: Tianying Chang
>Assignee: Tianying Chang
> Attachments: hbase-15155.patch
>
>
> After we upgrade from 94.7 to 94.26 and 1.0, we found that "Show All RPC 
> handler status" link on RS webUI stops working after running in production 
> cluster with relatively high load for several days.  
> Turn out to be it is a bug introduced by 
> https://issues.apache.org/jira/browse/HBASE-10312 The BoundedFIFOBuffer cause 
> RPCHandler Status overriden/removed permanently when there is a spike of 
> non-RPC tasks status that is over the MAX_SIZE (1000).  So as long as the RS 
> experienced "high" load once, the RPC status monitoring is gone forever, 
> until RS is restarted. 
>  We added a unit test that can repro this. And the fix can pass the test.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HBASE-15155) Show All RPC handler tasks stop working after cluster is under heavy load for a while

2016-01-21 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-15155 started by Tianying Chang.
--
> Show All RPC handler tasks stop working after cluster is under heavy load for 
> a while
> -
>
> Key: HBASE-15155
> URL: https://issues.apache.org/jira/browse/HBASE-15155
> Project: HBase
>  Issue Type: Bug
>  Components: monitoring
>Affects Versions: 0.98.0, 1.0.0, 0.94.19
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>
> After we upgrade from 94.7 to 94.26 and 1.0, we found that "Show All RPC 
> handler status" link on RS webUI stops working after running in production 
> cluster with relatively high load for several days.  
> Turn out to be it is a bug introduced by 
> https://issues.apache.org/jira/browse/HBASE-10312 The BoundedFIFOBuffer cause 
> RPCHandler Status overriden/removed permanently when there is a spike of 
> non-RPC tasks status that is over the MAX_SIZE (1000).  So as long as the RS 
> experienced "high" load once, the RPC status monitoring is gone forever, 
> until RS is restarted. 
>  We added a unit test that can repro this. And the fix can pass the test.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15155) Show All RPC handler tasks stop working after cluster is under heavy load for a while

2016-01-21 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15111667#comment-15111667
 ] 

Tianying Chang commented on HBASE-15155:


Thanks [~ted_yu] will make a new patch. 

> Show All RPC handler tasks stop working after cluster is under heavy load for 
> a while
> -
>
> Key: HBASE-15155
> URL: https://issues.apache.org/jira/browse/HBASE-15155
> Project: HBase
>  Issue Type: Bug
>  Components: monitoring
>Affects Versions: 0.98.0, 1.0.0, 0.94.19
>Reporter: Tianying Chang
>Assignee: Tianying Chang
> Attachments: hbase-15155.patch
>
>
> After we upgrade from 94.7 to 94.26 and 1.0, we found that "Show All RPC 
> handler status" link on RS webUI stops working after running in production 
> cluster with relatively high load for several days.  
> Turn out to be it is a bug introduced by 
> https://issues.apache.org/jira/browse/HBASE-10312 The BoundedFIFOBuffer cause 
> RPCHandler Status overriden/removed permanently when there is a spike of 
> non-RPC tasks status that is over the MAX_SIZE (1000).  So as long as the RS 
> experienced "high" load once, the RPC status monitoring is gone forever, 
> until RS is restarted. 
>  We added a unit test that can repro this. And the fix can pass the test.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15155) Show All RPC handler tasks stop working after cluster is under heavy load for a while

2016-01-21 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15111992#comment-15111992
 ] 

Tianying Chang commented on HBASE-15155:


 RPC handler status is inserted when RPC handle is started, not like the 
non-RPC monitored task, it is long running thread that never exit, so only 
transition between waiting and running state while RS is up running. 

> Show All RPC handler tasks stop working after cluster is under heavy load for 
> a while
> -
>
> Key: HBASE-15155
> URL: https://issues.apache.org/jira/browse/HBASE-15155
> Project: HBase
>  Issue Type: Bug
>  Components: monitoring
>Affects Versions: 0.98.0, 1.0.0, 0.94.19
>Reporter: Tianying Chang
>Assignee: Tianying Chang
> Attachments: hbase-15155.patch
>
>
> After we upgrade from 94.7 to 94.26 and 1.0, we found that "Show All RPC 
> handler status" link on RS webUI stops working after running in production 
> cluster with relatively high load for several days.  
> Turn out to be it is a bug introduced by 
> https://issues.apache.org/jira/browse/HBASE-10312 The BoundedFIFOBuffer cause 
> RPCHandler Status overriden/removed permanently when there is a spike of 
> non-RPC tasks status that is over the MAX_SIZE (1000).  So as long as the RS 
> experienced "high" load once, the RPC status monitoring is gone forever, 
> until RS is restarted. 
>  We added a unit test that can repro this. And the fix can pass the test.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13639) SyncTable - rsync for HBase tables

2015-10-07 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947244#comment-14947244
 ] 

Tianying Chang commented on HBASE-13639:


Nice idea! I am wondering if the source table has constant write into it, what 
will be the outcome? Or I guess  this feature is meant for source table that is 
not being updated? Thanks

> SyncTable - rsync for HBase tables
> --
>
> Key: HBASE-13639
> URL: https://issues.apache.org/jira/browse/HBASE-13639
> Project: HBase
>  Issue Type: New Feature
>Reporter: Dave Latham
>Assignee: Dave Latham
> Fix For: 2.0.0, 0.98.14, 1.2.0
>
> Attachments: HBASE-13639-0.98-addendum-hadoop-1.patch, 
> HBASE-13639-0.98.patch, HBASE-13639-v1.patch, HBASE-13639-v2.patch, 
> HBASE-13639-v3-0.98.patch, HBASE-13639-v3.patch, HBASE-13639.patch
>
>
> Given HBase tables in remote clusters with similar but not identical data, 
> efficiently update a target table such that the data in question is identical 
> to a source table.  Efficiency in this context means using far less network 
> traffic than would be required to ship all the data from one cluster to the 
> other.  Takes inspiration from rsync.
> Design doc: 
> https://docs.google.com/document/d/1-2c9kJEWNrXf5V4q_wBcoIXfdchN7Pxvxv1IO6PW0-U/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11765) ReplicationSink should merge the Put/Delete of the same row into one Action even if they are from different hlog entry.

2014-08-16 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099873#comment-14099873
 ] 

Tianying Chang commented on HBASE-11765:


[~lhofhansl] Thanks for the link. It seems HBASE-8806 is trying to solve the 
exactly same problem, but using a different approach. My way is to sort all the 
kvs from all hlog entries. That way, it is able to guarantee for each  batch() 
call sent by replication sink, only one Put/Delete is created for a row, so no 
lock problem. It fees a little like the approach taken by HBASE-6930. My patch 
does not change the behavior of multi() in HRegion, only effect replication 
sink implementation. With this change, a hlog that used to take 4min 20sec to 
replay only need 30 sec.  I will take a deeper look at HBASE-8806. Thanks.

 ReplicationSink should merge the Put/Delete of the same row into one Action 
 even if they are from different hlog entry.
 ---

 Key: HBASE-11765
 URL: https://issues.apache.org/jira/browse/HBASE-11765
 Project: HBase
  Issue Type: Improvement
  Components: Performance, Replication
Affects Versions: 0.94.7
Reporter: Tianying Chang
Assignee: Tianying Chang
 Fix For: 0.94.23

 Attachments: HBASE-11765.patch


 The current replicationSink code make sure it will only create one Put/Delete 
 action of the kv of same row if it is from same hlog entry. However, when the 
 same row of Put/Delete exist in different hlog entry, multiple Put/Delete 
 action will be created, this will cause synchronization cost during the multi 
 batch operation. 
 In one of our application traffic pattern which has delete for same row twice 
 for many rows, we saw doMiniBatchMutation() is invoked many times due to the 
 row lock for the same row. ReplicationSink side is super slow, and 
 replication queue build up. 
 We should put the put/delete for the same row into one Put/Delete action even 
 if they are from different hlog entry. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11765) ReplicationSink should merge the Put/Delete of the same row into one Action even if they are from different hlog entry.

2014-08-15 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-11765:
--

 Summary: ReplicationSink should merge the Put/Delete of the same 
row into one Action even if they are from different hlog entry.
 Key: HBASE-11765
 URL: https://issues.apache.org/jira/browse/HBASE-11765
 Project: HBase
  Issue Type: Improvement
  Components: Performance, Replication
Affects Versions: 0.94.7
Reporter: Tianying Chang
Assignee: Tianying Chang
 Fix For: 0.94.7


The current replicationSink code make sure it will only create one Put/Delete 
action of the kv of same row if it is from same hlog entry. However, when the 
same row of Put/Delete exist in different hlog entry, multiple Put/Delete 
action will be created, this will cause synchronization cost during the multi 
batch operation. 

In one of our application traffic pattern which has delete for same row twice 
for many rows, we saw doMiniBatchMutation() is invoked many times due to the 
row lock for the same row. ReplicationSink side is super slow, and replication 
queue build up. 

We should put the put/delete for the same row into one Put/Delete action even 
if they are from different hlog entry. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11765) ReplicationSink should merge the Put/Delete of the same row into one Action even if they are from different hlog entry.

2014-08-15 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-11765:
---

Attachment: HBASE-11765.patch

 ReplicationSink should merge the Put/Delete of the same row into one Action 
 even if they are from different hlog entry.
 ---

 Key: HBASE-11765
 URL: https://issues.apache.org/jira/browse/HBASE-11765
 Project: HBase
  Issue Type: Improvement
  Components: Performance, Replication
Affects Versions: 0.94.7
Reporter: Tianying Chang
Assignee: Tianying Chang
 Fix For: 0.94.7

 Attachments: HBASE-11765.patch


 The current replicationSink code make sure it will only create one Put/Delete 
 action of the kv of same row if it is from same hlog entry. However, when the 
 same row of Put/Delete exist in different hlog entry, multiple Put/Delete 
 action will be created, this will cause synchronization cost during the multi 
 batch operation. 
 In one of our application traffic pattern which has delete for same row twice 
 for many rows, we saw doMiniBatchMutation() is invoked many times due to the 
 row lock for the same row. ReplicationSink side is super slow, and 
 replication queue build up. 
 We should put the put/delete for the same row into one Put/Delete action even 
 if they are from different hlog entry. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Work started] (HBASE-11765) ReplicationSink should merge the Put/Delete of the same row into one Action even if they are from different hlog entry.

2014-08-15 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-11765 started by Tianying Chang.

 ReplicationSink should merge the Put/Delete of the same row into one Action 
 even if they are from different hlog entry.
 ---

 Key: HBASE-11765
 URL: https://issues.apache.org/jira/browse/HBASE-11765
 Project: HBase
  Issue Type: Improvement
  Components: Performance, Replication
Affects Versions: 0.94.7
Reporter: Tianying Chang
Assignee: Tianying Chang
 Fix For: 0.94.7

 Attachments: HBASE-11765.patch


 The current replicationSink code make sure it will only create one Put/Delete 
 action of the kv of same row if it is from same hlog entry. However, when the 
 same row of Put/Delete exist in different hlog entry, multiple Put/Delete 
 action will be created, this will cause synchronization cost during the multi 
 batch operation. 
 In one of our application traffic pattern which has delete for same row twice 
 for many rows, we saw doMiniBatchMutation() is invoked many times due to the 
 row lock for the same row. ReplicationSink side is super slow, and 
 replication queue build up. 
 We should put the put/delete for the same row into one Put/Delete action even 
 if they are from different hlog entry. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11684) HBase replicationSource should support multithread to ship the log entry

2014-08-05 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-11684:
--

 Summary: HBase replicationSource should support multithread to 
ship the log entry
 Key: HBASE-11684
 URL: https://issues.apache.org/jira/browse/HBASE-11684
 Project: HBase
  Issue Type: Improvement
  Components: Performance, regionserver, Replication
Reporter: Tianying Chang
Assignee: Tianying Chang


We found the replication rate cannot keep up with the write rate when the 
master cluster is write heavy. We got huge log queue build up due to that. But 
when we do a rolling restart of master cluster, we found that the 
appliedOpsRate doubled due to the extra thread created to help recover the log 
of the restarted RS. ReplicateLogEntries is a synchronous blocking call, it 
becomes the bottleneck when is only runs with one thread. I think we should 
support multi-thread for the replication source to ship the data. I don't see 
any consistency problem. Any other concern here? 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11684) HBase replicationSource should support multithread to ship the log entry

2014-08-05 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087252#comment-14087252
 ] 

Tianying Chang commented on HBASE-11684:


[~jdcryans]  Can you give some comments on this? 

 HBase replicationSource should support multithread to ship the log entry
 

 Key: HBASE-11684
 URL: https://issues.apache.org/jira/browse/HBASE-11684
 Project: HBase
  Issue Type: Improvement
  Components: Performance, regionserver, Replication
Reporter: Tianying Chang
Assignee: Tianying Chang

 We found the replication rate cannot keep up with the write rate when the 
 master cluster is write heavy. We got huge log queue build up due to that. 
 But when we do a rolling restart of master cluster, we found that the 
 appliedOpsRate doubled due to the extra thread created to help recover the 
 log of the restarted RS. ReplicateLogEntries is a synchronous blocking call, 
 it becomes the bottleneck when is only runs with one thread. I think we 
 should support multi-thread for the replication source to ship the data. I 
 don't see any consistency problem. Any other concern here? 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11684) HBase replicationSource should support multithread to ship the log entry

2014-08-05 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang updated HBASE-11684:
---

Component/s: (was: regionserver)

 HBase replicationSource should support multithread to ship the log entry
 

 Key: HBASE-11684
 URL: https://issues.apache.org/jira/browse/HBASE-11684
 Project: HBase
  Issue Type: Improvement
  Components: Performance, Replication
Reporter: Tianying Chang
Assignee: Tianying Chang

 We found the replication rate cannot keep up with the write rate when the 
 master cluster is write heavy. We got huge log queue build up due to that. 
 But when we do a rolling restart of master cluster, we found that the 
 appliedOpsRate doubled due to the extra thread created to help recover the 
 log of the restarted RS. ReplicateLogEntries is a synchronous blocking call, 
 it becomes the bottleneck when is only runs with one thread. I think we 
 should support multi-thread for the replication source to ship the data. I 
 don't see any consistency problem. Any other concern here? 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10935) support snapshot policy where flush memstore can be skipped to prevent production cluster freeze

2014-06-01 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015176#comment-14015176
 ] 

Tianying Chang commented on HBASE-10935:


[~mbertozzi] Sure! Thanks!

 support snapshot policy where flush memstore can be skipped to prevent 
 production cluster freeze
 

 Key: HBASE-10935
 URL: https://issues.apache.org/jira/browse/HBASE-10935
 Project: HBase
  Issue Type: New Feature
  Components: shell, snapshots
Affects Versions: 0.94.7, 0.94.18
Reporter: Tianying Chang
Assignee: Tianying Chang
Priority: Minor
 Fix For: 0.94.7, 0.99.0, 0.94.20

 Attachments: HBASE-10935-0.94-v1.patch, HBASE-10935-0.98-v1.patch, 
 HBASE-10935-trunk-v1.patch, hbase-10935-94.patch, hbase-10935-trunk.patch


 We are using snapshot feature to do HBase disaster recovery. We will do 
 snapshot in our production cluster periodically. The current flush snapshot 
 policy require all regions of the table to coordinate to prevent write and do 
 flush at the same time. Since we use WALPlayer to complete the data that is 
 not in the snapshot HFile, we don't need the snapshot to do coordinated 
 flush. The snapshot just recored all the HFile that are already there. 
 I added the parameter in the HBase shell. So people can choose to use the 
 NoFlush snapshot when they need, like below. Otherwise, the default flush 
 snpahot support is not impacted. 
 snaphot 'TestTable', 'TestSnapshot', 'skipFlush'



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10935) support snapshot policy where flush memstore can be skipped to prevent production cluster freeze

2014-05-29 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012111#comment-14012111
 ] 

Tianying Chang commented on HBASE-10935:


Never mind. I figure out the cause. It is due to inner class. I have tested the 
change, and will upload the updated patch. Thanks. 

 support snapshot policy where flush memstore can be skipped to prevent 
 production cluster freeze
 

 Key: HBASE-10935
 URL: https://issues.apache.org/jira/browse/HBASE-10935
 Project: HBase
  Issue Type: New Feature
  Components: shell, snapshots
Affects Versions: 0.94.7, 0.94.18
Reporter: Tianying Chang
Assignee: Tianying Chang
Priority: Minor
 Fix For: 0.99.0

 Attachments: jira-10935-trunk.patch, jira-10935.patch


 We are using snapshot feature to do HBase disaster recovery. We will do 
 snapshot in our production cluster periodically. The current flush snapshot 
 policy require all regions of the table to coordinate to prevent write and do 
 flush at the same time. Since we use WALPlayer to complete the data that is 
 not in the snapshot HFile, we don't need the snapshot to do coordinated 
 flush. The snapshot just recored all the HFile that are already there. 
 I added the parameter in the HBase shell. So people can choose to use the 
 NoFlush snapshot when they need, like below. Otherwise, the default flush 
 snpahot support is not impacted. 
 snaphot 'TestTable', 'TestSnapshot', 'skipFlush'



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   3   >