[jira] [Commented] (HBASE-20943) Add offline/online region count into metrics
[ https://issues.apache.org/jira/browse/HBASE-20943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572030#comment-16572030 ] Tianying Chang commented on HBASE-20943: [~huaxiang] thanks for reviewing! > Add offline/online region count into metrics > > > Key: HBASE-20943 > URL: https://issues.apache.org/jira/browse/HBASE-20943 > Project: HBase > Issue Type: Improvement > Components: metrics >Affects Versions: 2.0.0, 1.2.6.1 >Reporter: Tianying Chang >Assignee: jinghan xu >Priority: Minor > Attachments: HBASE-20943.patch, Screen Shot 2018-07-25 at 2.51.19 > PM.png > > > We intensively use metrics to monitor the health of our HBase production > cluster. We have seen some regions of a table stuck and cannot be brought > online due to AWS issue which cause some log file corrupted. It will be good > if we can catch this early. Although WebUI has this information, it is not > useful for automated monitoring. By adding this metric, we can easily monitor > them with our monitoring system. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20943) Add offline/online region count into metrics
[ https://issues.apache.org/jira/browse/HBASE-20943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570665#comment-16570665 ] Tianying Chang commented on HBASE-20943: [~yuzhih...@gmail.com] it seems I cannot assign this Jira to [~jinghanx], any permission needed? > Add offline/online region count into metrics > > > Key: HBASE-20943 > URL: https://issues.apache.org/jira/browse/HBASE-20943 > Project: HBase > Issue Type: Improvement > Components: metrics >Affects Versions: 2.0.0, 1.2.6.1 >Reporter: Tianying Chang >Priority: Minor > Attachments: Screen Shot 2018-07-25 at 2.51.19 PM.png > > > We intensively use metrics to monitor the health of our HBase production > cluster. We have seen some regions of a table stuck and cannot be brought > online due to AWS issue which cause some log file corrupted. It will be good > if we can catch this early. Although WebUI has this information, it is not > useful for automated monitoring. By adding this metric, we can easily monitor > them with our monitoring system. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20943) Add offline/online region count into metrics
[ https://issues.apache.org/jira/browse/HBASE-20943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570561#comment-16570561 ] Tianying Chang commented on HBASE-20943: [~yuzhih...@gmail.com] Thanks. My teammate Jinghan Xu will upload the patch. > Add offline/online region count into metrics > > > Key: HBASE-20943 > URL: https://issues.apache.org/jira/browse/HBASE-20943 > Project: HBase > Issue Type: Improvement > Components: metrics >Affects Versions: 2.0.0, 1.2.6.1 >Reporter: Tianying Chang >Priority: Minor > Attachments: Screen Shot 2018-07-25 at 2.51.19 PM.png > > > We intensively use metrics to monitor the health of our HBase production > cluster. We have seen some regions of a table stuck and cannot be brought > online due to AWS issue which cause some log file corrupted. It will be good > if we can catch this early. Although WebUI has this information, it is not > useful for automated monitoring. By adding this metric, we can easily monitor > them with our monitoring system. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20943) Add offline/online region count into metrics
[ https://issues.apache.org/jira/browse/HBASE-20943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-20943: --- Attachment: Screen Shot 2018-07-25 at 2.51.19 PM.png > Add offline/online region count into metrics > > > Key: HBASE-20943 > URL: https://issues.apache.org/jira/browse/HBASE-20943 > Project: HBase > Issue Type: Improvement > Components: metrics >Affects Versions: 2.0.0, 1.2.6.1 >Reporter: Tianying Chang >Priority: Minor > Attachments: Screen Shot 2018-07-25 at 2.51.19 PM.png > > > We intensively use metrics to monitor the health of our HBase production > cluster. We have seen some regions of a table stuck and cannot be brought > online due to AWS issue which cause some log file corrupted. It will be good > if we can catch this early. Although WebUI has this information, it is not > useful for automated monitoring. By adding this metric, we can easily monitor > them with our monitoring system. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20943) Add offline/online region count into metrics
Tianying Chang created HBASE-20943: -- Summary: Add offline/online region count into metrics Key: HBASE-20943 URL: https://issues.apache.org/jira/browse/HBASE-20943 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 1.2.6.1, 2.0.0 Reporter: Tianying Chang We intensively use metrics to monitor the health of our HBase production cluster. We have seen some regions of a table stuck and cannot be brought online due to AWS issue which cause some log file corrupted. It will be good if we can catch this early. Although WebUI has this information, it is not useful for automated monitoring. By adding this metric, we can easily monitor them with our monitoring system. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15400) Use DateTieredCompactor for Date Tiered Compaction
[ https://issues.apache.org/jira/browse/HBASE-15400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168353#comment-16168353 ] Tianying Chang commented on HBASE-15400: thanks [~davelatham] for confirming! We will port the other two also. > Use DateTieredCompactor for Date Tiered Compaction > -- > > Key: HBASE-15400 > URL: https://issues.apache.org/jira/browse/HBASE-15400 > Project: HBase > Issue Type: Sub-task > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15400-0.98.patch, HBASE-15400-15389-v12.patch, > HBASE-15400-branch-1.patch, HBASE-15400.patch, HBASE-15400-v1.pa, > HBASE-15400-v3.patch, HBASE-15400-v3-v3.patch, HBASE-15400-v3-v4.patch, > HBASE-15400-v3-v5.patch, HBASE-15400-v6.patch, HBASE-15400-v7.patch > > > When we compact, we can output multiple files along the current window > boundaries. There are two use cases: > 1. Major compaction: We want to output date tiered store files with data > older than max age archived in trunks of the window size on the higher tier. > Once a window is old enough, we don't combine the windows to promote to the > next tier any further. So files in these windows retain the same timespan as > they were minor-compacted last time, which is the window size of the highest > tier. Major compaction will touch these files and we want to maintain the > same layout. This way, TTL and archiving will be simpler and more efficient. > 2. Bulk load files and the old file generated by major compaction before > upgrading to DTCP. > Pros: > 1. Restore locality, process versioning, updates and deletes while > maintaining the tiered layout. > 2. The best way to fix a skewed layout. > > This work is based on a prototype of DateTieredCompactor from HBASE-15389 and > focused on the part to meet needs for these two use cases while supporting > others. I have to call out a few design decisions: > 1. We only want to output the files along all windows for major compaction. > And we want to output multiple files older than max age in the trunks of the > maximum tier window size determined by base window size, windows per tier and > max age. > 2. For minor compaction, we don't want to output too many files, which will > remain around because of current restriction of contiguous compaction by seq > id. I will only output two files if all the files in the windows are being > combined, one for the data within window and the other for the out-of-window > tail. If there is any file in the window excluded from compaction, only one > file will be output from compaction. When the windows are promoted, the > situation of out of order data will gradually improve. For the incoming > window, we need to accommodate the case with user-specified future data. > 3. We have to pass the boundaries with the list of store file as a complete > time snapshot instead of two separate calls because window layout is > determined by the time the computation is called. So we will need new type of > compaction request. > 4. Since we will assign the same seq id for all output files, we need to sort > by maxTimestamp subsequently. Right now all compaction policy gets the files > sorted for StoreFileManager which sorts by seq id and other criteria. I will > use this order for DTCP only, to avoid impacting other compaction policies. > 5. We need some cleanup of current design of StoreEngine and CompactionPolicy. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-15400) Use DateTieredCompactor for Date Tiered Compaction
[ https://issues.apache.org/jira/browse/HBASE-15400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167075#comment-16167075 ] Tianying Chang commented on HBASE-15400: [~davelatham] Thanks for the information. We are in the process of backporting date tired compaction into our 1.2 branch now. One question we have is one teammate has backported HBASE-15181, but not sure if this one HBASE-15400 is also absolutely needed? It seems an important improvement that keep the number HFiles under control? Wondering does this mean if only backport 15181, the number of HFiles in older tier will grow too high? > Use DateTieredCompactor for Date Tiered Compaction > -- > > Key: HBASE-15400 > URL: https://issues.apache.org/jira/browse/HBASE-15400 > Project: HBase > Issue Type: Sub-task > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15400-0.98.patch, HBASE-15400-15389-v12.patch, > HBASE-15400-branch-1.patch, HBASE-15400.patch, HBASE-15400-v1.pa, > HBASE-15400-v3.patch, HBASE-15400-v3-v3.patch, HBASE-15400-v3-v4.patch, > HBASE-15400-v3-v5.patch, HBASE-15400-v6.patch, HBASE-15400-v7.patch > > > When we compact, we can output multiple files along the current window > boundaries. There are two use cases: > 1. Major compaction: We want to output date tiered store files with data > older than max age archived in trunks of the window size on the higher tier. > Once a window is old enough, we don't combine the windows to promote to the > next tier any further. So files in these windows retain the same timespan as > they were minor-compacted last time, which is the window size of the highest > tier. Major compaction will touch these files and we want to maintain the > same layout. This way, TTL and archiving will be simpler and more efficient. > 2. Bulk load files and the old file generated by major compaction before > upgrading to DTCP. > Pros: > 1. Restore locality, process versioning, updates and deletes while > maintaining the tiered layout. > 2. The best way to fix a skewed layout. > > This work is based on a prototype of DateTieredCompactor from HBASE-15389 and > focused on the part to meet needs for these two use cases while supporting > others. I have to call out a few design decisions: > 1. We only want to output the files along all windows for major compaction. > And we want to output multiple files older than max age in the trunks of the > maximum tier window size determined by base window size, windows per tier and > max age. > 2. For minor compaction, we don't want to output too many files, which will > remain around because of current restriction of contiguous compaction by seq > id. I will only output two files if all the files in the windows are being > combined, one for the data within window and the other for the out-of-window > tail. If there is any file in the window excluded from compaction, only one > file will be output from compaction. When the windows are promoted, the > situation of out of order data will gradually improve. For the incoming > window, we need to accommodate the case with user-specified future data. > 3. We have to pass the boundaries with the list of store file as a complete > time snapshot instead of two separate calls because window layout is > determined by the time the computation is called. So we will need new type of > compaction request. > 4. Since we will assign the same seq id for all output files, we need to sort > by maxTimestamp subsequently. Right now all compaction policy gets the files > sorted for StoreFileManager which sorts by seq id and other criteria. I will > use this order for DTCP only, to avoid impacting other compaction policies. > 5. We need some cleanup of current design of StoreEngine and CompactionPolicy. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17138) Backport read-path offheap (HBASE-11425) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941195#comment-15941195 ] Tianying Chang commented on HBASE-17138: > the above jira ids are listed in chronological order I have backport,and > there are also some patch i dont list, such as: > 1. keep Cell api compatible with our existing code. > 2. hfile format related compatibility issues. > 3. client compatibility issue [~hongxi] [~carp84] Just to want to clarify do you have the above related jira attached in the list? how much benefit does HBASE-15756 contribute for the Singles day performane boost on top of this offheap feature? > Backport read-path offheap (HBASE-11425) to branch-1 > > > Key: HBASE-17138 > URL: https://issues.apache.org/jira/browse/HBASE-17138 > Project: HBase > Issue Type: Improvement >Reporter: Yu Li >Assignee: Yu Sun > Attachments: > 0001-fix-EHB-511-Resolve-client-compatibility-issue-introduced-by-offheap-change.patch, > 0001-to-EHB-446-offheap-hfile-format-should-keep-compatible-v3.patch, > 0001-to-EHB-456-Cell-should-be-compatible-with-branch-1.1.2.patch > > > From the > [thread|http://mail-archives.apache.org/mod_mbox/hbase-user/201611.mbox/%3CCAM7-19%2Bn7cEiY4H9iLQ3N9V0NXppOPduZwk-hhgNLEaJfiV3kA%40mail.gmail.com%3E] > of sharing our experience and performance data of read-path offheap usage in > Alibaba search, we could see people are positive to have HBASE-11425 in > branch-1, so I'd like to create a JIRA and move the discussion and decision > making here. > Echoing some comments from the mail thread: > Bryan: > Is the backported patch available anywhere? If it ends up not getting > officially backported to branch-1 due to 2.0 around the corner, some of us > who build our own deploy may want to integrate into our builds > Andrew: > Yes, please, the patches will be useful to the community even if we decide > not to backport into an official 1.x release. > Enis: > I don't see any reason why we cannot backport to branch-1. > Ted: > Opening a JIRA would be fine. This makes it easier for people to obtain the > patch(es) > Nick: > From the DISCUSS thread re: EOL of 1.1, it seems we'll continue to > support 1.x releases for some time... I would guess these will be > maintained until 2.2 at least. Therefore, offheap patches that have seen > production exposure seem like a reasonable candidate for backport, perhaps in > a 1.4 or 1.5 release timeframe. > Anoop: > Because of some compatibility issues, we decide that this will be done in 2.0 > only.. Ya as Andy said, it would be great to share the 1.x backported > patches. > The following is all the jira ids we have back ported: > HBASE-10930 Change Filters and GetClosestRowBeforeTracker to work with Cells > (Ram) > HBASE-13373 Squash HFileReaderV3 together with HFileReaderV2 and > AbstractHFileReader; ditto for Scanners and BlockReader, etc. > HBASE-13429 Remove deprecated seek/reseek methods from HFileScanner. > HBASE-13450 - Purge RawBytescomparator from the writers and readers for > HBASE-10800 (Ram) > HBASE-13501 - Deprecate/Remove getComparator() in HRegionInfo. > HBASE-12048 Remove deprecated APIs from Filter. > HBASE-10800 - Use CellComparator instead of KVComparator (Ram) > HBASE-13679 Change ColumnTracker and SQM to deal with Cell instead of byte[], > int, int. > HBASE-13642 Deprecate RegionObserver#postScannerFilterRow CP hook with > byte[],int,int args in favor of taking Cell arg. > HBASE-13641 Deperecate Filter#filterRowKey(byte[] buffer, int offset, int > length) in favor of filterRowKey(Cell firstRowCell). > HBASE-13827 Delayed scanner close in KeyValueHeap and StoreScanner. > HBASE-13871 Change RegionScannerImpl to deal with Cell instead of byte[], > int, int. > HBASE-11911 Break up tests into more fine grained categories (Alex Newman) > HBASE-12059 Create hbase-annotations module > HBASE-12106 Move test annotations to test artifact (Enis Soztutar) > HBASE-13916 Create MultiByteBuffer an aggregation of ByteBuffers. > HBASE-15679 Assertion on wrong variable in > TestReplicationThrottler#testThrottling > HBASE-13931 Move Unsafe based operations to UnsafeAccess. > HBASE-12345 Unsafe based ByteBuffer Comparator. > HBASE-13998 Remove CellComparator#compareRows(byte[] left, int loffset, int > llength, byte[] right, int roffset, int rlength). > HBASE-13998 Remove CellComparator#compareRows()- Addendum to fix javadoc warn > HBASE-13579 Avoid isCellTTLExpired() for NO-TAG cases (partially backport > this patch) > HBASE-13448 New Cell implementation with cached component offsets/lengths. > HBASE-13387 Add ByteBufferedCell an extension to Cell. > HBASE-13387 Add ByteBufferedCell an extension to Cell - addendum. > HBASE-12650 Move ServerName to hbase-common module (partially backport this
[jira] [Commented] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939355#comment-15939355 ] Tianying Chang commented on HBASE-17453: [~saint@gmail.com] One more question, I can see those method from Admin.proto has higher priority. But what is the criteria to decide what kind of methods should go to Admin.proto vs Client.proto. It seems all of them are implemented at RSRpcService.java any ways. > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Fix For: 2.0.0, 1.2.5 > > Attachments: HBASE-17453-1.2.patch, > HBASE-17453-master-fixWhiteSpace.patch, HBASE-17453-master.patch, > HBASE-17453-master-v1.patch, HBASE-17453-master-v2.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938876#comment-15938876 ] Tianying Chang commented on HBASE-17453: [~saint@gmail.com] Actually, as long as the method is in RSRpcServices, it serves my purpose, since I just need to test the connection with the specific RS and reconnect if needed, to make sure other operations like get, or mutate can succeed. It is just in AsynchHBase, it only has Client.proto/Cell.proto/HBase.proto/RPC.proto, no Admin.Protos. it is just grouping/naming by Asynchbase. I will move the Ping API into Admin.proto in the server patch,(no need to make my Asynchbase client side code change at all) and regenerate the patch. > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Fix For: 2.0.0, 1.2.5 > > Attachments: HBASE-17453-1.2.patch, > HBASE-17453-master-fixWhiteSpace.patch, HBASE-17453-master.patch, > HBASE-17453-master-v1.patch, HBASE-17453-master-v2.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932869#comment-15932869 ] Tianying Chang commented on HBASE-15181: [~enis] that is a great point. [~davelatham] I am wondering will opentsdb benefit from it thought since I assume it set startRow/stoprow with start/stop time encoded in it. That should have already excluded all unnecessary data? > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.18 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing > Results in our production is at > https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit# -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-17453: --- Status: In Progress (was: Patch Available) > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Fix For: 2.0.0, 1.2.5 > > Attachments: HBASE-17453-1.2.patch, > HBASE-17453-master-fixWhiteSpace.patch, HBASE-17453-master.patch, > HBASE-17453-master-v1.patch, HBASE-17453-master-v2.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-17453: --- Attachment: HBASE-17453-master-fixWhiteSpace.patch > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Fix For: 2.0.0, 1.2.5 > > Attachments: HBASE-17453-1.2.patch, > HBASE-17453-master-fixWhiteSpace.patch, HBASE-17453-master.patch, > HBASE-17453-master-v1.patch, HBASE-17453-master-v2.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-17453: --- Status: Patch Available (was: In Progress) > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Fix For: 2.0.0, 1.2.5 > > Attachments: HBASE-17453-1.2.patch, > HBASE-17453-master-fixWhiteSpace.patch, HBASE-17453-master.patch, > HBASE-17453-master-v1.patch, HBASE-17453-master-v2.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928560#comment-15928560 ] Tianying Chang commented on HBASE-15181: [~davelatham] Good point about the set time range vs encoding the time information into the row. By opentsdb schema design, my gut feeling is it is not using set time range, unfortunately. I will verify it. If it is not using set time range, I guess since our opentsdb usage configured to only keep 28 days with TTL set to 28, one benefit is we can utilize is it can drop the whole store files with the expired TTL. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.18 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing > Results in our production is at > https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit# -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927035#comment-15927035 ] Tianying Chang commented on HBASE-15181: [~davelatham] Thanks a lot for the information. Will look through the subtasks on HBASE-15339 also. Will give it a try to apply those patches to 1.2. My feeling is this kind of compaction algorithm should be very suitable for opentsdb use case, do you know if anyone had concrete experience on the improvement from this compaction algorithm on opentsdb? > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.18 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing > Results in our production is at > https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit# -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-17453: --- Status: Patch Available (was: In Progress) > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Fix For: 2.0.0, 1.2.5 > > Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch, > HBASE-17453-master-v1.patch, HBASE-17453-master-v2.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-17453: --- Status: In Progress (was: Patch Available) > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Fix For: 2.0.0, 1.2.5 > > Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch, > HBASE-17453-master-v1.patch, HBASE-17453-master-v2.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-17453: --- Attachment: HBASE-17453-master-v2.patch > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Fix For: 2.0.0, 1.2.5 > > Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch, > HBASE-17453-master-v1.patch, HBASE-17453-master-v2.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926652#comment-15926652 ] Tianying Chang commented on HBASE-17453: [~anoop.hbase] Yes, I agree the input parameter for ping is useless. will remove that. and thanks for catching included the MasterProtos.java into the package, will remove that also. [~saint@gmail.com] Our usage is to call from asynchbase client to test the client/server connection, so I feel it is better to be in Client.proto. But I am fine to put it in Admin.proto if it can achieve the same goal. As to coprocessor, that means all the tables in the cluster has to deploy the noop coprocessor, right? If so, that seems too much operation overhead. > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Fix For: 2.0.0, 1.2.5 > > Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch, > HBASE-17453-master-v1.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-15400) Use DateTieredCompactor for Date Tiered Compaction
[ https://issues.apache.org/jira/browse/HBASE-15400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925063#comment-15925063 ] Tianying Chang commented on HBASE-15400: [~clarax98007] I am wondering will this be back ported to 1.2.x? > Use DateTieredCompactor for Date Tiered Compaction > -- > > Key: HBASE-15400 > URL: https://issues.apache.org/jira/browse/HBASE-15400 > Project: HBase > Issue Type: Sub-task > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0 > > Attachments: HBASE-15400-0.98.patch, HBASE-15400-15389-v12.patch, > HBASE-15400-branch-1.patch, HBASE-15400.patch, HBASE-15400-v1.pa, > HBASE-15400-v3.patch, HBASE-15400-v3-v3.patch, HBASE-15400-v3-v4.patch, > HBASE-15400-v3-v5.patch, HBASE-15400-v6.patch, HBASE-15400-v7.patch > > > When we compact, we can output multiple files along the current window > boundaries. There are two use cases: > 1. Major compaction: We want to output date tiered store files with data > older than max age archived in trunks of the window size on the higher tier. > Once a window is old enough, we don't combine the windows to promote to the > next tier any further. So files in these windows retain the same timespan as > they were minor-compacted last time, which is the window size of the highest > tier. Major compaction will touch these files and we want to maintain the > same layout. This way, TTL and archiving will be simpler and more efficient. > 2. Bulk load files and the old file generated by major compaction before > upgrading to DTCP. > Pros: > 1. Restore locality, process versioning, updates and deletes while > maintaining the tiered layout. > 2. The best way to fix a skewed layout. > > This work is based on a prototype of DateTieredCompactor from HBASE-15389 and > focused on the part to meet needs for these two use cases while supporting > others. I have to call out a few design decisions: > 1. We only want to output the files along all windows for major compaction. > And we want to output multiple files older than max age in the trunks of the > maximum tier window size determined by base window size, windows per tier and > max age. > 2. For minor compaction, we don't want to output too many files, which will > remain around because of current restriction of contiguous compaction by seq > id. I will only output two files if all the files in the windows are being > combined, one for the data within window and the other for the out-of-window > tail. If there is any file in the window excluded from compaction, only one > file will be output from compaction. When the windows are promoted, the > situation of out of order data will gradually improve. For the incoming > window, we need to accommodate the case with user-specified future data. > 3. We have to pass the boundaries with the list of store file as a complete > time snapshot instead of two separate calls because window layout is > determined by the time the computation is called. So we will need new type of > compaction request. > 4. Since we will assign the same seq id for all output files, we need to sort > by maxTimestamp subsequently. Right now all compaction policy gets the files > sorted for StoreFileManager which sorts by seq id and other criteria. I will > use this order for DTCP only, to avoid impacting other compaction policies. > 5. We need some cleanup of current design of StoreEngine and CompactionPolicy. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924776#comment-15924776 ] Tianying Chang commented on HBASE-15181: [~clarax98007] Do you have patch for 1.2 also? > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.18 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing > Results in our production is at > https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit# -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17138) Backport read-path offheap (HBASE-11425) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924612#comment-15924612 ] Tianying Chang commented on HBASE-17138: [~carp84] I totally agree that should spend enough time to understand all the JIRA listed here, so have the quality and capability to handle problems in production if any. I am wondering if you can just post your patches against 1.1 here, just as a reference purpose? (no need to worry about clean apply-able) This way, for people doing back port, they can leverage your knowledge/lessons/experience, and have the extra information to debug if needed. This way, the back port can be done with less time. :) > Backport read-path offheap (HBASE-11425) to branch-1 > > > Key: HBASE-17138 > URL: https://issues.apache.org/jira/browse/HBASE-17138 > Project: HBase > Issue Type: Improvement >Reporter: Yu Li >Assignee: Yu Sun > Attachments: > 0001-fix-EHB-511-Resolve-client-compatibility-issue-introduced-by-offheap-change.patch, > 0001-to-EHB-446-offheap-hfile-format-should-keep-compatible-v3.patch, > 0001-to-EHB-456-Cell-should-be-compatible-with-branch-1.1.2.patch > > > From the > [thread|http://mail-archives.apache.org/mod_mbox/hbase-user/201611.mbox/%3CCAM7-19%2Bn7cEiY4H9iLQ3N9V0NXppOPduZwk-hhgNLEaJfiV3kA%40mail.gmail.com%3E] > of sharing our experience and performance data of read-path offheap usage in > Alibaba search, we could see people are positive to have HBASE-11425 in > branch-1, so I'd like to create a JIRA and move the discussion and decision > making here. > Echoing some comments from the mail thread: > Bryan: > Is the backported patch available anywhere? If it ends up not getting > officially backported to branch-1 due to 2.0 around the corner, some of us > who build our own deploy may want to integrate into our builds > Andrew: > Yes, please, the patches will be useful to the community even if we decide > not to backport into an official 1.x release. > Enis: > I don't see any reason why we cannot backport to branch-1. > Ted: > Opening a JIRA would be fine. This makes it easier for people to obtain the > patch(es) > Nick: > From the DISCUSS thread re: EOL of 1.1, it seems we'll continue to > support 1.x releases for some time... I would guess these will be > maintained until 2.2 at least. Therefore, offheap patches that have seen > production exposure seem like a reasonable candidate for backport, perhaps in > a 1.4 or 1.5 release timeframe. > Anoop: > Because of some compatibility issues, we decide that this will be done in 2.0 > only.. Ya as Andy said, it would be great to share the 1.x backported > patches. > The following is all the jira ids we have back ported: > HBASE-10930 Change Filters and GetClosestRowBeforeTracker to work with Cells > (Ram) > HBASE-13373 Squash HFileReaderV3 together with HFileReaderV2 and > AbstractHFileReader; ditto for Scanners and BlockReader, etc. > HBASE-13429 Remove deprecated seek/reseek methods from HFileScanner. > HBASE-13450 - Purge RawBytescomparator from the writers and readers for > HBASE-10800 (Ram) > HBASE-13501 - Deprecate/Remove getComparator() in HRegionInfo. > HBASE-12048 Remove deprecated APIs from Filter. > HBASE-10800 - Use CellComparator instead of KVComparator (Ram) > HBASE-13679 Change ColumnTracker and SQM to deal with Cell instead of byte[], > int, int. > HBASE-13642 Deprecate RegionObserver#postScannerFilterRow CP hook with > byte[],int,int args in favor of taking Cell arg. > HBASE-13641 Deperecate Filter#filterRowKey(byte[] buffer, int offset, int > length) in favor of filterRowKey(Cell firstRowCell). > HBASE-13827 Delayed scanner close in KeyValueHeap and StoreScanner. > HBASE-13871 Change RegionScannerImpl to deal with Cell instead of byte[], > int, int. > HBASE-11911 Break up tests into more fine grained categories (Alex Newman) > HBASE-12059 Create hbase-annotations module > HBASE-12106 Move test annotations to test artifact (Enis Soztutar) > HBASE-13916 Create MultiByteBuffer an aggregation of ByteBuffers. > HBASE-15679 Assertion on wrong variable in > TestReplicationThrottler#testThrottling > HBASE-13931 Move Unsafe based operations to UnsafeAccess. > HBASE-12345 Unsafe based ByteBuffer Comparator. > HBASE-13998 Remove CellComparator#compareRows(byte[] left, int loffset, int > llength, byte[] right, int roffset, int rlength). > HBASE-13998 Remove CellComparator#compareRows()- Addendum to fix javadoc warn > HBASE-13579 Avoid isCellTTLExpired() for NO-TAG cases (partially backport > this patch) > HBASE-13448 New Cell implementation with cached component offsets/lengths. > HBASE-13387 Add ByteBufferedCell an extension to Cell. > HBASE-13387 Add ByteBufferedCell an extension to Cell - addendum. > HBASE-12650 Move ServerName to
[jira] [Commented] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924582#comment-15924582 ] Tianying Chang commented on HBASE-17453: It seems the test error is not related to my change except the whitespace one. Is this expected? > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Fix For: 2.0.0, 1.2.5 > > Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch, > HBASE-17453-master-v1.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-17453: --- Status: Patch Available (was: In Progress) > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Fix For: 2.0.0, 1.2.5 > > Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch, > HBASE-17453-master-v1.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923560#comment-15923560 ] Tianying Chang commented on HBASE-17453: attached the patch with the extra white space removed. > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Fix For: 2.0.0, 1.2.5 > > Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch, > HBASE-17453-master-v1.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-17453: --- Status: In Progress (was: Patch Available) > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Fix For: 2.0.0, 1.2.5 > > Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch, > HBASE-17453-master-v1.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-17453: --- Attachment: HBASE-17453-master-v1.patch > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Fix For: 2.0.0, 1.2.5 > > Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch, > HBASE-17453-master-v1.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-17453: --- Status: Patch Available (was: In Progress) > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Fix For: 2.0.0, 1.2.5 > > Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906295#comment-15906295 ] Tianying Chang commented on HBASE-17453: [~ted_yu] "Submit Patch" is the right action to take at this moment? > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Fix For: 2.0.0, 1.2.5 > > Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-17453: --- Fix Version/s: 1.2.5 2.0.0 > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Fix For: 2.0.0, 1.2.5 > > Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Work started] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-17453 started by Tianying Chang. -- > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Work stopped] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-17453 stopped by Tianying Chang. -- > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906231#comment-15906231 ] Tianying Chang commented on HBASE-17453: [~ted_yu] Thanks a lot for the url link! Will read that. I found hbase-protocol-shaded/README.txt and followed that to regenerate the patch. Now it is passing those errors on my box. I have re-resubmitted the patch. Thanks again for your fast response! > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-17453: --- Attachment: HBASE-17453-master.patch > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-17453: --- Attachment: (was: HBASE-17453-master.patch) > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Attachments: HBASE-17453-1.2.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17138) Backport read-path offheap (HBASE-11425) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906118#comment-15906118 ] Tianying Chang commented on HBASE-17138: thanks [~anoop.hbase] what do you mean by trunk patches backport? Is this Alibaba's patch list for backporting from trunk/2.0 to 1.1? So the patch list above theoretically can be applied one by one on 1.2.1 open source version? > Backport read-path offheap (HBASE-11425) to branch-1 > > > Key: HBASE-17138 > URL: https://issues.apache.org/jira/browse/HBASE-17138 > Project: HBase > Issue Type: Improvement >Reporter: Yu Li >Assignee: Yu Sun > Attachments: > 0001-fix-EHB-511-Resolve-client-compatibility-issue-introduced-by-offheap-change.patch, > 0001-to-EHB-446-offheap-hfile-format-should-keep-compatible-v3.patch, > 0001-to-EHB-456-Cell-should-be-compatible-with-branch-1.1.2.patch > > > From the > [thread|http://mail-archives.apache.org/mod_mbox/hbase-user/201611.mbox/%3CCAM7-19%2Bn7cEiY4H9iLQ3N9V0NXppOPduZwk-hhgNLEaJfiV3kA%40mail.gmail.com%3E] > of sharing our experience and performance data of read-path offheap usage in > Alibaba search, we could see people are positive to have HBASE-11425 in > branch-1, so I'd like to create a JIRA and move the discussion and decision > making here. > Echoing some comments from the mail thread: > Bryan: > Is the backported patch available anywhere? If it ends up not getting > officially backported to branch-1 due to 2.0 around the corner, some of us > who build our own deploy may want to integrate into our builds > Andrew: > Yes, please, the patches will be useful to the community even if we decide > not to backport into an official 1.x release. > Enis: > I don't see any reason why we cannot backport to branch-1. > Ted: > Opening a JIRA would be fine. This makes it easier for people to obtain the > patch(es) > Nick: > From the DISCUSS thread re: EOL of 1.1, it seems we'll continue to > support 1.x releases for some time... I would guess these will be > maintained until 2.2 at least. Therefore, offheap patches that have seen > production exposure seem like a reasonable candidate for backport, perhaps in > a 1.4 or 1.5 release timeframe. > Anoop: > Because of some compatibility issues, we decide that this will be done in 2.0 > only.. Ya as Andy said, it would be great to share the 1.x backported > patches. > The following is all the jira ids we have back ported: > HBASE-10930 Change Filters and GetClosestRowBeforeTracker to work with Cells > (Ram) > HBASE-13373 Squash HFileReaderV3 together with HFileReaderV2 and > AbstractHFileReader; ditto for Scanners and BlockReader, etc. > HBASE-13429 Remove deprecated seek/reseek methods from HFileScanner. > HBASE-13450 - Purge RawBytescomparator from the writers and readers for > HBASE-10800 (Ram) > HBASE-13501 - Deprecate/Remove getComparator() in HRegionInfo. > HBASE-12048 Remove deprecated APIs from Filter. > HBASE-10800 - Use CellComparator instead of KVComparator (Ram) > HBASE-13679 Change ColumnTracker and SQM to deal with Cell instead of byte[], > int, int. > HBASE-13642 Deprecate RegionObserver#postScannerFilterRow CP hook with > byte[],int,int args in favor of taking Cell arg. > HBASE-13641 Deperecate Filter#filterRowKey(byte[] buffer, int offset, int > length) in favor of filterRowKey(Cell firstRowCell). > HBASE-13827 Delayed scanner close in KeyValueHeap and StoreScanner. > HBASE-13871 Change RegionScannerImpl to deal with Cell instead of byte[], > int, int. > HBASE-11911 Break up tests into more fine grained categories (Alex Newman) > HBASE-12059 Create hbase-annotations module > HBASE-12106 Move test annotations to test artifact (Enis Soztutar) > HBASE-13916 Create MultiByteBuffer an aggregation of ByteBuffers. > HBASE-15679 Assertion on wrong variable in > TestReplicationThrottler#testThrottling > HBASE-13931 Move Unsafe based operations to UnsafeAccess. > HBASE-12345 Unsafe based ByteBuffer Comparator. > HBASE-13998 Remove CellComparator#compareRows(byte[] left, int loffset, int > llength, byte[] right, int roffset, int rlength). > HBASE-13998 Remove CellComparator#compareRows()- Addendum to fix javadoc warn > HBASE-13579 Avoid isCellTTLExpired() for NO-TAG cases (partially backport > this patch) > HBASE-13448 New Cell implementation with cached component offsets/lengths. > HBASE-13387 Add ByteBufferedCell an extension to Cell. > HBASE-13387 Add ByteBufferedCell an extension to Cell - addendum. > HBASE-12650 Move ServerName to hbase-common module (partially backport this > patch) > HBASE-12296 Filters should work with ByteBufferedCell. > HBASE-14120 ByteBufferUtils#compareTo small optimization. > HBASE-13510 - Purge ByteBloomFilter (Ram) > HBASE-13451 - Make the HFileBlockIndex blockKeys to Cells so that it could
[jira] [Commented] (HBASE-17138) Backport read-path offheap (HBASE-11425) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906117#comment-15906117 ] Tianying Chang commented on HBASE-17138: thanks [~anoop.hbase] what do you mean by trunk patches backport? Is this Alibaba's patch list for backporting from trunk/2.0 to 1.1? So the patch list above theoretically can be applied one by one on 1.2.1 open source version? > Backport read-path offheap (HBASE-11425) to branch-1 > > > Key: HBASE-17138 > URL: https://issues.apache.org/jira/browse/HBASE-17138 > Project: HBase > Issue Type: Improvement >Reporter: Yu Li >Assignee: Yu Sun > Attachments: > 0001-fix-EHB-511-Resolve-client-compatibility-issue-introduced-by-offheap-change.patch, > 0001-to-EHB-446-offheap-hfile-format-should-keep-compatible-v3.patch, > 0001-to-EHB-456-Cell-should-be-compatible-with-branch-1.1.2.patch > > > From the > [thread|http://mail-archives.apache.org/mod_mbox/hbase-user/201611.mbox/%3CCAM7-19%2Bn7cEiY4H9iLQ3N9V0NXppOPduZwk-hhgNLEaJfiV3kA%40mail.gmail.com%3E] > of sharing our experience and performance data of read-path offheap usage in > Alibaba search, we could see people are positive to have HBASE-11425 in > branch-1, so I'd like to create a JIRA and move the discussion and decision > making here. > Echoing some comments from the mail thread: > Bryan: > Is the backported patch available anywhere? If it ends up not getting > officially backported to branch-1 due to 2.0 around the corner, some of us > who build our own deploy may want to integrate into our builds > Andrew: > Yes, please, the patches will be useful to the community even if we decide > not to backport into an official 1.x release. > Enis: > I don't see any reason why we cannot backport to branch-1. > Ted: > Opening a JIRA would be fine. This makes it easier for people to obtain the > patch(es) > Nick: > From the DISCUSS thread re: EOL of 1.1, it seems we'll continue to > support 1.x releases for some time... I would guess these will be > maintained until 2.2 at least. Therefore, offheap patches that have seen > production exposure seem like a reasonable candidate for backport, perhaps in > a 1.4 or 1.5 release timeframe. > Anoop: > Because of some compatibility issues, we decide that this will be done in 2.0 > only.. Ya as Andy said, it would be great to share the 1.x backported > patches. > The following is all the jira ids we have back ported: > HBASE-10930 Change Filters and GetClosestRowBeforeTracker to work with Cells > (Ram) > HBASE-13373 Squash HFileReaderV3 together with HFileReaderV2 and > AbstractHFileReader; ditto for Scanners and BlockReader, etc. > HBASE-13429 Remove deprecated seek/reseek methods from HFileScanner. > HBASE-13450 - Purge RawBytescomparator from the writers and readers for > HBASE-10800 (Ram) > HBASE-13501 - Deprecate/Remove getComparator() in HRegionInfo. > HBASE-12048 Remove deprecated APIs from Filter. > HBASE-10800 - Use CellComparator instead of KVComparator (Ram) > HBASE-13679 Change ColumnTracker and SQM to deal with Cell instead of byte[], > int, int. > HBASE-13642 Deprecate RegionObserver#postScannerFilterRow CP hook with > byte[],int,int args in favor of taking Cell arg. > HBASE-13641 Deperecate Filter#filterRowKey(byte[] buffer, int offset, int > length) in favor of filterRowKey(Cell firstRowCell). > HBASE-13827 Delayed scanner close in KeyValueHeap and StoreScanner. > HBASE-13871 Change RegionScannerImpl to deal with Cell instead of byte[], > int, int. > HBASE-11911 Break up tests into more fine grained categories (Alex Newman) > HBASE-12059 Create hbase-annotations module > HBASE-12106 Move test annotations to test artifact (Enis Soztutar) > HBASE-13916 Create MultiByteBuffer an aggregation of ByteBuffers. > HBASE-15679 Assertion on wrong variable in > TestReplicationThrottler#testThrottling > HBASE-13931 Move Unsafe based operations to UnsafeAccess. > HBASE-12345 Unsafe based ByteBuffer Comparator. > HBASE-13998 Remove CellComparator#compareRows(byte[] left, int loffset, int > llength, byte[] right, int roffset, int rlength). > HBASE-13998 Remove CellComparator#compareRows()- Addendum to fix javadoc warn > HBASE-13579 Avoid isCellTTLExpired() for NO-TAG cases (partially backport > this patch) > HBASE-13448 New Cell implementation with cached component offsets/lengths. > HBASE-13387 Add ByteBufferedCell an extension to Cell. > HBASE-13387 Add ByteBufferedCell an extension to Cell - addendum. > HBASE-12650 Move ServerName to hbase-common module (partially backport this > patch) > HBASE-12296 Filters should work with ByteBufferedCell. > HBASE-14120 ByteBufferUtils#compareTo small optimization. > HBASE-13510 - Purge ByteBloomFilter (Ram) > HBASE-13451 - Make the HFileBlockIndex blockKeys to Cells so that it could
[jira] [Commented] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906112#comment-15906112 ] Tianying Chang commented on HBASE-17453: [~ted_yu] It seems for generating the patch for master 2.0 version, I also need to add the new API into "hbase-protocol-shaded/src/main/protobuf/Client.proto" besides just "hbase-protocol/src/main/protobuf/Client.proto" as I did for 1.2.5. ? I made same change to "hbase-protocol-shaded/src/main/protobuf/Client.proto", but when generating the ClientProtos.java file, they have differences other than my change, e.g. "return Consistency.forNumber(number);" vs "return Consistency.valueOf(number); I am wondering what version of protoc is HBase master using? > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905900#comment-15905900 ] Tianying Chang commented on HBASE-17453: [~ted_yu] thanks for checking. I only run the full test and passed for the patch for 1.2.5. So did not run the test for master patch on my box. Let me find out where I missed the porting. > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HBASE-17138) Backport read-path offheap (HBASE-11425) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905595#comment-15905595 ] Tianying Chang edited comment on HBASE-17138 at 3/10/17 7:44 PM: - [~anoop.hbase] Yes, we are interested in getting this patch in our production online facint cluster which is running on 1.2.1, any help from you will be great! It seems the customized version in Alibaba was forked from 1.1 (although on top of it with many of their private patches). If so I thought the effort fro backport to 1.2 should be similar? Do you have a high level sense what else need to be done besides those 70+ patches? Thanks was (Author: tychang): [~anoopamz] Yes, we are interested in getting this patch in our production online facint cluster which is running on 1.2.1, any help from you will be great! It seems the customized version in Alibaba was forked from 1.1 (although on top of it with many of their private patches). If so I thought the effort fro backport to 1.2 should be similar? Do you have a high level sense what else need to be done besides those 70+ patches? Thanks > Backport read-path offheap (HBASE-11425) to branch-1 > > > Key: HBASE-17138 > URL: https://issues.apache.org/jira/browse/HBASE-17138 > Project: HBase > Issue Type: Improvement >Reporter: Yu Li >Assignee: Yu Sun > Attachments: > 0001-fix-EHB-511-Resolve-client-compatibility-issue-introduced-by-offheap-change.patch, > 0001-to-EHB-446-offheap-hfile-format-should-keep-compatible-v3.patch, > 0001-to-EHB-456-Cell-should-be-compatible-with-branch-1.1.2.patch > > > From the > [thread|http://mail-archives.apache.org/mod_mbox/hbase-user/201611.mbox/%3CCAM7-19%2Bn7cEiY4H9iLQ3N9V0NXppOPduZwk-hhgNLEaJfiV3kA%40mail.gmail.com%3E] > of sharing our experience and performance data of read-path offheap usage in > Alibaba search, we could see people are positive to have HBASE-11425 in > branch-1, so I'd like to create a JIRA and move the discussion and decision > making here. > Echoing some comments from the mail thread: > Bryan: > Is the backported patch available anywhere? If it ends up not getting > officially backported to branch-1 due to 2.0 around the corner, some of us > who build our own deploy may want to integrate into our builds > Andrew: > Yes, please, the patches will be useful to the community even if we decide > not to backport into an official 1.x release. > Enis: > I don't see any reason why we cannot backport to branch-1. > Ted: > Opening a JIRA would be fine. This makes it easier for people to obtain the > patch(es) > Nick: > From the DISCUSS thread re: EOL of 1.1, it seems we'll continue to > support 1.x releases for some time... I would guess these will be > maintained until 2.2 at least. Therefore, offheap patches that have seen > production exposure seem like a reasonable candidate for backport, perhaps in > a 1.4 or 1.5 release timeframe. > Anoop: > Because of some compatibility issues, we decide that this will be done in 2.0 > only.. Ya as Andy said, it would be great to share the 1.x backported > patches. > The following is all the jira ids we have back ported: > HBASE-10930 Change Filters and GetClosestRowBeforeTracker to work with Cells > (Ram) > HBASE-13373 Squash HFileReaderV3 together with HFileReaderV2 and > AbstractHFileReader; ditto for Scanners and BlockReader, etc. > HBASE-13429 Remove deprecated seek/reseek methods from HFileScanner. > HBASE-13450 - Purge RawBytescomparator from the writers and readers for > HBASE-10800 (Ram) > HBASE-13501 - Deprecate/Remove getComparator() in HRegionInfo. > HBASE-12048 Remove deprecated APIs from Filter. > HBASE-10800 - Use CellComparator instead of KVComparator (Ram) > HBASE-13679 Change ColumnTracker and SQM to deal with Cell instead of byte[], > int, int. > HBASE-13642 Deprecate RegionObserver#postScannerFilterRow CP hook with > byte[],int,int args in favor of taking Cell arg. > HBASE-13641 Deperecate Filter#filterRowKey(byte[] buffer, int offset, int > length) in favor of filterRowKey(Cell firstRowCell). > HBASE-13827 Delayed scanner close in KeyValueHeap and StoreScanner. > HBASE-13871 Change RegionScannerImpl to deal with Cell instead of byte[], > int, int. > HBASE-11911 Break up tests into more fine grained categories (Alex Newman) > HBASE-12059 Create hbase-annotations module > HBASE-12106 Move test annotations to test artifact (Enis Soztutar) > HBASE-13916 Create MultiByteBuffer an aggregation of ByteBuffers. > HBASE-15679 Assertion on wrong variable in > TestReplicationThrottler#testThrottling > HBASE-13931 Move Unsafe based operations to UnsafeAccess. > HBASE-12345 Unsafe based ByteBuffer Comparator. > HBASE-13998 Remove CellComparator#compareRows(byte[] left, int loffset, int > llength,
[jira] [Commented] (HBASE-17138) Backport read-path offheap (HBASE-11425) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905595#comment-15905595 ] Tianying Chang commented on HBASE-17138: [~anoopamz] Yes, we are interested in getting this patch in our production online facint cluster which is running on 1.2.1, any help from you will be great! It seems the customized version in Alibaba was forked from 1.1 (although on top of it with many of their private patches). If so I thought the effort fro backport to 1.2 should be similar? Do you have a high level sense what else need to be done besides those 70+ patches? Thanks > Backport read-path offheap (HBASE-11425) to branch-1 > > > Key: HBASE-17138 > URL: https://issues.apache.org/jira/browse/HBASE-17138 > Project: HBase > Issue Type: Improvement >Reporter: Yu Li >Assignee: Yu Sun > Attachments: > 0001-fix-EHB-511-Resolve-client-compatibility-issue-introduced-by-offheap-change.patch, > 0001-to-EHB-446-offheap-hfile-format-should-keep-compatible-v3.patch, > 0001-to-EHB-456-Cell-should-be-compatible-with-branch-1.1.2.patch > > > From the > [thread|http://mail-archives.apache.org/mod_mbox/hbase-user/201611.mbox/%3CCAM7-19%2Bn7cEiY4H9iLQ3N9V0NXppOPduZwk-hhgNLEaJfiV3kA%40mail.gmail.com%3E] > of sharing our experience and performance data of read-path offheap usage in > Alibaba search, we could see people are positive to have HBASE-11425 in > branch-1, so I'd like to create a JIRA and move the discussion and decision > making here. > Echoing some comments from the mail thread: > Bryan: > Is the backported patch available anywhere? If it ends up not getting > officially backported to branch-1 due to 2.0 around the corner, some of us > who build our own deploy may want to integrate into our builds > Andrew: > Yes, please, the patches will be useful to the community even if we decide > not to backport into an official 1.x release. > Enis: > I don't see any reason why we cannot backport to branch-1. > Ted: > Opening a JIRA would be fine. This makes it easier for people to obtain the > patch(es) > Nick: > From the DISCUSS thread re: EOL of 1.1, it seems we'll continue to > support 1.x releases for some time... I would guess these will be > maintained until 2.2 at least. Therefore, offheap patches that have seen > production exposure seem like a reasonable candidate for backport, perhaps in > a 1.4 or 1.5 release timeframe. > Anoop: > Because of some compatibility issues, we decide that this will be done in 2.0 > only.. Ya as Andy said, it would be great to share the 1.x backported > patches. > The following is all the jira ids we have back ported: > HBASE-10930 Change Filters and GetClosestRowBeforeTracker to work with Cells > (Ram) > HBASE-13373 Squash HFileReaderV3 together with HFileReaderV2 and > AbstractHFileReader; ditto for Scanners and BlockReader, etc. > HBASE-13429 Remove deprecated seek/reseek methods from HFileScanner. > HBASE-13450 - Purge RawBytescomparator from the writers and readers for > HBASE-10800 (Ram) > HBASE-13501 - Deprecate/Remove getComparator() in HRegionInfo. > HBASE-12048 Remove deprecated APIs from Filter. > HBASE-10800 - Use CellComparator instead of KVComparator (Ram) > HBASE-13679 Change ColumnTracker and SQM to deal with Cell instead of byte[], > int, int. > HBASE-13642 Deprecate RegionObserver#postScannerFilterRow CP hook with > byte[],int,int args in favor of taking Cell arg. > HBASE-13641 Deperecate Filter#filterRowKey(byte[] buffer, int offset, int > length) in favor of filterRowKey(Cell firstRowCell). > HBASE-13827 Delayed scanner close in KeyValueHeap and StoreScanner. > HBASE-13871 Change RegionScannerImpl to deal with Cell instead of byte[], > int, int. > HBASE-11911 Break up tests into more fine grained categories (Alex Newman) > HBASE-12059 Create hbase-annotations module > HBASE-12106 Move test annotations to test artifact (Enis Soztutar) > HBASE-13916 Create MultiByteBuffer an aggregation of ByteBuffers. > HBASE-15679 Assertion on wrong variable in > TestReplicationThrottler#testThrottling > HBASE-13931 Move Unsafe based operations to UnsafeAccess. > HBASE-12345 Unsafe based ByteBuffer Comparator. > HBASE-13998 Remove CellComparator#compareRows(byte[] left, int loffset, int > llength, byte[] right, int roffset, int rlength). > HBASE-13998 Remove CellComparator#compareRows()- Addendum to fix javadoc warn > HBASE-13579 Avoid isCellTTLExpired() for NO-TAG cases (partially backport > this patch) > HBASE-13448 New Cell implementation with cached component offsets/lengths. > HBASE-13387 Add ByteBufferedCell an extension to Cell. > HBASE-13387 Add ByteBufferedCell an extension to Cell - addendum. > HBASE-12650 Move ServerName to hbase-common module (partially backport this > patch) > HBASE-12296 Filters
[jira] [Commented] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904162#comment-15904162 ] Tianying Chang commented on HBASE-17453: [~ted_yu] [~Apache9] [~saint@gmail.com] attached a patch generated from master branch. > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-17453: --- Attachment: HBASE-17453-master.patch > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Attachments: HBASE-17453-1.2.patch, HBASE-17453-master.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-17453: --- Attachment: HBASE-17453-master.patch > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Attachments: HBASE-17453-1.2.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-17453: --- Attachment: (was: HBASE-17453-master.patch) > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Attachments: HBASE-17453-1.2.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17138) Backport read-path offheap (HBASE-11425) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904015#comment-15904015 ] Tianying Chang commented on HBASE-17138: [~haoran] [~carp84] This improvement looks super useful to us too. We are currently on HBase 1.2.1. Will be very helpful if this can be backport. Anyone currently working on it? > Backport read-path offheap (HBASE-11425) to branch-1 > > > Key: HBASE-17138 > URL: https://issues.apache.org/jira/browse/HBASE-17138 > Project: HBase > Issue Type: Improvement >Reporter: Yu Li >Assignee: Yu Sun > Attachments: > 0001-fix-EHB-511-Resolve-client-compatibility-issue-introduced-by-offheap-change.patch, > 0001-to-EHB-446-offheap-hfile-format-should-keep-compatible-v3.patch, > 0001-to-EHB-456-Cell-should-be-compatible-with-branch-1.1.2.patch > > > From the > [thread|http://mail-archives.apache.org/mod_mbox/hbase-user/201611.mbox/%3CCAM7-19%2Bn7cEiY4H9iLQ3N9V0NXppOPduZwk-hhgNLEaJfiV3kA%40mail.gmail.com%3E] > of sharing our experience and performance data of read-path offheap usage in > Alibaba search, we could see people are positive to have HBASE-11425 in > branch-1, so I'd like to create a JIRA and move the discussion and decision > making here. > Echoing some comments from the mail thread: > Bryan: > Is the backported patch available anywhere? If it ends up not getting > officially backported to branch-1 due to 2.0 around the corner, some of us > who build our own deploy may want to integrate into our builds > Andrew: > Yes, please, the patches will be useful to the community even if we decide > not to backport into an official 1.x release. > Enis: > I don't see any reason why we cannot backport to branch-1. > Ted: > Opening a JIRA would be fine. This makes it easier for people to obtain the > patch(es) > Nick: > From the DISCUSS thread re: EOL of 1.1, it seems we'll continue to > support 1.x releases for some time... I would guess these will be > maintained until 2.2 at least. Therefore, offheap patches that have seen > production exposure seem like a reasonable candidate for backport, perhaps in > a 1.4 or 1.5 release timeframe. > Anoop: > Because of some compatibility issues, we decide that this will be done in 2.0 > only.. Ya as Andy said, it would be great to share the 1.x backported > patches. > The following is all the jira ids we have back ported: > HBASE-10930 Change Filters and GetClosestRowBeforeTracker to work with Cells > (Ram) > HBASE-13373 Squash HFileReaderV3 together with HFileReaderV2 and > AbstractHFileReader; ditto for Scanners and BlockReader, etc. > HBASE-13429 Remove deprecated seek/reseek methods from HFileScanner. > HBASE-13450 - Purge RawBytescomparator from the writers and readers for > HBASE-10800 (Ram) > HBASE-13501 - Deprecate/Remove getComparator() in HRegionInfo. > HBASE-12048 Remove deprecated APIs from Filter. > HBASE-10800 - Use CellComparator instead of KVComparator (Ram) > HBASE-13679 Change ColumnTracker and SQM to deal with Cell instead of byte[], > int, int. > HBASE-13642 Deprecate RegionObserver#postScannerFilterRow CP hook with > byte[],int,int args in favor of taking Cell arg. > HBASE-13641 Deperecate Filter#filterRowKey(byte[] buffer, int offset, int > length) in favor of filterRowKey(Cell firstRowCell). > HBASE-13827 Delayed scanner close in KeyValueHeap and StoreScanner. > HBASE-13871 Change RegionScannerImpl to deal with Cell instead of byte[], > int, int. > HBASE-11911 Break up tests into more fine grained categories (Alex Newman) > HBASE-12059 Create hbase-annotations module > HBASE-12106 Move test annotations to test artifact (Enis Soztutar) > HBASE-13916 Create MultiByteBuffer an aggregation of ByteBuffers. > HBASE-15679 Assertion on wrong variable in > TestReplicationThrottler#testThrottling > HBASE-13931 Move Unsafe based operations to UnsafeAccess. > HBASE-12345 Unsafe based ByteBuffer Comparator. > HBASE-13998 Remove CellComparator#compareRows(byte[] left, int loffset, int > llength, byte[] right, int roffset, int rlength). > HBASE-13998 Remove CellComparator#compareRows()- Addendum to fix javadoc warn > HBASE-13579 Avoid isCellTTLExpired() for NO-TAG cases (partially backport > this patch) > HBASE-13448 New Cell implementation with cached component offsets/lengths. > HBASE-13387 Add ByteBufferedCell an extension to Cell. > HBASE-13387 Add ByteBufferedCell an extension to Cell - addendum. > HBASE-12650 Move ServerName to hbase-common module (partially backport this > patch) > HBASE-12296 Filters should work with ByteBufferedCell. > HBASE-14120 ByteBufferUtils#compareTo small optimization. > HBASE-13510 - Purge ByteBloomFilter (Ram) > HBASE-13451 - Make the HFileBlockIndex blockKeys to Cells so that it could be > easy to use in the CellComparators (Ram) >
[jira] [Commented] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15900381#comment-15900381 ] Tianying Chang commented on HBASE-17453: [~stack] sorry for the long delay. What we want is a simple API at RS side that we can use to poll to verify if the connection behave weird, and need reconnect. Earlier, GetPrococolVersion in 94 was borrowed to achieve this goal as a side effect. Basically, this call should be a simple and cheap one. I can made a patch for trunk if needed. > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Attachments: HBASE-17453-1.2.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855004#comment-15855004 ] Tianying Chang commented on HBASE-17453: Thanks [~tedyu] [~stack] for the comments. We need a method to reliably know if a connection with the RS still live. Previously, we rely on GetProtocolVersion(). When there is no traffic received within certain time, this could be either really there is no traffic, or the connection is bad. By sending a "Ping" and get response back, we know for sure it is not because connection is bad, therefore, no need to reconnect. If there is no response back from "Ping", we will reconnect. So we just need a lightweight response so that we know the communication link is healthy. The PintProtocol.proto you mentioned above matches what we need. If it is a live API hosted by the RS, we can definitely use that. > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Attachments: HBASE-17453-1.2.patch > > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822735#comment-15822735 ] Tianying Chang commented on HBASE-17453: [~Apache9] thanks. I will put the patch out next week. > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
[ https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-17453: --- Description: Our HBase service is hosted in AWS. We saw cases where the connection between the client (Asynchbase in our case) and server stop working but did not throw any exception, therefore traffic stuck. So we added a "Ping" feature in AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, if no traffic for given time, we send the "Ping", if no response back for "Ping", we assume the connect is bad and reconnect. Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is deprecated. To be able to support same detect/reconnect feature, we added Ping() in our internal HBase 1.2 branch, and also patched accordingly in Asynchbase 1.7. We would like to open source this feature since it is useful for use case in AWS environment. was: Our HBase service is hosted in AWS. We saw cases where the connection between the client (Asynchbase in our case) and server stop working but did not throw any exception, therefore traffic stuck. So we added a "Ping" feature in AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, if no traffic for given time, we send the "Ping", if no response back for "Ping", we assume the connect is bad and reconnect. Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is deprecated. To be able to support same detect/reconnect feature, we added Ping() in our internal HBase 1.2 branch, and also patched accordingly in Asynchbase 1.7. We would like to open source this feature since it is useful for use case in AWS environment. We used GetProtocolVersion in AsyncHBase to detect unhealthy connection to RS since in AWS, sometimes it enters a state the connection > add Ping into HBase server for deprecated GetProtocolVersion > > > Key: HBASE-17453 > URL: https://issues.apache.org/jira/browse/HBASE-17453 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.2.2 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > > Our HBase service is hosted in AWS. We saw cases where the connection between > the client (Asynchbase in our case) and server stop working but did not throw > any exception, therefore traffic stuck. So we added a "Ping" feature in > AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, > if no traffic for given time, we send the "Ping", if no response back for > "Ping", we assume the connect is bad and reconnect. > Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is > deprecated. To be able to support same detect/reconnect feature, we added > Ping() in our internal HBase 1.2 branch, and also patched accordingly in > Asynchbase 1.7. > We would like to open source this feature since it is useful for use case in > AWS environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion
Tianying Chang created HBASE-17453: -- Summary: add Ping into HBase server for deprecated GetProtocolVersion Key: HBASE-17453 URL: https://issues.apache.org/jira/browse/HBASE-17453 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 1.2.2 Reporter: Tianying Chang Assignee: Tianying Chang Priority: Minor Our HBase service is hosted in AWS. We saw cases where the connection between the client (Asynchbase in our case) and server stop working but did not throw any exception, therefore traffic stuck. So we added a "Ping" feature in AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, if no traffic for given time, we send the "Ping", if no response back for "Ping", we assume the connect is bad and reconnect. Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is deprecated. To be able to support same detect/reconnect feature, we added Ping() in our internal HBase 1.2 branch, and also patched accordingly in Asynchbase 1.7. We would like to open source this feature since it is useful for use case in AWS environment. We used GetProtocolVersion in AsyncHBase to detect unhealthy connection to RS since in AWS, sometimes it enters a state the connection -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
[ https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359803#comment-15359803 ] Tianying Chang commented on HBASE-16030: Attached a new patch. Still use my old way, but updated to fit 1.2, and addressed the earlier CR comments. > All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is > on, causing flush spike > -- > > Key: HBASE-16030 > URL: https://issues.apache.org/jira/browse/HBASE-16030 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.1 >Reporter: Tianying Chang >Assignee: Tianying Chang > Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3 > > Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, Screen Shot > 2016-06-15 at 11.52.38 PM.png, hbase-16030-v2.patch, hbase-16030-v3.patch, > hbase-16030.patch > > > In our production cluster, we observed that memstore flush spike every hour > for all regions/RS. (we use the default memstore periodic flush time of 1 > hour). > This will happend when two conditions are met: > 1. the memstore does not have enough data to be flushed before 1 hour limit > reached; > 2. all regions are opened around the same time, (e.g. all RS are started at > the same time when start a cluster). > With above two conditions, all the regions will be flushed around the same > time at: startTime+1hour-delay again and again. > We added a flush jittering time to randomize the flush time of each region, > so that they don't get flushed at around the same time. We had this feature > running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found > this issue still there in 1.2. So we are porting this into 1.2 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
[ https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-16030: --- Attachment: hbase-16030-v3.patch > All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is > on, causing flush spike > -- > > Key: HBASE-16030 > URL: https://issues.apache.org/jira/browse/HBASE-16030 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.1 >Reporter: Tianying Chang >Assignee: Tianying Chang > Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3 > > Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, Screen Shot > 2016-06-15 at 11.52.38 PM.png, hbase-16030-v2.patch, hbase-16030-v3.patch, > hbase-16030.patch > > > In our production cluster, we observed that memstore flush spike every hour > for all regions/RS. (we use the default memstore periodic flush time of 1 > hour). > This will happend when two conditions are met: > 1. the memstore does not have enough data to be flushed before 1 hour limit > reached; > 2. all regions are opened around the same time, (e.g. all RS are started at > the same time when start a cluster). > With above two conditions, all the regions will be flushed around the same > time at: startTime+1hour-delay again and again. > We added a flush jittering time to randomize the flush time of each region, > so that they don't get flushed at around the same time. We had this feature > running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found > this issue still there in 1.2. So we are porting this into 1.2 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16128) add support for p999 histogram metrics
[ https://issues.apache.org/jira/browse/HBASE-16128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359650#comment-15359650 ] Tianying Chang commented on HBASE-16128: Should I provide a patch for a different version? > add support for p999 histogram metrics > -- > > Key: HBASE-16128 > URL: https://issues.apache.org/jira/browse/HBASE-16128 > Project: HBase > Issue Type: Improvement > Components: metrics >Affects Versions: 1.2.1 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Attachments: HBase-16128.patch > > > Currently there is support for p75,p90,p99, but not support for p999. We need > p999 metrics for reflecting p99 metrics at client level, especially client > side is fanout call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16128) add support for p999 histogram metrics
[ https://issues.apache.org/jira/browse/HBASE-16128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-16128: --- Status: Patch Available (was: In Progress) > add support for p999 histogram metrics > -- > > Key: HBASE-16128 > URL: https://issues.apache.org/jira/browse/HBASE-16128 > Project: HBase > Issue Type: Improvement > Components: metrics >Affects Versions: 1.2.1 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Attachments: HBase-16128.patch > > > Currently there is support for p75,p90,p99, but not support for p999. We need > p999 metrics for reflecting p99 metrics at client level, especially client > side is fanout call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16128) add support for p999 histogram metrics
[ https://issues.apache.org/jira/browse/HBASE-16128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15357848#comment-15357848 ] Tianying Chang commented on HBASE-16128: have deployed in our production cluster and see the p999 metrics now. > add support for p999 histogram metrics > -- > > Key: HBASE-16128 > URL: https://issues.apache.org/jira/browse/HBASE-16128 > Project: HBase > Issue Type: Improvement > Components: metrics >Affects Versions: 1.2.1 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Attachments: HBase-16128.patch > > > Currently there is support for p75,p90,p99, but not support for p999. We need > p999 metrics for reflecting p99 metrics at client level, especially client > side is fanout call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16128) add support for p999 histogram metrics
[ https://issues.apache.org/jira/browse/HBASE-16128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356086#comment-15356086 ] Tianying Chang commented on HBASE-16128: Attached a patch for 1.2.1 > add support for p999 histogram metrics > -- > > Key: HBASE-16128 > URL: https://issues.apache.org/jira/browse/HBASE-16128 > Project: HBase > Issue Type: Improvement > Components: metrics >Affects Versions: 1.2.1 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Attachments: HBase-16128.patch > > > Currently there is support for p75,p90,p99, but not support for p999. We need > p999 metrics for reflecting p99 metrics at client level, especially client > side is fanout call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HBASE-16128) add support for p999 histogram metrics
[ https://issues.apache.org/jira/browse/HBASE-16128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-16128 started by Tianying Chang. -- > add support for p999 histogram metrics > -- > > Key: HBASE-16128 > URL: https://issues.apache.org/jira/browse/HBASE-16128 > Project: HBase > Issue Type: Improvement > Components: metrics >Affects Versions: 1.2.1 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Attachments: HBase-16128.patch > > > Currently there is support for p75,p90,p99, but not support for p999. We need > p999 metrics for reflecting p99 metrics at client level, especially client > side is fanout call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16128) add support for p999 histogram metrics
[ https://issues.apache.org/jira/browse/HBASE-16128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-16128: --- Attachment: HBase-16128.patch > add support for p999 histogram metrics > -- > > Key: HBASE-16128 > URL: https://issues.apache.org/jira/browse/HBASE-16128 > Project: HBase > Issue Type: Improvement > Components: metrics >Affects Versions: 1.2.1 >Reporter: Tianying Chang >Assignee: Tianying Chang >Priority: Minor > Attachments: HBase-16128.patch > > > Currently there is support for p75,p90,p99, but not support for p999. We need > p999 metrics for reflecting p99 metrics at client level, especially client > side is fanout call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16128) add support for p999 histogram metrics
Tianying Chang created HBASE-16128: -- Summary: add support for p999 histogram metrics Key: HBASE-16128 URL: https://issues.apache.org/jira/browse/HBASE-16128 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 1.2.1 Reporter: Tianying Chang Assignee: Tianying Chang Priority: Minor Currently there is support for p75,p90,p99, but not support for p999. We need p999 metrics for reflecting p99 metrics at client level, especially client side is fanout call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
[ https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340691#comment-15340691 ] Tianying Chang commented on HBASE-16030: [~enis] good scenario! So with the previous hard code 5 mins delay, the flush WILL NOT happen for 5 mins. It sounds also a problem if flush is stalled for 5 mins (although not as bad as 30 mins) Feels like the right way of jittering should not be putting into a blocking queue ahead of time. What do you think? > All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is > on, causing flush spike > -- > > Key: HBASE-16030 > URL: https://issues.apache.org/jira/browse/HBASE-16030 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.1 >Reporter: Tianying Chang >Assignee: Tianying Chang > Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3 > > Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, Screen Shot > 2016-06-15 at 11.52.38 PM.png, hbase-16030-v2.patch, hbase-16030.patch > > > In our production cluster, we observed that memstore flush spike every hour > for all regions/RS. (we use the default memstore periodic flush time of 1 > hour). > This will happend when two conditions are met: > 1. the memstore does not have enough data to be flushed before 1 hour limit > reached; > 2. all regions are opened around the same time, (e.g. all RS are started at > the same time when start a cluster). > With above two conditions, all the regions will be flushed around the same > time at: startTime+1hour-delay again and again. > We added a flush jittering time to randomize the flush time of each region, > so that they don't get flushed at around the same time. We had this feature > running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found > this issue still there in 1.2. So we are porting this into 1.2 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
[ https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15339804#comment-15339804 ] Tianying Chang commented on HBASE-16030: [~enis] I feel that we should have check for configurable value to be a non-negative to prevent user error. But it seems all the places for retrieving the configurable value does not do any check. Is this the convention for HBase that user has the responsibility make sure a reasonable value is used? > All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is > on, causing flush spike > -- > > Key: HBASE-16030 > URL: https://issues.apache.org/jira/browse/HBASE-16030 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.1 >Reporter: Tianying Chang >Assignee: Tianying Chang > Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3 > > Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, Screen Shot > 2016-06-15 at 11.52.38 PM.png, hbase-16030-v2.patch, hbase-16030.patch > > > In our production cluster, we observed that memstore flush spike every hour > for all regions/RS. (we use the default memstore periodic flush time of 1 > hour). > This will happend when two conditions are met: > 1. the memstore does not have enough data to be flushed before 1 hour limit > reached; > 2. all regions are opened around the same time, (e.g. all RS are started at > the same time when start a cluster). > With above two conditions, all the regions will be flushed around the same > time at: startTime+1hour-delay again and again. > We added a flush jittering time to randomize the flush time of each region, > so that they don't get flushed at around the same time. We had this feature > running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found > this issue still there in 1.2. So we are porting this into 1.2 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
[ https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15336324#comment-15336324 ] Tianying Chang edited comment on HBASE-16030 at 6/19/16 6:22 AM: - [~enis] sure, I can make another patch. Question, by increasing the flush delay time, the flush request will stay in the queue for 30 minutes. Will this cause any issue? was (Author: tychang): @enis sure, I can make another patch. Question, by increasing the flush delay time, the flush request will stay in the queue for 30 minutes. Will this cause any issue? > All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is > on, causing flush spike > -- > > Key: HBASE-16030 > URL: https://issues.apache.org/jira/browse/HBASE-16030 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.1 >Reporter: Tianying Chang >Assignee: Tianying Chang > Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3 > > Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, Screen Shot > 2016-06-15 at 11.52.38 PM.png, hbase-16030-v2.patch, hbase-16030.patch > > > In our production cluster, we observed that memstore flush spike every hour > for all regions/RS. (we use the default memstore periodic flush time of 1 > hour). > This will happend when two conditions are met: > 1. the memstore does not have enough data to be flushed before 1 hour limit > reached; > 2. all regions are opened around the same time, (e.g. all RS are started at > the same time when start a cluster). > With above two conditions, all the regions will be flushed around the same > time at: startTime+1hour-delay again and again. > We added a flush jittering time to randomize the flush time of each region, > so that they don't get flushed at around the same time. We had this feature > running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found > this issue still there in 1.2. So we are porting this into 1.2 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
[ https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338403#comment-15338403 ] Tianying Chang commented on HBASE-16030: [~enis] attached a new patch which simply make the jitter value configurable and with larger default value. > All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is > on, causing flush spike > -- > > Key: HBASE-16030 > URL: https://issues.apache.org/jira/browse/HBASE-16030 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.1 >Reporter: Tianying Chang >Assignee: Tianying Chang > Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3 > > Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, Screen Shot > 2016-06-15 at 11.52.38 PM.png, hbase-16030-v2.patch, hbase-16030.patch > > > In our production cluster, we observed that memstore flush spike every hour > for all regions/RS. (we use the default memstore periodic flush time of 1 > hour). > This will happend when two conditions are met: > 1. the memstore does not have enough data to be flushed before 1 hour limit > reached; > 2. all regions are opened around the same time, (e.g. all RS are started at > the same time when start a cluster). > With above two conditions, all the regions will be flushed around the same > time at: startTime+1hour-delay again and again. > We added a flush jittering time to randomize the flush time of each region, > so that they don't get flushed at around the same time. We had this feature > running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found > this issue still there in 1.2. So we are porting this into 1.2 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
[ https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-16030: --- Attachment: hbase-16030-v2.patch > All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is > on, causing flush spike > -- > > Key: HBASE-16030 > URL: https://issues.apache.org/jira/browse/HBASE-16030 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.1 >Reporter: Tianying Chang >Assignee: Tianying Chang > Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3 > > Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, Screen Shot > 2016-06-15 at 11.52.38 PM.png, hbase-16030-v2.patch, hbase-16030.patch > > > In our production cluster, we observed that memstore flush spike every hour > for all regions/RS. (we use the default memstore periodic flush time of 1 > hour). > This will happend when two conditions are met: > 1. the memstore does not have enough data to be flushed before 1 hour limit > reached; > 2. all regions are opened around the same time, (e.g. all RS are started at > the same time when start a cluster). > With above two conditions, all the regions will be flushed around the same > time at: startTime+1hour-delay again and again. > We added a flush jittering time to randomize the flush time of each region, > so that they don't get flushed at around the same time. We had this feature > running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found > this issue still there in 1.2. So we are porting this into 1.2 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
[ https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15336324#comment-15336324 ] Tianying Chang commented on HBASE-16030: @enis sure, I can make another patch. Question, by increasing the flush delay time, the flush request will stay in the queue for 30 minutes. Will this cause any issue? > All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is > on, causing flush spike > -- > > Key: HBASE-16030 > URL: https://issues.apache.org/jira/browse/HBASE-16030 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.1 >Reporter: Tianying Chang >Assignee: Tianying Chang > Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3 > > Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, Screen Shot > 2016-06-15 at 11.52.38 PM.png, hbase-16030.patch > > > In our production cluster, we observed that memstore flush spike every hour > for all regions/RS. (we use the default memstore periodic flush time of 1 > hour). > This will happend when two conditions are met: > 1. the memstore does not have enough data to be flushed before 1 hour limit > reached; > 2. all regions are opened around the same time, (e.g. all RS are started at > the same time when start a cluster). > With above two conditions, all the regions will be flushed around the same > time at: startTime+1hour-delay again and again. > We added a flush jittering time to randomize the flush time of each region, > so that they don't get flushed at around the same time. We had this feature > running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found > this issue still there in 1.2. So we are porting this into 1.2 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
[ https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-16030: --- Attachment: Screen Shot 2016-06-15 at 11.52.38 PM.png > All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is > on, causing flush spike > -- > > Key: HBASE-16030 > URL: https://issues.apache.org/jira/browse/HBASE-16030 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.1 >Reporter: Tianying Chang >Assignee: Tianying Chang > Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3 > > Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, Screen Shot > 2016-06-15 at 11.52.38 PM.png, hbase-16030.patch > > > In our production cluster, we observed that memstore flush spike every hour > for all regions/RS. (we use the default memstore periodic flush time of 1 > hour). > This will happend when two conditions are met: > 1. the memstore does not have enough data to be flushed before 1 hour limit > reached; > 2. all regions are opened around the same time, (e.g. all RS are started at > the same time when start a cluster). > With above two conditions, all the regions will be flushed around the same > time at: startTime+1hour-delay again and again. > We added a flush jittering time to randomize the flush time of each region, > so that they don't get flushed at around the same time. We had this feature > running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found > this issue still there in 1.2. So we are porting this into 1.2 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
[ https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333224#comment-15333224 ] Tianying Chang commented on HBASE-16030: [~enis] thanks for reviewing the patch. Yes, 5 minutes is not enough, we would like to see the flush uniformly distributed through the one hour range in online facing production cluster. I am fine if we can make this value configurable, therefore larger than 5 min. Will it have a problem if flush request is queued and delayed for up to 1 hour? BTW, attached a new graph to show the impact of the hourly spike on the network/disk/cpu on our new 1.2RC test cluster. > All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is > on, causing flush spike > -- > > Key: HBASE-16030 > URL: https://issues.apache.org/jira/browse/HBASE-16030 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.1 >Reporter: Tianying Chang >Assignee: Tianying Chang > Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3 > > Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, > hbase-16030.patch > > > In our production cluster, we observed that memstore flush spike every hour > for all regions/RS. (we use the default memstore periodic flush time of 1 > hour). > This will happend when two conditions are met: > 1. the memstore does not have enough data to be flushed before 1 hour limit > reached; > 2. all regions are opened around the same time, (e.g. all RS are started at > the same time when start a cluster). > With above two conditions, all the regions will be flushed around the same > time at: startTime+1hour-delay again and again. > We added a flush jittering time to randomize the flush time of each region, > so that they don't get flushed at around the same time. We had this feature > running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found > this issue still there in 1.2. So we are porting this into 1.2 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
[ https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-16030: --- Attachment: Screen Shot 2016-06-15 at 11.35.42 PM.png > All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is > on, causing flush spike > -- > > Key: HBASE-16030 > URL: https://issues.apache.org/jira/browse/HBASE-16030 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.1 >Reporter: Tianying Chang >Assignee: Tianying Chang > Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3 > > Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, > hbase-16030.patch > > > In our production cluster, we observed that memstore flush spike every hour > for all regions/RS. (we use the default memstore periodic flush time of 1 > hour). > This will happend when two conditions are met: > 1. the memstore does not have enough data to be flushed before 1 hour limit > reached; > 2. all regions are opened around the same time, (e.g. all RS are started at > the same time when start a cluster). > With above two conditions, all the regions will be flushed around the same > time at: startTime+1hour-delay again and again. > We added a flush jittering time to randomize the flush time of each region, > so that they don't get flushed at around the same time. We had this feature > running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found > this issue still there in 1.2. So we are porting this into 1.2 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
[ https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-16030: --- Attachment: (was: screenshot-1.png) > All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is > on, causing flush spike > -- > > Key: HBASE-16030 > URL: https://issues.apache.org/jira/browse/HBASE-16030 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.1 >Reporter: Tianying Chang >Assignee: Tianying Chang > Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3 > > Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, > hbase-16030.patch > > > In our production cluster, we observed that memstore flush spike every hour > for all regions/RS. (we use the default memstore periodic flush time of 1 > hour). > This will happend when two conditions are met: > 1. the memstore does not have enough data to be flushed before 1 hour limit > reached; > 2. all regions are opened around the same time, (e.g. all RS are started at > the same time when start a cluster). > With above two conditions, all the regions will be flushed around the same > time at: startTime+1hour-delay again and again. > We added a flush jittering time to randomize the flush time of each region, > so that they don't get flushed at around the same time. We had this feature > running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found > this issue still there in 1.2. So we are porting this into 1.2 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
[ https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-16030: --- Attachment: screenshot-1.png > All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is > on, causing flush spike > -- > > Key: HBASE-16030 > URL: https://issues.apache.org/jira/browse/HBASE-16030 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.1 >Reporter: Tianying Chang >Assignee: Tianying Chang > Fix For: 2.0.0, 1.3.0, 1.2.3 > > Attachments: hbase-16030.patch, screenshot-1.png > > > In our production cluster, we observed that memstore flush spike every hour > for all regions/RS. (we use the default memstore periodic flush time of 1 > hour). > This will happend when two conditions are met: > 1. the memstore does not have enough data to be flushed before 1 hour limit > reached; > 2. all regions are opened around the same time, (e.g. all RS are started at > the same time when start a cluster). > With above two conditions, all the regions will be flushed around the same > time at: startTime+1hour-delay again and again. > We added a flush jittering time to randomize the flush time of each region, > so that they don't get flushed at around the same time. We had this feature > running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found > this issue still there in 1.2. So we are porting this into 1.2 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-16028) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
[ https://issues.apache.org/jira/browse/HBASE-16028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang resolved HBASE-16028. Resolution: Duplicate > All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is > on, causing flush spike > -- > > Key: HBASE-16028 > URL: https://issues.apache.org/jira/browse/HBASE-16028 > Project: HBase > Issue Type: Improvement > Components: hbase, Performance >Affects Versions: 1.2.1 >Reporter: Tianying Chang >Assignee: Tianying Chang > > In our production cluster, we observed that memstore flush spike every hour > for all regions/RS. (we use the default memstore periodic flush time of 1 > hour). > This will happend when two conditions are met: > 1. the memstore does not have enough data to be flushed before 1 hour limit > reached; > 2. all regions are opened around the same time, (e.g. all RS are started at > the same time when start a cluster). > With above two conditions, all the regions will be flushed around the same > time at: startTime+1hour-delay again and again. > We added a flush jittering time to randomize the flush time of each region, > so that they don't get flushed at around the same time. We had this feature > running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found > this issue still there in 1.2. So we are porting this into 1.2 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-16029) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
[ https://issues.apache.org/jira/browse/HBASE-16029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang resolved HBASE-16029. Resolution: Duplicate > All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is > on, causing flush spike > -- > > Key: HBASE-16029 > URL: https://issues.apache.org/jira/browse/HBASE-16029 > Project: HBase > Issue Type: Improvement > Components: hbase, Performance >Affects Versions: 1.2.1 >Reporter: Tianying Chang >Assignee: Tianying Chang > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-16027) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
[ https://issues.apache.org/jira/browse/HBASE-16027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang resolved HBASE-16027. Resolution: Duplicate > All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is > on, causing flush spike > -- > > Key: HBASE-16027 > URL: https://issues.apache.org/jira/browse/HBASE-16027 > Project: HBase > Issue Type: Bug > Components: hbase, Performance >Affects Versions: 1.2.1 >Reporter: Tianying Chang >Assignee: Tianying Chang > > In our production cluster, we observed that memstore flush spike every hour > for all regions/RS. (we use the default memstore periodic flush time of 1 > hour). > This will happend when two conditions are met: > 1. the memstore does not have enough data to be flushed before 1 hour limit > reached; > 2. all regions are opened around the same time, (e.g. all RS are started at > the same time when start a cluster). > With above two conditions, all the regions will be flushed around the same > time at: startTime+1hour-delay again and again. > We added a flush jittering time to randomize the flush time of each region, > so that they don't get flushed at around the same time. We had this feature > running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found > this issue still there in 1.2. So we are porting this into 1.2 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
[ https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-16030: --- Fix Version/s: 1.2.1 Status: Patch Available (was: In Progress) > All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is > on, causing flush spike > -- > > Key: HBASE-16030 > URL: https://issues.apache.org/jira/browse/HBASE-16030 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.1 >Reporter: Tianying Chang >Assignee: Tianying Chang > Fix For: 1.2.1 > > Attachments: hbase-16030.patch > > > In our production cluster, we observed that memstore flush spike every hour > for all regions/RS. (we use the default memstore periodic flush time of 1 > hour). > This will happend when two conditions are met: > 1. the memstore does not have enough data to be flushed before 1 hour limit > reached; > 2. all regions are opened around the same time, (e.g. all RS are started at > the same time when start a cluster). > With above two conditions, all the regions will be flushed around the same > time at: startTime+1hour-delay again and again. > We added a flush jittering time to randomize the flush time of each region, > so that they don't get flushed at around the same time. We had this feature > running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found > this issue still there in 1.2. So we are porting this into 1.2 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
[ https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330913#comment-15330913 ] Tianying Chang commented on HBASE-16030: attached patch for 1.2.1 > All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is > on, causing flush spike > -- > > Key: HBASE-16030 > URL: https://issues.apache.org/jira/browse/HBASE-16030 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.1 >Reporter: Tianying Chang >Assignee: Tianying Chang > Fix For: 1.2.1 > > Attachments: hbase-16030.patch > > > In our production cluster, we observed that memstore flush spike every hour > for all regions/RS. (we use the default memstore periodic flush time of 1 > hour). > This will happend when two conditions are met: > 1. the memstore does not have enough data to be flushed before 1 hour limit > reached; > 2. all regions are opened around the same time, (e.g. all RS are started at > the same time when start a cluster). > With above two conditions, all the regions will be flushed around the same > time at: startTime+1hour-delay again and again. > We added a flush jittering time to randomize the flush time of each region, > so that they don't get flushed at around the same time. We had this feature > running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found > this issue still there in 1.2. So we are porting this into 1.2 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
[ https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-16030: --- Attachment: hbase-16030.patch > All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is > on, causing flush spike > -- > > Key: HBASE-16030 > URL: https://issues.apache.org/jira/browse/HBASE-16030 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.1 >Reporter: Tianying Chang >Assignee: Tianying Chang > Fix For: 1.2.1 > > Attachments: hbase-16030.patch > > > In our production cluster, we observed that memstore flush spike every hour > for all regions/RS. (we use the default memstore periodic flush time of 1 > hour). > This will happend when two conditions are met: > 1. the memstore does not have enough data to be flushed before 1 hour limit > reached; > 2. all regions are opened around the same time, (e.g. all RS are started at > the same time when start a cluster). > With above two conditions, all the regions will be flushed around the same > time at: startTime+1hour-delay again and again. > We added a flush jittering time to randomize the flush time of each region, > so that they don't get flushed at around the same time. We had this feature > running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found > this issue still there in 1.2. So we are porting this into 1.2 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
[ https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-16030 started by Tianying Chang. -- > All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is > on, causing flush spike > -- > > Key: HBASE-16030 > URL: https://issues.apache.org/jira/browse/HBASE-16030 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.1 >Reporter: Tianying Chang >Assignee: Tianying Chang > > In our production cluster, we observed that memstore flush spike every hour > for all regions/RS. (we use the default memstore periodic flush time of 1 > hour). > This will happend when two conditions are met: > 1. the memstore does not have enough data to be flushed before 1 hour limit > reached; > 2. all regions are opened around the same time, (e.g. all RS are started at > the same time when start a cluster). > With above two conditions, all the regions will be flushed around the same > time at: startTime+1hour-delay again and again. > We added a flush jittering time to randomize the flush time of each region, > so that they don't get flushed at around the same time. We had this feature > running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found > this issue still there in 1.2. So we are porting this into 1.2 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
Tianying Chang created HBASE-16030: -- Summary: All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike Key: HBASE-16030 URL: https://issues.apache.org/jira/browse/HBASE-16030 Project: HBase Issue Type: Improvement Affects Versions: 1.2.1 Reporter: Tianying Chang Assignee: Tianying Chang In our production cluster, we observed that memstore flush spike every hour for all regions/RS. (we use the default memstore periodic flush time of 1 hour). This will happend when two conditions are met: 1. the memstore does not have enough data to be flushed before 1 hour limit reached; 2. all regions are opened around the same time, (e.g. all RS are started at the same time when start a cluster). With above two conditions, all the regions will be flushed around the same time at: startTime+1hour-delay again and again. We added a flush jittering time to randomize the flush time of each region, so that they don't get flushed at around the same time. We had this feature running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found this issue still there in 1.2. So we are porting this into 1.2 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16029) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
Tianying Chang created HBASE-16029: -- Summary: All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike Key: HBASE-16029 URL: https://issues.apache.org/jira/browse/HBASE-16029 Project: HBase Issue Type: Improvement Components: hbase, Performance Affects Versions: 1.2.1 Reporter: Tianying Chang Assignee: Tianying Chang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16028) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
Tianying Chang created HBASE-16028: -- Summary: All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike Key: HBASE-16028 URL: https://issues.apache.org/jira/browse/HBASE-16028 Project: HBase Issue Type: Improvement Components: hbase, Performance Affects Versions: 1.2.1 Reporter: Tianying Chang Assignee: Tianying Chang In our production cluster, we observed that memstore flush spike every hour for all regions/RS. (we use the default memstore periodic flush time of 1 hour). This will happend when two conditions are met: 1. the memstore does not have enough data to be flushed before 1 hour limit reached; 2. all regions are opened around the same time, (e.g. all RS are started at the same time when start a cluster). With above two conditions, all the regions will be flushed around the same time at: startTime+1hour-delay again and again. We added a flush jittering time to randomize the flush time of each region, so that they don't get flushed at around the same time. We had this feature running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found this issue still there in 1.2. So we are porting this into 1.2 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16027) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
Tianying Chang created HBASE-16027: -- Summary: All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike Key: HBASE-16027 URL: https://issues.apache.org/jira/browse/HBASE-16027 Project: HBase Issue Type: Bug Components: hbase, Performance Affects Versions: 1.2.1 Reporter: Tianying Chang Assignee: Tianying Chang In our production cluster, we observed that memstore flush spike every hour for all regions/RS. (we use the default memstore periodic flush time of 1 hour). This will happend when two conditions are met: 1. the memstore does not have enough data to be flushed before 1 hour limit reached; 2. all regions are opened around the same time, (e.g. all RS are started at the same time when start a cluster). With above two conditions, all the regions will be flushed around the same time at: startTime+1hour-delay again and again. We added a flush jittering time to randomize the flush time of each region, so that they don't get flushed at around the same time. We had this feature running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found this issue still there in 1.2. So we are porting this into 1.2 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-7055) port HBASE-6371 tier-based compaction from 0.89-fb to trunk (with changes)
[ https://issues.apache.org/jira/browse/HBASE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang reassigned HBASE-7055: - Assignee: Tianying Chang (was: Sergey Shelukhin) > port HBASE-6371 tier-based compaction from 0.89-fb to trunk (with changes) > -- > > Key: HBASE-7055 > URL: https://issues.apache.org/jira/browse/HBASE-7055 > Project: HBase > Issue Type: Task > Components: Compaction >Affects Versions: 0.95.2 >Reporter: Sergey Shelukhin >Assignee: Tianying Chang > Attachments: HBASE-6371-squashed.patch, HBASE-6371-v2-squashed.patch, > HBASE-6371-v3-refactor-only-squashed.patch, > HBASE-6371-v4-refactor-only-squashed.patch, > HBASE-6371-v5-refactor-only-squashed.patch, HBASE-7055-v0.patch, > HBASE-7055-v1.patch, HBASE-7055-v2.patch, HBASE-7055-v3.patch, > HBASE-7055-v4.patch, HBASE-7055-v5.patch, HBASE-7055-v6.patch, > HBASE-7055-v7.patch, HBASE-7055-v7.patch, Tier Based Compaction Settings.pdf > > > See HBASE-6371 for details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15155) Show All RPC handler tasks stops working after cluster is under heavy load for a while
[ https://issues.apache.org/jira/browse/HBASE-15155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130776#comment-15130776 ] Tianying Chang commented on HBASE-15155: We deployed at our production cluster running 94.26, it is working fine with heavy traffic for over a week now. > Show All RPC handler tasks stops working after cluster is under heavy load > for a while > -- > > Key: HBASE-15155 > URL: https://issues.apache.org/jira/browse/HBASE-15155 > Project: HBase > Issue Type: Bug > Components: monitoring >Affects Versions: 0.98.0, 1.0.0, 0.94.19 >Reporter: Tianying Chang >Assignee: Tianying Chang > Attachments: hbase-15155.patch > > > After we upgrade from 94.7 to 94.26 and 1.0, we found that "Show All RPC > handler status" link on RS webUI stops working after running in production > cluster with relatively high load for several days. > Turn out to be it is a bug introduced by > https://issues.apache.org/jira/browse/HBASE-10312 The BoundedFIFOBuffer cause > RPCHandler Status overriden/removed permanently when there is a spike of > non-RPC tasks status that is over the MAX_SIZE (1000). So as long as the RS > experienced "high" load once, the RPC status monitoring is gone forever, > until RS is restarted. > We added a unit test that can repro this. And the fix can pass the test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15155) Show All RPC handler tasks stop working after cluster is under heavy load for a while
Tianying Chang created HBASE-15155: -- Summary: Show All RPC handler tasks stop working after cluster is under heavy load for a while Key: HBASE-15155 URL: https://issues.apache.org/jira/browse/HBASE-15155 Project: HBase Issue Type: Bug Components: monitoring Affects Versions: 0.94.19, 1.0.0, 0.98.0 Reporter: Tianying Chang Assignee: Tianying Chang After we upgrade from 94.7 to 94.26 and 1.0, we found that "Show All RPC handler status" link on RS webUI stops working after running in production cluster with relatively high load for several days. Turn out to be it is a bug introduced by https://issues.apache.org/jira/browse/HBASE-10312 The BoundedFIFOBuffer cause RPCHandler Status overriden/removed permanently when there is a spike of non-RPC tasks status that is over the MAX_SIZE (1000). So as long as the RS experienced "high" load once, the RPC status monitoring is gone forever, until RS is restarted. We added a unit test that can repro this. And the fix can pass the test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15155) Show All RPC handler tasks stop working after cluster is under heavy load for a while
[ https://issues.apache.org/jira/browse/HBASE-15155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-15155: --- Attachment: hbase-15155.patch > Show All RPC handler tasks stop working after cluster is under heavy load for > a while > - > > Key: HBASE-15155 > URL: https://issues.apache.org/jira/browse/HBASE-15155 > Project: HBase > Issue Type: Bug > Components: monitoring >Affects Versions: 0.98.0, 1.0.0, 0.94.19 >Reporter: Tianying Chang >Assignee: Tianying Chang > Attachments: hbase-15155.patch > > > After we upgrade from 94.7 to 94.26 and 1.0, we found that "Show All RPC > handler status" link on RS webUI stops working after running in production > cluster with relatively high load for several days. > Turn out to be it is a bug introduced by > https://issues.apache.org/jira/browse/HBASE-10312 The BoundedFIFOBuffer cause > RPCHandler Status overriden/removed permanently when there is a spike of > non-RPC tasks status that is over the MAX_SIZE (1000). So as long as the RS > experienced "high" load once, the RPC status monitoring is gone forever, > until RS is restarted. > We added a unit test that can repro this. And the fix can pass the test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HBASE-15155) Show All RPC handler tasks stop working after cluster is under heavy load for a while
[ https://issues.apache.org/jira/browse/HBASE-15155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-15155 started by Tianying Chang. -- > Show All RPC handler tasks stop working after cluster is under heavy load for > a while > - > > Key: HBASE-15155 > URL: https://issues.apache.org/jira/browse/HBASE-15155 > Project: HBase > Issue Type: Bug > Components: monitoring >Affects Versions: 0.98.0, 1.0.0, 0.94.19 >Reporter: Tianying Chang >Assignee: Tianying Chang > > After we upgrade from 94.7 to 94.26 and 1.0, we found that "Show All RPC > handler status" link on RS webUI stops working after running in production > cluster with relatively high load for several days. > Turn out to be it is a bug introduced by > https://issues.apache.org/jira/browse/HBASE-10312 The BoundedFIFOBuffer cause > RPCHandler Status overriden/removed permanently when there is a spike of > non-RPC tasks status that is over the MAX_SIZE (1000). So as long as the RS > experienced "high" load once, the RPC status monitoring is gone forever, > until RS is restarted. > We added a unit test that can repro this. And the fix can pass the test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15155) Show All RPC handler tasks stop working after cluster is under heavy load for a while
[ https://issues.apache.org/jira/browse/HBASE-15155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15111667#comment-15111667 ] Tianying Chang commented on HBASE-15155: Thanks [~ted_yu] will make a new patch. > Show All RPC handler tasks stop working after cluster is under heavy load for > a while > - > > Key: HBASE-15155 > URL: https://issues.apache.org/jira/browse/HBASE-15155 > Project: HBase > Issue Type: Bug > Components: monitoring >Affects Versions: 0.98.0, 1.0.0, 0.94.19 >Reporter: Tianying Chang >Assignee: Tianying Chang > Attachments: hbase-15155.patch > > > After we upgrade from 94.7 to 94.26 and 1.0, we found that "Show All RPC > handler status" link on RS webUI stops working after running in production > cluster with relatively high load for several days. > Turn out to be it is a bug introduced by > https://issues.apache.org/jira/browse/HBASE-10312 The BoundedFIFOBuffer cause > RPCHandler Status overriden/removed permanently when there is a spike of > non-RPC tasks status that is over the MAX_SIZE (1000). So as long as the RS > experienced "high" load once, the RPC status monitoring is gone forever, > until RS is restarted. > We added a unit test that can repro this. And the fix can pass the test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15155) Show All RPC handler tasks stop working after cluster is under heavy load for a while
[ https://issues.apache.org/jira/browse/HBASE-15155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15111992#comment-15111992 ] Tianying Chang commented on HBASE-15155: RPC handler status is inserted when RPC handle is started, not like the non-RPC monitored task, it is long running thread that never exit, so only transition between waiting and running state while RS is up running. > Show All RPC handler tasks stop working after cluster is under heavy load for > a while > - > > Key: HBASE-15155 > URL: https://issues.apache.org/jira/browse/HBASE-15155 > Project: HBase > Issue Type: Bug > Components: monitoring >Affects Versions: 0.98.0, 1.0.0, 0.94.19 >Reporter: Tianying Chang >Assignee: Tianying Chang > Attachments: hbase-15155.patch > > > After we upgrade from 94.7 to 94.26 and 1.0, we found that "Show All RPC > handler status" link on RS webUI stops working after running in production > cluster with relatively high load for several days. > Turn out to be it is a bug introduced by > https://issues.apache.org/jira/browse/HBASE-10312 The BoundedFIFOBuffer cause > RPCHandler Status overriden/removed permanently when there is a spike of > non-RPC tasks status that is over the MAX_SIZE (1000). So as long as the RS > experienced "high" load once, the RPC status monitoring is gone forever, > until RS is restarted. > We added a unit test that can repro this. And the fix can pass the test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13639) SyncTable - rsync for HBase tables
[ https://issues.apache.org/jira/browse/HBASE-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947244#comment-14947244 ] Tianying Chang commented on HBASE-13639: Nice idea! I am wondering if the source table has constant write into it, what will be the outcome? Or I guess this feature is meant for source table that is not being updated? Thanks > SyncTable - rsync for HBase tables > -- > > Key: HBASE-13639 > URL: https://issues.apache.org/jira/browse/HBASE-13639 > Project: HBase > Issue Type: New Feature >Reporter: Dave Latham >Assignee: Dave Latham > Fix For: 2.0.0, 0.98.14, 1.2.0 > > Attachments: HBASE-13639-0.98-addendum-hadoop-1.patch, > HBASE-13639-0.98.patch, HBASE-13639-v1.patch, HBASE-13639-v2.patch, > HBASE-13639-v3-0.98.patch, HBASE-13639-v3.patch, HBASE-13639.patch > > > Given HBase tables in remote clusters with similar but not identical data, > efficiently update a target table such that the data in question is identical > to a source table. Efficiency in this context means using far less network > traffic than would be required to ship all the data from one cluster to the > other. Takes inspiration from rsync. > Design doc: > https://docs.google.com/document/d/1-2c9kJEWNrXf5V4q_wBcoIXfdchN7Pxvxv1IO6PW0-U/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11765) ReplicationSink should merge the Put/Delete of the same row into one Action even if they are from different hlog entry.
[ https://issues.apache.org/jira/browse/HBASE-11765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099873#comment-14099873 ] Tianying Chang commented on HBASE-11765: [~lhofhansl] Thanks for the link. It seems HBASE-8806 is trying to solve the exactly same problem, but using a different approach. My way is to sort all the kvs from all hlog entries. That way, it is able to guarantee for each batch() call sent by replication sink, only one Put/Delete is created for a row, so no lock problem. It fees a little like the approach taken by HBASE-6930. My patch does not change the behavior of multi() in HRegion, only effect replication sink implementation. With this change, a hlog that used to take 4min 20sec to replay only need 30 sec. I will take a deeper look at HBASE-8806. Thanks. ReplicationSink should merge the Put/Delete of the same row into one Action even if they are from different hlog entry. --- Key: HBASE-11765 URL: https://issues.apache.org/jira/browse/HBASE-11765 Project: HBase Issue Type: Improvement Components: Performance, Replication Affects Versions: 0.94.7 Reporter: Tianying Chang Assignee: Tianying Chang Fix For: 0.94.23 Attachments: HBASE-11765.patch The current replicationSink code make sure it will only create one Put/Delete action of the kv of same row if it is from same hlog entry. However, when the same row of Put/Delete exist in different hlog entry, multiple Put/Delete action will be created, this will cause synchronization cost during the multi batch operation. In one of our application traffic pattern which has delete for same row twice for many rows, we saw doMiniBatchMutation() is invoked many times due to the row lock for the same row. ReplicationSink side is super slow, and replication queue build up. We should put the put/delete for the same row into one Put/Delete action even if they are from different hlog entry. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11765) ReplicationSink should merge the Put/Delete of the same row into one Action even if they are from different hlog entry.
Tianying Chang created HBASE-11765: -- Summary: ReplicationSink should merge the Put/Delete of the same row into one Action even if they are from different hlog entry. Key: HBASE-11765 URL: https://issues.apache.org/jira/browse/HBASE-11765 Project: HBase Issue Type: Improvement Components: Performance, Replication Affects Versions: 0.94.7 Reporter: Tianying Chang Assignee: Tianying Chang Fix For: 0.94.7 The current replicationSink code make sure it will only create one Put/Delete action of the kv of same row if it is from same hlog entry. However, when the same row of Put/Delete exist in different hlog entry, multiple Put/Delete action will be created, this will cause synchronization cost during the multi batch operation. In one of our application traffic pattern which has delete for same row twice for many rows, we saw doMiniBatchMutation() is invoked many times due to the row lock for the same row. ReplicationSink side is super slow, and replication queue build up. We should put the put/delete for the same row into one Put/Delete action even if they are from different hlog entry. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11765) ReplicationSink should merge the Put/Delete of the same row into one Action even if they are from different hlog entry.
[ https://issues.apache.org/jira/browse/HBASE-11765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-11765: --- Attachment: HBASE-11765.patch ReplicationSink should merge the Put/Delete of the same row into one Action even if they are from different hlog entry. --- Key: HBASE-11765 URL: https://issues.apache.org/jira/browse/HBASE-11765 Project: HBase Issue Type: Improvement Components: Performance, Replication Affects Versions: 0.94.7 Reporter: Tianying Chang Assignee: Tianying Chang Fix For: 0.94.7 Attachments: HBASE-11765.patch The current replicationSink code make sure it will only create one Put/Delete action of the kv of same row if it is from same hlog entry. However, when the same row of Put/Delete exist in different hlog entry, multiple Put/Delete action will be created, this will cause synchronization cost during the multi batch operation. In one of our application traffic pattern which has delete for same row twice for many rows, we saw doMiniBatchMutation() is invoked many times due to the row lock for the same row. ReplicationSink side is super slow, and replication queue build up. We should put the put/delete for the same row into one Put/Delete action even if they are from different hlog entry. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Work started] (HBASE-11765) ReplicationSink should merge the Put/Delete of the same row into one Action even if they are from different hlog entry.
[ https://issues.apache.org/jira/browse/HBASE-11765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-11765 started by Tianying Chang. ReplicationSink should merge the Put/Delete of the same row into one Action even if they are from different hlog entry. --- Key: HBASE-11765 URL: https://issues.apache.org/jira/browse/HBASE-11765 Project: HBase Issue Type: Improvement Components: Performance, Replication Affects Versions: 0.94.7 Reporter: Tianying Chang Assignee: Tianying Chang Fix For: 0.94.7 Attachments: HBASE-11765.patch The current replicationSink code make sure it will only create one Put/Delete action of the kv of same row if it is from same hlog entry. However, when the same row of Put/Delete exist in different hlog entry, multiple Put/Delete action will be created, this will cause synchronization cost during the multi batch operation. In one of our application traffic pattern which has delete for same row twice for many rows, we saw doMiniBatchMutation() is invoked many times due to the row lock for the same row. ReplicationSink side is super slow, and replication queue build up. We should put the put/delete for the same row into one Put/Delete action even if they are from different hlog entry. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11684) HBase replicationSource should support multithread to ship the log entry
Tianying Chang created HBASE-11684: -- Summary: HBase replicationSource should support multithread to ship the log entry Key: HBASE-11684 URL: https://issues.apache.org/jira/browse/HBASE-11684 Project: HBase Issue Type: Improvement Components: Performance, regionserver, Replication Reporter: Tianying Chang Assignee: Tianying Chang We found the replication rate cannot keep up with the write rate when the master cluster is write heavy. We got huge log queue build up due to that. But when we do a rolling restart of master cluster, we found that the appliedOpsRate doubled due to the extra thread created to help recover the log of the restarted RS. ReplicateLogEntries is a synchronous blocking call, it becomes the bottleneck when is only runs with one thread. I think we should support multi-thread for the replication source to ship the data. I don't see any consistency problem. Any other concern here? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11684) HBase replicationSource should support multithread to ship the log entry
[ https://issues.apache.org/jira/browse/HBASE-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087252#comment-14087252 ] Tianying Chang commented on HBASE-11684: [~jdcryans] Can you give some comments on this? HBase replicationSource should support multithread to ship the log entry Key: HBASE-11684 URL: https://issues.apache.org/jira/browse/HBASE-11684 Project: HBase Issue Type: Improvement Components: Performance, regionserver, Replication Reporter: Tianying Chang Assignee: Tianying Chang We found the replication rate cannot keep up with the write rate when the master cluster is write heavy. We got huge log queue build up due to that. But when we do a rolling restart of master cluster, we found that the appliedOpsRate doubled due to the extra thread created to help recover the log of the restarted RS. ReplicateLogEntries is a synchronous blocking call, it becomes the bottleneck when is only runs with one thread. I think we should support multi-thread for the replication source to ship the data. I don't see any consistency problem. Any other concern here? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11684) HBase replicationSource should support multithread to ship the log entry
[ https://issues.apache.org/jira/browse/HBASE-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-11684: --- Component/s: (was: regionserver) HBase replicationSource should support multithread to ship the log entry Key: HBASE-11684 URL: https://issues.apache.org/jira/browse/HBASE-11684 Project: HBase Issue Type: Improvement Components: Performance, Replication Reporter: Tianying Chang Assignee: Tianying Chang We found the replication rate cannot keep up with the write rate when the master cluster is write heavy. We got huge log queue build up due to that. But when we do a rolling restart of master cluster, we found that the appliedOpsRate doubled due to the extra thread created to help recover the log of the restarted RS. ReplicateLogEntries is a synchronous blocking call, it becomes the bottleneck when is only runs with one thread. I think we should support multi-thread for the replication source to ship the data. I don't see any consistency problem. Any other concern here? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10935) support snapshot policy where flush memstore can be skipped to prevent production cluster freeze
[ https://issues.apache.org/jira/browse/HBASE-10935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015176#comment-14015176 ] Tianying Chang commented on HBASE-10935: [~mbertozzi] Sure! Thanks! support snapshot policy where flush memstore can be skipped to prevent production cluster freeze Key: HBASE-10935 URL: https://issues.apache.org/jira/browse/HBASE-10935 Project: HBase Issue Type: New Feature Components: shell, snapshots Affects Versions: 0.94.7, 0.94.18 Reporter: Tianying Chang Assignee: Tianying Chang Priority: Minor Fix For: 0.94.7, 0.99.0, 0.94.20 Attachments: HBASE-10935-0.94-v1.patch, HBASE-10935-0.98-v1.patch, HBASE-10935-trunk-v1.patch, hbase-10935-94.patch, hbase-10935-trunk.patch We are using snapshot feature to do HBase disaster recovery. We will do snapshot in our production cluster periodically. The current flush snapshot policy require all regions of the table to coordinate to prevent write and do flush at the same time. Since we use WALPlayer to complete the data that is not in the snapshot HFile, we don't need the snapshot to do coordinated flush. The snapshot just recored all the HFile that are already there. I added the parameter in the HBase shell. So people can choose to use the NoFlush snapshot when they need, like below. Otherwise, the default flush snpahot support is not impacted. snaphot 'TestTable', 'TestSnapshot', 'skipFlush' -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10935) support snapshot policy where flush memstore can be skipped to prevent production cluster freeze
[ https://issues.apache.org/jira/browse/HBASE-10935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012111#comment-14012111 ] Tianying Chang commented on HBASE-10935: Never mind. I figure out the cause. It is due to inner class. I have tested the change, and will upload the updated patch. Thanks. support snapshot policy where flush memstore can be skipped to prevent production cluster freeze Key: HBASE-10935 URL: https://issues.apache.org/jira/browse/HBASE-10935 Project: HBase Issue Type: New Feature Components: shell, snapshots Affects Versions: 0.94.7, 0.94.18 Reporter: Tianying Chang Assignee: Tianying Chang Priority: Minor Fix For: 0.99.0 Attachments: jira-10935-trunk.patch, jira-10935.patch We are using snapshot feature to do HBase disaster recovery. We will do snapshot in our production cluster periodically. The current flush snapshot policy require all regions of the table to coordinate to prevent write and do flush at the same time. Since we use WALPlayer to complete the data that is not in the snapshot HFile, we don't need the snapshot to do coordinated flush. The snapshot just recored all the HFile that are already there. I added the parameter in the HBase shell. So people can choose to use the NoFlush snapshot when they need, like below. Otherwise, the default flush snpahot support is not impacted. snaphot 'TestTable', 'TestSnapshot', 'skipFlush' -- This message was sent by Atlassian JIRA (v6.2#6252)