[ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14317823#comment-14317823
 ] 

Ashish Singhi commented on HBASE-9531:
--------------------------------------

Thanks [~andrew.purt...@gmail.com] for the quick response and review.
bq. let us shake out any issues others might have after the fact, if any, with 
improvements
Yeah sure.
bq. One minor issue is the new shell command is missing coverage in TestShell
Addressed in v1 patch. Since status command does not return anything could not 
assert anything in the test case. Please let me know if anything else I could 
do here.
bq. Another minor issue is the patch proposed here for master isn't using the 
new APIs
Addressed in v1 patch.
Please review v1 and share your comments.

> a command line (hbase shell) interface to retreive the replication metrics 
> and show replication lag
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-9531
>                 URL: https://issues.apache.org/jira/browse/HBASE-9531
>             Project: HBase
>          Issue Type: New Feature
>          Components: Replication
>    Affects Versions: 0.99.0
>            Reporter: Demai Ni
>            Assignee: Ashish Singhi
>             Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11
>
>         Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, 
> HBASE-9531-master-v1.patch, HBASE-9531-master-v2.patch, 
> HBASE-9531-master-v3.patch, HBASE-9531-master-v4.patch, 
> HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch, HBASE-9531-v1.patch, 
> HBASE-9531.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive 
> the replication metrics info such as:ageOfLastShippedOp, 
> timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and 
> timeStampsOfLastAppliedOp. And also to provide a point of time info of the 
> lag of replication(source only)
> Understand that hbase is using Hadoop 
> metrics(http://hbase.apache.org/metrics.html), which is a common way to 
> monitor metric info. This Jira is to serve as a light-weight client 
> interface, comparing to a completed(certainly better, but heavier)GUI 
> monitoring package. I made the code works on 0.94.9 now, and like to use this 
> jira to get opinions about whether the feature is valuable to other 
> users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell 
> command 'status', and invent a new module, called ReplicationLoad.  In 
> HRegionServer.buildServerLoad() , use the local replication service objects 
> to get their loads  which could be wrapped in a ReplicationLoad object and 
> then simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
> ReplicationSinkMetrics, a few getters and setters will be created, and ask 
> Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for 
> his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
>       if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
> timeStampsOfLastShippedOp)) //err on the large side
>       else if (current time - timeStampsOfLastShippedOp) < 2* 
> ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
> recently 
>         else lag = 0 // last shipped may happens last night, so NO real lag 
> although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>         SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>         SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>         SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>         SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>         SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>         SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>         SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>         SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest015.svl.ibm.com:
>         SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','sink'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>         SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>         SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>         SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','lag' 
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com: lag = 0
>     hdtest018.svl.ibm.com: lag = 14
>     hdtest015.svl.ibm.com: lag = 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to