[jira] [Commented] (HBASE-16077) Replication status doesnt show failed RS metrics in CLI
[ https://issues.apache.org/jira/browse/HBASE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15596532#comment-15596532 ] Demai Ni commented on HBASE-16077: -- [~bibinchundatt], would you please elaborate a bit detail what do you expect from the output of the CLI command? thanks > Replication status doesnt show failed RS metrics in CLI > > > Key: HBASE-16077 > URL: https://issues.apache.org/jira/browse/HBASE-16077 > Project: HBase > Issue Type: Bug >Reporter: Bibin A Chundatt > > Steps to reproduce > > # Create 2 clusters and configure replication > # Create TABLE 1 and enable table replication > # Shutdown Cluster 2 for short period. > # Load data to TABLE 1 > # Shutdown Region Server whr Region of TABLE 1 is available > # Check metrics using CLI > {noformat} > hbase(main):003:0* status 'replication' > 2016-06-14 00:58:04,664 INFO [main] ipc.AbstractRpcClient: RPC Server > Kerberos principal name for service=MasterService is > hbase/hadoop.hadoop@hadoop.com > version 1.0.2 > 3 live servers > host-10-19-92-200: >SOURCE: PeerID=11, SizeOfLogQueue=0, ShippedBatches=30, > ShippedOps=1351, ShippedBytes=1513127672, LogReadInBytes=662648911, > LogEditsRead=1546, LogEditsFiltered=1409, SizeOfLogToReplicate=0, > TimeWillBeTakenForLogToReplicate=0, ShippedHFiles=0, SizeOfHFileRefsQueue=0, > AgeOfLastShippedOp=0, TimeStampsOfLastShippedOp=Tue Jun 14 00:58:01 IST 2016, > Replication Lag=0 >SINK : AppliedBatches=2, AppliedOps=5, AppliedHFiles=3, > AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Mon Jun 13 02:18:06 IST 2016 > host-10-19-92-187: >SOURCE: PeerID=11, SizeOfLogQueue=0, ShippedBatches=0, ShippedOps=0, > ShippedBytes=0, LogReadInBytes=65719, LogEditsRead=112, LogEditsFiltered=112, > SizeOfLogToReplicate=0, TimeWillBeTakenForLogToReplicate=0, ShippedHFiles=0, > SizeOfHFileRefsQueue=0, AgeOfLastShippedOp=0, TimeStampsOfLastShippedOp=Tue > Jun 14 00:58:01 IST 2016, Replication Lag=0 >SINK : AppliedBatches=0, AppliedOps=0, AppliedHFiles=0, > AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Mon Jun 13 09:07:20 IST 2016 > host-10-19-92-188: >SOURCE: PeerID=11, SizeOfLogQueue=0, ShippedBatches=39, > ShippedOps=1730, ShippedBytes=1937609744, LogReadInBytes=848439638, > LogEditsRead=1671, LogEditsFiltered=1497, SizeOfLogToReplicate=0, > TimeWillBeTakenForLogToReplicate=0, ShippedHFiles=0, SizeOfHFileRefsQueue=0, > AgeOfLastShippedOp=0, TimeStampsOfLastShippedOp=Tue Jun 14 00:58:03 IST 2016, > Replication Lag=0 >SINK : AppliedBatches=1, AppliedOps=1, AppliedHFiles=0, > AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Mon Jun 13 01:53:53 IST 2016 > {noformat} > *JMX output* > {noformat} > { > "name" : "Hadoop:service=HBase,name=RegionServer,sub=Replication", > "modelerType" : "RegionServer,sub=Replication", > "tag.Context" : "regionserver", > "tag.Hostname" : "host-10-19-92-200", > "source.11.sizeOfLogToReplicate" : 537, > "source.11-host-10-19-92-187,21302,1465787242095.sizeOfLogToReplicate" : > 282766680, > "source.shippedHFiles" : 0, > "source.ageOfLastShippedOp" : 0, > "source.11.shippedHFiles" : 0, > "source.11-host-10-19-92-187,21302,1465787242095.ageOfLastShippedOp" : 0, > "source.shippedKBs" : 1477663, > "source.sizeOfHFileRefsQueue" : 0, > "source.logReadInBytes" : 691148656, > "source.11-host-10-19-92-187,21302,1465787242095.logEditsRead" : 39, > "source.11-host-10-19-92-187,21302,1465787242095.shippedOps" : 0, > "source.11.logEditsFiltered" : 1244, > "source.sizeOfLogQueue" : 4, > "source.timeWillBeTakenForLogToReplicate" : 1, > "sink.ageOfLastAppliedOp" : 0, > > "source.11-host-10-19-92-187,21302,1465787242095.timeWillBeTakenForLogToReplicate" > : 0, > "source.logEditsRead" : 1420, > "source.11.sizeOfLogQueue" : 0, > "source.11-host-10-19-92-187,21302,1465787242095.logEditsFiltered" : 32, > "source.11-host-10-19-92-187,21302,1465787242095.shippedHFiles" : 0, > "source.shippedOps" : 1351, > "source.11.shippedKBs" : 1477663, > "source.11.logReadInBytes" : 662562515, > "sink.appliedHFiles" : 3, > "source.11.sizeOfHFileRefsQueue" : 0, > "source.logEditsFiltered" : 1276, > "source.shippedBytes" : 1513127672, > "source.11-host-10-19-92-187,21302,1465787242095.shippedBatches" : 0, > "source.11.shippedBytes" : 1513127672, > "sink.appliedOps" : 5, > "source.11-host-10-19-92-187,21302,1465787242095.sizeOfLogQueue" : 4, > "source.11.shippedBatches" : 30, > "source.11-host-10-19-92-187,21302,1465787242095.sizeOfHFileRefsQueue" : > 0, > "source.11.timeWillBeTakenForLogToReplicate" : 1, > "source.11-host-10-19-92-187,21302,1465787242095.logReadInBytes" :
[jira] [Commented] (HBASE-11085) Incremental Backup Restore support
[ https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598650#comment-14598650 ] Demai Ni commented on HBASE-11085: -- [~vrodionov], Thanks. :-) > Incremental Backup Restore support > -- > > Key: HBASE-11085 > URL: https://issues.apache.org/jira/browse/HBASE-11085 > Project: HBase > Issue Type: New Feature >Reporter: Demai Ni >Assignee: Vladimir Rodionov > Attachments: > HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch, > HBASE-11085-trunk-v1.patch, > HBASE-11085-trunk-v2-contain-HBASE-10900-trunk-v4.patch, > HBASE-11085-trunk-v2.patch, HLogPlayer.java > > > h2. Feature Description > the jira is part of > [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on > full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. > for the detail layout and frame work, please reference to [HBASE-10900| > https://issues.apache.org/jira/browse/HBASE-10900]. > When client issues an incremental backup request, BackupManager will check > the request and then kicks of a global procedure via HBaseAdmin for all the > active regionServer to roll log. Each Region server will record their log > number into zookeeper. Then we determine which log need to be included in > this incremental backup, and use DistCp to copy them to target location. At > the same time, a dependency of backup image will be recorded, and later on > saved in Backup Manifest file. > Restore is to replay the backuped WAL logs on target HBase instance. The > replay will occur after full backup. > As incremental backup image depends on prior full backup image and > incremental images if exists. Manifest file will be used to store the > dependency lineage during backup, and used during restore time for PIT > restore. > h2. Use case(i.e example) > {code:title=Incremental Backup Restore example|borderStyle=solid} > /***/ > /* STEP1: FULL backup from sourcecluster to targetcluster > /* if no table name specified, all tables from source cluster will be > backuped > /***/ > [sourcecluster]$ hbase backup create full > hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn > ... > 14/05/09 13:35:46 INFO backup.BackupManager: Backup request > backup_1399667695966 has been executed. > /***/ > /* STEP2: In HBase Shell, put a few rows > > /***/ > hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1' > hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1' > hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1' > /***/ > /* STEP3: Take the 1st incremental backup > > /***/ > [sourcecluster]$ hbase backup create incremental > hdfs://hostname.targetcluster.org:9000/userid/backupdir > ... > 14/05/09 13:37:45 INFO backup.BackupManager: Backup request > backup_1399667851020 has been executed. > /***/ > /* STEP4: In HBase Shell, put a few more rows. > > /* update 'row100', and create new 'row101' > > /***/ > hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2' > hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2' > hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2' > /***/ > /* STEP5: Take the 2nd incremental backup > > /***/ > [sourcecluster]$ hbase backup create incremental > hdfs://hostname.targetcluster.org:9000/userid/backupdir > ... > 14/05/09 13:39:33 INFO backup.BackupManager: Backup requ
[jira] [Commented] (HBASE-10900) FULL table backup and restore
[ https://issues.apache.org/jira/browse/HBASE-10900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327973#comment-14327973 ] Demai Ni commented on HBASE-10900: -- Due to personal reason, I can't work directly to contribute back to open source community at this moment. Put this jira as 'unassigned', and remove fixed version hopefully, someone can pick it up. or Or my situation may change and I can continue to work on this. Thanks... Demai > FULL table backup and restore > - > > Key: HBASE-10900 > URL: https://issues.apache.org/jira/browse/HBASE-10900 > Project: HBase > Issue Type: Task >Reporter: Demai Ni > Attachments: HBASE-10900-fullbackup-trunk-v1.patch, > HBASE-10900-trunk-v2.patch, HBASE-10900-trunk-v3.patch, > HBASE-10900-trunk-v4.patch > > > h2. Feature Description > This is a subtask of > [HBase-7912|https://issues.apache.org/jira/browse/HBASE-7912] to support FULL > backup/restore, and will complete the following function: > {code:title=Backup Restore example|borderStyle=solid} > /* backup from sourcecluster to targetcluster > */ > /* if no table name specified, all tables from source cluster will be > backuped */ > [sourcecluster]$ hbase backup create full > hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn > /* restore on targetcluser, this is a local restore > */ > /* backup_1396650096738 - backup image name > */ > /* t1_dn,etc are the original table names. All tables will be restored if not > specified */ > /* t1_dn_restore, etc. are the restored table. if not specified, orginal > table name will be used*/ > [targetcluster]$ hbase restore /userid/backupdir backup_1396650096738 > t1_dn,t2_dn,t3_dn t1_dn_restore,t2_dn_restore,t3_dn_restore > /* restore from targetcluster back to source cluster, this is a remote restore > [sourcecluster]$ hbase restore > hdfs://hostname.targetcluster.org:9000/userid/backupdir backup_1396650096738 > t1_dn,t2_dn,t3_dn t1_dn_restore,t2_dn_restore,t3_dn_restore > {code} > h2. Detail layout and frame work for the next jiras > The patch is a wrapper of the existing snapshot and exportSnapshot, and will > use as the base framework for the over-all solution of > [HBase-7912|https://issues.apache.org/jira/browse/HBASE-7912] as described > below: > * *bin/hbase* : end-user command line interface to invoke > BackupClient and RestoreClient > * *BackupClient.java* : 'main' entry for backup operations. This patch will > only support 'full' backup. In future jiras, will support: > ** *create* incremental backup > ** *cancel* an ongoing backup > ** *delete* an exisitng backup image > ** *describe* the detailed informaiton of backup image > ** show *history* of all successful backups > ** show the *status* of the latest backup request > ** *convert* incremental backup WAL files into HFiles. either on-the-fly > during create or after create > ** *merge* backup image > ** *stop* backup a table of existing backup image > ** *show* tables of a backup image > * *BackupCommands.java* : a place to keep all the command usages and options > * *BackupManager.java* : handle backup requests on server-side, create > BACKUP ZOOKEEPER nodes to keep track backup. The timestamps kept in zookeeper > will be used for future incremental backup (not included in this jira). > Create BackupContext and DispatchRequest. > * *BackupHandler.java* : in this patch, it is a wrapper of snapshot and > exportsnapshot. In future jiras, > ** *timestamps* info will be recorded in ZK > ** carry on *incremental* backup. > ** update backup *progress* > ** set flags of *status* > ** build up *backupManifest* file(in this jira only limited info for > fullback. later on, timestamps and dependency of multipl backup images are > also recorded here) > ** clean up after *failed* backup > ** clean up after *cancelled* backup > ** allow on-the-fly *convert* during incremental backup > * *BackupContext.java* : encapsulate backup information like backup ID, table > names, directory info, phase, TimeStamps of backup progress, size of data, > ancestor info, etc. > * *BackupCopier.java* : the copying operation. Later on, to support > progress report and mapper estimation; and extends DisCp for progress > updating to ZK during backup. > * *BackupExcpetion.java*: to handle exception from backup/restore > * *BackupManifest.java* : encapsulate all the backup image information. The > manifest info will be bundled as manifest file together with data. So that > each backup image will contain all the info needed for restore. > * *BackupStatus.java* : encapsulate backup status at table level during > backup progress > * *BackupUtil.j
[jira] [Commented] (HBASE-11085) Incremental Backup Restore support
[ https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327975#comment-14327975 ] Demai Ni commented on HBASE-11085: -- Due to personal reason, I can't work directly to contribute back to open source community at this moment. Put this jira as 'unassigned', and remove fixed version hopefully, someone can pick it up. Or my situation may change and then I will continue to work on this. Thanks... Demai > Incremental Backup Restore support > -- > > Key: HBASE-11085 > URL: https://issues.apache.org/jira/browse/HBASE-11085 > Project: HBase > Issue Type: New Feature >Reporter: Demai Ni > Attachments: > HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch, > HBASE-11085-trunk-v1.patch, > HBASE-11085-trunk-v2-contain-HBASE-10900-trunk-v4.patch, > HBASE-11085-trunk-v2.patch, HLogPlayer.java > > > h2. Feature Description > the jira is part of > [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on > full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. > for the detail layout and frame work, please reference to [HBASE-10900| > https://issues.apache.org/jira/browse/HBASE-10900]. > When client issues an incremental backup request, BackupManager will check > the request and then kicks of a global procedure via HBaseAdmin for all the > active regionServer to roll log. Each Region server will record their log > number into zookeeper. Then we determine which log need to be included in > this incremental backup, and use DistCp to copy them to target location. At > the same time, a dependency of backup image will be recorded, and later on > saved in Backup Manifest file. > Restore is to replay the backuped WAL logs on target HBase instance. The > replay will occur after full backup. > As incremental backup image depends on prior full backup image and > incremental images if exists. Manifest file will be used to store the > dependency lineage during backup, and used during restore time for PIT > restore. > h2. Use case(i.e example) > {code:title=Incremental Backup Restore example|borderStyle=solid} > /***/ > /* STEP1: FULL backup from sourcecluster to targetcluster > /* if no table name specified, all tables from source cluster will be > backuped > /***/ > [sourcecluster]$ hbase backup create full > hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn > ... > 14/05/09 13:35:46 INFO backup.BackupManager: Backup request > backup_1399667695966 has been executed. > /***/ > /* STEP2: In HBase Shell, put a few rows > > /***/ > hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1' > hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1' > hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1' > /***/ > /* STEP3: Take the 1st incremental backup > > /***/ > [sourcecluster]$ hbase backup create incremental > hdfs://hostname.targetcluster.org:9000/userid/backupdir > ... > 14/05/09 13:37:45 INFO backup.BackupManager: Backup request > backup_1399667851020 has been executed. > /***/ > /* STEP4: In HBase Shell, put a few more rows. > > /* update 'row100', and create new 'row101' > > /***/ > hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2' > hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2' > hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2' > /***/ > /* STEP5: Take the 2nd incremental backup > > /***
[jira] [Updated] (HBASE-11085) Incremental Backup Restore support
[ https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11085: - Fix Version/s: (was: 1.1.0) Assignee: (was: Demai Ni) > Incremental Backup Restore support > -- > > Key: HBASE-11085 > URL: https://issues.apache.org/jira/browse/HBASE-11085 > Project: HBase > Issue Type: New Feature >Reporter: Demai Ni > Attachments: > HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch, > HBASE-11085-trunk-v1.patch, > HBASE-11085-trunk-v2-contain-HBASE-10900-trunk-v4.patch, > HBASE-11085-trunk-v2.patch, HLogPlayer.java > > > h2. Feature Description > the jira is part of > [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on > full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. > for the detail layout and frame work, please reference to [HBASE-10900| > https://issues.apache.org/jira/browse/HBASE-10900]. > When client issues an incremental backup request, BackupManager will check > the request and then kicks of a global procedure via HBaseAdmin for all the > active regionServer to roll log. Each Region server will record their log > number into zookeeper. Then we determine which log need to be included in > this incremental backup, and use DistCp to copy them to target location. At > the same time, a dependency of backup image will be recorded, and later on > saved in Backup Manifest file. > Restore is to replay the backuped WAL logs on target HBase instance. The > replay will occur after full backup. > As incremental backup image depends on prior full backup image and > incremental images if exists. Manifest file will be used to store the > dependency lineage during backup, and used during restore time for PIT > restore. > h2. Use case(i.e example) > {code:title=Incremental Backup Restore example|borderStyle=solid} > /***/ > /* STEP1: FULL backup from sourcecluster to targetcluster > /* if no table name specified, all tables from source cluster will be > backuped > /***/ > [sourcecluster]$ hbase backup create full > hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn > ... > 14/05/09 13:35:46 INFO backup.BackupManager: Backup request > backup_1399667695966 has been executed. > /***/ > /* STEP2: In HBase Shell, put a few rows > > /***/ > hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1' > hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1' > hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1' > /***/ > /* STEP3: Take the 1st incremental backup > > /***/ > [sourcecluster]$ hbase backup create incremental > hdfs://hostname.targetcluster.org:9000/userid/backupdir > ... > 14/05/09 13:37:45 INFO backup.BackupManager: Backup request > backup_1399667851020 has been executed. > /***/ > /* STEP4: In HBase Shell, put a few more rows. > > /* update 'row100', and create new 'row101' > > /***/ > hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2' > hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2' > hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2' > /***/ > /* STEP5: Take the 2nd incremental backup > > /***/ > [sourcecluster]$ hbase backup create incremental > hdfs://hostname.targetcluster.org:9000/userid/backupdir > ... > 14/05/09 13:39:33 INFO backup.BackupManager: Backup request > backup_1399667959165 has been executed. > /*
[jira] [Updated] (HBASE-10900) FULL table backup and restore
[ https://issues.apache.org/jira/browse/HBASE-10900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-10900: - Fix Version/s: (was: 1.1.0) Assignee: (was: Demai Ni) > FULL table backup and restore > - > > Key: HBASE-10900 > URL: https://issues.apache.org/jira/browse/HBASE-10900 > Project: HBase > Issue Type: Task >Reporter: Demai Ni > Attachments: HBASE-10900-fullbackup-trunk-v1.patch, > HBASE-10900-trunk-v2.patch, HBASE-10900-trunk-v3.patch, > HBASE-10900-trunk-v4.patch > > > h2. Feature Description > This is a subtask of > [HBase-7912|https://issues.apache.org/jira/browse/HBASE-7912] to support FULL > backup/restore, and will complete the following function: > {code:title=Backup Restore example|borderStyle=solid} > /* backup from sourcecluster to targetcluster > */ > /* if no table name specified, all tables from source cluster will be > backuped */ > [sourcecluster]$ hbase backup create full > hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn > /* restore on targetcluser, this is a local restore > */ > /* backup_1396650096738 - backup image name > */ > /* t1_dn,etc are the original table names. All tables will be restored if not > specified */ > /* t1_dn_restore, etc. are the restored table. if not specified, orginal > table name will be used*/ > [targetcluster]$ hbase restore /userid/backupdir backup_1396650096738 > t1_dn,t2_dn,t3_dn t1_dn_restore,t2_dn_restore,t3_dn_restore > /* restore from targetcluster back to source cluster, this is a remote restore > [sourcecluster]$ hbase restore > hdfs://hostname.targetcluster.org:9000/userid/backupdir backup_1396650096738 > t1_dn,t2_dn,t3_dn t1_dn_restore,t2_dn_restore,t3_dn_restore > {code} > h2. Detail layout and frame work for the next jiras > The patch is a wrapper of the existing snapshot and exportSnapshot, and will > use as the base framework for the over-all solution of > [HBase-7912|https://issues.apache.org/jira/browse/HBASE-7912] as described > below: > * *bin/hbase* : end-user command line interface to invoke > BackupClient and RestoreClient > * *BackupClient.java* : 'main' entry for backup operations. This patch will > only support 'full' backup. In future jiras, will support: > ** *create* incremental backup > ** *cancel* an ongoing backup > ** *delete* an exisitng backup image > ** *describe* the detailed informaiton of backup image > ** show *history* of all successful backups > ** show the *status* of the latest backup request > ** *convert* incremental backup WAL files into HFiles. either on-the-fly > during create or after create > ** *merge* backup image > ** *stop* backup a table of existing backup image > ** *show* tables of a backup image > * *BackupCommands.java* : a place to keep all the command usages and options > * *BackupManager.java* : handle backup requests on server-side, create > BACKUP ZOOKEEPER nodes to keep track backup. The timestamps kept in zookeeper > will be used for future incremental backup (not included in this jira). > Create BackupContext and DispatchRequest. > * *BackupHandler.java* : in this patch, it is a wrapper of snapshot and > exportsnapshot. In future jiras, > ** *timestamps* info will be recorded in ZK > ** carry on *incremental* backup. > ** update backup *progress* > ** set flags of *status* > ** build up *backupManifest* file(in this jira only limited info for > fullback. later on, timestamps and dependency of multipl backup images are > also recorded here) > ** clean up after *failed* backup > ** clean up after *cancelled* backup > ** allow on-the-fly *convert* during incremental backup > * *BackupContext.java* : encapsulate backup information like backup ID, table > names, directory info, phase, TimeStamps of backup progress, size of data, > ancestor info, etc. > * *BackupCopier.java* : the copying operation. Later on, to support > progress report and mapper estimation; and extends DisCp for progress > updating to ZK during backup. > * *BackupExcpetion.java*: to handle exception from backup/restore > * *BackupManifest.java* : encapsulate all the backup image information. The > manifest info will be bundled as manifest file together with data. So that > each backup image will contain all the info needed for restore. > * *BackupStatus.java* : encapsulate backup status at table level during > backup progress > * *BackupUtil.java* : utility methods during backup process > * *RestoreClient.java* : 'main' entry for restore operations. This patch > will only support 'full' backup. > * *RestoreUtil.java*: utility methods during restore process > * *ExportSnapshot.java* : remove 'fin
[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14319278#comment-14319278 ] Demai Ni commented on HBASE-9531: - [~ashish singhi], Thank you so much for picking it up and complete the jira [~apurtell], thanks a lot for pushing the feature in. glad the code can be used by more users. > a command line (hbase shell) interface to retreive the replication metrics > and show replication lag > --- > > Key: HBASE-9531 > URL: https://issues.apache.org/jira/browse/HBASE-9531 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 0.99.0 >Reporter: Demai Ni >Assignee: Ashish Singhi > Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 > > Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, > HBASE-9531-master-v1.patch, HBASE-9531-master-v2.patch, > HBASE-9531-master-v3.patch, HBASE-9531-master-v4.patch, > HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch, HBASE-9531-v1.patch, > HBASE-9531-v2.patch, HBASE-9531-v3-0.98.patch, HBASE-9531-v3-branch-1.patch, > HBASE-9531-v3.patch, HBASE-9531.patch > > > This jira is to provide a command line (hbase shell) interface to retreive > the replication metrics info such as:ageOfLastShippedOp, > timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and > timeStampsOfLastAppliedOp. And also to provide a point of time info of the > lag of replication(source only) > Understand that hbase is using Hadoop > metrics(http://hbase.apache.org/metrics.html), which is a common way to > monitor metric info. This Jira is to serve as a light-weight client > interface, comparing to a completed(certainly better, but heavier)GUI > monitoring package. I made the code works on 0.94.9 now, and like to use this > jira to get opinions about whether the feature is valuable to other > users/workshop. If so, I will build a trunk patch. > All inputs are greatly appreciated. Thank you! > The overall design is to reuse the existing logic which supports hbase shell > command 'status', and invent a new module, called ReplicationLoad. In > HRegionServer.buildServerLoad() , use the local replication service objects > to get their loads which could be wrapped in a ReplicationLoad object and > then simply pass it to the ServerLoad. In ReplicationSourceMetrics and > ReplicationSinkMetrics, a few getters and setters will be created, and ask > Replication to build a "ReplicationLoad". (many thanks to Jean-Daniel for > his kindly suggestions through dev email list) > the replication lag will be calculated for source only, and use this formula: > {code:title=Replication lag|borderStyle=solid} > if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - > timeStampsOfLastShippedOp)) //err on the large side > else if (current time - timeStampsOfLastShippedOp) < 2* > ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen > recently > else lag = 0 // last shipped may happens last night, so NO real lag > although ageOfLastShippedOp is non-zero > {code} > External will look something like: > {code:title=status 'replication'|borderStyle=solid} > hbase(main):001:0> status 'replication' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):002:0> status 'replication','source' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','sink' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp
[jira] [Commented] (HBASE-10900) FULL table backup and restore
[ https://issues.apache.org/jira/browse/HBASE-10900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303736#comment-14303736 ] Demai Ni commented on HBASE-10900: -- [~apurtell], sounds the right way to go. [~jerryhe], any objections? If not, I will go ahead resolve the jiras under my name as not fix. > FULL table backup and restore > - > > Key: HBASE-10900 > URL: https://issues.apache.org/jira/browse/HBASE-10900 > Project: HBase > Issue Type: Task >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 1.1.0 > > Attachments: HBASE-10900-fullbackup-trunk-v1.patch, > HBASE-10900-trunk-v2.patch, HBASE-10900-trunk-v3.patch, > HBASE-10900-trunk-v4.patch > > > h2. Feature Description > This is a subtask of > [HBase-7912|https://issues.apache.org/jira/browse/HBASE-7912] to support FULL > backup/restore, and will complete the following function: > {code:title=Backup Restore example|borderStyle=solid} > /* backup from sourcecluster to targetcluster > */ > /* if no table name specified, all tables from source cluster will be > backuped */ > [sourcecluster]$ hbase backup create full > hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn > /* restore on targetcluser, this is a local restore > */ > /* backup_1396650096738 - backup image name > */ > /* t1_dn,etc are the original table names. All tables will be restored if not > specified */ > /* t1_dn_restore, etc. are the restored table. if not specified, orginal > table name will be used*/ > [targetcluster]$ hbase restore /userid/backupdir backup_1396650096738 > t1_dn,t2_dn,t3_dn t1_dn_restore,t2_dn_restore,t3_dn_restore > /* restore from targetcluster back to source cluster, this is a remote restore > [sourcecluster]$ hbase restore > hdfs://hostname.targetcluster.org:9000/userid/backupdir backup_1396650096738 > t1_dn,t2_dn,t3_dn t1_dn_restore,t2_dn_restore,t3_dn_restore > {code} > h2. Detail layout and frame work for the next jiras > The patch is a wrapper of the existing snapshot and exportSnapshot, and will > use as the base framework for the over-all solution of > [HBase-7912|https://issues.apache.org/jira/browse/HBASE-7912] as described > below: > * *bin/hbase* : end-user command line interface to invoke > BackupClient and RestoreClient > * *BackupClient.java* : 'main' entry for backup operations. This patch will > only support 'full' backup. In future jiras, will support: > ** *create* incremental backup > ** *cancel* an ongoing backup > ** *delete* an exisitng backup image > ** *describe* the detailed informaiton of backup image > ** show *history* of all successful backups > ** show the *status* of the latest backup request > ** *convert* incremental backup WAL files into HFiles. either on-the-fly > during create or after create > ** *merge* backup image > ** *stop* backup a table of existing backup image > ** *show* tables of a backup image > * *BackupCommands.java* : a place to keep all the command usages and options > * *BackupManager.java* : handle backup requests on server-side, create > BACKUP ZOOKEEPER nodes to keep track backup. The timestamps kept in zookeeper > will be used for future incremental backup (not included in this jira). > Create BackupContext and DispatchRequest. > * *BackupHandler.java* : in this patch, it is a wrapper of snapshot and > exportsnapshot. In future jiras, > ** *timestamps* info will be recorded in ZK > ** carry on *incremental* backup. > ** update backup *progress* > ** set flags of *status* > ** build up *backupManifest* file(in this jira only limited info for > fullback. later on, timestamps and dependency of multipl backup images are > also recorded here) > ** clean up after *failed* backup > ** clean up after *cancelled* backup > ** allow on-the-fly *convert* during incremental backup > * *BackupContext.java* : encapsulate backup information like backup ID, table > names, directory info, phase, TimeStamps of backup progress, size of data, > ancestor info, etc. > * *BackupCopier.java* : the copying operation. Later on, to support > progress report and mapper estimation; and extends DisCp for progress > updating to ZK during backup. > * *BackupExcpetion.java*: to handle exception from backup/restore > * *BackupManifest.java* : encapsulate all the backup image information. The > manifest info will be bundled as manifest file together with data. So that > each backup image will contain all the info needed for restore. > * *BackupStatus.java* : encapsulate backup status at table level during > backup progress > * *BackupUtil.java* : utility methods during backup process > * *RestoreClient.java* : 'main
[jira] [Commented] (HBASE-10900) FULL table backup and restore
[ https://issues.apache.org/jira/browse/HBASE-10900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14302677#comment-14302677 ] Demai Ni commented on HBASE-10900: -- [~ram_krish], thanks for the ping. After chatted with several folks in hbase community, the plan is to build a stand-alone utility in github(or some place similar) instead of pushing the large portion of code into hbase core code. I think [~jinghe]is still planning to get it done. > FULL table backup and restore > - > > Key: HBASE-10900 > URL: https://issues.apache.org/jira/browse/HBASE-10900 > Project: HBase > Issue Type: Task >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 1.1.0 > > Attachments: HBASE-10900-fullbackup-trunk-v1.patch, > HBASE-10900-trunk-v2.patch, HBASE-10900-trunk-v3.patch, > HBASE-10900-trunk-v4.patch > > > h2. Feature Description > This is a subtask of > [HBase-7912|https://issues.apache.org/jira/browse/HBASE-7912] to support FULL > backup/restore, and will complete the following function: > {code:title=Backup Restore example|borderStyle=solid} > /* backup from sourcecluster to targetcluster > */ > /* if no table name specified, all tables from source cluster will be > backuped */ > [sourcecluster]$ hbase backup create full > hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn > /* restore on targetcluser, this is a local restore > */ > /* backup_1396650096738 - backup image name > */ > /* t1_dn,etc are the original table names. All tables will be restored if not > specified */ > /* t1_dn_restore, etc. are the restored table. if not specified, orginal > table name will be used*/ > [targetcluster]$ hbase restore /userid/backupdir backup_1396650096738 > t1_dn,t2_dn,t3_dn t1_dn_restore,t2_dn_restore,t3_dn_restore > /* restore from targetcluster back to source cluster, this is a remote restore > [sourcecluster]$ hbase restore > hdfs://hostname.targetcluster.org:9000/userid/backupdir backup_1396650096738 > t1_dn,t2_dn,t3_dn t1_dn_restore,t2_dn_restore,t3_dn_restore > {code} > h2. Detail layout and frame work for the next jiras > The patch is a wrapper of the existing snapshot and exportSnapshot, and will > use as the base framework for the over-all solution of > [HBase-7912|https://issues.apache.org/jira/browse/HBASE-7912] as described > below: > * *bin/hbase* : end-user command line interface to invoke > BackupClient and RestoreClient > * *BackupClient.java* : 'main' entry for backup operations. This patch will > only support 'full' backup. In future jiras, will support: > ** *create* incremental backup > ** *cancel* an ongoing backup > ** *delete* an exisitng backup image > ** *describe* the detailed informaiton of backup image > ** show *history* of all successful backups > ** show the *status* of the latest backup request > ** *convert* incremental backup WAL files into HFiles. either on-the-fly > during create or after create > ** *merge* backup image > ** *stop* backup a table of existing backup image > ** *show* tables of a backup image > * *BackupCommands.java* : a place to keep all the command usages and options > * *BackupManager.java* : handle backup requests on server-side, create > BACKUP ZOOKEEPER nodes to keep track backup. The timestamps kept in zookeeper > will be used for future incremental backup (not included in this jira). > Create BackupContext and DispatchRequest. > * *BackupHandler.java* : in this patch, it is a wrapper of snapshot and > exportsnapshot. In future jiras, > ** *timestamps* info will be recorded in ZK > ** carry on *incremental* backup. > ** update backup *progress* > ** set flags of *status* > ** build up *backupManifest* file(in this jira only limited info for > fullback. later on, timestamps and dependency of multipl backup images are > also recorded here) > ** clean up after *failed* backup > ** clean up after *cancelled* backup > ** allow on-the-fly *convert* during incremental backup > * *BackupContext.java* : encapsulate backup information like backup ID, table > names, directory info, phase, TimeStamps of backup progress, size of data, > ancestor info, etc. > * *BackupCopier.java* : the copying operation. Later on, to support > progress report and mapper estimation; and extends DisCp for progress > updating to ZK during backup. > * *BackupExcpetion.java*: to handle exception from backup/restore > * *BackupManifest.java* : encapsulate all the backup image information. The > manifest info will be bundled as manifest file together with data. So that > each backup image will contain all the info needed for restore. > * *BackupStatus.java* : encapsulate backup s
[jira] [Commented] (HBASE-12073) Shell command user_permission fails on the table created by user if he is not global admin.
[ https://issues.apache.org/jira/browse/HBASE-12073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146581#comment-14146581 ] Demai Ni commented on HBASE-12073: -- [~esteban], Matteo is right. [HBASE-11452 | https://issues.apache.org/jira/browse/HBASE-11452] was to provide a java client API, and it also changed the ruby script so that the external behavior will be consistent. There was no intention to change the existing logic. Certainly, if decide to change the logic as [~apurtell] mentioned, client code of AccessControlClient.java is a place to start with... Demai > Shell command user_permission fails on the table created by user if he is not > global admin. > -- > > Key: HBASE-12073 > URL: https://issues.apache.org/jira/browse/HBASE-12073 > Project: HBase > Issue Type: Bug >Reporter: Srikanth Srungarapu >Assignee: Srikanth Srungarapu >Priority: Minor > > The command fails as the changes introduced by HBASE-10892 requires user > (because of newly introduced call to getTableDescriptors) to have global > admin permission. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11617) incorrect AgeOfLastAppliedOp and AgeOfLastShippedOp in replication Metrics when no new replication OP
[ https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108278#comment-14108278 ] Demai Ni commented on HBASE-11617: -- [~apurtell] and [~lhofhansl], many thanks for the review and committing the patch > incorrect AgeOfLastAppliedOp and AgeOfLastShippedOp in replication Metrics > when no new replication OP > -- > > Key: HBASE-11617 > URL: https://issues.apache.org/jira/browse/HBASE-11617 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 0.98.2 >Reporter: Demai Ni >Assignee: Demai Ni >Priority: Minor > Fix For: 0.99.0, 2.0.0, 0.98.6 > > Attachments: HBASE-11617-master-v1.patch > > > AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in > the 'replication queue' before it got replicated(aka applied) > {code} > /** >* Set the age of the last applied operation >* >* @param timestamp The timestamp of the last operation applied. >* @return the age that was set >*/ > public long setAgeOfLastAppliedOp(long timestamp) { > lastTimestampForAge = timestamp; > long age = System.currentTimeMillis() - lastTimestampForAge; > rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); > return age; > } > {code} > In the following scenario: > 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is > set for example 100ms; > 2) and then NO new Sink op occur. > 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of > return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, > It was because that refreshAgeOfLastAppliedOp() get invoked periodically by > getStats(). > proposed fix: > {code} > --- > hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java > +++ > hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java > @@ -35,6 +35,7 @@ public class MetricsSink { > >private MetricsReplicationSource rms; >private long lastTimestampForAge = System.currentTimeMillis(); > + private long age = 0; > >public MetricsSink() { > rms = > CompatibilitySingletonFactory.getInstance(MetricsReplicationSource.class); > @@ -47,8 +48,12 @@ public class MetricsSink { > * @return the age that was set > */ >public long setAgeOfLastAppliedOp(long timestamp) { > -lastTimestampForAge = timestamp; > -long age = System.currentTimeMillis() - lastTimestampForAge; > +if (lastTimestampForAge != timestamp) { > + lastTimestampForAge = timestamp; > + this.age = System.currentTimeMillis() - lastTimestampForAge; > +} else { > + this.age = 0; > +} > rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); > return age; >} > {code} > detail discussion in [dev@hbase | > http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E > ] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11617) incorrect AgeOfLastAppliedOp and AgeOfLastShippedOp in replication Metrics when no new replication OP
[ https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107741#comment-14107741 ] Demai Ni commented on HBASE-11617: -- [~andrew.purt...@gmail.com], thanks for the ping [~lhofhansl], what's your take on this? maybe we can commit the current fix, and consider to remove .refreshAgeOfLastAppliedOp in a later refactoring effort? > incorrect AgeOfLastAppliedOp and AgeOfLastShippedOp in replication Metrics > when no new replication OP > -- > > Key: HBASE-11617 > URL: https://issues.apache.org/jira/browse/HBASE-11617 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 0.98.2 >Reporter: Demai Ni >Assignee: Demai Ni >Priority: Minor > Fix For: 0.99.0, 2.0.0, 0.98.6 > > Attachments: HBASE-11617-master-v1.patch > > > AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in > the 'replication queue' before it got replicated(aka applied) > {code} > /** >* Set the age of the last applied operation >* >* @param timestamp The timestamp of the last operation applied. >* @return the age that was set >*/ > public long setAgeOfLastAppliedOp(long timestamp) { > lastTimestampForAge = timestamp; > long age = System.currentTimeMillis() - lastTimestampForAge; > rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); > return age; > } > {code} > In the following scenario: > 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is > set for example 100ms; > 2) and then NO new Sink op occur. > 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of > return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, > It was because that refreshAgeOfLastAppliedOp() get invoked periodically by > getStats(). > proposed fix: > {code} > --- > hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java > +++ > hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java > @@ -35,6 +35,7 @@ public class MetricsSink { > >private MetricsReplicationSource rms; >private long lastTimestampForAge = System.currentTimeMillis(); > + private long age = 0; > >public MetricsSink() { > rms = > CompatibilitySingletonFactory.getInstance(MetricsReplicationSource.class); > @@ -47,8 +48,12 @@ public class MetricsSink { > * @return the age that was set > */ >public long setAgeOfLastAppliedOp(long timestamp) { > -lastTimestampForAge = timestamp; > -long age = System.currentTimeMillis() - lastTimestampForAge; > +if (lastTimestampForAge != timestamp) { > + lastTimestampForAge = timestamp; > + this.age = System.currentTimeMillis() - lastTimestampForAge; > +} else { > + this.age = 0; > +} > rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); > return age; >} > {code} > detail discussion in [dev@hbase | > http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E > ] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094323#comment-14094323 ] Demai Ni commented on HBASE-9531: - again, the '-1 lineLengths' are for the generated protobuff code and jruby script, should be ok. [~apurtell],[~enis], does the new patch match your suggestions? thanks... Demai > a command line (hbase shell) interface to retreive the replication metrics > and show replication lag > --- > > Key: HBASE-9531 > URL: https://issues.apache.org/jira/browse/HBASE-9531 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 0.99.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.6 > > Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, > HBASE-9531-master-v1.patch, HBASE-9531-master-v2.patch, > HBASE-9531-master-v3.patch, HBASE-9531-trunk-v0.patch, > HBASE-9531-trunk-v0.patch > > > This jira is to provide a command line (hbase shell) interface to retreive > the replication metrics info such as:ageOfLastShippedOp, > timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and > timeStampsOfLastAppliedOp. And also to provide a point of time info of the > lag of replication(source only) > Understand that hbase is using Hadoop > metrics(http://hbase.apache.org/metrics.html), which is a common way to > monitor metric info. This Jira is to serve as a light-weight client > interface, comparing to a completed(certainly better, but heavier)GUI > monitoring package. I made the code works on 0.94.9 now, and like to use this > jira to get opinions about whether the feature is valuable to other > users/workshop. If so, I will build a trunk patch. > All inputs are greatly appreciated. Thank you! > The overall design is to reuse the existing logic which supports hbase shell > command 'status', and invent a new module, called ReplicationLoad. In > HRegionServer.buildServerLoad() , use the local replication service objects > to get their loads which could be wrapped in a ReplicationLoad object and > then simply pass it to the ServerLoad. In ReplicationSourceMetrics and > ReplicationSinkMetrics, a few getters and setters will be created, and ask > Replication to build a "ReplicationLoad". (many thanks to Jean-Daniel for > his kindly suggestions through dev email list) > the replication lag will be calculated for source only, and use this formula: > {code:title=Replication lag|borderStyle=solid} > if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - > timeStampsOfLastShippedOp)) //err on the large side > else if (current time - timeStampsOfLastShippedOp) < 2* > ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen > recently > else lag = 0 // last shipped may happens last night, so NO real lag > although ageOfLastShippedOp is non-zero > {code} > External will look something like: > {code:title=status 'replication'|borderStyle=solid} > hbase(main):001:0> status 'replication' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):002:0> status 'replication','source' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','sink' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.
[jira] [Updated] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-9531: Attachment: HBASE-9531-master-v3.patch upload v3 patch. [~apurtell], can you please take a look again? Originally, I added the ReplicationLoadSink and ReplicationLoadSource inside ReplicationLoad.java. However, it turns out that ProtobufUtil.java can't import ReplicationLoad directly as it is under hbase-server. So I just put the two new files under hbase-client. > a command line (hbase shell) interface to retreive the replication metrics > and show replication lag > --- > > Key: HBASE-9531 > URL: https://issues.apache.org/jira/browse/HBASE-9531 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 0.99.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.6 > > Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, > HBASE-9531-master-v1.patch, HBASE-9531-master-v2.patch, > HBASE-9531-master-v3.patch, HBASE-9531-trunk-v0.patch, > HBASE-9531-trunk-v0.patch > > > This jira is to provide a command line (hbase shell) interface to retreive > the replication metrics info such as:ageOfLastShippedOp, > timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and > timeStampsOfLastAppliedOp. And also to provide a point of time info of the > lag of replication(source only) > Understand that hbase is using Hadoop > metrics(http://hbase.apache.org/metrics.html), which is a common way to > monitor metric info. This Jira is to serve as a light-weight client > interface, comparing to a completed(certainly better, but heavier)GUI > monitoring package. I made the code works on 0.94.9 now, and like to use this > jira to get opinions about whether the feature is valuable to other > users/workshop. If so, I will build a trunk patch. > All inputs are greatly appreciated. Thank you! > The overall design is to reuse the existing logic which supports hbase shell > command 'status', and invent a new module, called ReplicationLoad. In > HRegionServer.buildServerLoad() , use the local replication service objects > to get their loads which could be wrapped in a ReplicationLoad object and > then simply pass it to the ServerLoad. In ReplicationSourceMetrics and > ReplicationSinkMetrics, a few getters and setters will be created, and ask > Replication to build a "ReplicationLoad". (many thanks to Jean-Daniel for > his kindly suggestions through dev email list) > the replication lag will be calculated for source only, and use this formula: > {code:title=Replication lag|borderStyle=solid} > if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - > timeStampsOfLastShippedOp)) //err on the large side > else if (current time - timeStampsOfLastShippedOp) < 2* > ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen > recently > else lag = 0 // last shipped may happens last night, so NO real lag > although ageOfLastShippedOp is non-zero > {code} > External will look something like: > {code:title=status 'replication'|borderStyle=solid} > hbase(main):001:0> status 'replication' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):002:0> status 'replication','source' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','sink' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtes
[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081731#comment-14081731 ] Demai Ni commented on HBASE-9531: - [~apurtell], thanks for the tip. I will be out of town for a week and no access to the enviroment. So change the target to 98.6 due to the delay on the new patch and miss the 98.5 cut time.. Demai > a command line (hbase shell) interface to retreive the replication metrics > and show replication lag > --- > > Key: HBASE-9531 > URL: https://issues.apache.org/jira/browse/HBASE-9531 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 0.99.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.6 > > Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, > HBASE-9531-master-v1.patch, HBASE-9531-master-v2.patch, > HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch > > > This jira is to provide a command line (hbase shell) interface to retreive > the replication metrics info such as:ageOfLastShippedOp, > timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and > timeStampsOfLastAppliedOp. And also to provide a point of time info of the > lag of replication(source only) > Understand that hbase is using Hadoop > metrics(http://hbase.apache.org/metrics.html), which is a common way to > monitor metric info. This Jira is to serve as a light-weight client > interface, comparing to a completed(certainly better, but heavier)GUI > monitoring package. I made the code works on 0.94.9 now, and like to use this > jira to get opinions about whether the feature is valuable to other > users/workshop. If so, I will build a trunk patch. > All inputs are greatly appreciated. Thank you! > The overall design is to reuse the existing logic which supports hbase shell > command 'status', and invent a new module, called ReplicationLoad. In > HRegionServer.buildServerLoad() , use the local replication service objects > to get their loads which could be wrapped in a ReplicationLoad object and > then simply pass it to the ServerLoad. In ReplicationSourceMetrics and > ReplicationSinkMetrics, a few getters and setters will be created, and ask > Replication to build a "ReplicationLoad". (many thanks to Jean-Daniel for > his kindly suggestions through dev email list) > the replication lag will be calculated for source only, and use this formula: > {code:title=Replication lag|borderStyle=solid} > if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - > timeStampsOfLastShippedOp)) //err on the large side > else if (current time - timeStampsOfLastShippedOp) < 2* > ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen > recently > else lag = 0 // last shipped may happens last night, so NO real lag > although ageOfLastShippedOp is non-zero > {code} > External will look something like: > {code:title=status 'replication'|borderStyle=solid} > hbase(main):001:0> status 'replication' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):002:0> status 'replication','source' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','sink' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: >
[jira] [Updated] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-9531: Fix Version/s: (was: 0.98.5) 0.98.6 > a command line (hbase shell) interface to retreive the replication metrics > and show replication lag > --- > > Key: HBASE-9531 > URL: https://issues.apache.org/jira/browse/HBASE-9531 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 0.99.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.6 > > Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, > HBASE-9531-master-v1.patch, HBASE-9531-master-v2.patch, > HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch > > > This jira is to provide a command line (hbase shell) interface to retreive > the replication metrics info such as:ageOfLastShippedOp, > timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and > timeStampsOfLastAppliedOp. And also to provide a point of time info of the > lag of replication(source only) > Understand that hbase is using Hadoop > metrics(http://hbase.apache.org/metrics.html), which is a common way to > monitor metric info. This Jira is to serve as a light-weight client > interface, comparing to a completed(certainly better, but heavier)GUI > monitoring package. I made the code works on 0.94.9 now, and like to use this > jira to get opinions about whether the feature is valuable to other > users/workshop. If so, I will build a trunk patch. > All inputs are greatly appreciated. Thank you! > The overall design is to reuse the existing logic which supports hbase shell > command 'status', and invent a new module, called ReplicationLoad. In > HRegionServer.buildServerLoad() , use the local replication service objects > to get their loads which could be wrapped in a ReplicationLoad object and > then simply pass it to the ServerLoad. In ReplicationSourceMetrics and > ReplicationSinkMetrics, a few getters and setters will be created, and ask > Replication to build a "ReplicationLoad". (many thanks to Jean-Daniel for > his kindly suggestions through dev email list) > the replication lag will be calculated for source only, and use this formula: > {code:title=Replication lag|borderStyle=solid} > if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - > timeStampsOfLastShippedOp)) //err on the large side > else if (current time - timeStampsOfLastShippedOp) < 2* > ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen > recently > else lag = 0 // last shipped may happens last night, so NO real lag > although ageOfLastShippedOp is non-zero > {code} > External will look something like: > {code:title=status 'replication'|borderStyle=solid} > hbase(main):001:0> status 'replication' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):002:0> status 'replication','source' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','sink' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','lag' > version 0.94.9 > 3 live servers > hdtest017.svl.
[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081485#comment-14081485 ] Demai Ni commented on HBASE-9531: - [~apurtell] so we shouldn't expose protobuf objects to client (such as hbase shell) directly? I was considering to implement gettings for each individual value in *ReplicationLoad*, such as *sink.getAgeOfLastAppliedOp() *sink.getTimeStampsOfLastAppliedOp() however, when I implemented source, it become too complex(as it is a list, so each value will become something like *HashMap getAgeOfLastShippedOp() {} *HashMap getSizeOfLogQueue(){} *HashMap getTimeStampsOfLastShippedOp(){} *HashMap getReplicationLag(){} I feel it is kind of over-engineered. Hence, the current implementation. Demai > a command line (hbase shell) interface to retreive the replication metrics > and show replication lag > --- > > Key: HBASE-9531 > URL: https://issues.apache.org/jira/browse/HBASE-9531 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 0.99.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.5 > > Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, > HBASE-9531-master-v1.patch, HBASE-9531-master-v2.patch, > HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch > > > This jira is to provide a command line (hbase shell) interface to retreive > the replication metrics info such as:ageOfLastShippedOp, > timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and > timeStampsOfLastAppliedOp. And also to provide a point of time info of the > lag of replication(source only) > Understand that hbase is using Hadoop > metrics(http://hbase.apache.org/metrics.html), which is a common way to > monitor metric info. This Jira is to serve as a light-weight client > interface, comparing to a completed(certainly better, but heavier)GUI > monitoring package. I made the code works on 0.94.9 now, and like to use this > jira to get opinions about whether the feature is valuable to other > users/workshop. If so, I will build a trunk patch. > All inputs are greatly appreciated. Thank you! > The overall design is to reuse the existing logic which supports hbase shell > command 'status', and invent a new module, called ReplicationLoad. In > HRegionServer.buildServerLoad() , use the local replication service objects > to get their loads which could be wrapped in a ReplicationLoad object and > then simply pass it to the ServerLoad. In ReplicationSourceMetrics and > ReplicationSinkMetrics, a few getters and setters will be created, and ask > Replication to build a "ReplicationLoad". (many thanks to Jean-Daniel for > his kindly suggestions through dev email list) > the replication lag will be calculated for source only, and use this formula: > {code:title=Replication lag|borderStyle=solid} > if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - > timeStampsOfLastShippedOp)) //err on the large side > else if (current time - timeStampsOfLastShippedOp) < 2* > ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen > recently > else lag = 0 // last shipped may happens last night, so NO real lag > although ageOfLastShippedOp is non-zero > {code} > External will look something like: > {code:title=status 'replication'|borderStyle=solid} > hbase(main):001:0> status 'replication' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):002:0> status 'replication','source' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastSh
[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080992#comment-14080992 ] Demai Ni commented on HBASE-9531: - bq. -1 lineLengths. The patch introduces the following lines longer than 100: the long lines come from either the generated protobuf code or the jruby hbase shell code, both have existing long lines already the failed testcases show up in other jiras' recently, and are not related with this patch [~apurtell] and [~enis], would you please take another look at this new patch? Thanks. BTW, I am working on another small fix [HBASE-11617 | https://issues.apache.org/jira/browse/HBASE-11617]. the patches conflict with each other. I will resolve the conflict after one of the two committed. Demai > a command line (hbase shell) interface to retreive the replication metrics > and show replication lag > --- > > Key: HBASE-9531 > URL: https://issues.apache.org/jira/browse/HBASE-9531 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 0.99.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.5 > > Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, > HBASE-9531-master-v1.patch, HBASE-9531-master-v2.patch, > HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch > > > This jira is to provide a command line (hbase shell) interface to retreive > the replication metrics info such as:ageOfLastShippedOp, > timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and > timeStampsOfLastAppliedOp. And also to provide a point of time info of the > lag of replication(source only) > Understand that hbase is using Hadoop > metrics(http://hbase.apache.org/metrics.html), which is a common way to > monitor metric info. This Jira is to serve as a light-weight client > interface, comparing to a completed(certainly better, but heavier)GUI > monitoring package. I made the code works on 0.94.9 now, and like to use this > jira to get opinions about whether the feature is valuable to other > users/workshop. If so, I will build a trunk patch. > All inputs are greatly appreciated. Thank you! > The overall design is to reuse the existing logic which supports hbase shell > command 'status', and invent a new module, called ReplicationLoad. In > HRegionServer.buildServerLoad() , use the local replication service objects > to get their loads which could be wrapped in a ReplicationLoad object and > then simply pass it to the ServerLoad. In ReplicationSourceMetrics and > ReplicationSinkMetrics, a few getters and setters will be created, and ask > Replication to build a "ReplicationLoad". (many thanks to Jean-Daniel for > his kindly suggestions through dev email list) > the replication lag will be calculated for source only, and use this formula: > {code:title=Replication lag|borderStyle=solid} > if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - > timeStampsOfLastShippedOp)) //err on the large side > else if (current time - timeStampsOfLastShippedOp) < 2* > ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen > recently > else lag = 0 // last shipped may happens last night, so NO real lag > although ageOfLastShippedOp is non-zero > {code} > External will look something like: > {code:title=status 'replication'|borderStyle=solid} > hbase(main):001:0> status 'replication' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):002:0> status 'replication','source' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShi
[jira] [Updated] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-9531: Attachment: HBASE-9531-master-v2.patch uploaded v2 patch for master which is changed according to [~enis] suggestion > a command line (hbase shell) interface to retreive the replication metrics > and show replication lag > --- > > Key: HBASE-9531 > URL: https://issues.apache.org/jira/browse/HBASE-9531 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 0.99.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.5 > > Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, > HBASE-9531-master-v1.patch, HBASE-9531-master-v2.patch, > HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch > > > This jira is to provide a command line (hbase shell) interface to retreive > the replication metrics info such as:ageOfLastShippedOp, > timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and > timeStampsOfLastAppliedOp. And also to provide a point of time info of the > lag of replication(source only) > Understand that hbase is using Hadoop > metrics(http://hbase.apache.org/metrics.html), which is a common way to > monitor metric info. This Jira is to serve as a light-weight client > interface, comparing to a completed(certainly better, but heavier)GUI > monitoring package. I made the code works on 0.94.9 now, and like to use this > jira to get opinions about whether the feature is valuable to other > users/workshop. If so, I will build a trunk patch. > All inputs are greatly appreciated. Thank you! > The overall design is to reuse the existing logic which supports hbase shell > command 'status', and invent a new module, called ReplicationLoad. In > HRegionServer.buildServerLoad() , use the local replication service objects > to get their loads which could be wrapped in a ReplicationLoad object and > then simply pass it to the ServerLoad. In ReplicationSourceMetrics and > ReplicationSinkMetrics, a few getters and setters will be created, and ask > Replication to build a "ReplicationLoad". (many thanks to Jean-Daniel for > his kindly suggestions through dev email list) > the replication lag will be calculated for source only, and use this formula: > {code:title=Replication lag|borderStyle=solid} > if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - > timeStampsOfLastShippedOp)) //err on the large side > else if (current time - timeStampsOfLastShippedOp) < 2* > ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen > recently > else lag = 0 // last shipped may happens last night, so NO real lag > although ageOfLastShippedOp is non-zero > {code} > External will look something like: > {code:title=status 'replication'|borderStyle=solid} > hbase(main):001:0> status 'replication' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):002:0> status 'replication','source' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','sink' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','l
[jira] [Commented] (HBASE-11617) incorrect AgeOfLastAppliedOp and AgeOfLastShippedOp in replication Metrics when no new replication OP
[ https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080269#comment-14080269 ] Demai Ni commented on HBASE-11617: -- don't think the failed testcases are related with this patch. the same failures also show up in other jiras from recent testing > incorrect AgeOfLastAppliedOp and AgeOfLastShippedOp in replication Metrics > when no new replication OP > -- > > Key: HBASE-11617 > URL: https://issues.apache.org/jira/browse/HBASE-11617 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 0.98.2 >Reporter: Demai Ni >Assignee: Demai Ni >Priority: Minor > Fix For: 0.99.0, 0.98.5, 2.0.0 > > Attachments: HBASE-11617-master-v1.patch > > > AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in > the 'replication queue' before it got replicated(aka applied) > {code} > /** >* Set the age of the last applied operation >* >* @param timestamp The timestamp of the last operation applied. >* @return the age that was set >*/ > public long setAgeOfLastAppliedOp(long timestamp) { > lastTimestampForAge = timestamp; > long age = System.currentTimeMillis() - lastTimestampForAge; > rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); > return age; > } > {code} > In the following scenario: > 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is > set for example 100ms; > 2) and then NO new Sink op occur. > 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of > return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, > It was because that refreshAgeOfLastAppliedOp() get invoked periodically by > getStats(). > proposed fix: > {code} > --- > hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java > +++ > hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java > @@ -35,6 +35,7 @@ public class MetricsSink { > >private MetricsReplicationSource rms; >private long lastTimestampForAge = System.currentTimeMillis(); > + private long age = 0; > >public MetricsSink() { > rms = > CompatibilitySingletonFactory.getInstance(MetricsReplicationSource.class); > @@ -47,8 +48,12 @@ public class MetricsSink { > * @return the age that was set > */ >public long setAgeOfLastAppliedOp(long timestamp) { > -lastTimestampForAge = timestamp; > -long age = System.currentTimeMillis() - lastTimestampForAge; > +if (lastTimestampForAge != timestamp) { > + lastTimestampForAge = timestamp; > + this.age = System.currentTimeMillis() - lastTimestampForAge; > +} else { > + this.age = 0; > +} > rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); > return age; >} > {code} > detail discussion in [dev@hbase | > http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E > ] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11617) incorrect AgeOfLastAppliedOp and AgeOfLastShippedOp in replication Metrics when no new replication OP
[ https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080026#comment-14080026 ] Demai Ni commented on HBASE-11617: -- btw, with this patch, I am not sure what the purpose of MetricsSink.refreshAgeOfLastAppliedOp() ? As it will be ignored and always return age = 0; > incorrect AgeOfLastAppliedOp and AgeOfLastShippedOp in replication Metrics > when no new replication OP > -- > > Key: HBASE-11617 > URL: https://issues.apache.org/jira/browse/HBASE-11617 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 0.98.2 >Reporter: Demai Ni >Assignee: Demai Ni >Priority: Minor > Fix For: 0.99.0, 0.98.5, 2.0.0 > > Attachments: HBASE-11617-master-v1.patch > > > AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in > the 'replication queue' before it got replicated(aka applied) > {code} > /** >* Set the age of the last applied operation >* >* @param timestamp The timestamp of the last operation applied. >* @return the age that was set >*/ > public long setAgeOfLastAppliedOp(long timestamp) { > lastTimestampForAge = timestamp; > long age = System.currentTimeMillis() - lastTimestampForAge; > rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); > return age; > } > {code} > In the following scenario: > 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is > set for example 100ms; > 2) and then NO new Sink op occur. > 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of > return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, > It was because that refreshAgeOfLastAppliedOp() get invoked periodically by > getStats(). > proposed fix: > {code} > --- > hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java > +++ > hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java > @@ -35,6 +35,7 @@ public class MetricsSink { > >private MetricsReplicationSource rms; >private long lastTimestampForAge = System.currentTimeMillis(); > + private long age = 0; > >public MetricsSink() { > rms = > CompatibilitySingletonFactory.getInstance(MetricsReplicationSource.class); > @@ -47,8 +48,12 @@ public class MetricsSink { > * @return the age that was set > */ >public long setAgeOfLastAppliedOp(long timestamp) { > -lastTimestampForAge = timestamp; > -long age = System.currentTimeMillis() - lastTimestampForAge; > +if (lastTimestampForAge != timestamp) { > + lastTimestampForAge = timestamp; > + this.age = System.currentTimeMillis() - lastTimestampForAge; > +} else { > + this.age = 0; > +} > rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); > return age; >} > {code} > detail discussion in [dev@hbase | > http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E > ] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11143) Improve replication metrics
[ https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080018#comment-14080018 ] Demai Ni commented on HBASE-11143: -- thanks to [~lhofhansl]'s suggestion, the patch is uploaded in [HBASE-11617 | https://issues.apache.org/jira/browse/HBASE-11617] > Improve replication metrics > --- > > Key: HBASE-11143 > URL: https://issues.apache.org/jira/browse/HBASE-11143 > Project: HBase > Issue Type: Bug > Components: Replication >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Fix For: 0.99.0, 0.94.20, 0.98.3 > > Attachments: 11143-0.94-v2.txt, 11143-0.94-v3.txt, 11143-0.94.txt, > 11143-trunk.txt > > > We are trying to report on replication lag and find that there is no good > single metric to do that. > ageOfLastShippedOp is close, but unfortunately it is increased even when > there is nothing to ship on a particular RegionServer. > I would like discuss a few options here: > Add a new metric: replicationQueueTime (or something) with the above meaning. > I.e. if we have something to ship we set the age of that last shipped edit, > if we fail we increment that last time (just like we do now). But if there is > nothing to replicate we set it to current time (and hence that metric is > reported to close to 0). > Alternatively we could change the meaning of ageOfLastShippedOp to mean to do > that. That might lead to surprises, but the current behavior is clearly weird > when there is nothing to replicate. > Comments? [~jdcryans], [~stack]. > If approach sounds good, I'll make a patch for all branches. > Edit: Also adds a new shippedKBs metric to track the amount of data that is > shipped via replication. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11617) incorrect AgeOfLastAppliedOp and AgeOfLastShippedOp in replication Metrics when no new replication OP
[ https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11617: - Status: Patch Available (was: In Progress) [~lhofhansl], would you please take a look at the patch, whether it matches your take? thanks > incorrect AgeOfLastAppliedOp and AgeOfLastShippedOp in replication Metrics > when no new replication OP > -- > > Key: HBASE-11617 > URL: https://issues.apache.org/jira/browse/HBASE-11617 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 0.98.2 >Reporter: Demai Ni >Assignee: Demai Ni >Priority: Minor > Fix For: 0.99.0, 0.98.5, 2.0.0 > > Attachments: HBASE-11617-master-v1.patch > > > AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in > the 'replication queue' before it got replicated(aka applied) > {code} > /** >* Set the age of the last applied operation >* >* @param timestamp The timestamp of the last operation applied. >* @return the age that was set >*/ > public long setAgeOfLastAppliedOp(long timestamp) { > lastTimestampForAge = timestamp; > long age = System.currentTimeMillis() - lastTimestampForAge; > rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); > return age; > } > {code} > In the following scenario: > 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is > set for example 100ms; > 2) and then NO new Sink op occur. > 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of > return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, > It was because that refreshAgeOfLastAppliedOp() get invoked periodically by > getStats(). > proposed fix: > {code} > --- > hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java > +++ > hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java > @@ -35,6 +35,7 @@ public class MetricsSink { > >private MetricsReplicationSource rms; >private long lastTimestampForAge = System.currentTimeMillis(); > + private long age = 0; > >public MetricsSink() { > rms = > CompatibilitySingletonFactory.getInstance(MetricsReplicationSource.class); > @@ -47,8 +48,12 @@ public class MetricsSink { > * @return the age that was set > */ >public long setAgeOfLastAppliedOp(long timestamp) { > -lastTimestampForAge = timestamp; > -long age = System.currentTimeMillis() - lastTimestampForAge; > +if (lastTimestampForAge != timestamp) { > + lastTimestampForAge = timestamp; > + this.age = System.currentTimeMillis() - lastTimestampForAge; > +} else { > + this.age = 0; > +} > rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); > return age; >} > {code} > detail discussion in [dev@hbase | > http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E > ] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11617) incorrect AgeOfLastAppliedOp and AgeOfLastShippedOp in replication Metrics when no new replication OP
[ https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11617: - Summary: incorrect AgeOfLastAppliedOp and AgeOfLastShippedOp in replication Metrics when no new replication OP (was: AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink OP ) > incorrect AgeOfLastAppliedOp and AgeOfLastShippedOp in replication Metrics > when no new replication OP > -- > > Key: HBASE-11617 > URL: https://issues.apache.org/jira/browse/HBASE-11617 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 0.98.2 >Reporter: Demai Ni >Assignee: Demai Ni >Priority: Minor > Fix For: 0.99.0, 0.98.5, 2.0.0 > > Attachments: HBASE-11617-master-v1.patch > > > AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in > the 'replication queue' before it got replicated(aka applied) > {code} > /** >* Set the age of the last applied operation >* >* @param timestamp The timestamp of the last operation applied. >* @return the age that was set >*/ > public long setAgeOfLastAppliedOp(long timestamp) { > lastTimestampForAge = timestamp; > long age = System.currentTimeMillis() - lastTimestampForAge; > rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); > return age; > } > {code} > In the following scenario: > 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is > set for example 100ms; > 2) and then NO new Sink op occur. > 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of > return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, > It was because that refreshAgeOfLastAppliedOp() get invoked periodically by > getStats(). > proposed fix: > {code} > --- > hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java > +++ > hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java > @@ -35,6 +35,7 @@ public class MetricsSink { > >private MetricsReplicationSource rms; >private long lastTimestampForAge = System.currentTimeMillis(); > + private long age = 0; > >public MetricsSink() { > rms = > CompatibilitySingletonFactory.getInstance(MetricsReplicationSource.class); > @@ -47,8 +48,12 @@ public class MetricsSink { > * @return the age that was set > */ >public long setAgeOfLastAppliedOp(long timestamp) { > -lastTimestampForAge = timestamp; > -long age = System.currentTimeMillis() - lastTimestampForAge; > +if (lastTimestampForAge != timestamp) { > + lastTimestampForAge = timestamp; > + this.age = System.currentTimeMillis() - lastTimestampForAge; > +} else { > + this.age = 0; > +} > rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); > return age; >} > {code} > detail discussion in [dev@hbase | > http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E > ] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11617) AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink OP
[ https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11617: - Attachment: HBASE-11617-master-v1.patch upload the patch for both AgeOfLastAppliedOp and AgeOfLatShippedOp(from [HBase-11143 | https://issues.apache.org/jira/browse/HBASE-11143] ) > AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink > OP > > > Key: HBASE-11617 > URL: https://issues.apache.org/jira/browse/HBASE-11617 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 0.98.2 >Reporter: Demai Ni >Assignee: Demai Ni >Priority: Minor > Fix For: 0.99.0, 0.98.5, 2.0.0 > > Attachments: HBASE-11617-master-v1.patch > > > AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in > the 'replication queue' before it got replicated(aka applied) > {code} > /** >* Set the age of the last applied operation >* >* @param timestamp The timestamp of the last operation applied. >* @return the age that was set >*/ > public long setAgeOfLastAppliedOp(long timestamp) { > lastTimestampForAge = timestamp; > long age = System.currentTimeMillis() - lastTimestampForAge; > rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); > return age; > } > {code} > In the following scenario: > 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is > set for example 100ms; > 2) and then NO new Sink op occur. > 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of > return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, > It was because that refreshAgeOfLastAppliedOp() get invoked periodically by > getStats(). > proposed fix: > {code} > --- > hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java > +++ > hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java > @@ -35,6 +35,7 @@ public class MetricsSink { > >private MetricsReplicationSource rms; >private long lastTimestampForAge = System.currentTimeMillis(); > + private long age = 0; > >public MetricsSink() { > rms = > CompatibilitySingletonFactory.getInstance(MetricsReplicationSource.class); > @@ -47,8 +48,12 @@ public class MetricsSink { > * @return the age that was set > */ >public long setAgeOfLastAppliedOp(long timestamp) { > -lastTimestampForAge = timestamp; > -long age = System.currentTimeMillis() - lastTimestampForAge; > +if (lastTimestampForAge != timestamp) { > + lastTimestampForAge = timestamp; > + this.age = System.currentTimeMillis() - lastTimestampForAge; > +} else { > + this.age = 0; > +} > rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); > return age; >} > {code} > detail discussion in [dev@hbase | > http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E > ] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Work started] (HBASE-11617) AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink OP
[ https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-11617 started by Demai Ni. > AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink > OP > > > Key: HBASE-11617 > URL: https://issues.apache.org/jira/browse/HBASE-11617 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 0.98.2 >Reporter: Demai Ni >Assignee: Demai Ni >Priority: Minor > Fix For: 0.99.0, 0.98.5, 2.0.0 > > > AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in > the 'replication queue' before it got replicated(aka applied) > {code} > /** >* Set the age of the last applied operation >* >* @param timestamp The timestamp of the last operation applied. >* @return the age that was set >*/ > public long setAgeOfLastAppliedOp(long timestamp) { > lastTimestampForAge = timestamp; > long age = System.currentTimeMillis() - lastTimestampForAge; > rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); > return age; > } > {code} > In the following scenario: > 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is > set for example 100ms; > 2) and then NO new Sink op occur. > 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of > return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, > It was because that refreshAgeOfLastAppliedOp() get invoked periodically by > getStats(). > proposed fix: > {code} > --- > hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java > +++ > hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java > @@ -35,6 +35,7 @@ public class MetricsSink { > >private MetricsReplicationSource rms; >private long lastTimestampForAge = System.currentTimeMillis(); > + private long age = 0; > >public MetricsSink() { > rms = > CompatibilitySingletonFactory.getInstance(MetricsReplicationSource.class); > @@ -47,8 +48,12 @@ public class MetricsSink { > * @return the age that was set > */ >public long setAgeOfLastAppliedOp(long timestamp) { > -lastTimestampForAge = timestamp; > -long age = System.currentTimeMillis() - lastTimestampForAge; > +if (lastTimestampForAge != timestamp) { > + lastTimestampForAge = timestamp; > + this.age = System.currentTimeMillis() - lastTimestampForAge; > +} else { > + this.age = 0; > +} > rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); > return age; >} > {code} > detail discussion in [dev@hbase | > http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E > ] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11617) AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink OP
[ https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079818#comment-14079818 ] Demai Ni commented on HBASE-11617: -- actually, putting the checking in MetricsSink.refreshAgeOfLastAppliedOp() may be better? > AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink > OP > > > Key: HBASE-11617 > URL: https://issues.apache.org/jira/browse/HBASE-11617 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 0.98.2 >Reporter: Demai Ni >Assignee: Demai Ni >Priority: Minor > Fix For: 0.99.0, 0.98.5, 2.0.0 > > > AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in > the 'replication queue' before it got replicated(aka applied) > {code} > /** >* Set the age of the last applied operation >* >* @param timestamp The timestamp of the last operation applied. >* @return the age that was set >*/ > public long setAgeOfLastAppliedOp(long timestamp) { > lastTimestampForAge = timestamp; > long age = System.currentTimeMillis() - lastTimestampForAge; > rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); > return age; > } > {code} > In the following scenario: > 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is > set for example 100ms; > 2) and then NO new Sink op occur. > 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of > return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, > It was because that refreshAgeOfLastAppliedOp() get invoked periodically by > getStats(). > proposed fix: > {code} > --- > hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java > +++ > hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java > @@ -35,6 +35,7 @@ public class MetricsSink { > >private MetricsReplicationSource rms; >private long lastTimestampForAge = System.currentTimeMillis(); > + private long age = 0; > >public MetricsSink() { > rms = > CompatibilitySingletonFactory.getInstance(MetricsReplicationSource.class); > @@ -47,8 +48,12 @@ public class MetricsSink { > * @return the age that was set > */ >public long setAgeOfLastAppliedOp(long timestamp) { > -lastTimestampForAge = timestamp; > -long age = System.currentTimeMillis() - lastTimestampForAge; > +if (lastTimestampForAge != timestamp) { > + lastTimestampForAge = timestamp; > + this.age = System.currentTimeMillis() - lastTimestampForAge; > +} else { > + this.age = 0; > +} > rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); > return age; >} > {code} > detail discussion in [dev@hbase | > http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E > ] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11617) AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink OP
[ https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079784#comment-14079784 ] Demai Ni commented on HBASE-11617: -- [~lhofhansl], thanks for confirming the problem. bq. Can we just not refresh from getStats? That way the metric retains the value it was last set to by ReplicationSink. I am not sure how to stop refresh getStats(), it is a public method, which can be invoke by other application. And it is also invoked by ReplicationStatisticsThread. Also the invocation won't pass in a parm to check whether a refresh is needed. Suggestions? Demai > AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink > OP > > > Key: HBASE-11617 > URL: https://issues.apache.org/jira/browse/HBASE-11617 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 0.98.2 >Reporter: Demai Ni >Assignee: Demai Ni >Priority: Minor > Fix For: 0.99.0, 0.98.5, 2.0.0 > > > AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in > the 'replication queue' before it got replicated(aka applied) > {code} > /** >* Set the age of the last applied operation >* >* @param timestamp The timestamp of the last operation applied. >* @return the age that was set >*/ > public long setAgeOfLastAppliedOp(long timestamp) { > lastTimestampForAge = timestamp; > long age = System.currentTimeMillis() - lastTimestampForAge; > rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); > return age; > } > {code} > In the following scenario: > 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is > set for example 100ms; > 2) and then NO new Sink op occur. > 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of > return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, > It was because that refreshAgeOfLastAppliedOp() get invoked periodically by > getStats(). > proposed fix: > {code} > --- > hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java > +++ > hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java > @@ -35,6 +35,7 @@ public class MetricsSink { > >private MetricsReplicationSource rms; >private long lastTimestampForAge = System.currentTimeMillis(); > + private long age = 0; > >public MetricsSink() { > rms = > CompatibilitySingletonFactory.getInstance(MetricsReplicationSource.class); > @@ -47,8 +48,12 @@ public class MetricsSink { > * @return the age that was set > */ >public long setAgeOfLastAppliedOp(long timestamp) { > -lastTimestampForAge = timestamp; > -long age = System.currentTimeMillis() - lastTimestampForAge; > +if (lastTimestampForAge != timestamp) { > + lastTimestampForAge = timestamp; > + this.age = System.currentTimeMillis() - lastTimestampForAge; > +} else { > + this.age = 0; > +} > rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); > return age; >} > {code} > detail discussion in [dev@hbase | > http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E > ] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11617) AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink OP
[ https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11617: - Description: AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in the 'replication queue' before it got replicated(aka applied) {code} /** * Set the age of the last applied operation * * @param timestamp The timestamp of the last operation applied. * @return the age that was set */ public long setAgeOfLastAppliedOp(long timestamp) { lastTimestampForAge = timestamp; long age = System.currentTimeMillis() - lastTimestampForAge; rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); return age; } {code} In the following scenario: 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is set for example 100ms; 2) and then NO new Sink op occur. 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, It was because that refreshAgeOfLastAppliedOp() get invoked periodically by getStats(). proposed fix: {code} --- hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java +++ hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java @@ -35,6 +35,7 @@ public class MetricsSink { private MetricsReplicationSource rms; private long lastTimestampForAge = System.currentTimeMillis(); + private long age = 0; public MetricsSink() { rms = CompatibilitySingletonFactory.getInstance(MetricsReplicationSource.class); @@ -47,8 +48,12 @@ public class MetricsSink { * @return the age that was set */ public long setAgeOfLastAppliedOp(long timestamp) { -lastTimestampForAge = timestamp; -long age = System.currentTimeMillis() - lastTimestampForAge; +if (lastTimestampForAge != timestamp) { + lastTimestampForAge = timestamp; + this.age = System.currentTimeMillis() - lastTimestampForAge; +} else { + this.age = 0; +} rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); return age; } {code} detail discussion in [dev@hbase | http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E ] was: AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in the 'replication queue' before it got replicated(aka applied) {code} /** * Set the age of the last applied operation * * @param timestamp The timestamp of the last operation applied. * @return the age that was set */ public long setAgeOfLastAppliedOp(long timestamp) { lastTimestampForAge = timestamp; long age = System.currentTimeMillis() - lastTimestampForAge; rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); return age; } {code} In the following scenario: 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is set for example 100ms; 2) and then NO new Sink op occur. 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, It was because that refreshAgeOfLastAppliedOp() get invoked periodically by getStats(). proposed fix: {code} // a new value + private long age; public long setAgeOfLastAppliedOp(long timestamp) { + if (lastTimestampForAge != timestamp) { lastTimestampForAge = timestamp; - long age = System.currentTimeMillis() - lastTimestampForAge; +this.age = System.currentTimeMillis() - lastTimestampForAge; rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); + } else { + this.age = 0; // no new Sink OP coming. the last one already applied + } return age; } {code} detail discussion in [dev@hbase | http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E ] > AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink > OP > > > Key: HBASE-11617 > URL: https://issues.apache.org/jira/browse/HBASE-11617 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 0.98.2 >Reporter: Demai Ni >Assignee: Demai Ni >Priority: Minor > Fix For: 0.99.0, 0.98.5, 2.0.0 > > > AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in > the 'replication queue' before it got replicated(aka applied) > {code} > /** >* Set the age of the last applied operation >* >* @param timestamp The timestamp of the last operation applied. >* @return the age that was set >*/ > public long setAgeOfLastAppliedOp(long timestamp) { > lastTimestampForAge = timestamp; > long age = System.currentTimeMi
[jira] [Updated] (HBASE-11617) AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink OP
[ https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11617: - Description: AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in the 'replication queue' before it got replicated(aka applied) {code} /** * Set the age of the last applied operation * * @param timestamp The timestamp of the last operation applied. * @return the age that was set */ public long setAgeOfLastAppliedOp(long timestamp) { lastTimestampForAge = timestamp; long age = System.currentTimeMillis() - lastTimestampForAge; rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); return age; } {code} In the following scenario: 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is set for example 100ms; 2) and then NO new Sink op occur. 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, It was because that refreshAgeOfLastAppliedOp() get invoked periodically by getStats(). proposed fix: {code} // a new value + private long age; public long setAgeOfLastAppliedOp(long timestamp) { + if (lastTimestampForAge != timestamp) { lastTimestampForAge = timestamp; - long age = System.currentTimeMillis() - lastTimestampForAge; +this.age = System.currentTimeMillis() - lastTimestampForAge; rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); + } else { + this.age = 0; // no new Sink OP coming. the last one already applied + } return age; } {code} detail discussion in [dev@hbase | http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E ] was: AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in the 'replication queue' before it got replicated(aka applied) {code} /** * Set the age of the last applied operation * * @param timestamp The timestamp of the last operation applied. * @return the age that was set */ public long setAgeOfLastAppliedOp(long timestamp) { lastTimestampForAge = timestamp; long age = System.currentTimeMillis() - lastTimestampForAge; rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); return age; } {code} In the following scenario: 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is set for example 100ms; 2) and then NO new Sink op occur. 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, It was because that refreshAgeOfLastAppliedOp() get invoked periodically by getStats(). proposed fix: {code} // a new value + private long age; public long setAgeOfLastAppliedOp(long timestamp) { + if (lastTimestampForAge != timestamp) { lastTimestampForAge = timestamp; - long age = System.currentTimeMillis() - lastTimestampForAge; +this.age = System.currentTimeMillis() - lastTimestampForAge; rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); } return age; } {code} detail discussion in [dev@hbase | http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E ] > AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink > OP > > > Key: HBASE-11617 > URL: https://issues.apache.org/jira/browse/HBASE-11617 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 0.98.2 >Reporter: Demai Ni >Assignee: Demai Ni >Priority: Minor > Fix For: 0.99.0, 0.98.5, 2.0.0 > > > AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in > the 'replication queue' before it got replicated(aka applied) > {code} > /** >* Set the age of the last applied operation >* >* @param timestamp The timestamp of the last operation applied. >* @return the age that was set >*/ > public long setAgeOfLastAppliedOp(long timestamp) { > lastTimestampForAge = timestamp; > long age = System.currentTimeMillis() - lastTimestampForAge; > rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); > return age; > } > {code} > In the following scenario: > 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is > set for example 100ms; > 2) and then NO new Sink op occur. > 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of > return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, > It was because that refreshAgeOfLastAppliedOp() get invoked periodically by > getStats(). > proposed fix: > {code} > > // a new value > + private long age; > > public l
[jira] [Updated] (HBASE-11617) AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink OP
[ https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11617: - Environment: (was: AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in the 'replication queue' before it got replicated(aka applied) {code} /** * Set the age of the last applied operation * * @param timestamp The timestamp of the last operation applied. * @return the age that was set */ public long setAgeOfLastAppliedOp(long timestamp) { lastTimestampForAge = timestamp; long age = System.currentTimeMillis() - lastTimestampForAge; rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); return age; } {code} In the following scenario: 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is set for example 100ms; 2) and then NO new Sink op occur. 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, It was because that refreshAgeOfLastAppliedOp() get invoked periodically by getStats(). proposed fix: {code} // a new value + private long age; public long setAgeOfLastAppliedOp(long timestamp) { + if (lastTimestampForAge != timestamp) { lastTimestampForAge = timestamp; - long age = System.currentTimeMillis() - lastTimestampForAge; +this.age = System.currentTimeMillis() - lastTimestampForAge; rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); } return age; } {code} detail discussion in [dev@hbase | http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E ]) > AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink > OP > > > Key: HBASE-11617 > URL: https://issues.apache.org/jira/browse/HBASE-11617 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 0.98.2 >Reporter: Demai Ni >Assignee: Demai Ni >Priority: Minor > Fix For: 0.99.0, 0.98.5, 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11617) AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink OP
Demai Ni created HBASE-11617: Summary: AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink OP Key: HBASE-11617 URL: https://issues.apache.org/jira/browse/HBASE-11617 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.98.2 Environment: AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in the 'replication queue' before it got replicated(aka applied) {code} /** * Set the age of the last applied operation * * @param timestamp The timestamp of the last operation applied. * @return the age that was set */ public long setAgeOfLastAppliedOp(long timestamp) { lastTimestampForAge = timestamp; long age = System.currentTimeMillis() - lastTimestampForAge; rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); return age; } {code} In the following scenario: 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is set for example 100ms; 2) and then NO new Sink op occur. 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, It was because that refreshAgeOfLastAppliedOp() get invoked periodically by getStats(). proposed fix: {code} // a new value + private long age; public long setAgeOfLastAppliedOp(long timestamp) { + if (lastTimestampForAge != timestamp) { lastTimestampForAge = timestamp; - long age = System.currentTimeMillis() - lastTimestampForAge; +this.age = System.currentTimeMillis() - lastTimestampForAge; rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); } return age; } {code} detail discussion in [dev@hbase | http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E ] Reporter: Demai Ni Assignee: Demai Ni Priority: Minor Fix For: 0.99.0, 0.98.5, 2.0.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11617) AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink OP
[ https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11617: - Description: AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in the 'replication queue' before it got replicated(aka applied) {code} /** * Set the age of the last applied operation * * @param timestamp The timestamp of the last operation applied. * @return the age that was set */ public long setAgeOfLastAppliedOp(long timestamp) { lastTimestampForAge = timestamp; long age = System.currentTimeMillis() - lastTimestampForAge; rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); return age; } {code} In the following scenario: 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is set for example 100ms; 2) and then NO new Sink op occur. 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, It was because that refreshAgeOfLastAppliedOp() get invoked periodically by getStats(). proposed fix: {code} // a new value + private long age; public long setAgeOfLastAppliedOp(long timestamp) { + if (lastTimestampForAge != timestamp) { lastTimestampForAge = timestamp; - long age = System.currentTimeMillis() - lastTimestampForAge; +this.age = System.currentTimeMillis() - lastTimestampForAge; rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); } return age; } {code} detail discussion in [dev@hbase | http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E ] > AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink > OP > > > Key: HBASE-11617 > URL: https://issues.apache.org/jira/browse/HBASE-11617 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 0.98.2 >Reporter: Demai Ni >Assignee: Demai Ni >Priority: Minor > Fix For: 0.99.0, 0.98.5, 2.0.0 > > > AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in > the 'replication queue' before it got replicated(aka applied) > {code} > /** >* Set the age of the last applied operation >* >* @param timestamp The timestamp of the last operation applied. >* @return the age that was set >*/ > public long setAgeOfLastAppliedOp(long timestamp) { > lastTimestampForAge = timestamp; > long age = System.currentTimeMillis() - lastTimestampForAge; > rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); > return age; > } > {code} > In the following scenario: > 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is > set for example 100ms; > 2) and then NO new Sink op occur. > 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of > return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, > It was because that refreshAgeOfLastAppliedOp() get invoked periodically by > getStats(). > proposed fix: > {code} > > // a new value > + private long age; > > public long setAgeOfLastAppliedOp(long timestamp) { > + if (lastTimestampForAge != timestamp) { > lastTimestampForAge = timestamp; > - long age = System.currentTimeMillis() - lastTimestampForAge; > +this.age = System.currentTimeMillis() - lastTimestampForAge; > rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); > } > return age; > } > {code} > detail discussion in [dev@hbase | > http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E > ] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073817#comment-14073817 ] Demai Ni commented on HBASE-9531: - [~enis], good point. I will provide a revised patch accordingly demai > a command line (hbase shell) interface to retreive the replication metrics > and show replication lag > --- > > Key: HBASE-9531 > URL: https://issues.apache.org/jira/browse/HBASE-9531 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 0.99.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.5 > > Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, > HBASE-9531-master-v1.patch, HBASE-9531-trunk-v0.patch, > HBASE-9531-trunk-v0.patch > > > This jira is to provide a command line (hbase shell) interface to retreive > the replication metrics info such as:ageOfLastShippedOp, > timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and > timeStampsOfLastAppliedOp. And also to provide a point of time info of the > lag of replication(source only) > Understand that hbase is using Hadoop > metrics(http://hbase.apache.org/metrics.html), which is a common way to > monitor metric info. This Jira is to serve as a light-weight client > interface, comparing to a completed(certainly better, but heavier)GUI > monitoring package. I made the code works on 0.94.9 now, and like to use this > jira to get opinions about whether the feature is valuable to other > users/workshop. If so, I will build a trunk patch. > All inputs are greatly appreciated. Thank you! > The overall design is to reuse the existing logic which supports hbase shell > command 'status', and invent a new module, called ReplicationLoad. In > HRegionServer.buildServerLoad() , use the local replication service objects > to get their loads which could be wrapped in a ReplicationLoad object and > then simply pass it to the ServerLoad. In ReplicationSourceMetrics and > ReplicationSinkMetrics, a few getters and setters will be created, and ask > Replication to build a "ReplicationLoad". (many thanks to Jean-Daniel for > his kindly suggestions through dev email list) > the replication lag will be calculated for source only, and use this formula: > {code:title=Replication lag|borderStyle=solid} > if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - > timeStampsOfLastShippedOp)) //err on the large side > else if (current time - timeStampsOfLastShippedOp) < 2* > ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen > recently > else lag = 0 // last shipped may happens last night, so NO real lag > although ageOfLastShippedOp is non-zero > {code} > External will look something like: > {code:title=status 'replication'|borderStyle=solid} > hbase(main):001:0> status 'replication' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):002:0> status 'replication','source' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','sink' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','lag' > version 0.94
[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073454#comment-14073454 ] Demai Ni commented on HBASE-9531: - the failure "org.apache.hadoop.hbase.io.hfile.TestCacheConfig" show up in a few other patch testing, seems unrelated with this jira/patch > a command line (hbase shell) interface to retreive the replication metrics > and show replication lag > --- > > Key: HBASE-9531 > URL: https://issues.apache.org/jira/browse/HBASE-9531 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 0.99.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.5 > > Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, > HBASE-9531-master-v1.patch, HBASE-9531-trunk-v0.patch, > HBASE-9531-trunk-v0.patch > > > This jira is to provide a command line (hbase shell) interface to retreive > the replication metrics info such as:ageOfLastShippedOp, > timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and > timeStampsOfLastAppliedOp. And also to provide a point of time info of the > lag of replication(source only) > Understand that hbase is using Hadoop > metrics(http://hbase.apache.org/metrics.html), which is a common way to > monitor metric info. This Jira is to serve as a light-weight client > interface, comparing to a completed(certainly better, but heavier)GUI > monitoring package. I made the code works on 0.94.9 now, and like to use this > jira to get opinions about whether the feature is valuable to other > users/workshop. If so, I will build a trunk patch. > All inputs are greatly appreciated. Thank you! > The overall design is to reuse the existing logic which supports hbase shell > command 'status', and invent a new module, called ReplicationLoad. In > HRegionServer.buildServerLoad() , use the local replication service objects > to get their loads which could be wrapped in a ReplicationLoad object and > then simply pass it to the ServerLoad. In ReplicationSourceMetrics and > ReplicationSinkMetrics, a few getters and setters will be created, and ask > Replication to build a "ReplicationLoad". (many thanks to Jean-Daniel for > his kindly suggestions through dev email list) > the replication lag will be calculated for source only, and use this formula: > {code:title=Replication lag|borderStyle=solid} > if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - > timeStampsOfLastShippedOp)) //err on the large side > else if (current time - timeStampsOfLastShippedOp) < 2* > ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen > recently > else lag = 0 // last shipped may happens last night, so NO real lag > although ageOfLastShippedOp is non-zero > {code} > External will look something like: > {code:title=status 'replication'|borderStyle=solid} > hbase(main):001:0> status 'replication' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):002:0> status 'replication','source' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','sink' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 201
[jira] [Updated] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-9531: Attachment: HBASE-9531-master-v1.patch [~apurtell], thanks a lot for the help. I just tried out the patch again which is valid for both 0.98 and master(trunk) branch. So resubmit to HadoopQA. [~enis], how about branch-1? thanks. > a command line (hbase shell) interface to retreive the replication metrics > and show replication lag > --- > > Key: HBASE-9531 > URL: https://issues.apache.org/jira/browse/HBASE-9531 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 0.99.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.5 > > Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, > HBASE-9531-master-v1.patch, HBASE-9531-trunk-v0.patch, > HBASE-9531-trunk-v0.patch > > > This jira is to provide a command line (hbase shell) interface to retreive > the replication metrics info such as:ageOfLastShippedOp, > timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and > timeStampsOfLastAppliedOp. And also to provide a point of time info of the > lag of replication(source only) > Understand that hbase is using Hadoop > metrics(http://hbase.apache.org/metrics.html), which is a common way to > monitor metric info. This Jira is to serve as a light-weight client > interface, comparing to a completed(certainly better, but heavier)GUI > monitoring package. I made the code works on 0.94.9 now, and like to use this > jira to get opinions about whether the feature is valuable to other > users/workshop. If so, I will build a trunk patch. > All inputs are greatly appreciated. Thank you! > The overall design is to reuse the existing logic which supports hbase shell > command 'status', and invent a new module, called ReplicationLoad. In > HRegionServer.buildServerLoad() , use the local replication service objects > to get their loads which could be wrapped in a ReplicationLoad object and > then simply pass it to the ServerLoad. In ReplicationSourceMetrics and > ReplicationSinkMetrics, a few getters and setters will be created, and ask > Replication to build a "ReplicationLoad". (many thanks to Jean-Daniel for > his kindly suggestions through dev email list) > the replication lag will be calculated for source only, and use this formula: > {code:title=Replication lag|borderStyle=solid} > if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - > timeStampsOfLastShippedOp)) //err on the large side > else if (current time - timeStampsOfLastShippedOp) < 2* > ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen > recently > else lag = 0 // last shipped may happens last night, so NO real lag > although ageOfLastShippedOp is non-zero > {code} > External will look something like: > {code:title=status 'replication'|borderStyle=solid} > hbase(main):001:0> status 'replication' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):002:0> status 'replication','source' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','sink' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLa
[jira] [Commented] (HBASE-11566) make ExportSnapshot extendable by removing 'final'
[ https://issues.apache.org/jira/browse/HBASE-11566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070520#comment-14070520 ] Demai Ni commented on HBASE-11566: -- [~mbertozzi], and [~apurtell], thank you so much... Demai > make ExportSnapshot extendable by removing 'final' > --- > > Key: HBASE-11566 > URL: https://issues.apache.org/jira/browse/HBASE-11566 > Project: HBase > Issue Type: Improvement > Components: snapshots >Affects Versions: 0.98.4 >Reporter: Demai Ni >Assignee: Andrew Purtell >Priority: Minor > Fix For: 0.99.0, 0.98.5, 2.0.0 > > Attachments: HBASE-11566.patch > > > currently the ExportSnapshot is defined as final class. This jira would like > to remove 'final' to make the class extendable so that we can leverage the > existing snapshot logic for backup/restore solution discussed in [HBASE-7912 > | https://issues.apache.org/jira/browse/HBASE-7912] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11566) make ExportSnapshot extendable by removing 'final'
Demai Ni created HBASE-11566: Summary: make ExportSnapshot extendable by removing 'final' Key: HBASE-11566 URL: https://issues.apache.org/jira/browse/HBASE-11566 Project: HBase Issue Type: Improvement Components: snapshots Affects Versions: 0.98.4, 0.98.3 Reporter: Demai Ni Assignee: Demai Ni Priority: Minor Fix For: 0.99.0, 1.0.0, 0.98.5, 2.0.0 currently the ExportSnapshot is defined as final class. This jira would like to remove 'final' to make the class extendable so that we can leverage the existing snapshot logic for backup/restore solution discussed in [HBASE-7912 | https://issues.apache.org/jira/browse/HBASE-7912] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11542) Unit Test KeyStoreTestUtil.java compilation failure in IBM JDK
[ https://issues.apache.org/jira/browse/HBASE-11542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066527#comment-14066527 ] Demai Ni commented on HBASE-11542: -- [~stack], thanks for following this issue. Linsey and I worked within the same group, and just began to get into the hbase community. She is looking into jvm.java as a reference, and already have a fix in our local repo, and will provide a patch a bit later Demai > Unit Test KeyStoreTestUtil.java compilation failure in IBM JDK > > > Key: HBASE-11542 > URL: https://issues.apache.org/jira/browse/HBASE-11542 > Project: HBase > Issue Type: Improvement > Components: build, test >Affects Versions: 0.99.0 > Environment: RHEL 6.3 ,IBM JDK 6 >Reporter: LinseyPang >Priority: Minor > Fix For: 2.0.0 > > > In trunk, jira HBase-10336 added a utility test KeyStoreTestUtil.java, which > leverages the following sun classes: > import sun.security.x509.AlgorithmId; > import sun.security.x509.CertificateAlgorithmId; > > this cause hbase compiler failure if using IBM JDK, > There are similar classes like below in IBM jdk: > import com.ibm.security.x509.AlgorithmId; > import com.ibm.security.x509.CertificateAlgorithmId; > This jira is to add handling of the x509 references. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11452) add getUserPermission feature in AccessControlClient as client API
[ https://issues.apache.org/jira/browse/HBASE-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14052777#comment-14052777 ] Demai Ni commented on HBASE-11452: -- [~stack],[~apurtell] and [~enis], many thanks for the review and commit the fix. Sorry for the late response. Appreciate the help Demai > add getUserPermission feature in AccessControlClient as client API > --- > > Key: HBASE-11452 > URL: https://issues.apache.org/jira/browse/HBASE-11452 > Project: HBase > Issue Type: Improvement > Components: Client, security >Affects Versions: 0.98.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.4, 2.0.0 > > Attachments: HBASE-11452-master-v0.patch, > HBASE-11452-master-v1.patch, HBASE-11452-master-v1.patch, > HBASE-11452-master-v2.patch, HBASE-11452-master-v3.patch > > > Currently user can 'grant','revoke' and show 'user_permission' through hbase > shell. And there are client api implemented in AccessControlClient.java for > 'grant' and 'revoke'. This jira is to add the 'user_permission' feature with > a new method called 'getUserPermission' > To keep interface consistant, this jira will also update user_permission.rb > to use this API directly. The test result is > {code} > hbase(main):001:0> user_permission > User > Table,Family,Qualifier:Permission > > hbase dn:t1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadminetest,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > hive t1_dn,,: [Permission: > actions=READ,WRITE] > > biadmintable1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintable2,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintest_dn,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 6 row(s) in 1.6220 seconds > hbase(main):002:0> user_permission 't.*' > User > Table,Family,Qualifier:Permission > > hive t1_dn,,: [Permission: > actions=READ,WRITE] > > biadmintable1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintable2,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintest_dn,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 4 row(s) in 0.2130 seconds > hbase(main):003:0> user_permission 'dn:t1' > User > Table,Family,Qualifier:Permission > > hbase dn:t1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 1 row(s) in 0.0790 seconds > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11452) add getUserPermission feature in AccessControlClient as client API
[ https://issues.apache.org/jira/browse/HBASE-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14052499#comment-14052499 ] Demai Ni commented on HBASE-11452: -- bq. -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. didn't find any of them related with this patch. And also saw same warnings show up from other recent HadoopQA testing. The unit test failures doesn't look related either. Demai > add getUserPermission feature in AccessControlClient as client API > --- > > Key: HBASE-11452 > URL: https://issues.apache.org/jira/browse/HBASE-11452 > Project: HBase > Issue Type: Improvement > Components: Client, security >Affects Versions: 0.98.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.4, 2.0.0 > > Attachments: HBASE-11452-master-v0.patch, > HBASE-11452-master-v1.patch, HBASE-11452-master-v1.patch, > HBASE-11452-master-v2.patch > > > Currently user can 'grant','revoke' and show 'user_permission' through hbase > shell. And there are client api implemented in AccessControlClient.java for > 'grant' and 'revoke'. This jira is to add the 'user_permission' feature with > a new method called 'getUserPermission' > To keep interface consistant, this jira will also update user_permission.rb > to use this API directly. The test result is > {code} > hbase(main):001:0> user_permission > User > Table,Family,Qualifier:Permission > > hbase dn:t1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadminetest,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > hive t1_dn,,: [Permission: > actions=READ,WRITE] > > biadmintable1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintable2,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintest_dn,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 6 row(s) in 1.6220 seconds > hbase(main):002:0> user_permission 't.*' > User > Table,Family,Qualifier:Permission > > hive t1_dn,,: [Permission: > actions=READ,WRITE] > > biadmintable1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintable2,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintest_dn,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 4 row(s) in 0.2130 seconds > hbase(main):003:0> user_permission 'dn:t1' > User > Table,Family,Qualifier:Permission > > hbase dn:t1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 1 row(s) in 0.0790 seconds > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11452) add userPermission feature in AccessControlClient as client API
[ https://issues.apache.org/jira/browse/HBASE-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14052494#comment-14052494 ] Demai Ni commented on HBASE-11452: -- [~enis], thanks. the method name is 'getUserPermission' in the patch now. let me change the jira description. Demai > add userPermission feature in AccessControlClient as client API > > > Key: HBASE-11452 > URL: https://issues.apache.org/jira/browse/HBASE-11452 > Project: HBase > Issue Type: Improvement > Components: Client, security >Affects Versions: 0.98.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.4, 2.0.0 > > Attachments: HBASE-11452-master-v0.patch, > HBASE-11452-master-v1.patch, HBASE-11452-master-v1.patch, > HBASE-11452-master-v2.patch > > > Currently user can 'grant','revoke' and show 'user_permission' through hbase > shell. And there are client api implemented in AccessControlClient.java for > 'grant' and 'revoke'. This jira is to add the 'user_permission' feature. > To keep interface consistant, this jira will also update user_permission.rb > to use this API directly. The test result is > {code} > hbase(main):001:0> user_permission > User > Table,Family,Qualifier:Permission > > hbase dn:t1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadminetest,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > hive t1_dn,,: [Permission: > actions=READ,WRITE] > > biadmintable1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintable2,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintest_dn,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 6 row(s) in 1.6220 seconds > hbase(main):002:0> user_permission 't.*' > User > Table,Family,Qualifier:Permission > > hive t1_dn,,: [Permission: > actions=READ,WRITE] > > biadmintable1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintable2,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintest_dn,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 4 row(s) in 0.2130 seconds > hbase(main):003:0> user_permission 'dn:t1' > User > Table,Family,Qualifier:Permission > > hbase dn:t1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 1 row(s) in 0.0790 seconds > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11452) add getUserPermission feature in AccessControlClient as client API
[ https://issues.apache.org/jira/browse/HBASE-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11452: - Description: Currently user can 'grant','revoke' and show 'user_permission' through hbase shell. And there are client api implemented in AccessControlClient.java for 'grant' and 'revoke'. This jira is to add the 'user_permission' feature with a new method called 'getUserPermission' To keep interface consistant, this jira will also update user_permission.rb to use this API directly. The test result is {code} hbase(main):001:0> user_permission User Table,Family,Qualifier:Permission hbase dn:t1,,: [Permission: actions=READ,WRITE,EXEC,CREATE,ADMIN] biadminetest,,: [Permission: actions=READ,WRITE,EXEC,CREATE,ADMIN] hive t1_dn,,: [Permission: actions=READ,WRITE] biadmintable1,,: [Permission: actions=READ,WRITE,EXEC,CREATE,ADMIN] biadmintable2,,: [Permission: actions=READ,WRITE,EXEC,CREATE,ADMIN] biadmintest_dn,,: [Permission: actions=READ,WRITE,EXEC,CREATE,ADMIN] 6 row(s) in 1.6220 seconds hbase(main):002:0> user_permission 't.*' User Table,Family,Qualifier:Permission hive t1_dn,,: [Permission: actions=READ,WRITE] biadmintable1,,: [Permission: actions=READ,WRITE,EXEC,CREATE,ADMIN] biadmintable2,,: [Permission: actions=READ,WRITE,EXEC,CREATE,ADMIN] biadmintest_dn,,: [Permission: actions=READ,WRITE,EXEC,CREATE,ADMIN] 4 row(s) in 0.2130 seconds hbase(main):003:0> user_permission 'dn:t1' User Table,Family,Qualifier:Permission hbase dn:t1,,: [Permission: actions=READ,WRITE,EXEC,CREATE,ADMIN] 1 row(s) in 0.0790 seconds {code} was: Currently user can 'grant','revoke' and show 'user_permission' through hbase shell. And there are client api implemented in AccessControlClient.java for 'grant' and 'revoke'. This jira is to add the 'user_permission' feature. To keep interface consistant, this jira will also update user_permission.rb to use this API directly. The test result is {code} hbase(main):001:0> user_permission User Table,Family,Qualifier:Permission hbase dn:t1,,: [Permission: actions=READ,WRITE,EXEC,CREATE,ADMIN] biadminetest,,: [Permission: actions=READ,WRITE,EXEC,CREATE,ADMIN] hive t1_dn,,: [Permission: actions=READ,WRITE] biadmintable1,,: [Perm
[jira] [Updated] (HBASE-11452) add getUserPermission feature in AccessControlClient as client API
[ https://issues.apache.org/jira/browse/HBASE-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11452: - Summary: add getUserPermission feature in AccessControlClient as client API (was: add userPermission feature in AccessControlClient as client API ) > add getUserPermission feature in AccessControlClient as client API > --- > > Key: HBASE-11452 > URL: https://issues.apache.org/jira/browse/HBASE-11452 > Project: HBase > Issue Type: Improvement > Components: Client, security >Affects Versions: 0.98.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.4, 2.0.0 > > Attachments: HBASE-11452-master-v0.patch, > HBASE-11452-master-v1.patch, HBASE-11452-master-v1.patch, > HBASE-11452-master-v2.patch > > > Currently user can 'grant','revoke' and show 'user_permission' through hbase > shell. And there are client api implemented in AccessControlClient.java for > 'grant' and 'revoke'. This jira is to add the 'user_permission' feature. > To keep interface consistant, this jira will also update user_permission.rb > to use this API directly. The test result is > {code} > hbase(main):001:0> user_permission > User > Table,Family,Qualifier:Permission > > hbase dn:t1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadminetest,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > hive t1_dn,,: [Permission: > actions=READ,WRITE] > > biadmintable1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintable2,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintest_dn,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 6 row(s) in 1.6220 seconds > hbase(main):002:0> user_permission 't.*' > User > Table,Family,Qualifier:Permission > > hive t1_dn,,: [Permission: > actions=READ,WRITE] > > biadmintable1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintable2,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintest_dn,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 4 row(s) in 0.2130 seconds > hbase(main):003:0> user_permission 'dn:t1' > User > Table,Family,Qualifier:Permission > > hbase dn:t1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 1 row(s) in 0.0790 seconds > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11452) add userPermission feature in AccessControlClient as client API
[ https://issues.apache.org/jira/browse/HBASE-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11452: - Attachment: HBASE-11452-master-v2.patch a minor conflict due to new commit. upload v2 patch > add userPermission feature in AccessControlClient as client API > > > Key: HBASE-11452 > URL: https://issues.apache.org/jira/browse/HBASE-11452 > Project: HBase > Issue Type: Improvement > Components: Client, security >Affects Versions: 0.98.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.4, 2.0.0 > > Attachments: HBASE-11452-master-v0.patch, > HBASE-11452-master-v1.patch, HBASE-11452-master-v1.patch, > HBASE-11452-master-v2.patch > > > Currently user can 'grant','revoke' and show 'user_permission' through hbase > shell. And there are client api implemented in AccessControlClient.java for > 'grant' and 'revoke'. This jira is to add the 'user_permission' feature. > To keep interface consistant, this jira will also update user_permission.rb > to use this API directly. The test result is > {code} > hbase(main):001:0> user_permission > User > Table,Family,Qualifier:Permission > > hbase dn:t1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadminetest,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > hive t1_dn,,: [Permission: > actions=READ,WRITE] > > biadmintable1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintable2,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintest_dn,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 6 row(s) in 1.6220 seconds > hbase(main):002:0> user_permission 't.*' > User > Table,Family,Qualifier:Permission > > hive t1_dn,,: [Permission: > actions=READ,WRITE] > > biadmintable1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintable2,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintest_dn,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 4 row(s) in 0.2130 seconds > hbase(main):003:0> user_permission 'dn:t1' > User > Table,Family,Qualifier:Permission > > hbase dn:t1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 1 row(s) in 0.0790 seconds > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11452) add userPermission feature in AccessControlClient as client API
[ https://issues.apache.org/jira/browse/HBASE-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11452: - Attachment: HBASE-11452-master-v1.patch [~apurtell], thanks. since Stack fixed the findbugs link, let me try the patch one more time. Demai > add userPermission feature in AccessControlClient as client API > > > Key: HBASE-11452 > URL: https://issues.apache.org/jira/browse/HBASE-11452 > Project: HBase > Issue Type: Improvement > Components: Client, security >Affects Versions: 0.98.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.4, 2.0.0 > > Attachments: HBASE-11452-master-v0.patch, > HBASE-11452-master-v1.patch, HBASE-11452-master-v1.patch > > > Currently user can 'grant','revoke' and show 'user_permission' through hbase > shell. And there are client api implemented in AccessControlClient.java for > 'grant' and 'revoke'. This jira is to add the 'user_permission' feature. > To keep interface consistant, this jira will also update user_permission.rb > to use this API directly. The test result is > {code} > hbase(main):001:0> user_permission > User > Table,Family,Qualifier:Permission > > hbase dn:t1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadminetest,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > hive t1_dn,,: [Permission: > actions=READ,WRITE] > > biadmintable1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintable2,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintest_dn,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 6 row(s) in 1.6220 seconds > hbase(main):002:0> user_permission 't.*' > User > Table,Family,Qualifier:Permission > > hive t1_dn,,: [Permission: > actions=READ,WRITE] > > biadmintable1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintable2,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintest_dn,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 4 row(s) in 0.2130 seconds > hbase(main):003:0> user_permission 'dn:t1' > User > Table,Family,Qualifier:Permission > > hbase dn:t1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 1 row(s) in 0.0790 seconds > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11452) add userPermission feature in AccessControlClient as client API
[ https://issues.apache.org/jira/browse/HBASE-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051027#comment-14051027 ] Demai Ni commented on HBASE-11452: -- [~apurtell], thanks for the review. The patch can be applied to 0.98 directly. the ' 11 new Findbugs' is still a puzzle to me, as I can't access the files of 'Findbugs warnings' from HadoopQA and couldn't generate them locally. I re-exam the code, and don't think the patch can cause so many Fingbugs warnings. well, I certainly could be wrong here. on a side note, [~stack] mentioned that trunk is probably broken. so probably better to rerun the QA again after fixed. Demai > add userPermission feature in AccessControlClient as client API > > > Key: HBASE-11452 > URL: https://issues.apache.org/jira/browse/HBASE-11452 > Project: HBase > Issue Type: Improvement > Components: Client, security >Affects Versions: 0.98.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.4, 2.0.0 > > Attachments: HBASE-11452-master-v0.patch, HBASE-11452-master-v1.patch > > > Currently user can 'grant','revoke' and show 'user_permission' through hbase > shell. And there are client api implemented in AccessControlClient.java for > 'grant' and 'revoke'. This jira is to add the 'user_permission' feature. > To keep interface consistant, this jira will also update user_permission.rb > to use this API directly. The test result is > {code} > hbase(main):001:0> user_permission > User > Table,Family,Qualifier:Permission > > hbase dn:t1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadminetest,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > hive t1_dn,,: [Permission: > actions=READ,WRITE] > > biadmintable1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintable2,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintest_dn,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 6 row(s) in 1.6220 seconds > hbase(main):002:0> user_permission 't.*' > User > Table,Family,Qualifier:Permission > > hive t1_dn,,: [Permission: > actions=READ,WRITE] > > biadmintable1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintable2,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintest_dn,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 4 row(s) in 0.2130 seconds > hbase(main):003:0> user_permission 'dn:t1' > User > Table,Family,Qualifier:Permission > > hbase dn:t1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 1 row(s) in 0.0790 seconds > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14050649#comment-14050649 ] Demai Ni commented on HBASE-9531: - three issues from the hadoopQA report: * "-1 lineLengths. The patch introduces the following lines longer than 100:". the code is generated through protobuf, and I saw that it already contains other lines longer than 100, so should be ok * the failed UT: org.apache.hadoop.hbase.regionserver.wal.TestLogRolling, the assert failure is at {code} 381 assertTrue(pipeline.length == 382 fs.getDefaultReplication(TEST_UTIL.getDataTestDirOnTestFS())); {code} I couldn't find the immediate relationship with this patch. will look more into this. This particular testcase seems unstable in the past. so could be unrelated * "-1 findbugs. The patch appears to introduce 10 new Findbugs (version 1.3.9) warnings." again this is the 2nd time this week that the links to findbugs warnings do not work. I can't find them through test report. I will send a note to dev@hbase for help > a command line (hbase shell) interface to retreive the replication metrics > and show replication lag > --- > > Key: HBASE-9531 > URL: https://issues.apache.org/jira/browse/HBASE-9531 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 0.99.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.5 > > Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, > HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch > > > This jira is to provide a command line (hbase shell) interface to retreive > the replication metrics info such as:ageOfLastShippedOp, > timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and > timeStampsOfLastAppliedOp. And also to provide a point of time info of the > lag of replication(source only) > Understand that hbase is using Hadoop > metrics(http://hbase.apache.org/metrics.html), which is a common way to > monitor metric info. This Jira is to serve as a light-weight client > interface, comparing to a completed(certainly better, but heavier)GUI > monitoring package. I made the code works on 0.94.9 now, and like to use this > jira to get opinions about whether the feature is valuable to other > users/workshop. If so, I will build a trunk patch. > All inputs are greatly appreciated. Thank you! > The overall design is to reuse the existing logic which supports hbase shell > command 'status', and invent a new module, called ReplicationLoad. In > HRegionServer.buildServerLoad() , use the local replication service objects > to get their loads which could be wrapped in a ReplicationLoad object and > then simply pass it to the ServerLoad. In ReplicationSourceMetrics and > ReplicationSinkMetrics, a few getters and setters will be created, and ask > Replication to build a "ReplicationLoad". (many thanks to Jean-Daniel for > his kindly suggestions through dev email list) > the replication lag will be calculated for source only, and use this formula: > {code:title=Replication lag|borderStyle=solid} > if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - > timeStampsOfLastShippedOp)) //err on the large side > else if (current time - timeStampsOfLastShippedOp) < 2* > ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen > recently > else lag = 0 // last shipped may happens last night, so NO real lag > although ageOfLastShippedOp is non-zero > {code} > External will look something like: > {code:title=status 'replication'|borderStyle=solid} > hbase(main):001:0> status 'replication' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):002:0> status 'replication','source' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:P
[jira] [Updated] (HBASE-11452) add userPermission feature in AccessControlClient as client API
[ https://issues.apache.org/jira/browse/HBASE-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11452: - Attachment: HBASE-11452-master-v1.patch attached v1 patch with method name changed to getUserPermission(). also to use it to get another round with hadoopQA for the failed testcases and the findingbug warning > add userPermission feature in AccessControlClient as client API > > > Key: HBASE-11452 > URL: https://issues.apache.org/jira/browse/HBASE-11452 > Project: HBase > Issue Type: Improvement > Components: Client, security >Affects Versions: 0.98.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 2.0.0 > > Attachments: HBASE-11452-master-v0.patch, HBASE-11452-master-v1.patch > > > Currently user can 'grant','revoke' and show 'user_permission' through hbase > shell. And there are client api implemented in AccessControlClient.java for > 'grant' and 'revoke'. This jira is to add the 'user_permission' feature. > To keep interface consistant, this jira will also update user_permission.rb > to use this API directly. The test result is > {code} > hbase(main):001:0> user_permission > User > Table,Family,Qualifier:Permission > > hbase dn:t1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadminetest,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > hive t1_dn,,: [Permission: > actions=READ,WRITE] > > biadmintable1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintable2,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintest_dn,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 6 row(s) in 1.6220 seconds > hbase(main):002:0> user_permission 't.*' > User > Table,Family,Qualifier:Permission > > hive t1_dn,,: [Permission: > actions=READ,WRITE] > > biadmintable1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintable2,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintest_dn,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 4 row(s) in 0.2130 seconds > hbase(main):003:0> user_permission 'dn:t1' > User > Table,Family,Qualifier:Permission > > hbase dn:t1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 1 row(s) in 0.0790 seconds > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14050487#comment-14050487 ] Demai Ni commented on HBASE-9531: - [~stack], thanks a lot demai > a command line (hbase shell) interface to retreive the replication metrics > and show replication lag > --- > > Key: HBASE-9531 > URL: https://issues.apache.org/jira/browse/HBASE-9531 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 0.99.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.5 > > Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, > HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch > > > This jira is to provide a command line (hbase shell) interface to retreive > the replication metrics info such as:ageOfLastShippedOp, > timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and > timeStampsOfLastAppliedOp. And also to provide a point of time info of the > lag of replication(source only) > Understand that hbase is using Hadoop > metrics(http://hbase.apache.org/metrics.html), which is a common way to > monitor metric info. This Jira is to serve as a light-weight client > interface, comparing to a completed(certainly better, but heavier)GUI > monitoring package. I made the code works on 0.94.9 now, and like to use this > jira to get opinions about whether the feature is valuable to other > users/workshop. If so, I will build a trunk patch. > All inputs are greatly appreciated. Thank you! > The overall design is to reuse the existing logic which supports hbase shell > command 'status', and invent a new module, called ReplicationLoad. In > HRegionServer.buildServerLoad() , use the local replication service objects > to get their loads which could be wrapped in a ReplicationLoad object and > then simply pass it to the ServerLoad. In ReplicationSourceMetrics and > ReplicationSinkMetrics, a few getters and setters will be created, and ask > Replication to build a "ReplicationLoad". (many thanks to Jean-Daniel for > his kindly suggestions through dev email list) > the replication lag will be calculated for source only, and use this formula: > {code:title=Replication lag|borderStyle=solid} > if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - > timeStampsOfLastShippedOp)) //err on the large side > else if (current time - timeStampsOfLastShippedOp) < 2* > ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen > recently > else lag = 0 // last shipped may happens last night, so NO real lag > although ageOfLastShippedOp is non-zero > {code} > External will look something like: > {code:title=status 'replication'|borderStyle=solid} > hbase(main):001:0> status 'replication' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):002:0> status 'replication','source' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','sink' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','lag' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: lag = 0 > hdtest018
[jira] [Commented] (HBASE-11452) add userPermission feature in AccessControlClient as client API
[ https://issues.apache.org/jira/browse/HBASE-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14049615#comment-14049615 ] Demai Ni commented on HBASE-11452: -- the failed replication Testcases should not related with this patch. I also can't find the artifact of findingbugs warnings. The link will hit 404 error, which is odd. [~apurtell], thanks for the comments. bq. can we consider calling this getUserPermission? sure. I will change the method name. bq. Do I have a Java variant of Stockholm Syndrome? would you please elaborate a little bit? do you feel that we have too many ways to retrieve the same userPermission information? or we should use 'tableName' directly instead of 'tableRegex'? Thanks Demai > add userPermission feature in AccessControlClient as client API > > > Key: HBASE-11452 > URL: https://issues.apache.org/jira/browse/HBASE-11452 > Project: HBase > Issue Type: Improvement > Components: Client, security >Affects Versions: 0.98.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 2.0.0 > > Attachments: HBASE-11452-master-v0.patch > > > Currently user can 'grant','revoke' and show 'user_permission' through hbase > shell. And there are client api implemented in AccessControlClient.java for > 'grant' and 'revoke'. This jira is to add the 'user_permission' feature. > To keep interface consistant, this jira will also update user_permission.rb > to use this API directly. The test result is > {code} > hbase(main):001:0> user_permission > User > Table,Family,Qualifier:Permission > > hbase dn:t1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadminetest,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > hive t1_dn,,: [Permission: > actions=READ,WRITE] > > biadmintable1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintable2,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintest_dn,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 6 row(s) in 1.6220 seconds > hbase(main):002:0> user_permission 't.*' > User > Table,Family,Qualifier:Permission > > hive t1_dn,,: [Permission: > actions=READ,WRITE] > > biadmintable1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintable2,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintest_dn,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 4 row(s) in 0.2130 seconds > hbase(main):003:0> user_permission 'dn:t1' > User > Table,Family,Qualifier:Permission > > hbase dn:t1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 1 row(s) in 0.0790 seconds > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14049475#comment-14049475 ] Demai Ni commented on HBASE-9531: - strange that the patch submitted a few days ago didn't trigger HadoopQA. I must miss a procedure step. Can someone give me a hand? thanks a lot... Demai > a command line (hbase shell) interface to retreive the replication metrics > and show replication lag > --- > > Key: HBASE-9531 > URL: https://issues.apache.org/jira/browse/HBASE-9531 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 0.99.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.5 > > Attachments: HBASE-9531-master-v1.patch, HBASE-9531-trunk-v0.patch, > HBASE-9531-trunk-v0.patch > > > This jira is to provide a command line (hbase shell) interface to retreive > the replication metrics info such as:ageOfLastShippedOp, > timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and > timeStampsOfLastAppliedOp. And also to provide a point of time info of the > lag of replication(source only) > Understand that hbase is using Hadoop > metrics(http://hbase.apache.org/metrics.html), which is a common way to > monitor metric info. This Jira is to serve as a light-weight client > interface, comparing to a completed(certainly better, but heavier)GUI > monitoring package. I made the code works on 0.94.9 now, and like to use this > jira to get opinions about whether the feature is valuable to other > users/workshop. If so, I will build a trunk patch. > All inputs are greatly appreciated. Thank you! > The overall design is to reuse the existing logic which supports hbase shell > command 'status', and invent a new module, called ReplicationLoad. In > HRegionServer.buildServerLoad() , use the local replication service objects > to get their loads which could be wrapped in a ReplicationLoad object and > then simply pass it to the ServerLoad. In ReplicationSourceMetrics and > ReplicationSinkMetrics, a few getters and setters will be created, and ask > Replication to build a "ReplicationLoad". (many thanks to Jean-Daniel for > his kindly suggestions through dev email list) > the replication lag will be calculated for source only, and use this formula: > {code:title=Replication lag|borderStyle=solid} > if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - > timeStampsOfLastShippedOp)) //err on the large side > else if (current time - timeStampsOfLastShippedOp) < 2* > ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen > recently > else lag = 0 // last shipped may happens last night, so NO real lag > although ageOfLastShippedOp is non-zero > {code} > External will look something like: > {code:title=status 'replication'|borderStyle=solid} > hbase(main):001:0> status 'replication' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):002:0> status 'replication','source' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','sink' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication',
[jira] [Updated] (HBASE-11452) add userPermission feature in AccessControlClient as client API
[ https://issues.apache.org/jira/browse/HBASE-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11452: - Status: Patch Available (was: Open) patch is attached for master branch. > add userPermission feature in AccessControlClient as client API > > > Key: HBASE-11452 > URL: https://issues.apache.org/jira/browse/HBASE-11452 > Project: HBase > Issue Type: Improvement > Components: Client, security >Affects Versions: 0.98.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 2.0.0 > > Attachments: HBASE-11452-master-v0.patch > > > Currently user can 'grant','revoke' and show 'user_permission' through hbase > shell. And there are client api implemented in AccessControlClient.java for > 'grant' and 'revoke'. This jira is to add the 'user_permission' feature. > To keep interface consistant, this jira will also update user_permission.rb > to use this API directly. The test result is > {code} > hbase(main):001:0> user_permission > User > Table,Family,Qualifier:Permission > > hbase dn:t1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadminetest,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > hive t1_dn,,: [Permission: > actions=READ,WRITE] > > biadmintable1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintable2,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintest_dn,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 6 row(s) in 1.6220 seconds > hbase(main):002:0> user_permission 't.*' > User > Table,Family,Qualifier:Permission > > hive t1_dn,,: [Permission: > actions=READ,WRITE] > > biadmintable1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintable2,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > biadmintest_dn,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 4 row(s) in 0.2130 seconds > hbase(main):003:0> user_permission 'dn:t1' > User > Table,Family,Qualifier:Permission > > hbase dn:t1,,: [Permission: > actions=READ,WRITE,EXEC,CREATE,ADMIN] > > 1 row(s) in 0.0790 seconds > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11452) add userPermission feature in AccessControlClient as client API
[ https://issues.apache.org/jira/browse/HBASE-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11452: - Description: Currently user can 'grant','revoke' and show 'user_permission' through hbase shell. And there are client api implemented in AccessControlClient.java for 'grant' and 'revoke'. This jira is to add the 'user_permission' feature. To keep interface consistant, this jira will also update user_permission.rb to use this API directly. The test result is {code} hbase(main):001:0> user_permission User Table,Family,Qualifier:Permission hbase dn:t1,,: [Permission: actions=READ,WRITE,EXEC,CREATE,ADMIN] biadminetest,,: [Permission: actions=READ,WRITE,EXEC,CREATE,ADMIN] hive t1_dn,,: [Permission: actions=READ,WRITE] biadmintable1,,: [Permission: actions=READ,WRITE,EXEC,CREATE,ADMIN] biadmintable2,,: [Permission: actions=READ,WRITE,EXEC,CREATE,ADMIN] biadmintest_dn,,: [Permission: actions=READ,WRITE,EXEC,CREATE,ADMIN] 6 row(s) in 1.6220 seconds hbase(main):002:0> user_permission 't.*' User Table,Family,Qualifier:Permission hive t1_dn,,: [Permission: actions=READ,WRITE] biadmintable1,,: [Permission: actions=READ,WRITE,EXEC,CREATE,ADMIN] biadmintable2,,: [Permission: actions=READ,WRITE,EXEC,CREATE,ADMIN] biadmintest_dn,,: [Permission: actions=READ,WRITE,EXEC,CREATE,ADMIN] 4 row(s) in 0.2130 seconds hbase(main):003:0> user_permission 'dn:t1' User Table,Family,Qualifier:Permission hbase dn:t1,,: [Permission: actions=READ,WRITE,EXEC,CREATE,ADMIN] 1 row(s) in 0.0790 seconds {code} was: Currently user can 'grant','revoke' and show 'user_permission' through hbase shell. And there are client api implemented in AccessControlClient.java for 'grant' and 'revoke'. This jira is to add the 'user_permission' feature. To keep interface consistant, this jira will also update user_permission.rb to use this API directly > add userPermission feature in AccessControlClient as client API > > > Key: HBASE-11452 > URL: https://issues.apache.org/jira/browse/HBASE-11452 > Project: HBase > Issue Type: Improvement > Components: Client, security >Affects Versions: 0.98.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 2.0.0 > > Attachments: HBASE-11452-master-v0.patch > > > Currently user can 'grant','revoke' and show 'user_permission' through hbase > shell. And there are client api implemented in AccessControlClient.java for > 'grant' and 'revoke'. This jira is to add the 'user_permission' feature. > To keep interface consistant, this jira will also update user_permission.rb > to use this API directly. The test result is > {code} > hbase(main):001:0> user_permission > User > Tab
[jira] [Updated] (HBASE-11452) add userPermission feature in AccessControlClient as client API
[ https://issues.apache.org/jira/browse/HBASE-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11452: - Attachment: HBASE-11452-master-v0.patch > add userPermission feature in AccessControlClient as client API > > > Key: HBASE-11452 > URL: https://issues.apache.org/jira/browse/HBASE-11452 > Project: HBase > Issue Type: Improvement > Components: Client, security >Affects Versions: 0.98.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 2.0.0 > > Attachments: HBASE-11452-master-v0.patch > > > Currently user can 'grant','revoke' and show 'user_permission' through hbase > shell. And there are client api implemented in AccessControlClient.java for > 'grant' and 'revoke'. This jira is to add the 'user_permission' feature. > To keep interface consistant, this jira will also update user_permission.rb > to use this API directly -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11452) add userPermission feature in AccessControlClient as client API
Demai Ni created HBASE-11452: Summary: add userPermission feature in AccessControlClient as client API Key: HBASE-11452 URL: https://issues.apache.org/jira/browse/HBASE-11452 Project: HBase Issue Type: Improvement Components: security Affects Versions: 0.99.0 Reporter: Demai Ni Assignee: Demai Ni Fix For: 0.99.0 Currently user can 'grant','revoke' and show 'user_permission' through hbase shell. And there are client api implemented in AccessControlClient.java for 'grant' and 'revoke'. This jira is to add the 'user_permission' feature. To keep interface consistant, this jira will also update user_permission.rb to use this API directly -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10851) Wait for regionservers to join the cluster
[ https://issues.apache.org/jira/browse/HBASE-10851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047965#comment-14047965 ] Demai Ni commented on HBASE-10851: -- [~jxiang] many thanks for your reponse. sorry for the delay, got occupied during weekend. My cluster is a single node cluster, but I put this property hbase.cluster.distributed true to mimic distributed, I guess that 'fool' the logic in {code:title="LocalHBaseCluster"} -- /** * @param c Configuration to check. * @return True if a 'local' address in hbase.master value. */ public static boolean isLocal(final Configuration c) { boolean mode = c.getBoolean(HConstants.CLUSTER_DISTRIBUTED, HConstants.DEFAULT_CLUSTER_DISTRIBUTED); return(mode == HConstants.CLUSTER_IS_LOCAL); } {code} {code:title="HMasterCommandLine"} ... if (LocalHBaseCluster.isLocal(conf)) { DefaultMetricsSystem.setMiniClusterMode(true); +conf.setInt(ServerManager.WAIT_ON_REGIONSERVERS_MINTOSTART, 1); ... } {code} > Wait for regionservers to join the cluster > -- > > Key: HBASE-10851 > URL: https://issues.apache.org/jira/browse/HBASE-10851 > Project: HBase > Issue Type: Bug >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang >Priority: Critical > Fix For: 0.99.0 > > Attachments: hbase-10851.patch, hbase-10851_v2.patch > > > With HBASE-10569, if regionservers are started a while after the master, all > regions will be assigned to the master. That may not be what users expect. > A work-around is to always start regionservers before masters. > I was wondering if the master can wait a little for other regionservers to > join. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046562#comment-14046562 ] Demai Ni commented on HBASE-9531: - [~apurtell], can you please take a look at the new patch. about this: bq. Might be good to have a summary option for replication by default but not necessary for the default of 'replication', I am including both 'sink' and 'source' info, like below. It is more like a 'detailed' option instead of 'summary' {code} hbase(main):002:0> status 1 servers, 0 dead, 19. average load hbase(main):003:0> status 'replication' version 0.99.0-SNAPSHOT 1 live servers hdtest014.svl.ibm.com: SOURCE:PeerID=15, AgeOfLastShippedOp=307, SizeOfLogQueue=0, TimeStampsOfLastShippedOp=Fri Jun 27 17:00:44 PDT 2014, Replication Lag=0 SINK :AgeOfLastAppliedOp=1129746, TimeStampsOfLastAppliedOp=Fri Jun 27 17:10:18 PDT 2014 {code} > a command line (hbase shell) interface to retreive the replication metrics > and show replication lag > --- > > Key: HBASE-9531 > URL: https://issues.apache.org/jira/browse/HBASE-9531 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 0.99.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.5 > > Attachments: HBASE-9531-master-v1.patch, HBASE-9531-trunk-v0.patch, > HBASE-9531-trunk-v0.patch > > > This jira is to provide a command line (hbase shell) interface to retreive > the replication metrics info such as:ageOfLastShippedOp, > timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and > timeStampsOfLastAppliedOp. And also to provide a point of time info of the > lag of replication(source only) > Understand that hbase is using Hadoop > metrics(http://hbase.apache.org/metrics.html), which is a common way to > monitor metric info. This Jira is to serve as a light-weight client > interface, comparing to a completed(certainly better, but heavier)GUI > monitoring package. I made the code works on 0.94.9 now, and like to use this > jira to get opinions about whether the feature is valuable to other > users/workshop. If so, I will build a trunk patch. > All inputs are greatly appreciated. Thank you! > The overall design is to reuse the existing logic which supports hbase shell > command 'status', and invent a new module, called ReplicationLoad. In > HRegionServer.buildServerLoad() , use the local replication service objects > to get their loads which could be wrapped in a ReplicationLoad object and > then simply pass it to the ServerLoad. In ReplicationSourceMetrics and > ReplicationSinkMetrics, a few getters and setters will be created, and ask > Replication to build a "ReplicationLoad". (many thanks to Jean-Daniel for > his kindly suggestions through dev email list) > the replication lag will be calculated for source only, and use this formula: > {code:title=Replication lag|borderStyle=solid} > if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - > timeStampsOfLastShippedOp)) //err on the large side > else if (current time - timeStampsOfLastShippedOp) < 2* > ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen > recently > else lag = 0 // last shipped may happens last night, so NO real lag > although ageOfLastShippedOp is non-zero > {code} > External will look something like: > {code:title=status 'replication'|borderStyle=solid} > hbase(main):001:0> status 'replication' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):002:0> status 'replication','source' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastS
[jira] [Updated] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-9531: Attachment: HBASE-9531-master-v1.patch updated patch to address Andrew's comments > a command line (hbase shell) interface to retreive the replication metrics > and show replication lag > --- > > Key: HBASE-9531 > URL: https://issues.apache.org/jira/browse/HBASE-9531 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 0.99.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.5 > > Attachments: HBASE-9531-master-v1.patch, HBASE-9531-trunk-v0.patch, > HBASE-9531-trunk-v0.patch > > > This jira is to provide a command line (hbase shell) interface to retreive > the replication metrics info such as:ageOfLastShippedOp, > timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and > timeStampsOfLastAppliedOp. And also to provide a point of time info of the > lag of replication(source only) > Understand that hbase is using Hadoop > metrics(http://hbase.apache.org/metrics.html), which is a common way to > monitor metric info. This Jira is to serve as a light-weight client > interface, comparing to a completed(certainly better, but heavier)GUI > monitoring package. I made the code works on 0.94.9 now, and like to use this > jira to get opinions about whether the feature is valuable to other > users/workshop. If so, I will build a trunk patch. > All inputs are greatly appreciated. Thank you! > The overall design is to reuse the existing logic which supports hbase shell > command 'status', and invent a new module, called ReplicationLoad. In > HRegionServer.buildServerLoad() , use the local replication service objects > to get their loads which could be wrapped in a ReplicationLoad object and > then simply pass it to the ServerLoad. In ReplicationSourceMetrics and > ReplicationSinkMetrics, a few getters and setters will be created, and ask > Replication to build a "ReplicationLoad". (many thanks to Jean-Daniel for > his kindly suggestions through dev email list) > the replication lag will be calculated for source only, and use this formula: > {code:title=Replication lag|borderStyle=solid} > if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - > timeStampsOfLastShippedOp)) //err on the large side > else if (current time - timeStampsOfLastShippedOp) < 2* > ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen > recently > else lag = 0 // last shipped may happens last night, so NO real lag > although ageOfLastShippedOp is non-zero > {code} > External will look something like: > {code:title=status 'replication'|borderStyle=solid} > hbase(main):001:0> status 'replication' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):002:0> status 'replication','source' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','sink' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','lag' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: lag = 0 > hdtest018.svl.ibm.com: lag = 14 > hd
[jira] [Commented] (HBASE-11431) Add support of running from command line for 'hbase shell'
[ https://issues.apache.org/jira/browse/HBASE-11431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046547#comment-14046547 ] Demai Ni commented on HBASE-11431: -- currently. I usually did this to ge the same functionality. 1) put the hbase shell commands in to a txt file. For example, I put this line put 't1_dn15','row5','cf1:q1','row5_from15' into file cmd.txt 2) then run this $hbase shell < cmd.txt It will work well like this {code} $ hbase shell < cmd.txt HBase Shell; enter 'help' for list of supported commands. Type "exit" to leave the HBase Shell Version 0.99.0-SNAPSHOT, redfe6592dfcab57f6f2a78f73d4fc788e62707e9, Fri Jun 27 15:36:44 PDT 2014 put 't1_dn15','row5','cf1:q1','row5_from15' 0 row(s) in 0.4750 seconds {code} And user can certainly put more than one command in the text file. So will that serve the requirement of this jira? > Add support of running from command line for 'hbase shell' > -- > > Key: HBASE-11431 > URL: https://issues.apache.org/jira/browse/HBASE-11431 > Project: HBase > Issue Type: New Feature > Components: Admin >Affects Versions: 0.89-fb >Reporter: Yi Deng >Priority: Minor > Labels: shell > Fix For: 0.89-fb > > > Add support of running from command line for 'hbase shell'. > Now you can execute shell command from the bash like this: > bin/hbase shell --exec='scan ".META"' > The result can be piped to grep or other command. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10851) Wait for regionservers to join the cluster
[ https://issues.apache.org/jira/browse/HBASE-10851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046522#comment-14046522 ] Demai Ni commented on HBASE-10851: -- [~jxiang], I encoutered this msg on master log on single-node cluster using trunk build "INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 1, slept for 978938 ms, expecting *minimum of 2*, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms, selfCheckedIn true" This jira change the default value to 2, and saw your comments "... The minimum regionservers to wait is changed from 1 to 2 so the active master is included. For standalone sever, the minimum regionservers to wait is set to 1." After added property 'hbase.master.wait.on.regionservers.mintostart' into hbase-site.xml, and my cluster up and run again. My question is from migration/configuration perspective. In the case an existing single-node cluster migrated to 1.0 (in my case, my cluster was using 98.2, I stopped hbase, replaced hbase jars, restarted hbase, and hit problem), is there a way to add such configuration into hbase-site.xml automatically? I examed the hbase-default.xml, and couldn't find the property, and also couldn't figure out whether we can use two different default values for single-node vs multi-node cluster. thanks Demai > Wait for regionservers to join the cluster > -- > > Key: HBASE-10851 > URL: https://issues.apache.org/jira/browse/HBASE-10851 > Project: HBase > Issue Type: Bug >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang >Priority: Critical > Fix For: 0.99.0 > > Attachments: hbase-10851.patch, hbase-10851_v2.patch > > > With HBASE-10569, if regionservers are started a while after the master, all > regions will be assigned to the master. That may not be what users expect. > A work-around is to always start regionservers before masters. > I was wondering if the master can wait a little for other regionservers to > join. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043004#comment-14043004 ] Demai Ni commented on HBASE-9531: - [~apurtell], thanks for the review and comments. bq. Can we just not set the new fields in ClusterStatus if replication is not active? let me check the code. should be ok to not set it with replication disabled. bq. In ReplicationLoad.java, please don't start method names with capital letters I will update the patch and correct them. bq. The default status command is 'summary', so we shouldn't dump all of the source and sink information as default, that's not a summary by definition. the code logic doesn't change the existing behavior of 'status' and the output of 'status summary'. The code in admin.rb will only check 2nd argument if the first arg is 'replication', otherwise, the code flow will go to the existing 'summary'/default logic, and won't contain the replication information. I will double check it. Again appreciate the help. I will put up an updated patch later this week. Demai > a command line (hbase shell) interface to retreive the replication metrics > and show replication lag > --- > > Key: HBASE-9531 > URL: https://issues.apache.org/jira/browse/HBASE-9531 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 0.99.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.5 > > Attachments: HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch > > > This jira is to provide a command line (hbase shell) interface to retreive > the replication metrics info such as:ageOfLastShippedOp, > timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and > timeStampsOfLastAppliedOp. And also to provide a point of time info of the > lag of replication(source only) > Understand that hbase is using Hadoop > metrics(http://hbase.apache.org/metrics.html), which is a common way to > monitor metric info. This Jira is to serve as a light-weight client > interface, comparing to a completed(certainly better, but heavier)GUI > monitoring package. I made the code works on 0.94.9 now, and like to use this > jira to get opinions about whether the feature is valuable to other > users/workshop. If so, I will build a trunk patch. > All inputs are greatly appreciated. Thank you! > The overall design is to reuse the existing logic which supports hbase shell > command 'status', and invent a new module, called ReplicationLoad. In > HRegionServer.buildServerLoad() , use the local replication service objects > to get their loads which could be wrapped in a ReplicationLoad object and > then simply pass it to the ServerLoad. In ReplicationSourceMetrics and > ReplicationSinkMetrics, a few getters and setters will be created, and ask > Replication to build a "ReplicationLoad". (many thanks to Jean-Daniel for > his kindly suggestions through dev email list) > the replication lag will be calculated for source only, and use this formula: > {code:title=Replication lag|borderStyle=solid} > if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - > timeStampsOfLastShippedOp)) //err on the large side > else if (current time - timeStampsOfLastShippedOp) < 2* > ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen > recently > else lag = 0 // last shipped may happens last night, so NO real lag > although ageOfLastShippedOp is non-zero > {code} > External will look something like: > {code:title=status 'replication'|borderStyle=solid} > hbase(main):001:0> status 'replication' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):002:0> status 'replication','source' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, siz
[jira] [Updated] (HBASE-11327) ExportSnapshot hit stackoverflow error when target snapshotDir doesn't contain uri
[ https://issues.apache.org/jira/browse/HBASE-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11327: - Attachment: HBASE-11327-0.98-v0.patch > ExportSnapshot hit stackoverflow error when target snapshotDir doesn't > contain uri > -- > > Key: HBASE-11327 > URL: https://issues.apache.org/jira/browse/HBASE-11327 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.2 >Reporter: Demai Ni >Assignee: Demai Ni >Priority: Minor > Fix For: 0.99.0, 0.98.4 > > Attachments: HBASE-11327-0.98-v0.patch, HBASE-11327-trunk-v0.patch > > > {code} > $hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot > snapshotT1_dn -copy-to /user/demai/backup1 > Exception in thread "main" java.lang.StackOverflowError > at java.util.regex.Pattern$Slice.match(Pattern.java:3490) > at java.util.regex.Pattern$Start.match(Pattern.java:3066) > at java.util.regex.Matcher.search(Matcher.java:1116) > at java.util.regex.Matcher.find(Matcher.java:546) > at > org.apache.hadoop.conf.Configuration.substituteVars(Configuration.java:681) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:893) > at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:175) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360) > {code} > the following command will work with uri > {code} > hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshotT1_dn > -copy-to hdfs://hdtest014.svl.ibm.com:9000/user/demai/backup2 > {code} > The bug is the same as > [Hadoop-9069|https://issues.apache.org/jira/browse/HADOOP-9069]. Since the > hadoop jira has been sitting there for more than a year, use this jira for a > local hbase fix for now. > Many thanks for [~mbertozzi] help on this one. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11327) ExportSnapshot hit stackoverflow error when target snapshotDir doesn't contain uri
[ https://issues.apache.org/jira/browse/HBASE-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11327: - Attachment: HBASE-11327-trunk-v0.patch > ExportSnapshot hit stackoverflow error when target snapshotDir doesn't > contain uri > -- > > Key: HBASE-11327 > URL: https://issues.apache.org/jira/browse/HBASE-11327 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.2 >Reporter: Demai Ni >Assignee: Demai Ni >Priority: Minor > Fix For: 0.99.0, 0.98.4 > > Attachments: HBASE-11327-0.98-v0.patch, HBASE-11327-trunk-v0.patch > > > {code} > $hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot > snapshotT1_dn -copy-to /user/demai/backup1 > Exception in thread "main" java.lang.StackOverflowError > at java.util.regex.Pattern$Slice.match(Pattern.java:3490) > at java.util.regex.Pattern$Start.match(Pattern.java:3066) > at java.util.regex.Matcher.search(Matcher.java:1116) > at java.util.regex.Matcher.find(Matcher.java:546) > at > org.apache.hadoop.conf.Configuration.substituteVars(Configuration.java:681) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:893) > at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:175) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360) > {code} > the following command will work with uri > {code} > hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshotT1_dn > -copy-to hdfs://hdtest014.svl.ibm.com:9000/user/demai/backup2 > {code} > The bug is the same as > [Hadoop-9069|https://issues.apache.org/jira/browse/HADOOP-9069]. Since the > hadoop jira has been sitting there for more than a year, use this jira for a > local hbase fix for now. > Many thanks for [~mbertozzi] help on this one. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11327) ExportSnapshot hit stackoverflow error when target snapshotDir doesn't contain uri
[ https://issues.apache.org/jira/browse/HBASE-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11327: - Status: Patch Available (was: Open) > ExportSnapshot hit stackoverflow error when target snapshotDir doesn't > contain uri > -- > > Key: HBASE-11327 > URL: https://issues.apache.org/jira/browse/HBASE-11327 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.2 >Reporter: Demai Ni >Assignee: Demai Ni >Priority: Minor > Fix For: 0.99.0, 0.98.4 > > Attachments: HBASE-11327-0.98-v0.patch, HBASE-11327-trunk-v0.patch > > > {code} > $hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot > snapshotT1_dn -copy-to /user/demai/backup1 > Exception in thread "main" java.lang.StackOverflowError > at java.util.regex.Pattern$Slice.match(Pattern.java:3490) > at java.util.regex.Pattern$Start.match(Pattern.java:3066) > at java.util.regex.Matcher.search(Matcher.java:1116) > at java.util.regex.Matcher.find(Matcher.java:546) > at > org.apache.hadoop.conf.Configuration.substituteVars(Configuration.java:681) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:893) > at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:175) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360) > {code} > the following command will work with uri > {code} > hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshotT1_dn > -copy-to hdfs://hdtest014.svl.ibm.com:9000/user/demai/backup2 > {code} > The bug is the same as > [Hadoop-9069|https://issues.apache.org/jira/browse/HADOOP-9069]. Since the > hadoop jira has been sitting there for more than a year, use this jira for a > local hbase fix for now. > Many thanks for [~mbertozzi] help on this one. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11327) ExportSnapshot hit stackoverflow error when target snapshotDir doesn't contain uri
Demai Ni created HBASE-11327: Summary: ExportSnapshot hit stackoverflow error when target snapshotDir doesn't contain uri Key: HBASE-11327 URL: https://issues.apache.org/jira/browse/HBASE-11327 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.98.2 Reporter: Demai Ni Assignee: Demai Ni Priority: Minor Fix For: 0.99.0, 0.98.4 {code} $hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshotT1_dn -copy-to /user/demai/backup1 Exception in thread "main" java.lang.StackOverflowError at java.util.regex.Pattern$Slice.match(Pattern.java:3490) at java.util.regex.Pattern$Start.match(Pattern.java:3066) at java.util.regex.Matcher.search(Matcher.java:1116) at java.util.regex.Matcher.find(Matcher.java:546) at org.apache.hadoop.conf.Configuration.substituteVars(Configuration.java:681) at org.apache.hadoop.conf.Configuration.get(Configuration.java:893) at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:175) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360) {code} the following command will work with uri {code} hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshotT1_dn -copy-to hdfs://hdtest014.svl.ibm.com:9000/user/demai/backup2 {code} The bug is the same as [Hadoop-9069|https://issues.apache.org/jira/browse/HADOOP-9069]. Since the hadoop jira has been sitting there for more than a year, use this jira for a local hbase fix for now. Many thanks for [~mbertozzi] help on this one. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot
[ https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-7912: Attachment: HBaseBackupRestore-Jira-7912-DesignDoc-v2.pdf uploaded designDoc V2 with a few minor changes, and listed limitations > HBase Backup/Restore Based on HBase Snapshot > > > Key: HBASE-7912 > URL: https://issues.apache.org/jira/browse/HBASE-7912 > Project: HBase > Issue Type: Sub-task >Reporter: Richard Ding >Assignee: Richard Ding > Attachments: HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, > HBaseBackupRestore-Jira-7912-DesignDoc-v2.pdf, > HBase_BackupRestore-Jira-7912-CLI-v1.pdf > > > Finally, we completed the implementation of our backup/restore solution, and > would like to share with community through this jira. > We are leveraging existing hbase snapshot feature, and provide a general > solution to common users. Our full backup is using snapshot to capture > metadata locally and using exportsnapshot to move data to another cluster; > the incremental backup is using offline-WALplayer to backup HLogs; we also > leverage global distribution rolllog and flush to improve performance; other > added-on values such as convert, merge, progress report, and CLI commands. So > that a common user can backup hbase data without in-depth knowledge of hbase. > Our solution also contains some usability features for enterprise users. > The detail design document and CLI command will be attached in this jira. We > plan to use 10~12 subtasks to share each of the following features, and > document the detail implement in the subtasks: > * *Full Backup* : provide local and remote back/restore for a list of tables > * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental > backup) > * *distributed* Logroll and distributed flush > * Backup *Manifest* and history > * *Incremental* backup: to build on top of full backup as daily/weekly backup > * *Convert* incremental backup WAL files into hfiles > * *Merge* several backup images into one(like merge weekly into monthly) > * *add and remove* table to and from Backup image > * *Cancel* a backup process > * backup progress *status* > * full backup based on *existing snapshot* > *-* > *Below is the original description, to keep here as the history for the > design and discussion back in 2013* > There have been attempts in the past to come up with a viable HBase > backup/restore solution (e.g., HBASE-4618). Recently, there are many > advancements and new features in HBase, for example, FileLink, Snapshot, and > Distributed Barrier Procedure. This is a proposal for a backup/restore > solution that utilizes these new features to achieve better performance and > consistency. > > A common practice of backup and restore in database is to first take full > baseline backup, and then periodically take incremental backup that capture > the changes since the full baseline backup. HBase cluster can store massive > amount data. Combination of full backups with incremental backups has > tremendous benefit for HBase as well. The following is a typical scenario > for full and incremental backup. > # The user takes a full backup of a table or a set of tables in HBase. > # The user schedules periodical incremental backups to capture the changes > from the full backup, or from last incremental backup. > # The user needs to restore table data to a past point of time. > # The full backup is restored to the table(s) or to different table name(s). > Then the incremental backups that are up to the desired point in time are > applied on top of the full backup. > We would support the following key features and capabilities. > * Full backup uses HBase snapshot to capture HFiles. > * Use HBase WALs to capture incremental changes, but we use bulk load of > HFiles for fast incremental restore. > * Support single table or a set of tables, and column family level backup and > restore. > * Restore to different table names. > * Support adding additional tables or CF to backup set without interruption > of incremental backup schedule. > * Support rollup/combining of incremental backups into longer period and > bigger incremental backups. > * Unified command line interface for all the above. > The solution will support HBase backup to FileSystem, either on the same > cluster or across clusters. It has the flexibility to support backup to > other devices and servers in the future. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot
[ https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017973#comment-14017973 ] Demai Ni commented on HBASE-7912: - [~fenghh], many thanks for the comments {quote} "Use case example 1" in page 3: The full backup doesn't contain data of table3 and table4, so when restoring table3 and table4, their data are all restored from the incremental backups, right? Sounds it's not a typical scenario(full-backup + incremental backups) for backup/restore. {quote} during step c. ".. user adds other table.." this actually triggers an implicite full backup for table 3 and table 4. So when restore them in the future, the data will come both full and incremental backup. {quote} "4. Full Backup": Does log roll take place after taking (full) snapshot? What if new writes arrive after taking snapshot but before log roll? {quote} the logic is to take log roll first and then snapshot. if new writes arrive in between, it will be saved in the full backup image. And the same writes will be saved again in the next incremental backup. The approach is to ensure no data loss by allowing duplicate puts during restore. {quote} "5. Incremental Backup": What if some RS fails during the log roll procedure so that not all current log number are recorded onto ZooKeeper? {quote} in such case, the backup process will abort, and the clean up logic is the same as [HBASE-11172 cancel a backup process | https://issues.apache.org/jira/browse/HBASE-11172]. The code will remove the incomplete backup image and roll back zookeeper state to the previous backup. {quote} What if some log files are archived/deleted between two incremental backups and are not included in any incremental backup? Is it possible? {quote} Good point. (also thanks to [~mbertozzi], who pointed out the same problem earlier). There is a log cleaner that hasn't been included in the patch yet. It is called BackupLogCleaner extended from BaseLogCleanerDelegate, as part of hbase.master.logcleaner.plugins. It would keep the logs. The side-effect would be (if user don't do incremental too often) too much log files left. We have a stop -all feature to remove all backup tables, also will free up the logs. Thanks for pointing out the typo. I will fix them up in the doc. > HBase Backup/Restore Based on HBase Snapshot > > > Key: HBASE-7912 > URL: https://issues.apache.org/jira/browse/HBASE-7912 > Project: HBase > Issue Type: Sub-task >Reporter: Richard Ding >Assignee: Richard Ding > Attachments: HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, > HBase_BackupRestore-Jira-7912-CLI-v1.pdf > > > Finally, we completed the implementation of our backup/restore solution, and > would like to share with community through this jira. > We are leveraging existing hbase snapshot feature, and provide a general > solution to common users. Our full backup is using snapshot to capture > metadata locally and using exportsnapshot to move data to another cluster; > the incremental backup is using offline-WALplayer to backup HLogs; we also > leverage global distribution rolllog and flush to improve performance; other > added-on values such as convert, merge, progress report, and CLI commands. So > that a common user can backup hbase data without in-depth knowledge of hbase. > Our solution also contains some usability features for enterprise users. > The detail design document and CLI command will be attached in this jira. We > plan to use 10~12 subtasks to share each of the following features, and > document the detail implement in the subtasks: > * *Full Backup* : provide local and remote back/restore for a list of tables > * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental > backup) > * *distributed* Logroll and distributed flush > * Backup *Manifest* and history > * *Incremental* backup: to build on top of full backup as daily/weekly backup > * *Convert* incremental backup WAL files into hfiles > * *Merge* several backup images into one(like merge weekly into monthly) > * *add and remove* table to and from Backup image > * *Cancel* a backup process > * backup progress *status* > * full backup based on *existing snapshot* > *-* > *Below is the original description, to keep here as the history for the > design and discussion back in 2013* > There have been attempts in the past to come up with a viable HBase > backup/restore solution (e.g., HBASE-4618). Recently, there are many > advancements and new features in HBase, for example, FileLink, Snapshot, and > Distributed Barrier Procedure. This is a proposal for a backup/restore > solution that utilizes these new features to achie
[jira] [Commented] (HBASE-10289) Avoid random port usage by default JMX Server. Create Custome JMX server
[ https://issues.apache.org/jira/browse/HBASE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14016977#comment-14016977 ] Demai Ni commented on HBASE-10289: -- [~andrew.purt...@gmail.com], Qiang mentioned offline last night that 0.98 code may be slightly different comparing to trunk, and he is looking into that, and will provide the patch if it is indeed different. He is at another timezone, so I responded for him... Demai > Avoid random port usage by default JMX Server. Create Custome JMX server > > > Key: HBASE-10289 > URL: https://issues.apache.org/jira/browse/HBASE-10289 > Project: HBase > Issue Type: Improvement >Reporter: nijel >Assignee: Qiang Tian >Priority: Minor > Labels: stack > Fix For: 0.99.0 > > Attachments: HBASE-10289-v4.patch, HBASE-10289.patch, > HBASE-10289_1.patch, HBASE-10289_2.patch, HBASE-10289_3.patch, > HBase10289-master.patch, hbase10289-master-v1.patch, > hbase10289-master-v2.patch > > > If we enable JMX MBean server for HMaster or Region server through VM > arguments, the process will use one random which we cannot configure. > This can be a problem if that random port is configured for some other > service. > This issue can be avoided by supporting a custom JMX Server. > The ports can be configured. If there is no ports configured, it will > continue the same way as now. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11274) More general single-row Condition Mutation
[ https://issues.apache.org/jira/browse/HBASE-11274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013871#comment-14013871 ] Demai Ni commented on HBASE-11274: -- [~lshmouse], definitely a valid use case here. I think the other SQL layer compoments provide similar function already, so pushing it into HBase code engineer need to give us better performance? can you please share the design a bit? And it will be great if it can decide the order of filters. thanks. Demai > More general single-row Condition Mutation > -- > > Key: HBASE-11274 > URL: https://issues.apache.org/jira/browse/HBASE-11274 > Project: HBase > Issue Type: Improvement >Reporter: Liu Shaohui >Priority: Minor > > Currently, the checkAndDelete and checkAndPut interface only support atomic > mutation with single condition. But in actual apps, we need more general > condition-mutation that support multi conditions and logical expression with > those conditions. > For example, to support the following sql > {quote} > insert row where (column A == 'X' and column B == 'Y') or (column C == 'z') > {quote} > Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11085) Incremental Backup Restore support
[ https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11085: - Attachment: HLogPlayer.java attached a customized HLogPlayer, which is used in incremental backup for the 'convert' feature. That is convert from HLog to HFile. The code is very similar to WALPlayer except that it is offline(w/o live hbase cluster). Currently the code is kind of messy and only for backup code. So posted here only to show as prototype. We are working on a general code in [HBASE-11170|https://issues.apache.org/jira/browse/HBASE-11170]. so the real patch will be provided it. > Incremental Backup Restore support > -- > > Key: HBASE-11085 > URL: https://issues.apache.org/jira/browse/HBASE-11085 > Project: HBase > Issue Type: New Feature >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0 > > Attachments: > HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch, > HBASE-11085-trunk-v1.patch, > HBASE-11085-trunk-v2-contain-HBASE-10900-trunk-v4.patch, > HBASE-11085-trunk-v2.patch, HLogPlayer.java > > > h2. Feature Description > the jira is part of > [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on > full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. > for the detail layout and frame work, please reference to [HBASE-10900| > https://issues.apache.org/jira/browse/HBASE-10900]. > When client issues an incremental backup request, BackupManager will check > the request and then kicks of a global procedure via HBaseAdmin for all the > active regionServer to roll log. Each Region server will record their log > number into zookeeper. Then we determine which log need to be included in > this incremental backup, and use DistCp to copy them to target location. At > the same time, a dependency of backup image will be recorded, and later on > saved in Backup Manifest file. > Restore is to replay the backuped WAL logs on target HBase instance. The > replay will occur after full backup. > As incremental backup image depends on prior full backup image and > incremental images if exists. Manifest file will be used to store the > dependency lineage during backup, and used during restore time for PIT > restore. > h2. Use case(i.e example) > {code:title=Incremental Backup Restore example|borderStyle=solid} > /***/ > /* STEP1: FULL backup from sourcecluster to targetcluster > /* if no table name specified, all tables from source cluster will be > backuped > /***/ > [sourcecluster]$ hbase backup create full > hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn > ... > 14/05/09 13:35:46 INFO backup.BackupManager: Backup request > backup_1399667695966 has been executed. > /***/ > /* STEP2: In HBase Shell, put a few rows > > /***/ > hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1' > hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1' > hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1' > /***/ > /* STEP3: Take the 1st incremental backup > > /***/ > [sourcecluster]$ hbase backup create incremental > hdfs://hostname.targetcluster.org:9000/userid/backupdir > ... > 14/05/09 13:37:45 INFO backup.BackupManager: Backup request > backup_1399667851020 has been executed. > /***/ > /* STEP4: In HBase Shell, put a few more rows. > > /* update 'row100', and create new 'row101' > > /***/ > hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2' > hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2' > hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2' > /*
[jira] [Commented] (HBASE-11085) Incremental Backup Restore support
[ https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011284#comment-14011284 ] Demai Ni commented on HBASE-11085: -- uploaded v2 patch to review board : https://reviews.apache.org/r/21492/ also put a combined review(both full and incremental backup) here: https://reviews.apache.org/r/21981/. > Incremental Backup Restore support > -- > > Key: HBASE-11085 > URL: https://issues.apache.org/jira/browse/HBASE-11085 > Project: HBase > Issue Type: New Feature >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0 > > Attachments: > HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch, > HBASE-11085-trunk-v1.patch, > HBASE-11085-trunk-v2-contain-HBASE-10900-trunk-v4.patch, > HBASE-11085-trunk-v2.patch > > > h2. Feature Description > the jira is part of > [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on > full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. > for the detail layout and frame work, please reference to [HBASE-10900| > https://issues.apache.org/jira/browse/HBASE-10900]. > When client issues an incremental backup request, BackupManager will check > the request and then kicks of a global procedure via HBaseAdmin for all the > active regionServer to roll log. Each Region server will record their log > number into zookeeper. Then we determine which log need to be included in > this incremental backup, and use DistCp to copy them to target location. At > the same time, a dependency of backup image will be recorded, and later on > saved in Backup Manifest file. > Restore is to replay the backuped WAL logs on target HBase instance. The > replay will occur after full backup. > As incremental backup image depends on prior full backup image and > incremental images if exists. Manifest file will be used to store the > dependency lineage during backup, and used during restore time for PIT > restore. > h2. Use case(i.e example) > {code:title=Incremental Backup Restore example|borderStyle=solid} > /***/ > /* STEP1: FULL backup from sourcecluster to targetcluster > /* if no table name specified, all tables from source cluster will be > backuped > /***/ > [sourcecluster]$ hbase backup create full > hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn > ... > 14/05/09 13:35:46 INFO backup.BackupManager: Backup request > backup_1399667695966 has been executed. > /***/ > /* STEP2: In HBase Shell, put a few rows > > /***/ > hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1' > hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1' > hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1' > /***/ > /* STEP3: Take the 1st incremental backup > > /***/ > [sourcecluster]$ hbase backup create incremental > hdfs://hostname.targetcluster.org:9000/userid/backupdir > ... > 14/05/09 13:37:45 INFO backup.BackupManager: Backup request > backup_1399667851020 has been executed. > /***/ > /* STEP4: In HBase Shell, put a few more rows. > > /* update 'row100', and create new 'row101' > > /***/ > hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2' > hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2' > hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2' > /***/ > /* STEP5: Take the 2nd incremental backup > > /***/ > [sourcecluster
[jira] [Commented] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot
[ https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011276#comment-14011276 ] Demai Ni commented on HBASE-7912: - hi, guys, I opened a review for the framework (patches of both full and incremental backup) here: https://reviews.apache.org/r/21981/. Thanks for your suggestion/comments. Demai > HBase Backup/Restore Based on HBase Snapshot > > > Key: HBASE-7912 > URL: https://issues.apache.org/jira/browse/HBASE-7912 > Project: HBase > Issue Type: Sub-task >Reporter: Richard Ding >Assignee: Richard Ding > Attachments: HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, > HBase_BackupRestore-Jira-7912-CLI-v1.pdf > > > Finally, we completed the implementation of our backup/restore solution, and > would like to share with community through this jira. > We are leveraging existing hbase snapshot feature, and provide a general > solution to common users. Our full backup is using snapshot to capture > metadata locally and using exportsnapshot to move data to another cluster; > the incremental backup is using offline-WALplayer to backup HLogs; we also > leverage global distribution rolllog and flush to improve performance; other > added-on values such as convert, merge, progress report, and CLI commands. So > that a common user can backup hbase data without in-depth knowledge of hbase. > Our solution also contains some usability features for enterprise users. > The detail design document and CLI command will be attached in this jira. We > plan to use 10~12 subtasks to share each of the following features, and > document the detail implement in the subtasks: > * *Full Backup* : provide local and remote back/restore for a list of tables > * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental > backup) > * *distributed* Logroll and distributed flush > * Backup *Manifest* and history > * *Incremental* backup: to build on top of full backup as daily/weekly backup > * *Convert* incremental backup WAL files into hfiles > * *Merge* several backup images into one(like merge weekly into monthly) > * *add and remove* table to and from Backup image > * *Cancel* a backup process > * backup progress *status* > * full backup based on *existing snapshot* > *-* > *Below is the original description, to keep here as the history for the > design and discussion back in 2013* > There have been attempts in the past to come up with a viable HBase > backup/restore solution (e.g., HBASE-4618). Recently, there are many > advancements and new features in HBase, for example, FileLink, Snapshot, and > Distributed Barrier Procedure. This is a proposal for a backup/restore > solution that utilizes these new features to achieve better performance and > consistency. > > A common practice of backup and restore in database is to first take full > baseline backup, and then periodically take incremental backup that capture > the changes since the full baseline backup. HBase cluster can store massive > amount data. Combination of full backups with incremental backups has > tremendous benefit for HBase as well. The following is a typical scenario > for full and incremental backup. > # The user takes a full backup of a table or a set of tables in HBase. > # The user schedules periodical incremental backups to capture the changes > from the full backup, or from last incremental backup. > # The user needs to restore table data to a past point of time. > # The full backup is restored to the table(s) or to different table name(s). > Then the incremental backups that are up to the desired point in time are > applied on top of the full backup. > We would support the following key features and capabilities. > * Full backup uses HBase snapshot to capture HFiles. > * Use HBase WALs to capture incremental changes, but we use bulk load of > HFiles for fast incremental restore. > * Support single table or a set of tables, and column family level backup and > restore. > * Restore to different table names. > * Support adding additional tables or CF to backup set without interruption > of incremental backup schedule. > * Support rollup/combining of incremental backups into longer period and > bigger incremental backups. > * Unified command line interface for all the above. > The solution will support HBase backup to FileSystem, either on the same > cluster or across clusters. It has the flexibility to support backup to > other devices and servers in the future. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11085) Incremental Backup Restore support
[ https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11085: - Description: h2. Feature Description the jira is part of [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. for the detail layout and frame work, please reference to [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. When client issues an incremental backup request, BackupManager will check the request and then kicks of a global procedure via HBaseAdmin for all the active regionServer to roll log. Each Region server will record their log number into zookeeper. Then we determine which log need to be included in this incremental backup, and use DistCp to copy them to target location. At the same time, a dependency of backup image will be recorded, and later on saved in Backup Manifest file. Restore is to replay the backuped WAL logs on target HBase instance. The replay will occur after full backup. As incremental backup image depends on prior full backup image and incremental images if exists. Manifest file will be used to store the dependency lineage during backup, and used during restore time for PIT restore. h2. Use case(i.e example) {code:title=Incremental Backup Restore example|borderStyle=solid} /***/ /* STEP1: FULL backup from sourcecluster to targetcluster /* if no table name specified, all tables from source cluster will be backuped /***/ [sourcecluster]$ hbase backup create full hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn ... 14/05/09 13:35:46 INFO backup.BackupManager: Backup request backup_1399667695966 has been executed. /***/ /* STEP2: In HBase Shell, put a few rows /***/ hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1' hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1' hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1' /***/ /* STEP3: Take the 1st incremental backup /***/ [sourcecluster]$ hbase backup create incremental hdfs://hostname.targetcluster.org:9000/userid/backupdir ... 14/05/09 13:37:45 INFO backup.BackupManager: Backup request backup_1399667851020 has been executed. /***/ /* STEP4: In HBase Shell, put a few more rows. /* update 'row100', and create new 'row101' /***/ hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2' hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2' hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2' hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2' hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2' hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2' /***/ /* STEP5: Take the 2nd incremental backup /***/ [sourcecluster]$ hbase backup create incremental hdfs://hostname.targetcluster.org:9000/userid/backupdir ... 14/05/09 13:39:33 INFO backup.BackupManager: Backup request backup_1399667959165 has been executed. /***/ /* STEP7: Restore from PIT of the 1st incremental backup /* specified the backup ID of the 1st incremental /* option -automatic, will trigger the restore of full backup first, then 1st /* incremental backup image /* t1_dn,etc are the original table names. All tables will be restored if not specified /* t1_dn_restore, etc. are the restored table. if not specified, orginal table name will be used /***
[jira] [Updated] (HBASE-11085) Incremental Backup Restore support
[ https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11085: - Description: h2. Feature Description the jira is part of [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. for the detail layout and frame work, please reference to [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. When client issues an incremental backup request, BackupManager will check the request and then kicks of a global procedure via HBaseAdmin for all the active regionServer to roll log. Each Region server will record their log number into zookeeper. Then we determine which log need to be included in this incremental backup, and use DistCp to copy them to target location. At the same time, a dependency of backup image will be recorded, and later on saved in Backup Manifest file. Restore is to replay the backuped WAL logs on target HBase instance. The replay will occur after full backup. As incremental backup image depends on prior full backup image and incremental images if exists. Manifest file will be used to store the dependency lineage during backup, and used during restore time for PIT restore. h2. Use case(i.e example) {code:title=Incremental Backup Restore example|borderStyle=solid} /***/ /* STEP1: FULL backup from sourcecluster to targetcluster /* if no table name specified, all tables from source cluster will be backuped /***/ [sourcecluster]$ hbase backup create full hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn ... 14/05/09 13:35:46 INFO backup.BackupManager: Backup request backup_1399667695966 has been executed. /***/ /* STEP2: In HBase Shell, put a few rows /***/ hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1' hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1' hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1' /***/ /* STEP3: Take the 1st incremental backup /***/ [sourcecluster]$ hbase backup create incremental hdfs://hostname.targetcluster.org:9000/userid/backupdir ... 14/05/09 13:37:45 INFO backup.BackupManager: Backup request backup_1399667851020 has been executed. /***/ /* STEP4: In HBase Shell, put a few more rows. /* update 'row100', and create new 'row101' /***/ hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2' hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2' hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2' hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2' hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2' hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2' /***/ /* STEP5: Take the 2nd incremental backup /***/ [sourcecluster]$ hbase backup create incremental hdfs://hostname.targetcluster.org:9000/userid/backupdir ... 14/05/09 13:39:33 INFO backup.BackupManager: Backup request backup_1399667959165 has been executed. /***/ /* STEP7: Restore from PIT of the 1st incremental backup /* specified the backup ID of the 1st incremental /* option -automatic, will trigger the restore of full backup first, then 1st /* incremental backup image /* t1_dn,etc are the original table names. All tables will be restored if not specified /* t1_dn_restore, etc. are the restored table. if not specified, orginal table name will be used /***
[jira] [Updated] (HBASE-11085) Incremental Backup Restore support
[ https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11085: - Attachment: HBASE-11085-trunk-v2-contain-HBASE-10900-trunk-v4.patch attached V2 patch, which contains: 1) a unit testcase (thanks to [~tianq], who implemented it) 2) address comments from [~tedyu] 3) fix a few long line warning, and java doc warnings > Incremental Backup Restore support > -- > > Key: HBASE-11085 > URL: https://issues.apache.org/jira/browse/HBASE-11085 > Project: HBase > Issue Type: New Feature >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0 > > Attachments: > HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch, > HBASE-11085-trunk-v1.patch, > HBASE-11085-trunk-v2-contain-HBASE-10900-trunk-v4.patch, > HBASE-11085-trunk-v2.patch > > > h2. Feature Description > the jira is part of > [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on > full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. > for the detail layout and frame work, please reference to [HBASE-10900| > https://issues.apache.org/jira/browse/HBASE-10900]. > When client issues an incremental backup request, BackupManager will check > the request and then kicks of a global procedure via HBaseAdmin for all the > active regionServer to roll log. Each Region server will record their log > number into zookeeper. Then we determine which log need to be included in > this incremental backup, and use DistCp to copy them to target location. At > the same time, a dependency of backup image will be recorded, and later on > saved in Backup Manifest file. > Restore is to replay the backuped WAL logs on target HBase instance. The > replay will occur after full backup. > As incremental backup image depends on prior full backup image and > incremental images if exists. Manifest file will be used to store the > dependency lineage during backup, and used during restore time for PIT > restore. > h2. Use case(i.e example) > {code:title=Incremental Backup Restore example|borderStyle=solid} > /***/ > /* STEP1: FULL backup from sourcecluster to targetcluster > /* if no table name specified, all tables from source cluster will be > backuped > /***/ > [sourcecluster]$ hbase backup create full > hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn > ... > 14/05/09 13:35:46 INFO backup.BackupManager: Backup request > backup_1399667695966 has been executed. > /***/ > /* STEP2: In HBase Shell, put a few rows > > /***/ > hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1' > hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1' > hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1' > /***/ > /* STEP3: Take the 1st incremental backup > > /***/ > [sourcecluster]$ hbase backup create incremental > hdfs://hostname.targetcluster.org:9000/userid/backupdir > ... > 14/05/09 13:37:45 INFO backup.BackupManager: Backup request > backup_1399667851020 has been executed. > /***/ > /* STEP4: In HBase Shell, put a few more rows. > > /* update 'row100', and create new 'row101' > > /***/ > hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2' > hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2' > hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2' > /***/ > /* STEP5: Take the 2nd incremental backup > > /*
[jira] [Updated] (HBASE-11085) Incremental Backup Restore support
[ https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11085: - Attachment: HBASE-11085-trunk-v2.patch > Incremental Backup Restore support > -- > > Key: HBASE-11085 > URL: https://issues.apache.org/jira/browse/HBASE-11085 > Project: HBase > Issue Type: New Feature >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0 > > Attachments: > HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch, > HBASE-11085-trunk-v1.patch, HBASE-11085-trunk-v2.patch > > > h2. Feature Description > the jira is part of > [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on > full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. > for the detail layout and frame work, please reference to [HBASE-10900| > https://issues.apache.org/jira/browse/HBASE-10900]. > When client issues an incremental backup request, BackupManager will check > the request and then kicks of a global procedure via HBaseAdmin for all the > active regionServer to roll log. Each Region server will record their log > number into zookeeper. Then we determine which log need to be included in > this incremental backup, and use DistCp to copy them to target location. At > the same time, a dependency of backup image will be recorded, and later on > saved in Backup Manifest file. > Restore is to replay the backuped WAL logs on target HBase instance. The > replay will occur after full backup. > As incremental backup image depends on prior full backup image and > incremental images if exists. Manifest file will be used to store the > dependency lineage during backup, and used during restore time for PIT > restore. > h2. Use case(i.e example) > {code:title=Incremental Backup Restore example|borderStyle=solid} > /***/ > /* STEP1: FULL backup from sourcecluster to targetcluster > /* if no table name specified, all tables from source cluster will be > backuped > /***/ > [sourcecluster]$ hbase backup create full > hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn > ... > 14/05/09 13:35:46 INFO backup.BackupManager: Backup request > backup_1399667695966 has been executed. > /***/ > /* STEP2: In HBase Shell, put a few rows > > /***/ > hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1' > hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1' > hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1' > /***/ > /* STEP3: Take the 1st incremental backup > > /***/ > [sourcecluster]$ hbase backup create incremental > hdfs://hostname.targetcluster.org:9000/userid/backupdir > ... > 14/05/09 13:37:45 INFO backup.BackupManager: Backup request > backup_1399667851020 has been executed. > /***/ > /* STEP4: In HBase Shell, put a few more rows. > > /* update 'row100', and create new 'row101' > > /***/ > hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2' > hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2' > hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2' > /***/ > /* STEP5: Take the 2nd incremental backup > > /***/ > [sourcecluster]$ hbase backup create incremental > hdfs://hostname.targetcluster.org:9000/userid/backupdir > ... > 14/05/09 13:39:33 INFO backup.BackupManager: Backup request > backup_1399667959165 has been executed. > /
[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010262#comment-14010262 ] Demai Ni commented on HBASE-9531: - [~jdcryans], may I know your takes about this feature? thanks... Demai > a command line (hbase shell) interface to retreive the replication metrics > and show replication lag > --- > > Key: HBASE-9531 > URL: https://issues.apache.org/jira/browse/HBASE-9531 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 0.99.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.4 > > Attachments: HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch > > > This jira is to provide a command line (hbase shell) interface to retreive > the replication metrics info such as:ageOfLastShippedOp, > timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and > timeStampsOfLastAppliedOp. And also to provide a point of time info of the > lag of replication(source only) > Understand that hbase is using Hadoop > metrics(http://hbase.apache.org/metrics.html), which is a common way to > monitor metric info. This Jira is to serve as a light-weight client > interface, comparing to a completed(certainly better, but heavier)GUI > monitoring package. I made the code works on 0.94.9 now, and like to use this > jira to get opinions about whether the feature is valuable to other > users/workshop. If so, I will build a trunk patch. > All inputs are greatly appreciated. Thank you! > The overall design is to reuse the existing logic which supports hbase shell > command 'status', and invent a new module, called ReplicationLoad. In > HRegionServer.buildServerLoad() , use the local replication service objects > to get their loads which could be wrapped in a ReplicationLoad object and > then simply pass it to the ServerLoad. In ReplicationSourceMetrics and > ReplicationSinkMetrics, a few getters and setters will be created, and ask > Replication to build a "ReplicationLoad". (many thanks to Jean-Daniel for > his kindly suggestions through dev email list) > the replication lag will be calculated for source only, and use this formula: > {code:title=Replication lag|borderStyle=solid} > if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - > timeStampsOfLastShippedOp)) //err on the large side > else if (current time - timeStampsOfLastShippedOp) < 2* > ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen > recently > else lag = 0 // last shipped may happens last night, so NO real lag > although ageOfLastShippedOp is non-zero > {code} > External will look something like: > {code:title=status 'replication'|borderStyle=solid} > hbase(main):001:0> status 'replication' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):002:0> status 'replication','source' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','sink' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','lag' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: lag = 0 > hdtest018.svl.ibm.com: lag = 1
[jira] [Updated] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-9531: Description: This jira is to provide a command line (hbase shell) interface to retreive the replication metrics info such as:ageOfLastShippedOp, timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and timeStampsOfLastAppliedOp. And also to provide a point of time info of the lag of replication(source only) Understand that hbase is using Hadoop metrics(http://hbase.apache.org/metrics.html), which is a common way to monitor metric info. This Jira is to serve as a light-weight client interface, comparing to a completed(certainly better, but heavier)GUI monitoring package. I made the code works on 0.94.9 now, and like to use this jira to get opinions about whether the feature is valuable to other users/workshop. If so, I will build a trunk patch. All inputs are greatly appreciated. Thank you! The overall design is to reuse the existing logic which supports hbase shell command 'status', and invent a new module, called ReplicationLoad. In HRegionServer.buildServerLoad() , use the local replication service objects to get their loads which could be wrapped in a ReplicationLoad object and then simply pass it to the ServerLoad. In ReplicationSourceMetrics and ReplicationSinkMetrics, a few getters and setters will be created, and ask Replication to build a "ReplicationLoad". (many thanks to Jean-Daniel for his kindly suggestions through dev email list) the replication lag will be calculated for source only, and use this formula: {code:title=Replication lag|borderStyle=solid} if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - timeStampsOfLastShippedOp)) //err on the large side else if (current time - timeStampsOfLastShippedOp) < 2* ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen recently else lag = 0 // last shipped may happens last night, so NO real lag although ageOfLastShippedOp is non-zero {code} External will look something like: {code:title=status 'replication'|borderStyle=solid} hbase(main):001:0> status 'replication' version 0.94.9 3 live servers hdtest017.svl.ibm.com: SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 14:48:48 PDT 2013 hdtest018.svl.ibm.com: SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 14:50:59 PDT 2013 hdtest015.svl.ibm.com: SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 14:48:48 PDT 2013 hbase(main):002:0> status 'replication','source' version 0.94.9 3 live servers hdtest017.svl.ibm.com: SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 hdtest018.svl.ibm.com: SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 hdtest015.svl.ibm.com: SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 hbase(main):003:0> status 'replication','sink' version 0.94.9 3 live servers hdtest017.svl.ibm.com: SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 14:48:48 PDT 2013 hdtest018.svl.ibm.com: SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 14:50:59 PDT 2013 hdtest015.svl.ibm.com: SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 14:48:48 PDT 2013 hbase(main):003:0> status 'replication','lag' version 0.94.9 3 live servers hdtest017.svl.ibm.com: lag = 0 hdtest018.svl.ibm.com: lag = 14 hdtest015.svl.ibm.com: lag = 0 {code} was: This jira is to provide a command line (hbase shell) interface to retreive the replication metrics info such as:ageOfLastShippedOp, timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and timeStampsOfLastAppliedOp. And also to provide a point of time info of the lag of replication(source only) Understand that hbase is using Hadoop metrics(http://hbase.apache.org/metrics.html), which is a common way to monitor metric info. This Jira is to serve as a light-weight client interface, comparing to a completed(certainly better, but heavier)GUI monitoring package. I made the code works on 0.94.9 now, and like to use this jira to get opinions about whether the feature is valuable to other users/workshop. If so, I will build a trunk patch. All inputs are greatly appreciated. Thank you! The overall design is to
[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010084#comment-14010084 ] Demai Ni commented on HBASE-9531: - about the release audit warning(31 of them), this file (https://builds.apache.org/job/PreCommit-HBASE-Build/9602//artifact/patchprocess/patchReleaseAuditWarnings.txt) pointes the problems for all the files under "patchprocess", such as: {quote} 31 Unknown Licenses *** Unapproved licenses: patchprocess/newPatchFindbugsWarningshbase-examples.xml patchprocess/newPatchFindbugsWarningshbase-thrift.xml patchprocess/patchFindbugsWarningshbase-prefix-tree.xml patchprocess/newPatchFindbugsWarningshbase-server.xml patchprocess/patchFindbugsWarningshbase-hadoop-compat.xml patchprocess/patchFindbugsWarningshbase-common.xml patchprocess/newPatchFindbugsWarningshbase-thrift.html {quote} probably a side-effect when switching to git? > a command line (hbase shell) interface to retreive the replication metrics > and show replication lag > --- > > Key: HBASE-9531 > URL: https://issues.apache.org/jira/browse/HBASE-9531 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 0.99.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.4 > > Attachments: HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch > > > This jira is to provide a command line (hbase shell) interface to retreive > the replication metrics info such as:ageOfLastShippedOp, > timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and > timeStampsOfLastAppliedOp. And also to provide a point of time info of the > lag of replication(source only) > Understand that hbase is using Hadoop > metrics(http://hbase.apache.org/metrics.html), which is a common way to > monitor metric info. This Jira is to serve as a light-weight client > interface, comparing to a completed(certainly better, but heavier)GUI > monitoring package. I made the code works on 0.94.9 now, and like to use this > jira to get opinions about whether the feature is valuable to other > users/workshop. If so, I will build a trunk patch. > All inputs are greatly appreciated. Thank you! > The overall design is to reuse the existing logic which supports hbase shell > command 'status', and invent a new module, called ReplicationLoad. In > HRegionServer.buildServerLoad() , use the local replication service objects > to get their loads which could be wrapped in a ReplicationLoad object and > then simply pass it to the ServerLoad. In ReplicationSourceMetrics and > ReplicationSinkMetrics, a few getters and setters will be created, and ask > Replication to build a "ReplicationLoad". (many thanks to Jean-Daniel for > his kindly suggestions through dev email list) > the replication lag will be calculated for source only, and use this formula: > {code:title=Replication lag|borderStyle=solid} > if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - > timeStampsOfLastShippedOp)) //err on the large side > else if (current time - timeStampsOfLastShippedOp) < 2* > ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen > recently > else lag = 0 // last shipped may happens last night, so NO real > lag although ageOfLastShippedOp is non-zero > {code} > External will look something like: > {code:title=status 'replication'|borderStyle=solid} > hbase(main):001:0> status 'replication' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):002:0> status 'replication','source' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1
[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010064#comment-14010064 ] Demai Ni commented on HBASE-9531: - 'The patch introduces the following lines longer than 100' comes from two places: first two are from ClusterStatusProtos.java, which is generated the 2nd two are from admin.rb, where there are a lot of lines longer than 100, probably for the ruby coding-style(?) do we need to keep the lines under 100 in both cases? Demai > a command line (hbase shell) interface to retreive the replication metrics > and show replication lag > --- > > Key: HBASE-9531 > URL: https://issues.apache.org/jira/browse/HBASE-9531 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 0.99.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.4 > > Attachments: HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch > > > This jira is to provide a command line (hbase shell) interface to retreive > the replication metrics info such as:ageOfLastShippedOp, > timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and > timeStampsOfLastAppliedOp. And also to provide a point of time info of the > lag of replication(source only) > Understand that hbase is using Hadoop > metrics(http://hbase.apache.org/metrics.html), which is a common way to > monitor metric info. This Jira is to serve as a light-weight client > interface, comparing to a completed(certainly better, but heavier)GUI > monitoring package. I made the code works on 0.94.9 now, and like to use this > jira to get opinions about whether the feature is valuable to other > users/workshop. If so, I will build a trunk patch. > All inputs are greatly appreciated. Thank you! > The overall design is to reuse the existing logic which supports hbase shell > command 'status', and invent a new module, called ReplicationLoad. In > HRegionServer.buildServerLoad() , use the local replication service objects > to get their loads which could be wrapped in a ReplicationLoad object and > then simply pass it to the ServerLoad. In ReplicationSourceMetrics and > ReplicationSinkMetrics, a few getters and setters will be created, and ask > Replication to build a "ReplicationLoad". (many thanks to Jean-Daniel for > his kindly suggestions through dev email list) > the replication lag will be calculated for source only, and use this formula: > {code:title=Replication lag|borderStyle=solid} > if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - > timeStampsOfLastShippedOp)) //err on the large side > else if (current time - timeStampsOfLastShippedOp) < 2* > ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen > recently > else lag = 0 // last shipped may happens last night, so NO real > lag although ageOfLastShippedOp is non-zero > {code} > External will look something like: > {code:title=status 'replication'|borderStyle=solid} > hbase(main):001:0> status 'replication' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):002:0> status 'replication','source' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','sink' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.s
[jira] [Updated] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-9531: Attachment: HBASE-9531-trunk-v0.patch for some reason, the patch doesn't trigger HadoopQA the first time, so attached it again with fingers crossed > a command line (hbase shell) interface to retreive the replication metrics > and show replication lag > --- > > Key: HBASE-9531 > URL: https://issues.apache.org/jira/browse/HBASE-9531 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 0.99.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.4 > > Attachments: HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch > > > This jira is to provide a command line (hbase shell) interface to retreive > the replication metrics info such as:ageOfLastShippedOp, > timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and > timeStampsOfLastAppliedOp. And also to provide a point of time info of the > lag of replication(source only) > Understand that hbase is using Hadoop > metrics(http://hbase.apache.org/metrics.html), which is a common way to > monitor metric info. This Jira is to serve as a light-weight client > interface, comparing to a completed(certainly better, but heavier)GUI > monitoring package. I made the code works on 0.94.9 now, and like to use this > jira to get opinions about whether the feature is valuable to other > users/workshop. If so, I will build a trunk patch. > All inputs are greatly appreciated. Thank you! > The overall design is to reuse the existing logic which supports hbase shell > command 'status', and invent a new module, called ReplicationLoad. In > HRegionServer.buildServerLoad() , use the local replication service objects > to get their loads which could be wrapped in a ReplicationLoad object and > then simply pass it to the ServerLoad. In ReplicationSourceMetrics and > ReplicationSinkMetrics, a few getters and setters will be created, and ask > Replication to build a "ReplicationLoad". (many thanks to Jean-Daniel for > his kindly suggestions through dev email list) > the replication lag will be calculated for source only, and use this formula: > {code:title=Replication lag|borderStyle=solid} > if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - > timeStampsOfLastShippedOp)) //err on the large side > else if (current time - timeStampsOfLastShippedOp) < 2* > ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen > recently > else lag = 0 // last shipped may happens last night, so NO real > lag although ageOfLastShippedOp is non-zero > {code} > External will look something like: > {code:title=status 'replication'|borderStyle=solid} > hbase(main):001:0> status 'replication' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):002:0> status 'replication','source' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','sink' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','lag' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: lag = 0 >
[jira] [Updated] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-9531: Status: Patch Available (was: In Progress) > a command line (hbase shell) interface to retreive the replication metrics > and show replication lag > --- > > Key: HBASE-9531 > URL: https://issues.apache.org/jira/browse/HBASE-9531 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 0.99.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.4 > > Attachments: HBASE-9531-trunk-v0.patch > > > This jira is to provide a command line (hbase shell) interface to retreive > the replication metrics info such as:ageOfLastShippedOp, > timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and > timeStampsOfLastAppliedOp. And also to provide a point of time info of the > lag of replication(source only) > Understand that hbase is using Hadoop > metrics(http://hbase.apache.org/metrics.html), which is a common way to > monitor metric info. This Jira is to serve as a light-weight client > interface, comparing to a completed(certainly better, but heavier)GUI > monitoring package. I made the code works on 0.94.9 now, and like to use this > jira to get opinions about whether the feature is valuable to other > users/workshop. If so, I will build a trunk patch. > All inputs are greatly appreciated. Thank you! > The overall design is to reuse the existing logic which supports hbase shell > command 'status', and invent a new module, called ReplicationLoad. In > HRegionServer.buildServerLoad() , use the local replication service objects > to get their loads which could be wrapped in a ReplicationLoad object and > then simply pass it to the ServerLoad. In ReplicationSourceMetrics and > ReplicationSinkMetrics, a few getters and setters will be created, and ask > Replication to build a "ReplicationLoad". (many thanks to Jean-Daniel for > his kindly suggestions through dev email list) > the replication lag will be calculated for source only, and use this formula: > {code:title=Replication lag|borderStyle=solid} > if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - > timeStampsOfLastShippedOp)) //err on the large side > else if (current time - timeStampsOfLastShippedOp) < 2* > ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen > recently > else lag = 0 // last shipped may happens last night, so NO real > lag although ageOfLastShippedOp is non-zero > {code} > External will look something like: > {code:title=status 'replication'|borderStyle=solid} > hbase(main):001:0> status 'replication' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):002:0> status 'replication','source' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','sink' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','lag' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: lag = 0 > hdtest018.svl.ibm.com: lag = 14 > hdtest015.svl.ibm.com: lag = 0 > {code} -- This message was sent by Atlassian JIRA (v6.2#6
[jira] [Updated] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-9531: Status: In Progress (was: Patch Available) > a command line (hbase shell) interface to retreive the replication metrics > and show replication lag > --- > > Key: HBASE-9531 > URL: https://issues.apache.org/jira/browse/HBASE-9531 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 0.99.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.4 > > Attachments: HBASE-9531-trunk-v0.patch > > > This jira is to provide a command line (hbase shell) interface to retreive > the replication metrics info such as:ageOfLastShippedOp, > timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and > timeStampsOfLastAppliedOp. And also to provide a point of time info of the > lag of replication(source only) > Understand that hbase is using Hadoop > metrics(http://hbase.apache.org/metrics.html), which is a common way to > monitor metric info. This Jira is to serve as a light-weight client > interface, comparing to a completed(certainly better, but heavier)GUI > monitoring package. I made the code works on 0.94.9 now, and like to use this > jira to get opinions about whether the feature is valuable to other > users/workshop. If so, I will build a trunk patch. > All inputs are greatly appreciated. Thank you! > The overall design is to reuse the existing logic which supports hbase shell > command 'status', and invent a new module, called ReplicationLoad. In > HRegionServer.buildServerLoad() , use the local replication service objects > to get their loads which could be wrapped in a ReplicationLoad object and > then simply pass it to the ServerLoad. In ReplicationSourceMetrics and > ReplicationSinkMetrics, a few getters and setters will be created, and ask > Replication to build a "ReplicationLoad". (many thanks to Jean-Daniel for > his kindly suggestions through dev email list) > the replication lag will be calculated for source only, and use this formula: > {code:title=Replication lag|borderStyle=solid} > if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - > timeStampsOfLastShippedOp)) //err on the large side > else if (current time - timeStampsOfLastShippedOp) < 2* > ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen > recently > else lag = 0 // last shipped may happens last night, so NO real > lag although ageOfLastShippedOp is non-zero > {code} > External will look something like: > {code:title=status 'replication'|borderStyle=solid} > hbase(main):001:0> status 'replication' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):002:0> status 'replication','source' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','sink' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','lag' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: lag = 0 > hdtest018.svl.ibm.com: lag = 14 > hdtest015.svl.ibm.com: lag = 0 > {code} -- This message was sent by Atlassian JIRA (v6.2#6
[jira] [Updated] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-9531: Fix Version/s: 0.98.4 0.99.0 Affects Version/s: 0.99.0 Status: Patch Available (was: Open) > a command line (hbase shell) interface to retreive the replication metrics > and show replication lag > --- > > Key: HBASE-9531 > URL: https://issues.apache.org/jira/browse/HBASE-9531 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 0.99.0 >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0, 0.98.4 > > Attachments: HBASE-9531-trunk-v0.patch > > > This jira is to provide a command line (hbase shell) interface to retreive > the replication metrics info such as:ageOfLastShippedOp, > timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and > timeStampsOfLastAppliedOp. And also to provide a point of time info of the > lag of replication(source only) > Understand that hbase is using Hadoop > metrics(http://hbase.apache.org/metrics.html), which is a common way to > monitor metric info. This Jira is to serve as a light-weight client > interface, comparing to a completed(certainly better, but heavier)GUI > monitoring package. I made the code works on 0.94.9 now, and like to use this > jira to get opinions about whether the feature is valuable to other > users/workshop. If so, I will build a trunk patch. > All inputs are greatly appreciated. Thank you! > The overall design is to reuse the existing logic which supports hbase shell > command 'status', and invent a new module, called ReplicationLoad. In > HRegionServer.buildServerLoad() , use the local replication service objects > to get their loads which could be wrapped in a ReplicationLoad object and > then simply pass it to the ServerLoad. In ReplicationSourceMetrics and > ReplicationSinkMetrics, a few getters and setters will be created, and ask > Replication to build a "ReplicationLoad". (many thanks to Jean-Daniel for > his kindly suggestions through dev email list) > the replication lag will be calculated for source only, and use this formula: > {code:title=Replication lag|borderStyle=solid} > if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - > timeStampsOfLastShippedOp)) //err on the large side > else if (current time - timeStampsOfLastShippedOp) < 2* > ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen > recently > else lag = 0 // last shipped may happens last night, so NO real > lag although ageOfLastShippedOp is non-zero > {code} > External will look something like: > {code:title=status 'replication'|borderStyle=solid} > hbase(main):001:0> status 'replication' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):002:0> status 'replication','source' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','sink' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','lag' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: lag = 0 > hdtest018.svl.ibm.com: lag = 14 >
[jira] [Updated] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-9531: Attachment: HBASE-9531-trunk-v0.patch finally, get back to build a trunk patch for this one. I can build a 0.98 patch if needed > a command line (hbase shell) interface to retreive the replication metrics > and show replication lag > --- > > Key: HBASE-9531 > URL: https://issues.apache.org/jira/browse/HBASE-9531 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Demai Ni >Assignee: Demai Ni > Attachments: HBASE-9531-trunk-v0.patch > > > This jira is to provide a command line (hbase shell) interface to retreive > the replication metrics info such as:ageOfLastShippedOp, > timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and > timeStampsOfLastAppliedOp. And also to provide a point of time info of the > lag of replication(source only) > Understand that hbase is using Hadoop > metrics(http://hbase.apache.org/metrics.html), which is a common way to > monitor metric info. This Jira is to serve as a light-weight client > interface, comparing to a completed(certainly better, but heavier)GUI > monitoring package. I made the code works on 0.94.9 now, and like to use this > jira to get opinions about whether the feature is valuable to other > users/workshop. If so, I will build a trunk patch. > All inputs are greatly appreciated. Thank you! > The overall design is to reuse the existing logic which supports hbase shell > command 'status', and invent a new module, called ReplicationLoad. In > HRegionServer.buildServerLoad() , use the local replication service objects > to get their loads which could be wrapped in a ReplicationLoad object and > then simply pass it to the ServerLoad. In ReplicationSourceMetrics and > ReplicationSinkMetrics, a few getters and setters will be created, and ask > Replication to build a "ReplicationLoad". (many thanks to Jean-Daniel for > his kindly suggestions through dev email list) > the replication lag will be calculated for source only, and use this formula: > {code:title=Replication lag|borderStyle=solid} > if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - > timeStampsOfLastShippedOp)) //err on the large side > else if (current time - timeStampsOfLastShippedOp) < 2* > ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen > recently > else lag = 0 // last shipped may happens last night, so NO real > lag although ageOfLastShippedOp is non-zero > {code} > External will look something like: > {code:title=status 'replication'|borderStyle=solid} > hbase(main):001:0> status 'replication' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):002:0> status 'replication','source' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 > hdtest018.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hdtest015.svl.ibm.com: > SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, > timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','sink' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hdtest018.svl.ibm.com: > SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:50:59 PDT 2013 > hdtest015.svl.ibm.com: > SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 > 14:48:48 PDT 2013 > hbase(main):003:0> status 'replication','lag' > version 0.94.9 > 3 live servers > hdtest017.svl.ibm.com: lag = 0 > hdtest018.svl.ibm.com: lag = 14 > hdtest015.svl.ibm.com: lag = 0 > {code} -- This message was sent by Atlas
[jira] [Commented] (HBASE-11085) Incremental Backup Restore support
[ https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998969#comment-13998969 ] Demai Ni commented on HBASE-11085: -- open review board: https://reviews.apache.org/r/21492/ [~stack], [~tedyu], [~mbertozzi], and other folks, looking forward to you takes. Many thanks... Demai > Incremental Backup Restore support > -- > > Key: HBASE-11085 > URL: https://issues.apache.org/jira/browse/HBASE-11085 > Project: HBase > Issue Type: New Feature >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0 > > Attachments: > HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch, > HBASE-11085-trunk-v1.patch > > > h2. Feature Description > the jira is part of > [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on > full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. > for the detail layout and frame work, please reference to [HBASE-10900| > https://issues.apache.org/jira/browse/HBASE-10900]. > When client issues an incremental backup request, BackupManager will check > the request and then kicks of a global procedure via HBaseAdmin for all the > active regionServer to roll log. Each Region server will record their log > number into zookeeper. Then we determine which log need to be included in > this incremental backup, and use DistCp to copy them to target location. At > the same time, a dependency of backup image will be recorded, and later on > saved in Backup Manifest file. > Restore is to replay the backuped WAL logs on target HBase instance. The > replay will occur after full backup. > As incremental backup image depends on prior full backup image and > incremental images if exists. Manifest file will be used to store the > dependency lineage during backup, and used during restore time for PIT > restore. > h2. Use case(i.e example) > {code:title=Incremental Backup Restore example|borderStyle=solid} > /***/ > /* STEP1: FULL backup from sourcecluster to targetcluster > /* if no table name specified, all tables from source cluster will be > backuped > /***/ > [sourcecluster]$ hbase backup create full > hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn > ... > 14/05/09 13:35:46 INFO backup.BackupManager: Backup request > backup_1399667695966 has been executed. > /***/ > /* STEP2: In HBase Shell, put a few rows > > /***/ > hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1' > hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1' > hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1' > /***/ > /* STEP3: Take the 1st incremental backup > > /***/ > [sourcecluster]$ hbase backup create incremental > hdfs://hostname.targetcluster.org:9000/userid/backupdir > ... > 14/05/09 13:37:45 INFO backup.BackupManager: Backup request > backup_1399667851020 has been executed. > /***/ > /* STEP4: In HBase Shell, put a few more rows. > > /* update 'row100', and create new 'row101' > > /***/ > hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2' > hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2' > hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2' > /***/ > /* STEP5: Take the 2nd incremental backup > > /***/ > [sourcecluster]$ hbase backup create incremental > hdfs://hostname.targetcluster.org:9000/userid/backupdir > ... > 14
[jira] [Updated] (HBASE-10900) FULL table backup and restore
[ https://issues.apache.org/jira/browse/HBASE-10900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-10900: - Attachment: HBASE-10900-trunk-v4.patch Attached v4 version for trunk, which includes 1) UT (thanks to [~enhs8920]) 2) small interface change to use Snapshot Manifest 3) plugged in a customized 'global log roll' to record info into ZK. A general 'global log roll' jira will be opened a bit later. and Fullbackup code will depend on it Demai > FULL table backup and restore > - > > Key: HBASE-10900 > URL: https://issues.apache.org/jira/browse/HBASE-10900 > Project: HBase > Issue Type: Task >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0 > > Attachments: HBASE-10900-fullbackup-trunk-v1.patch, > HBASE-10900-trunk-v2.patch, HBASE-10900-trunk-v3.patch, > HBASE-10900-trunk-v4.patch > > > h2. Feature Description > This is a subtask of > [HBase-7912|https://issues.apache.org/jira/browse/HBASE-7912] to support FULL > backup/restore, and will complete the following function: > {code:title=Backup Restore example|borderStyle=solid} > /* backup from sourcecluster to targetcluster > */ > /* if no table name specified, all tables from source cluster will be > backuped */ > [sourcecluster]$ hbase backup create full > hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn > /* restore on targetcluser, this is a local restore > */ > /* backup_1396650096738 - backup image name > */ > /* t1_dn,etc are the original table names. All tables will be restored if not > specified */ > /* t1_dn_restore, etc. are the restored table. if not specified, orginal > table name will be used*/ > [targetcluster]$ hbase restore /userid/backupdir backup_1396650096738 > t1_dn,t2_dn,t3_dn t1_dn_restore,t2_dn_restore,t3_dn_restore > /* restore from targetcluster back to source cluster, this is a remote restore > [sourcecluster]$ hbase restore > hdfs://hostname.targetcluster.org:9000/userid/backupdir backup_1396650096738 > t1_dn,t2_dn,t3_dn t1_dn_restore,t2_dn_restore,t3_dn_restore > {code} > h2. Detail layout and frame work for the next jiras > The patch is a wrapper of the existing snapshot and exportSnapshot, and will > use as the base framework for the over-all solution of > [HBase-7912|https://issues.apache.org/jira/browse/HBASE-7912] as described > below: > * *bin/hbase* : end-user command line interface to invoke > BackupClient and RestoreClient > * *BackupClient.java* : 'main' entry for backup operations. This patch will > only support 'full' backup. In future jiras, will support: > ** *create* incremental backup > ** *cancel* an ongoing backup > ** *delete* an exisitng backup image > ** *describe* the detailed informaiton of backup image > ** show *history* of all successful backups > ** show the *status* of the latest backup request > ** *convert* incremental backup WAL files into HFiles. either on-the-fly > during create or after create > ** *merge* backup image > ** *stop* backup a table of existing backup image > ** *show* tables of a backup image > * *BackupCommands.java* : a place to keep all the command usages and options > * *BackupManager.java* : handle backup requests on server-side, create > BACKUP ZOOKEEPER nodes to keep track backup. The timestamps kept in zookeeper > will be used for future incremental backup (not included in this jira). > Create BackupContext and DispatchRequest. > * *BackupHandler.java* : in this patch, it is a wrapper of snapshot and > exportsnapshot. In future jiras, > ** *timestamps* info will be recorded in ZK > ** carry on *incremental* backup. > ** update backup *progress* > ** set flags of *status* > ** build up *backupManifest* file(in this jira only limited info for > fullback. later on, timestamps and dependency of multipl backup images are > also recorded here) > ** clean up after *failed* backup > ** clean up after *cancelled* backup > ** allow on-the-fly *convert* during incremental backup > * *BackupContext.java* : encapsulate backup information like backup ID, table > names, directory info, phase, TimeStamps of backup progress, size of data, > ancestor info, etc. > * *BackupCopier.java* : the copying operation. Later on, to support > progress report and mapper estimation; and extends DisCp for progress > updating to ZK during backup. > * *BackupExcpetion.java*: to handle exception from backup/restore > * *BackupManifest.java* : encapsulate all the backup image information. The > manifest info will be bundled as manifest file together with data. So that > each backup image will contain all the info needed for restore. > * *BackupStatus.java* : encapsulate b
[jira] [Created] (HBASE-11172) Cancal a backup process
Demai Ni created HBASE-11172: Summary: Cancal a backup process Key: HBASE-11172 URL: https://issues.apache.org/jira/browse/HBASE-11172 Project: HBase Issue Type: New Feature Affects Versions: 0.99.0 Reporter: Demai Ni Fix For: 0.99.0 h2. Feature Description the jira is part of [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900] and incremental backup [HBASE-11085| https://issues.apache.org/jira/browse/HBASE-11085]. for the detail layout and frame work, please reference to [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. A backup operation may need to move handreds/thousands GB of data, and takes hours. Sometimes, the operation may take longer than the original maintenance time window planned by the administration. So it is necessary to have the functionality to cancel the operation and reset all the history/manifest info whenever necessary. so that we can have a clean backup in the next time window -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11085) Incremental Backup Restore support
[ https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11085: - Status: Patch Available (was: Open) > Incremental Backup Restore support > -- > > Key: HBASE-11085 > URL: https://issues.apache.org/jira/browse/HBASE-11085 > Project: HBase > Issue Type: New Feature >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0 > > Attachments: > HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch, > HBASE-11085-trunk-v1.patch > > > h2. Feature Description > the jira is part of > [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on > full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. > for the detail layout and frame work, please reference to [HBASE-10900| > https://issues.apache.org/jira/browse/HBASE-10900]. > When client issues an incremental backup request, BackupManager will check > the request and then kicks of a global procedure via HBaseAdmin for all the > active regionServer to roll log. Each Region server will record their log > number into zookeeper. Then we determine which log need to be included in > this incremental backup, and use DistCp to copy them to target location. At > the same time, a dependency of backup image will be recorded, and later on > saved in Backup Manifest file. > Restore is to replay the backuped WAL logs on target HBase instance. The > replay will occur after full backup. > As incremental backup image depends on prior full backup image and > incremental images if exists. Manifest file will be used to store the > dependency lineage during backup, and used during restore time for PIT > restore. > h2. Use case(i.e example) > {code:title=Incremental Backup Restore example|borderStyle=solid} > /***/ > /* STEP1: FULL backup from sourcecluster to targetcluster > /* if no table name specified, all tables from source cluster will be > backuped > /***/ > [sourcecluster]$ hbase backup create full > hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn > ... > 14/05/09 13:35:46 INFO backup.BackupManager: Backup request > backup_1399667695966 has been executed. > /***/ > /* STEP2: In HBase Shell, put a few rows > > /***/ > hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1' > hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1' > hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1' > /***/ > /* STEP3: Take the 1st incremental backup > > /***/ > [sourcecluster]$ hbase backup create incremental > hdfs://hostname.targetcluster.org:9000/userid/backupdir > ... > 14/05/09 13:37:45 INFO backup.BackupManager: Backup request > backup_1399667851020 has been executed. > /***/ > /* STEP4: In HBase Shell, put a few more rows. > > /* update 'row100', and create new 'row101' > > /***/ > hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2' > hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2' > hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2' > /***/ > /* STEP5: Take the 2nd incremental backup > > /***/ > [sourcecluster]$ hbase backup create incremental > hdfs://hostname.targetcluster.org:9000/userid/backupdir > ... > 14/05/09 13:39:33 INFO backup.BackupManager: Backup request > backup_1399667959165 has been executed. > /**
[jira] [Updated] (HBASE-11085) Incremental Backup Restore support
[ https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11085: - Attachment: HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch HBASE-11085-trunk-v1.patch > Incremental Backup Restore support > -- > > Key: HBASE-11085 > URL: https://issues.apache.org/jira/browse/HBASE-11085 > Project: HBase > Issue Type: New Feature >Reporter: Demai Ni >Assignee: Demai Ni > Fix For: 0.99.0 > > Attachments: > HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch, > HBASE-11085-trunk-v1.patch > > > h2. Feature Description > the jira is part of > [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on > full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. > for the detail layout and frame work, please reference to [HBASE-10900| > https://issues.apache.org/jira/browse/HBASE-10900]. > When client issues an incremental backup request, BackupManager will check > the request and then kicks of a global procedure via HBaseAdmin for all the > active regionServer to roll log. Each Region server will record their log > number into zookeeper. Then we determine which log need to be included in > this incremental backup, and use DistCp to copy them to target location. At > the same time, a dependency of backup image will be recorded, and later on > saved in Backup Manifest file. > Restore is to replay the backuped WAL logs on target HBase instance. The > replay will occur after full backup. > As incremental backup image depends on prior full backup image and > incremental images if exists. Manifest file will be used to store the > dependency lineage during backup, and used during restore time for PIT > restore. > h2. Use case(i.e example) > {code:title=Incremental Backup Restore example|borderStyle=solid} > /***/ > /* STEP1: FULL backup from sourcecluster to targetcluster > /* if no table name specified, all tables from source cluster will be > backuped > /***/ > [sourcecluster]$ hbase backup create full > hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn > ... > 14/05/09 13:35:46 INFO backup.BackupManager: Backup request > backup_1399667695966 has been executed. > /***/ > /* STEP2: In HBase Shell, put a few rows > > /***/ > hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1' > hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1' > hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1' > /***/ > /* STEP3: Take the 1st incremental backup > > /***/ > [sourcecluster]$ hbase backup create incremental > hdfs://hostname.targetcluster.org:9000/userid/backupdir > ... > 14/05/09 13:37:45 INFO backup.BackupManager: Backup request > backup_1399667851020 has been executed. > /***/ > /* STEP4: In HBase Shell, put a few more rows. > > /* update 'row100', and create new 'row101' > > /***/ > hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2' > hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2' > hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2' > hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2' > /***/ > /* STEP5: Take the 2nd incremental backup > > /***/ > [sourcecluster]$ hbase backup create incremental > hdfs://hostname.targetcluster.org:9000/userid/backupdir > ... > 14/05/09 13:39:33 INFO backup.BackupManager: Backup request > backup_1399667959165 has been executed. > /***
[jira] [Updated] (HBASE-11085) Incremental Backup Restore support
[ https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11085: - Description: h2. Feature Description the jira is part of [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. for the detail layout and frame work, please reference to [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. When client issues an incremental backup request, BackupManager will check the request and then kicks of a global procedure via HBaseAdmin for all the active regionServer to roll log. Each Region server will record their log number into zookeeper. Then we determine which log need to be included in this incremental backup, and use DistCp to copy them to target location. At the same time, a dependency of backup image will be recorded, and later on saved in Backup Manifest file. Restore is to replay the backuped WAL logs on target HBase instance. The replay will occur after full backup. As incremental backup image depends on prior full backup image and incremental images if exists. Manifest file will be used to store the dependency lineage during backup, and used during restore time for PIT restore. h2. Use case(i.e example) {code:title=Incremental Backup Restore example|borderStyle=solid} /***/ /* STEP1: FULL backup from sourcecluster to targetcluster /* if no table name specified, all tables from source cluster will be backuped /***/ [sourcecluster]$ hbase backup create full hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn ... 14/05/09 13:35:46 INFO backup.BackupManager: Backup request backup_1399667695966 has been executed. /***/ /* STEP2: In HBase Shell, put a few rows /***/ hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1' hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1' hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1' /***/ /* STEP3: Take the 1st incremental backup /***/ [sourcecluster]$ hbase backup create incremental hdfs://hostname.targetcluster.org:9000/userid/backupdir ... 14/05/09 13:37:45 INFO backup.BackupManager: Backup request backup_1399667851020 has been executed. /***/ /* STEP4: In HBase Shell, put a few more rows. /* update 'row100', and create new 'row101' /***/ hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2' hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2' hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2' hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2' hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2' hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2' /***/ /* STEP5: Take the 2nd incremental backup /***/ [sourcecluster]$ hbase backup create incremental hdfs://hostname.targetcluster.org:9000/userid/backupdir ... 14/05/09 13:39:33 INFO backup.BackupManager: Backup request backup_1399667959165 has been executed. /***/ /* STEP7: Restore from PIT of the 1st incremental backup /* specified the backup ID of the 1st incremental /* option -automatic, will trigger the restore of full backup first, then 1st /* incremental backup image /* t1_dn,etc are the original table names. All tables will be restored if not specified /* t1_dn_restore, etc. are the restored table. if not specified, orginal table name will be used /***
[jira] [Commented] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot
[ https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994060#comment-13994060 ] Demai Ni commented on HBASE-7912: - hi, folks, We have patches for both full backup (v4) and incremental backup (v1) uploaded today. With that, it is easy to apply this [HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch| https://issues.apache.org/jira/secure/attachment/12644215/HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch] directly on trunk, and give it a try. Please see the example in both incremental backup jira : [HBASE-11085|https://issues.apache.org/jira/browse/HBASE-11085] and fullbackup jira : [HBASE-10900|https://issues.apache.org/jira/browse/HBASE-10900]. We will open more jiras and patches for other features (just as, merge, convert, delete, history, progress) in the coming weeks. Also, thanks for the review comments from [~tedyu], [~mbertozzi], [~stack], and others. We will have a few follow-up improvements about zookeeper, protobuff, and leveraging the new snapshot manifest Demai > HBase Backup/Restore Based on HBase Snapshot > > > Key: HBASE-7912 > URL: https://issues.apache.org/jira/browse/HBASE-7912 > Project: HBase > Issue Type: Sub-task >Reporter: Richard Ding >Assignee: Richard Ding > Attachments: HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, > HBase_BackupRestore-Jira-7912-CLI-v1.pdf > > > Finally, we completed the implementation of our backup/restore solution, and > would like to share with community through this jira. > We are leveraging existing hbase snapshot feature, and provide a general > solution to common users. Our full backup is using snapshot to capture > metadata locally and using exportsnapshot to move data to another cluster; > the incremental backup is using offline-WALplayer to backup HLogs; we also > leverage global distribution rolllog and flush to improve performance; other > added-on values such as convert, merge, progress report, and CLI commands. So > that a common user can backup hbase data without in-depth knowledge of hbase. > Our solution also contains some usability features for enterprise users. > The detail design document and CLI command will be attached in this jira. We > plan to use 10~12 subtasks to share each of the following features, and > document the detail implement in the subtasks: > * *Full Backup* : provide local and remote back/restore for a list of tables > * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental > backup) > * *distributed* Logroll and distributed flush > * Backup *Manifest* and history > * *Incremental* backup: to build on top of full backup as daily/weekly backup > * *Convert* incremental backup WAL files into hfiles > * *Merge* several backup images into one(like merge weekly into monthly) > * *add and remove* table to and from Backup image > * *Cancel* a backup process > * backup progress *status* > * full backup based on *existing snapshot* > *-* > *Below is the original description, to keep here as the history for the > design and discussion back in 2013* > There have been attempts in the past to come up with a viable HBase > backup/restore solution (e.g., HBASE-4618). Recently, there are many > advancements and new features in HBase, for example, FileLink, Snapshot, and > Distributed Barrier Procedure. This is a proposal for a backup/restore > solution that utilizes these new features to achieve better performance and > consistency. > > A common practice of backup and restore in database is to first take full > baseline backup, and then periodically take incremental backup that capture > the changes since the full baseline backup. HBase cluster can store massive > amount data. Combination of full backups with incremental backups has > tremendous benefit for HBase as well. The following is a typical scenario > for full and incremental backup. > # The user takes a full backup of a table or a set of tables in HBase. > # The user schedules periodical incremental backups to capture the changes > from the full backup, or from last incremental backup. > # The user needs to restore table data to a past point of time. > # The full backup is restored to the table(s) or to different table name(s). > Then the incremental backups that are up to the desired point in time are > applied on top of the full backup. > We would support the following key features and capabilities. > * Full backup uses HBase snapshot to capture HFiles. > * Use HBase WALs to capture incremental changes, but we use bulk load of > HFiles for fast incremental restore. > * Support single table or a set of tables, and column family l
[jira] [Created] (HBASE-11175) improve Backup/Restore framework by abstracting out zookeeper
Demai Ni created HBASE-11175: Summary: improve Backup/Restore framework by abstracting out zookeeper Key: HBASE-11175 URL: https://issues.apache.org/jira/browse/HBASE-11175 Project: HBase Issue Type: New Feature Affects Versions: 0.99.0 Reporter: Demai Ni Fix For: 0.99.0 the jira is part of [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], h2. Feature Description current backup/restore patches are using zookeeper to keep the history and the dependency on source cluster. This jira is to abstract out the zookeeper usage. The jira is kind of follow up of [HBASE-10909|https://issues.apache.org/jira/browse/HBASE-10909] and [HBASE-10296|https://issues.apache.org/jira/browse/HBASE-10296] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot
[ https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995387#comment-13995387 ] Demai Ni commented on HBASE-7912: - Need HBase-11148 to roll the logs for [full backup|https://issues.apache.org/jira/browse/HBASE-10090] and [incremental backup|https://issues.apache.org/jira/browse/HBASE-11085], and also mark a timestamp for the next incremental > HBase Backup/Restore Based on HBase Snapshot > > > Key: HBASE-7912 > URL: https://issues.apache.org/jira/browse/HBASE-7912 > Project: HBase > Issue Type: Sub-task >Reporter: Richard Ding >Assignee: Richard Ding > Attachments: HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, > HBase_BackupRestore-Jira-7912-CLI-v1.pdf > > > Finally, we completed the implementation of our backup/restore solution, and > would like to share with community through this jira. > We are leveraging existing hbase snapshot feature, and provide a general > solution to common users. Our full backup is using snapshot to capture > metadata locally and using exportsnapshot to move data to another cluster; > the incremental backup is using offline-WALplayer to backup HLogs; we also > leverage global distribution rolllog and flush to improve performance; other > added-on values such as convert, merge, progress report, and CLI commands. So > that a common user can backup hbase data without in-depth knowledge of hbase. > Our solution also contains some usability features for enterprise users. > The detail design document and CLI command will be attached in this jira. We > plan to use 10~12 subtasks to share each of the following features, and > document the detail implement in the subtasks: > * *Full Backup* : provide local and remote back/restore for a list of tables > * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental > backup) > * *distributed* Logroll and distributed flush > * Backup *Manifest* and history > * *Incremental* backup: to build on top of full backup as daily/weekly backup > * *Convert* incremental backup WAL files into hfiles > * *Merge* several backup images into one(like merge weekly into monthly) > * *add and remove* table to and from Backup image > * *Cancel* a backup process > * backup progress *status* > * full backup based on *existing snapshot* > *-* > *Below is the original description, to keep here as the history for the > design and discussion back in 2013* > There have been attempts in the past to come up with a viable HBase > backup/restore solution (e.g., HBASE-4618). Recently, there are many > advancements and new features in HBase, for example, FileLink, Snapshot, and > Distributed Barrier Procedure. This is a proposal for a backup/restore > solution that utilizes these new features to achieve better performance and > consistency. > > A common practice of backup and restore in database is to first take full > baseline backup, and then periodically take incremental backup that capture > the changes since the full baseline backup. HBase cluster can store massive > amount data. Combination of full backups with incremental backups has > tremendous benefit for HBase as well. The following is a typical scenario > for full and incremental backup. > # The user takes a full backup of a table or a set of tables in HBase. > # The user schedules periodical incremental backups to capture the changes > from the full backup, or from last incremental backup. > # The user needs to restore table data to a past point of time. > # The full backup is restored to the table(s) or to different table name(s). > Then the incremental backups that are up to the desired point in time are > applied on top of the full backup. > We would support the following key features and capabilities. > * Full backup uses HBase snapshot to capture HFiles. > * Use HBase WALs to capture incremental changes, but we use bulk load of > HFiles for fast incremental restore. > * Support single table or a set of tables, and column family level backup and > restore. > * Restore to different table names. > * Support adding additional tables or CF to backup set without interruption > of incremental backup schedule. > * Support rollup/combining of incremental backups into longer period and > bigger incremental backups. > * Unified command line interface for all the above. > The solution will support HBase backup to FileSystem, either on the same > cluster or across clusters. It has the flexibility to support backup to > other devices and servers in the future. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10900) FULL table backup and restore
[ https://issues.apache.org/jira/browse/HBASE-10900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-10900: - Description: h2. Feature Description This is a subtask of [HBase-7912|https://issues.apache.org/jira/browse/HBASE-7912] to support FULL backup/restore, and will complete the following function: {code:title=Backup Restore example|borderStyle=solid} /* backup from sourcecluster to targetcluster */ /* if no table name specified, all tables from source cluster will be backuped */ [sourcecluster]$ hbase backup create full hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn /* restore on targetcluser, this is a local restore */ /* backup_1396650096738 - backup image name */ /* t1_dn,etc are the original table names. All tables will be restored if not specified */ /* t1_dn_restore, etc. are the restored table. if not specified, orginal table name will be used*/ [targetcluster]$ hbase restore /userid/backupdir backup_1396650096738 t1_dn,t2_dn,t3_dn t1_dn_restore,t2_dn_restore,t3_dn_restore /* restore from targetcluster back to source cluster, this is a remote restore [sourcecluster]$ hbase restore hdfs://hostname.targetcluster.org:9000/userid/backupdir backup_1396650096738 t1_dn,t2_dn,t3_dn t1_dn_restore,t2_dn_restore,t3_dn_restore {code} h2. Detail layout and frame work for the next jiras The patch is a wrapper of the existing snapshot and exportSnapshot, and will use as the base framework for the over-all solution of [HBase-7912|https://issues.apache.org/jira/browse/HBASE-7912] as described below: * *bin/hbase* : end-user command line interface to invoke BackupClient and RestoreClient * *BackupClient.java* : 'main' entry for backup operations. This patch will only support 'full' backup. In future jiras, will support: ** *create* incremental backup ** *cancel* an ongoing backup ** *delete* an exisitng backup image ** *describe* the detailed informaiton of backup image ** show *history* of all successful backups ** show the *status* of the latest backup request ** *convert* incremental backup WAL files into HFiles. either on-the-fly during create or after create ** *merge* backup image ** *stop* backup a table of existing backup image ** *show* tables of a backup image * *BackupCommands.java* : a place to keep all the command usages and options * *BackupManager.java* : handle backup requests on server-side, create BACKUP ZOOKEEPER nodes to keep track backup. The timestamps kept in zookeeper will be used for future incremental backup (not included in this jira). Create BackupContext and DispatchRequest. * *BackupHandler.java* : in this patch, it is a wrapper of snapshot and exportsnapshot. In future jiras, ** *timestamps* info will be recorded in ZK ** carry on *incremental* backup. ** update backup *progress* ** set flags of *status* ** build up *backupManifest* file(in this jira only limited info for fullback. later on, timestamps and dependency of multipl backup images are also recorded here) ** clean up after *failed* backup ** clean up after *cancelled* backup ** allow on-the-fly *convert* during incremental backup * *BackupContext.java* : encapsulate backup information like backup ID, table names, directory info, phase, TimeStamps of backup progress, size of data, ancestor info, etc. * *BackupCopier.java* : the copying operation. Later on, to support progress report and mapper estimation; and extends DisCp for progress updating to ZK during backup. * *BackupExcpetion.java*: to handle exception from backup/restore * *BackupManifest.java* : encapsulate all the backup image information. The manifest info will be bundled as manifest file together with data. So that each backup image will contain all the info needed for restore. * *BackupStatus.java* : encapsulate backup status at table level during backup progress * *BackupUtil.java* : utility methods during backup process * *RestoreClient.java* : 'main' entry for restore operations. This patch will only support 'full' backup. * *RestoreUtil.java*: utility methods during restore process * *ExportSnapshot.java* : remove 'final' so that another class SnapshotCopy.java can extends from it * *SnapshotCopy.java* : only a wrapper at this moment. But will be extended to keep track progress(maybe should implemented in ExportSnapshot directly?) * *BackupRestoreConstants.java* : add the constants used by backup/restore code. * *HBackupFilesystem.java* : the filesystem related api used by BackupClient and RestoreClient. h2. Global log roll currently a customized one under *org.apache.hadoop.hbase.backup.master* and *org.apache.hadoop.hbase.backup.regionserver* [HBASE-11148|https://issues.apache.org/jira/browse/HBASE-11148] is opened to
[jira] [Created] (HBASE-11174) show backup/restore progress
Demai Ni created HBASE-11174: Summary: show backup/restore progress Key: HBASE-11174 URL: https://issues.apache.org/jira/browse/HBASE-11174 Project: HBase Issue Type: New Feature Affects Versions: 0.99.0 Reporter: Demai Ni Fix For: 0.99.0 h2. Feature Description the jira is part of [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900] and incremental backup [HBASE-11085| https://issues.apache.org/jira/browse/HBASE-11085]. for the detail layout and frame work, please reference to [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. A backup/restore operation may take a while to complete, sometimes hours. It will be helpful to show the estimated progress as percentage to user. The jira will provide such functionally -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11173) Show Backup History
Demai Ni created HBASE-11173: Summary: Show Backup History Key: HBASE-11173 URL: https://issues.apache.org/jira/browse/HBASE-11173 Project: HBase Issue Type: New Feature Affects Versions: 0.99.0 Reporter: Demai Ni Fix For: 0.99.0 h2. Feature Description the jira is part of [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900] and incremental backup [HBASE-11085| https://issues.apache.org/jira/browse/HBASE-11085]. for the detail layout and frame work, please reference to [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. After several backup operations executed in the past, he may like to know what tables were backuped at what time, so that a restore or future backup operation can be performanced accordingly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11085) Incremental Backup Restore support
[ https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-11085: - Description: h2. Feature Description the jira is part of [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. for the detail layout and frame work, please reference to [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. When client issues an incremental backup request, BackupManager will check the request and then kicks of a global procedure via HBaseAdmin for all the active regionServer to roll log. Each Region server will record their log number into zookeeper. Then we determine which log need to be included in this incremental backup, and use DistCp to copy them to target location. At the same time, a dependency of backup image will be recorded, and later on saved in Backup Manifest file. Restore is to replay the backuped WAL logs on target HBase instance. The replay will occur after full backup. As incremental backup image depends on prior full backup image and incremental images if exists. Manifest file will be used to store the dependency lineage during backup, and used during restore time for PIT restore. h2. Use case(i.e example) {code:title=Incremental Backup Restore example|borderStyle=solid} /***/ /* STEP1: FULL backup from sourcecluster to targetcluster /* if no table name specified, all tables from source cluster will be backuped /***/ [sourcecluster]$ hbase backup create full hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn ... 14/05/09 13:35:46 INFO backup.BackupManager: Backup request backup_1399667695966 has been executed. /***/ /* STEP2: In HBase Shell, put a few rows /***/ hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1' hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1' hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1' /***/ /* STEP3: Take the 1st incremental backup /***/ [sourcecluster]$ hbase backup create incremental hdfs://hostname.targetcluster.org:9000/userid/backupdir ... 14/05/09 13:37:45 INFO backup.BackupManager: Backup request backup_1399667851020 has been executed. /***/ /* STEP4: In HBase Shell, put a few more rows. /* update 'row100', and create new 'row101' /***/ hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2' hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2' hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2' hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2' hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2' hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2' /***/ /* STEP5: Take the 2nd incremental backup /***/ [sourcecluster]$ hbase backup create incremental hdfs://hostname.targetcluster.org:9000/userid/backupdir ... 14/05/09 13:39:33 INFO backup.BackupManager: Backup request backup_1399667959165 has been executed. /***/ /* STEP7: Restore from PIT of the 1st incremental backup /* specified the backup ID of the 1st incremental /* option -automatic, will trigger the restore of full backup first, then 1st /* incremental backup image /* t1_dn,etc are the original table names. All tables will be restored if not specified /* t1_dn_restore, etc. are the restored table. if not specified, orginal table name will be used /***
[jira] [Commented] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot
[ https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996599#comment-13996599 ] Demai Ni commented on HBASE-7912: - [~stack], thanks for the comments. bq. This doc. with perhaps a little more commentary like it could go into the hbase refguide when this feature is committed? In additional to the cli pdf I attached in this jira. more completed documents can be found here: [IBM BigInsights 2.1.2|http://www-01.ibm.com/support/knowledgecenter/SSPT3X_2.1.2/com.ibm.swg.im.infosphere.biginsights.admin.doc/doc/admin_hbase_bkuprestore_overview.html], which was officially released in March 2014. We will open source all the features related with Backup/Restore from IBM BigInsights. We can move the documents to 'backup' session of HBase ref book as you suggested, and certainly after incorporated the comments/suggestions from the community. About testing, thanks to [~jinghe]'s comment. We already did functional, stress testing internally before release. For the current patches, since we did some changes per suggestions from the community, additional dev testing is being carried on. {quote} bq. We’ll convert/replay the backed-up Hlogs into HFiles for fast incremental restore. This is interesting. It is done against a cluster or it is just a MR job/tool? {quote} ~70% of the code logic is from WalPlayer, a MR job against target cluster. The difference is, we don't rely on a live hbase cluster when convert the HLog to Hfiles as the code can access the tableinfo offline. Currently the code is only useful for the backup/restore solution. We'd like to open another jira for the logic as a general tool/improvement of WalPlayer, and the new jira will have a dependency on [HBASE-8083 | https://issues.apache.org/jira/browse/HBASE-8073]. bq.What needs to go in first? What should we review first? Actually, need you and other folks' suggestion here. >From the dependency perspective, I'd like to have [Full backup HBase-10900| >https://issues.apache.org/jira/browse/HBASE-10900] in first, and then >[incremental backup >HBase-11085|https://issues.apache.org/jira/browse/HBASE-11085], and once >Jerry's [global log roll HBase-11148| >https://issues.apache.org/jira/browse/HBASE-11148] get accepted. I will put a >patch to update full and incremental to use it immediately. Then, I would >like to improve it with protobuff and abstract out zookeeper. If community accepts the solution of the general framework provided by [Full backup HBase-10900| https://issues.apache.org/jira/browse/HBASE-10900] and [incremental backup HBase-11085|https://issues.apache.org/jira/browse/HBASE-11085]. We will build the patches of other features on top of the framework. At this moment, I am thinking about open another review board for the combined patches of [both incremental and full backup | https://issues.apache.org/jira/secure/attachment/12644215/HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch]. I understand a lot of codes involved here, and open to any suggestion to make the review easier to everyone. :-) Demai > HBase Backup/Restore Based on HBase Snapshot > > > Key: HBASE-7912 > URL: https://issues.apache.org/jira/browse/HBASE-7912 > Project: HBase > Issue Type: Sub-task >Reporter: Richard Ding >Assignee: Richard Ding > Attachments: HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, > HBase_BackupRestore-Jira-7912-CLI-v1.pdf > > > Finally, we completed the implementation of our backup/restore solution, and > would like to share with community through this jira. > We are leveraging existing hbase snapshot feature, and provide a general > solution to common users. Our full backup is using snapshot to capture > metadata locally and using exportsnapshot to move data to another cluster; > the incremental backup is using offline-WALplayer to backup HLogs; we also > leverage global distribution rolllog and flush to improve performance; other > added-on values such as convert, merge, progress report, and CLI commands. So > that a common user can backup hbase data without in-depth knowledge of hbase. > Our solution also contains some usability features for enterprise users. > The detail design document and CLI command will be attached in this jira. We > plan to use 10~12 subtasks to share each of the following features, and > document the detail implement in the subtasks: > * *Full Backup* : provide local and remote back/restore for a list of tables > * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental > backup) > * *distributed* Logroll and distributed flush > * Backup *Manifest* and history > * *Incremental* backup: to build on top of full backup as daily/weekly backup > * *Convert* incremental backup WAL files int