[jira] [Commented] (HBASE-16077) Replication status doesnt show failed RS metrics in CLI

2016-10-21 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15596532#comment-15596532
 ] 

Demai Ni commented on HBASE-16077:
--

[~bibinchundatt], would you please elaborate a bit detail what do you expect 
from the output of the CLI command? thanks

> Replication status doesnt show failed RS metrics in CLI 
> 
>
> Key: HBASE-16077
> URL: https://issues.apache.org/jira/browse/HBASE-16077
> Project: HBase
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>
> Steps to reproduce
> 
> # Create 2 clusters and configure replication
> # Create TABLE 1 and enable table replication
> # Shutdown Cluster 2 for short period.
> # Load data to TABLE 1 
> # Shutdown Region Server whr Region of TABLE 1 is available
> # Check metrics using CLI
> {noformat}
> hbase(main):003:0* status 'replication'
> 2016-06-14 00:58:04,664 INFO  [main] ipc.AbstractRpcClient: RPC Server 
> Kerberos principal name for service=MasterService is 
> hbase/hadoop.hadoop@hadoop.com
> version 1.0.2
> 3 live servers
> host-10-19-92-200:
>SOURCE: PeerID=11, SizeOfLogQueue=0, ShippedBatches=30, 
> ShippedOps=1351, ShippedBytes=1513127672, LogReadInBytes=662648911, 
> LogEditsRead=1546, LogEditsFiltered=1409, SizeOfLogToReplicate=0, 
> TimeWillBeTakenForLogToReplicate=0, ShippedHFiles=0, SizeOfHFileRefsQueue=0, 
> AgeOfLastShippedOp=0, TimeStampsOfLastShippedOp=Tue Jun 14 00:58:01 IST 2016, 
> Replication Lag=0
>SINK  : AppliedBatches=2, AppliedOps=5, AppliedHFiles=3, 
> AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Mon Jun 13 02:18:06 IST 2016
> host-10-19-92-187:
>SOURCE: PeerID=11, SizeOfLogQueue=0, ShippedBatches=0, ShippedOps=0, 
> ShippedBytes=0, LogReadInBytes=65719, LogEditsRead=112, LogEditsFiltered=112, 
> SizeOfLogToReplicate=0, TimeWillBeTakenForLogToReplicate=0, ShippedHFiles=0, 
> SizeOfHFileRefsQueue=0, AgeOfLastShippedOp=0, TimeStampsOfLastShippedOp=Tue 
> Jun 14 00:58:01 IST 2016, Replication Lag=0
>SINK  : AppliedBatches=0, AppliedOps=0, AppliedHFiles=0, 
> AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Mon Jun 13 09:07:20 IST 2016
> host-10-19-92-188:
>SOURCE: PeerID=11, SizeOfLogQueue=0, ShippedBatches=39, 
> ShippedOps=1730, ShippedBytes=1937609744, LogReadInBytes=848439638, 
> LogEditsRead=1671, LogEditsFiltered=1497, SizeOfLogToReplicate=0, 
> TimeWillBeTakenForLogToReplicate=0, ShippedHFiles=0, SizeOfHFileRefsQueue=0, 
> AgeOfLastShippedOp=0, TimeStampsOfLastShippedOp=Tue Jun 14 00:58:03 IST 2016, 
> Replication Lag=0
>SINK  : AppliedBatches=1, AppliedOps=1, AppliedHFiles=0, 
> AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Mon Jun 13 01:53:53 IST 2016
> {noformat}
> *JMX output*
> {noformat}
> {
> "name" : "Hadoop:service=HBase,name=RegionServer,sub=Replication",
> "modelerType" : "RegionServer,sub=Replication",
> "tag.Context" : "regionserver",
> "tag.Hostname" : "host-10-19-92-200",
> "source.11.sizeOfLogToReplicate" : 537,
> "source.11-host-10-19-92-187,21302,1465787242095.sizeOfLogToReplicate" : 
> 282766680,
> "source.shippedHFiles" : 0,
> "source.ageOfLastShippedOp" : 0,
> "source.11.shippedHFiles" : 0,
> "source.11-host-10-19-92-187,21302,1465787242095.ageOfLastShippedOp" : 0,
> "source.shippedKBs" : 1477663,
> "source.sizeOfHFileRefsQueue" : 0,
> "source.logReadInBytes" : 691148656,
> "source.11-host-10-19-92-187,21302,1465787242095.logEditsRead" : 39,
> "source.11-host-10-19-92-187,21302,1465787242095.shippedOps" : 0,
> "source.11.logEditsFiltered" : 1244,
> "source.sizeOfLogQueue" : 4,
> "source.timeWillBeTakenForLogToReplicate" : 1,
> "sink.ageOfLastAppliedOp" : 0,
> 
> "source.11-host-10-19-92-187,21302,1465787242095.timeWillBeTakenForLogToReplicate"
>  : 0,
> "source.logEditsRead" : 1420,
> "source.11.sizeOfLogQueue" : 0,
> "source.11-host-10-19-92-187,21302,1465787242095.logEditsFiltered" : 32,
> "source.11-host-10-19-92-187,21302,1465787242095.shippedHFiles" : 0,
> "source.shippedOps" : 1351,
> "source.11.shippedKBs" : 1477663,
> "source.11.logReadInBytes" : 662562515,
> "sink.appliedHFiles" : 3,
> "source.11.sizeOfHFileRefsQueue" : 0,
> "source.logEditsFiltered" : 1276,
> "source.shippedBytes" : 1513127672,
> "source.11-host-10-19-92-187,21302,1465787242095.shippedBatches" : 0,
> "source.11.shippedBytes" : 1513127672,
> "sink.appliedOps" : 5,
> "source.11-host-10-19-92-187,21302,1465787242095.sizeOfLogQueue" : 4,
> "source.11.shippedBatches" : 30,
> "source.11-host-10-19-92-187,21302,1465787242095.sizeOfHFileRefsQueue" : 
> 0,
> "source.11.timeWillBeTakenForLogToReplicate" : 1,
> "source.11-host-10-19-92-187,21302,1465787242095.logReadInBytes" : 

[jira] [Commented] (HBASE-11085) Incremental Backup Restore support

2015-06-23 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598650#comment-14598650
 ] 

Demai Ni commented on HBASE-11085:
--

[~vrodionov], Thanks. :-) 

> Incremental Backup Restore support
> --
>
> Key: HBASE-11085
> URL: https://issues.apache.org/jira/browse/HBASE-11085
> Project: HBase
>  Issue Type: New Feature
>Reporter: Demai Ni
>Assignee: Vladimir Rodionov
> Attachments: 
> HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch, 
> HBASE-11085-trunk-v1.patch, 
> HBASE-11085-trunk-v2-contain-HBASE-10900-trunk-v4.patch, 
> HBASE-11085-trunk-v2.patch, HLogPlayer.java
>
>
> h2. Feature Description
> the jira is part of  
> [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on 
> full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. 
> for the detail layout and frame work, please reference to  [HBASE-10900| 
> https://issues.apache.org/jira/browse/HBASE-10900].
> When client issues an incremental backup request, BackupManager will check 
> the request and then kicks of a global procedure via HBaseAdmin for all the 
> active regionServer to roll log. Each Region server will record their log 
> number into zookeeper. Then we determine which log need to be included in 
> this incremental backup, and use DistCp to copy them to target location. At 
> the same time, a dependency of backup image will be recorded, and later on 
> saved in Backup Manifest file.
> Restore is to replay the backuped WAL logs on target HBase instance. The 
> replay will occur after full backup.
> As incremental backup image depends on prior full backup image and 
> incremental images if exists. Manifest file will be used to store the 
> dependency lineage during backup, and used during restore time for PIT 
> restore.  
> h2. Use case(i.e  example)
> {code:title=Incremental Backup Restore example|borderStyle=solid}
> /***/
> /* STEP1:  FULL backup from sourcecluster to targetcluster  
> /* if no table name specified, all tables from source cluster will be 
> backuped 
> /***/
> [sourcecluster]$ hbase backup create full 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn
> ...
> 14/05/09 13:35:46 INFO backup.BackupManager: Backup request 
> backup_1399667695966 has been executed.
> /***/
> /* STEP2:   In HBase Shell, put a few rows
> 
> /***/
> hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1'
> hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1'
> hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1'
> /***/
> /* STEP3:   Take the 1st incremental backup   
>  
> /***/
> [sourcecluster]$ hbase backup create incremental 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir
> ...
> 14/05/09 13:37:45 INFO backup.BackupManager: Backup request 
> backup_1399667851020 has been executed.
> /***/
> /* STEP4:   In HBase Shell, put a few more rows.  
> 
> /*   update 'row100', and create new 'row101' 
>   
> /***/
> hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2'
> hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2'
> hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2'
> /***/
> /* STEP5:   Take the 2nd incremental backup   
> 
> /***/
> [sourcecluster]$ hbase backup create incremental 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir
> ...
> 14/05/09 13:39:33 INFO backup.BackupManager: Backup requ

[jira] [Commented] (HBASE-10900) FULL table backup and restore

2015-02-19 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327973#comment-14327973
 ] 

Demai Ni commented on HBASE-10900:
--

Due to personal reason, I can't work directly to contribute back to open source 
community at this moment. Put this jira as 'unassigned', and remove fixed 
version

hopefully, someone can pick it up. or Or my situation may change and I can 
continue to work on this. 

Thanks... Demai

> FULL table backup and restore
> -
>
> Key: HBASE-10900
> URL: https://issues.apache.org/jira/browse/HBASE-10900
> Project: HBase
>  Issue Type: Task
>Reporter: Demai Ni
> Attachments: HBASE-10900-fullbackup-trunk-v1.patch, 
> HBASE-10900-trunk-v2.patch, HBASE-10900-trunk-v3.patch, 
> HBASE-10900-trunk-v4.patch
>
>
> h2. Feature Description
> This is a subtask of 
> [HBase-7912|https://issues.apache.org/jira/browse/HBASE-7912] to support FULL 
> backup/restore, and will complete the following function:
> {code:title=Backup Restore example|borderStyle=solid}
> /* backup from sourcecluster to targetcluster 
>  */
> /* if no table name specified, all tables from source cluster will be 
> backuped */
> [sourcecluster]$ hbase backup create full 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn
> /* restore on targetcluser, this is a local restore   
>   */
> /* backup_1396650096738 - backup image name   
>   */
> /* t1_dn,etc are the original table names. All tables will be restored if not 
> specified */
> /* t1_dn_restore, etc. are the restored table. if not specified, orginal 
> table name will be used*/
> [targetcluster]$ hbase restore /userid/backupdir backup_1396650096738 
> t1_dn,t2_dn,t3_dn t1_dn_restore,t2_dn_restore,t3_dn_restore
> /* restore from targetcluster back to source cluster, this is a remote restore
> [sourcecluster]$ hbase restore 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir backup_1396650096738 
> t1_dn,t2_dn,t3_dn t1_dn_restore,t2_dn_restore,t3_dn_restore
> {code}
> h2. Detail layout and frame work for the next jiras
> The patch is a wrapper of the existing snapshot and exportSnapshot, and will 
> use as the base framework for the over-all solution of  
> [HBase-7912|https://issues.apache.org/jira/browse/HBASE-7912] as described 
> below:
> * *bin/hbase*  : end-user command line interface to invoke 
> BackupClient and RestoreClient
> * *BackupClient.java*  : 'main' entry for backup operations. This patch will 
> only support 'full' backup. In future jiras, will support:
> ** *create* incremental backup
> ** *cancel* an ongoing backup
> ** *delete* an exisitng backup image
> ** *describe* the detailed informaiton of backup image
> ** show *history* of all successful backups 
> ** show the *status* of the latest backup request
> ** *convert* incremental backup WAL files into HFiles.  either on-the-fly 
> during create or after create
> ** *merge* backup image
> ** *stop* backup a table of existing backup image
> ** *show* tables of a backup image 
> * *BackupCommands.java* : a place to keep all the command usages and options
> * *BackupManager.java*  : handle backup requests on server-side, create 
> BACKUP ZOOKEEPER nodes to keep track backup. The timestamps kept in zookeeper 
> will be used for future incremental backup (not included in this jira). 
> Create BackupContext and DispatchRequest. 
> * *BackupHandler.java*  : in this patch, it is a wrapper of snapshot and 
> exportsnapshot. In future jiras, 
> ** *timestamps* info will be recorded in ZK
> ** carry on *incremental* backup.  
> ** update backup *progress*
> ** set flags of *status*
> ** build up *backupManifest* file(in this jira only limited info for 
> fullback. later on, timestamps and dependency of multipl backup images are 
> also recorded here)
> ** clean up after *failed* backup 
> ** clean up after *cancelled* backup
> ** allow on-the-fly *convert* during incremental backup 
> * *BackupContext.java* : encapsulate backup information like backup ID, table 
> names, directory info, phase, TimeStamps of backup progress, size of data, 
> ancestor info, etc. 
> * *BackupCopier.java*  : the copying operation.  Later on, to support 
> progress report and mapper estimation; and extends DisCp for progress 
> updating to ZK during backup. 
> * *BackupExcpetion.java*: to handle exception from backup/restore
> * *BackupManifest.java* : encapsulate all the backup image information. The 
> manifest info will be bundled as manifest file together with data. So that 
> each backup image will contain all the info needed for restore. 
> * *BackupStatus.java*   : encapsulate backup status at table level during 
> backup progress
> * *BackupUtil.j

[jira] [Commented] (HBASE-11085) Incremental Backup Restore support

2015-02-19 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327975#comment-14327975
 ] 

Demai Ni commented on HBASE-11085:
--

Due to personal reason, I can't work directly to contribute back to open source 
community at this moment. Put this jira as 'unassigned', and remove fixed 
version

hopefully, someone can pick it up. Or my situation may change and then I will 
continue to work on this. 

Thanks... Demai

> Incremental Backup Restore support
> --
>
> Key: HBASE-11085
> URL: https://issues.apache.org/jira/browse/HBASE-11085
> Project: HBase
>  Issue Type: New Feature
>Reporter: Demai Ni
> Attachments: 
> HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch, 
> HBASE-11085-trunk-v1.patch, 
> HBASE-11085-trunk-v2-contain-HBASE-10900-trunk-v4.patch, 
> HBASE-11085-trunk-v2.patch, HLogPlayer.java
>
>
> h2. Feature Description
> the jira is part of  
> [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on 
> full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. 
> for the detail layout and frame work, please reference to  [HBASE-10900| 
> https://issues.apache.org/jira/browse/HBASE-10900].
> When client issues an incremental backup request, BackupManager will check 
> the request and then kicks of a global procedure via HBaseAdmin for all the 
> active regionServer to roll log. Each Region server will record their log 
> number into zookeeper. Then we determine which log need to be included in 
> this incremental backup, and use DistCp to copy them to target location. At 
> the same time, a dependency of backup image will be recorded, and later on 
> saved in Backup Manifest file.
> Restore is to replay the backuped WAL logs on target HBase instance. The 
> replay will occur after full backup.
> As incremental backup image depends on prior full backup image and 
> incremental images if exists. Manifest file will be used to store the 
> dependency lineage during backup, and used during restore time for PIT 
> restore.  
> h2. Use case(i.e  example)
> {code:title=Incremental Backup Restore example|borderStyle=solid}
> /***/
> /* STEP1:  FULL backup from sourcecluster to targetcluster  
> /* if no table name specified, all tables from source cluster will be 
> backuped 
> /***/
> [sourcecluster]$ hbase backup create full 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn
> ...
> 14/05/09 13:35:46 INFO backup.BackupManager: Backup request 
> backup_1399667695966 has been executed.
> /***/
> /* STEP2:   In HBase Shell, put a few rows
> 
> /***/
> hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1'
> hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1'
> hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1'
> /***/
> /* STEP3:   Take the 1st incremental backup   
>  
> /***/
> [sourcecluster]$ hbase backup create incremental 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir
> ...
> 14/05/09 13:37:45 INFO backup.BackupManager: Backup request 
> backup_1399667851020 has been executed.
> /***/
> /* STEP4:   In HBase Shell, put a few more rows.  
> 
> /*   update 'row100', and create new 'row101' 
>   
> /***/
> hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2'
> hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2'
> hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2'
> /***/
> /* STEP5:   Take the 2nd incremental backup   
> 
> /***

[jira] [Updated] (HBASE-11085) Incremental Backup Restore support

2015-02-19 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11085:
-
Fix Version/s: (was: 1.1.0)
 Assignee: (was: Demai Ni)

> Incremental Backup Restore support
> --
>
> Key: HBASE-11085
> URL: https://issues.apache.org/jira/browse/HBASE-11085
> Project: HBase
>  Issue Type: New Feature
>Reporter: Demai Ni
> Attachments: 
> HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch, 
> HBASE-11085-trunk-v1.patch, 
> HBASE-11085-trunk-v2-contain-HBASE-10900-trunk-v4.patch, 
> HBASE-11085-trunk-v2.patch, HLogPlayer.java
>
>
> h2. Feature Description
> the jira is part of  
> [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on 
> full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. 
> for the detail layout and frame work, please reference to  [HBASE-10900| 
> https://issues.apache.org/jira/browse/HBASE-10900].
> When client issues an incremental backup request, BackupManager will check 
> the request and then kicks of a global procedure via HBaseAdmin for all the 
> active regionServer to roll log. Each Region server will record their log 
> number into zookeeper. Then we determine which log need to be included in 
> this incremental backup, and use DistCp to copy them to target location. At 
> the same time, a dependency of backup image will be recorded, and later on 
> saved in Backup Manifest file.
> Restore is to replay the backuped WAL logs on target HBase instance. The 
> replay will occur after full backup.
> As incremental backup image depends on prior full backup image and 
> incremental images if exists. Manifest file will be used to store the 
> dependency lineage during backup, and used during restore time for PIT 
> restore.  
> h2. Use case(i.e  example)
> {code:title=Incremental Backup Restore example|borderStyle=solid}
> /***/
> /* STEP1:  FULL backup from sourcecluster to targetcluster  
> /* if no table name specified, all tables from source cluster will be 
> backuped 
> /***/
> [sourcecluster]$ hbase backup create full 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn
> ...
> 14/05/09 13:35:46 INFO backup.BackupManager: Backup request 
> backup_1399667695966 has been executed.
> /***/
> /* STEP2:   In HBase Shell, put a few rows
> 
> /***/
> hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1'
> hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1'
> hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1'
> /***/
> /* STEP3:   Take the 1st incremental backup   
>  
> /***/
> [sourcecluster]$ hbase backup create incremental 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir
> ...
> 14/05/09 13:37:45 INFO backup.BackupManager: Backup request 
> backup_1399667851020 has been executed.
> /***/
> /* STEP4:   In HBase Shell, put a few more rows.  
> 
> /*   update 'row100', and create new 'row101' 
>   
> /***/
> hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2'
> hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2'
> hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2'
> /***/
> /* STEP5:   Take the 2nd incremental backup   
> 
> /***/
> [sourcecluster]$ hbase backup create incremental 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir
> ...
> 14/05/09 13:39:33 INFO backup.BackupManager: Backup request 
> backup_1399667959165 has been executed.
> /*

[jira] [Updated] (HBASE-10900) FULL table backup and restore

2015-02-19 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-10900:
-
Fix Version/s: (was: 1.1.0)
 Assignee: (was: Demai Ni)

> FULL table backup and restore
> -
>
> Key: HBASE-10900
> URL: https://issues.apache.org/jira/browse/HBASE-10900
> Project: HBase
>  Issue Type: Task
>Reporter: Demai Ni
> Attachments: HBASE-10900-fullbackup-trunk-v1.patch, 
> HBASE-10900-trunk-v2.patch, HBASE-10900-trunk-v3.patch, 
> HBASE-10900-trunk-v4.patch
>
>
> h2. Feature Description
> This is a subtask of 
> [HBase-7912|https://issues.apache.org/jira/browse/HBASE-7912] to support FULL 
> backup/restore, and will complete the following function:
> {code:title=Backup Restore example|borderStyle=solid}
> /* backup from sourcecluster to targetcluster 
>  */
> /* if no table name specified, all tables from source cluster will be 
> backuped */
> [sourcecluster]$ hbase backup create full 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn
> /* restore on targetcluser, this is a local restore   
>   */
> /* backup_1396650096738 - backup image name   
>   */
> /* t1_dn,etc are the original table names. All tables will be restored if not 
> specified */
> /* t1_dn_restore, etc. are the restored table. if not specified, orginal 
> table name will be used*/
> [targetcluster]$ hbase restore /userid/backupdir backup_1396650096738 
> t1_dn,t2_dn,t3_dn t1_dn_restore,t2_dn_restore,t3_dn_restore
> /* restore from targetcluster back to source cluster, this is a remote restore
> [sourcecluster]$ hbase restore 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir backup_1396650096738 
> t1_dn,t2_dn,t3_dn t1_dn_restore,t2_dn_restore,t3_dn_restore
> {code}
> h2. Detail layout and frame work for the next jiras
> The patch is a wrapper of the existing snapshot and exportSnapshot, and will 
> use as the base framework for the over-all solution of  
> [HBase-7912|https://issues.apache.org/jira/browse/HBASE-7912] as described 
> below:
> * *bin/hbase*  : end-user command line interface to invoke 
> BackupClient and RestoreClient
> * *BackupClient.java*  : 'main' entry for backup operations. This patch will 
> only support 'full' backup. In future jiras, will support:
> ** *create* incremental backup
> ** *cancel* an ongoing backup
> ** *delete* an exisitng backup image
> ** *describe* the detailed informaiton of backup image
> ** show *history* of all successful backups 
> ** show the *status* of the latest backup request
> ** *convert* incremental backup WAL files into HFiles.  either on-the-fly 
> during create or after create
> ** *merge* backup image
> ** *stop* backup a table of existing backup image
> ** *show* tables of a backup image 
> * *BackupCommands.java* : a place to keep all the command usages and options
> * *BackupManager.java*  : handle backup requests on server-side, create 
> BACKUP ZOOKEEPER nodes to keep track backup. The timestamps kept in zookeeper 
> will be used for future incremental backup (not included in this jira). 
> Create BackupContext and DispatchRequest. 
> * *BackupHandler.java*  : in this patch, it is a wrapper of snapshot and 
> exportsnapshot. In future jiras, 
> ** *timestamps* info will be recorded in ZK
> ** carry on *incremental* backup.  
> ** update backup *progress*
> ** set flags of *status*
> ** build up *backupManifest* file(in this jira only limited info for 
> fullback. later on, timestamps and dependency of multipl backup images are 
> also recorded here)
> ** clean up after *failed* backup 
> ** clean up after *cancelled* backup
> ** allow on-the-fly *convert* during incremental backup 
> * *BackupContext.java* : encapsulate backup information like backup ID, table 
> names, directory info, phase, TimeStamps of backup progress, size of data, 
> ancestor info, etc. 
> * *BackupCopier.java*  : the copying operation.  Later on, to support 
> progress report and mapper estimation; and extends DisCp for progress 
> updating to ZK during backup. 
> * *BackupExcpetion.java*: to handle exception from backup/restore
> * *BackupManifest.java* : encapsulate all the backup image information. The 
> manifest info will be bundled as manifest file together with data. So that 
> each backup image will contain all the info needed for restore. 
> * *BackupStatus.java*   : encapsulate backup status at table level during 
> backup progress
> * *BackupUtil.java* : utility methods during backup process
> * *RestoreClient.java*  : 'main' entry for restore operations. This patch 
> will only support 'full' backup. 
> * *RestoreUtil.java*: utility methods during restore process
> * *ExportSnapshot.java* : remove 'fin

[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

2015-02-12 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14319278#comment-14319278
 ] 

Demai Ni commented on HBASE-9531:
-

[~ashish singhi], Thank you so much for picking it up and complete the jira
[~apurtell], thanks a lot for pushing the feature in. glad the code can be used 
by more users. 


> a command line (hbase shell) interface to retreive the replication metrics 
> and show replication lag
> ---
>
> Key: HBASE-9531
> URL: https://issues.apache.org/jira/browse/HBASE-9531
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Affects Versions: 0.99.0
>Reporter: Demai Ni
>Assignee: Ashish Singhi
> Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11
>
> Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, 
> HBASE-9531-master-v1.patch, HBASE-9531-master-v2.patch, 
> HBASE-9531-master-v3.patch, HBASE-9531-master-v4.patch, 
> HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch, HBASE-9531-v1.patch, 
> HBASE-9531-v2.patch, HBASE-9531-v3-0.98.patch, HBASE-9531-v3-branch-1.patch, 
> HBASE-9531-v3.patch, HBASE-9531.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive 
> the replication metrics info such as:ageOfLastShippedOp, 
> timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and 
> timeStampsOfLastAppliedOp. And also to provide a point of time info of the 
> lag of replication(source only)
> Understand that hbase is using Hadoop 
> metrics(http://hbase.apache.org/metrics.html), which is a common way to 
> monitor metric info. This Jira is to serve as a light-weight client 
> interface, comparing to a completed(certainly better, but heavier)GUI 
> monitoring package. I made the code works on 0.94.9 now, and like to use this 
> jira to get opinions about whether the feature is valuable to other 
> users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell 
> command 'status', and invent a new module, called ReplicationLoad.  In 
> HRegionServer.buildServerLoad() , use the local replication service objects 
> to get their loads  which could be wrapped in a ReplicationLoad object and 
> then simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
> ReplicationSinkMetrics, a few getters and setters will be created, and ask 
> Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for 
> his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
>   if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
> timeStampsOfLastShippedOp)) //err on the large side
>   else if (current time - timeStampsOfLastShippedOp) < 2* 
> ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
> recently 
> else lag = 0 // last shipped may happens last night, so NO real lag 
> although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','sink'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp

[jira] [Commented] (HBASE-10900) FULL table backup and restore

2015-02-03 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303736#comment-14303736
 ] 

Demai Ni commented on HBASE-10900:
--

[~apurtell], sounds the right way to go.

[~jerryhe], any objections? If not, I will go ahead resolve the jiras under my 
name as not fix. 

> FULL table backup and restore
> -
>
> Key: HBASE-10900
> URL: https://issues.apache.org/jira/browse/HBASE-10900
> Project: HBase
>  Issue Type: Task
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 1.1.0
>
> Attachments: HBASE-10900-fullbackup-trunk-v1.patch, 
> HBASE-10900-trunk-v2.patch, HBASE-10900-trunk-v3.patch, 
> HBASE-10900-trunk-v4.patch
>
>
> h2. Feature Description
> This is a subtask of 
> [HBase-7912|https://issues.apache.org/jira/browse/HBASE-7912] to support FULL 
> backup/restore, and will complete the following function:
> {code:title=Backup Restore example|borderStyle=solid}
> /* backup from sourcecluster to targetcluster 
>  */
> /* if no table name specified, all tables from source cluster will be 
> backuped */
> [sourcecluster]$ hbase backup create full 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn
> /* restore on targetcluser, this is a local restore   
>   */
> /* backup_1396650096738 - backup image name   
>   */
> /* t1_dn,etc are the original table names. All tables will be restored if not 
> specified */
> /* t1_dn_restore, etc. are the restored table. if not specified, orginal 
> table name will be used*/
> [targetcluster]$ hbase restore /userid/backupdir backup_1396650096738 
> t1_dn,t2_dn,t3_dn t1_dn_restore,t2_dn_restore,t3_dn_restore
> /* restore from targetcluster back to source cluster, this is a remote restore
> [sourcecluster]$ hbase restore 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir backup_1396650096738 
> t1_dn,t2_dn,t3_dn t1_dn_restore,t2_dn_restore,t3_dn_restore
> {code}
> h2. Detail layout and frame work for the next jiras
> The patch is a wrapper of the existing snapshot and exportSnapshot, and will 
> use as the base framework for the over-all solution of  
> [HBase-7912|https://issues.apache.org/jira/browse/HBASE-7912] as described 
> below:
> * *bin/hbase*  : end-user command line interface to invoke 
> BackupClient and RestoreClient
> * *BackupClient.java*  : 'main' entry for backup operations. This patch will 
> only support 'full' backup. In future jiras, will support:
> ** *create* incremental backup
> ** *cancel* an ongoing backup
> ** *delete* an exisitng backup image
> ** *describe* the detailed informaiton of backup image
> ** show *history* of all successful backups 
> ** show the *status* of the latest backup request
> ** *convert* incremental backup WAL files into HFiles.  either on-the-fly 
> during create or after create
> ** *merge* backup image
> ** *stop* backup a table of existing backup image
> ** *show* tables of a backup image 
> * *BackupCommands.java* : a place to keep all the command usages and options
> * *BackupManager.java*  : handle backup requests on server-side, create 
> BACKUP ZOOKEEPER nodes to keep track backup. The timestamps kept in zookeeper 
> will be used for future incremental backup (not included in this jira). 
> Create BackupContext and DispatchRequest. 
> * *BackupHandler.java*  : in this patch, it is a wrapper of snapshot and 
> exportsnapshot. In future jiras, 
> ** *timestamps* info will be recorded in ZK
> ** carry on *incremental* backup.  
> ** update backup *progress*
> ** set flags of *status*
> ** build up *backupManifest* file(in this jira only limited info for 
> fullback. later on, timestamps and dependency of multipl backup images are 
> also recorded here)
> ** clean up after *failed* backup 
> ** clean up after *cancelled* backup
> ** allow on-the-fly *convert* during incremental backup 
> * *BackupContext.java* : encapsulate backup information like backup ID, table 
> names, directory info, phase, TimeStamps of backup progress, size of data, 
> ancestor info, etc. 
> * *BackupCopier.java*  : the copying operation.  Later on, to support 
> progress report and mapper estimation; and extends DisCp for progress 
> updating to ZK during backup. 
> * *BackupExcpetion.java*: to handle exception from backup/restore
> * *BackupManifest.java* : encapsulate all the backup image information. The 
> manifest info will be bundled as manifest file together with data. So that 
> each backup image will contain all the info needed for restore. 
> * *BackupStatus.java*   : encapsulate backup status at table level during 
> backup progress
> * *BackupUtil.java* : utility methods during backup process
> * *RestoreClient.java*  : 'main

[jira] [Commented] (HBASE-10900) FULL table backup and restore

2015-02-02 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14302677#comment-14302677
 ] 

Demai Ni commented on HBASE-10900:
--

[~ram_krish], thanks for the ping. After chatted with several folks in hbase 
community, the plan is to build a stand-alone utility in github(or some place 
similar) instead of pushing the large portion of code into hbase core code. I 
think [~jinghe]is still planning to get it done. 

> FULL table backup and restore
> -
>
> Key: HBASE-10900
> URL: https://issues.apache.org/jira/browse/HBASE-10900
> Project: HBase
>  Issue Type: Task
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 1.1.0
>
> Attachments: HBASE-10900-fullbackup-trunk-v1.patch, 
> HBASE-10900-trunk-v2.patch, HBASE-10900-trunk-v3.patch, 
> HBASE-10900-trunk-v4.patch
>
>
> h2. Feature Description
> This is a subtask of 
> [HBase-7912|https://issues.apache.org/jira/browse/HBASE-7912] to support FULL 
> backup/restore, and will complete the following function:
> {code:title=Backup Restore example|borderStyle=solid}
> /* backup from sourcecluster to targetcluster 
>  */
> /* if no table name specified, all tables from source cluster will be 
> backuped */
> [sourcecluster]$ hbase backup create full 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn
> /* restore on targetcluser, this is a local restore   
>   */
> /* backup_1396650096738 - backup image name   
>   */
> /* t1_dn,etc are the original table names. All tables will be restored if not 
> specified */
> /* t1_dn_restore, etc. are the restored table. if not specified, orginal 
> table name will be used*/
> [targetcluster]$ hbase restore /userid/backupdir backup_1396650096738 
> t1_dn,t2_dn,t3_dn t1_dn_restore,t2_dn_restore,t3_dn_restore
> /* restore from targetcluster back to source cluster, this is a remote restore
> [sourcecluster]$ hbase restore 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir backup_1396650096738 
> t1_dn,t2_dn,t3_dn t1_dn_restore,t2_dn_restore,t3_dn_restore
> {code}
> h2. Detail layout and frame work for the next jiras
> The patch is a wrapper of the existing snapshot and exportSnapshot, and will 
> use as the base framework for the over-all solution of  
> [HBase-7912|https://issues.apache.org/jira/browse/HBASE-7912] as described 
> below:
> * *bin/hbase*  : end-user command line interface to invoke 
> BackupClient and RestoreClient
> * *BackupClient.java*  : 'main' entry for backup operations. This patch will 
> only support 'full' backup. In future jiras, will support:
> ** *create* incremental backup
> ** *cancel* an ongoing backup
> ** *delete* an exisitng backup image
> ** *describe* the detailed informaiton of backup image
> ** show *history* of all successful backups 
> ** show the *status* of the latest backup request
> ** *convert* incremental backup WAL files into HFiles.  either on-the-fly 
> during create or after create
> ** *merge* backup image
> ** *stop* backup a table of existing backup image
> ** *show* tables of a backup image 
> * *BackupCommands.java* : a place to keep all the command usages and options
> * *BackupManager.java*  : handle backup requests on server-side, create 
> BACKUP ZOOKEEPER nodes to keep track backup. The timestamps kept in zookeeper 
> will be used for future incremental backup (not included in this jira). 
> Create BackupContext and DispatchRequest. 
> * *BackupHandler.java*  : in this patch, it is a wrapper of snapshot and 
> exportsnapshot. In future jiras, 
> ** *timestamps* info will be recorded in ZK
> ** carry on *incremental* backup.  
> ** update backup *progress*
> ** set flags of *status*
> ** build up *backupManifest* file(in this jira only limited info for 
> fullback. later on, timestamps and dependency of multipl backup images are 
> also recorded here)
> ** clean up after *failed* backup 
> ** clean up after *cancelled* backup
> ** allow on-the-fly *convert* during incremental backup 
> * *BackupContext.java* : encapsulate backup information like backup ID, table 
> names, directory info, phase, TimeStamps of backup progress, size of data, 
> ancestor info, etc. 
> * *BackupCopier.java*  : the copying operation.  Later on, to support 
> progress report and mapper estimation; and extends DisCp for progress 
> updating to ZK during backup. 
> * *BackupExcpetion.java*: to handle exception from backup/restore
> * *BackupManifest.java* : encapsulate all the backup image information. The 
> manifest info will be bundled as manifest file together with data. So that 
> each backup image will contain all the info needed for restore. 
> * *BackupStatus.java*   : encapsulate backup s

[jira] [Commented] (HBASE-12073) Shell command user_permission fails on the table created by user if he is not global admin.

2014-09-24 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146581#comment-14146581
 ] 

Demai Ni commented on HBASE-12073:
--

[~esteban], Matteo is right. [HBASE-11452 | 
https://issues.apache.org/jira/browse/HBASE-11452] was to provide a java client 
API, and it also changed the ruby script so that the external behavior will be 
consistent. There was no intention to change the existing logic. Certainly, if 
decide to change the logic as [~apurtell] mentioned, client code of 
AccessControlClient.java is a place to start with... Demai

> Shell command user_permission fails on the table created by user if he is not 
> global admin.   
> --
>
> Key: HBASE-12073
> URL: https://issues.apache.org/jira/browse/HBASE-12073
> Project: HBase
>  Issue Type: Bug
>Reporter: Srikanth Srungarapu
>Assignee: Srikanth Srungarapu
>Priority: Minor
>
> The command fails as the changes introduced by HBASE-10892 requires user 
> (because of newly introduced call to getTableDescriptors) to have global 
> admin permission.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11617) incorrect AgeOfLastAppliedOp and AgeOfLastShippedOp in replication Metrics when no new replication OP

2014-08-23 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108278#comment-14108278
 ] 

Demai Ni commented on HBASE-11617:
--

[~apurtell] and [~lhofhansl], many thanks for the review and committing the 
patch

> incorrect AgeOfLastAppliedOp and AgeOfLastShippedOp in replication Metrics 
> when no new replication OP 
> --
>
> Key: HBASE-11617
> URL: https://issues.apache.org/jira/browse/HBASE-11617
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 0.98.2
>Reporter: Demai Ni
>Assignee: Demai Ni
>Priority: Minor
> Fix For: 0.99.0, 2.0.0, 0.98.6
>
> Attachments: HBASE-11617-master-v1.patch
>
>
> AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in 
> the 'replication queue' before it got replicated(aka applied)
> {code}
>   /**
>* Set the age of the last applied operation
>*
>* @param timestamp The timestamp of the last operation applied.
>* @return the age that was set
>*/
>   public long setAgeOfLastAppliedOp(long timestamp) {
> lastTimestampForAge = timestamp;
> long age = System.currentTimeMillis() - lastTimestampForAge;
> rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
> return age;
>   } 
> {code}
> In the following scenario:
> 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is
> set for example 100ms;
> 2) and then NO new Sink op occur.
> 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of
> return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, 
> It was because that refreshAgeOfLastAppliedOp() get invoked periodically by 
> getStats(). 
> proposed fix: 
> {code}
> --- 
> hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java
> +++ 
> hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java
> @@ -35,6 +35,7 @@ public class MetricsSink {
>  
>private MetricsReplicationSource rms;
>private long lastTimestampForAge = System.currentTimeMillis();
> +  private long age = 0;
>  
>public MetricsSink() {
>  rms = 
> CompatibilitySingletonFactory.getInstance(MetricsReplicationSource.class);
> @@ -47,8 +48,12 @@ public class MetricsSink {
> * @return the age that was set
> */
>public long setAgeOfLastAppliedOp(long timestamp) {
> -lastTimestampForAge = timestamp;
> -long age = System.currentTimeMillis() - lastTimestampForAge;
> +if (lastTimestampForAge != timestamp) {
> +  lastTimestampForAge = timestamp;
> +  this.age = System.currentTimeMillis() - lastTimestampForAge;
> +} else {
> +  this.age = 0;
> +}
>  rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
>  return age;
>}
> {code}
> detail discussion in [dev@hbase  | 
> http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E
>  ]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11617) incorrect AgeOfLastAppliedOp and AgeOfLastShippedOp in replication Metrics when no new replication OP

2014-08-22 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107741#comment-14107741
 ] 

Demai Ni commented on HBASE-11617:
--

[~andrew.purt...@gmail.com], thanks for the ping
[~lhofhansl], what's your take on this? maybe we can commit the current fix, 
and consider to remove .refreshAgeOfLastAppliedOp in a later refactoring 
effort? 

> incorrect AgeOfLastAppliedOp and AgeOfLastShippedOp in replication Metrics 
> when no new replication OP 
> --
>
> Key: HBASE-11617
> URL: https://issues.apache.org/jira/browse/HBASE-11617
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 0.98.2
>Reporter: Demai Ni
>Assignee: Demai Ni
>Priority: Minor
> Fix For: 0.99.0, 2.0.0, 0.98.6
>
> Attachments: HBASE-11617-master-v1.patch
>
>
> AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in 
> the 'replication queue' before it got replicated(aka applied)
> {code}
>   /**
>* Set the age of the last applied operation
>*
>* @param timestamp The timestamp of the last operation applied.
>* @return the age that was set
>*/
>   public long setAgeOfLastAppliedOp(long timestamp) {
> lastTimestampForAge = timestamp;
> long age = System.currentTimeMillis() - lastTimestampForAge;
> rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
> return age;
>   } 
> {code}
> In the following scenario:
> 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is
> set for example 100ms;
> 2) and then NO new Sink op occur.
> 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of
> return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, 
> It was because that refreshAgeOfLastAppliedOp() get invoked periodically by 
> getStats(). 
> proposed fix: 
> {code}
> --- 
> hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java
> +++ 
> hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java
> @@ -35,6 +35,7 @@ public class MetricsSink {
>  
>private MetricsReplicationSource rms;
>private long lastTimestampForAge = System.currentTimeMillis();
> +  private long age = 0;
>  
>public MetricsSink() {
>  rms = 
> CompatibilitySingletonFactory.getInstance(MetricsReplicationSource.class);
> @@ -47,8 +48,12 @@ public class MetricsSink {
> * @return the age that was set
> */
>public long setAgeOfLastAppliedOp(long timestamp) {
> -lastTimestampForAge = timestamp;
> -long age = System.currentTimeMillis() - lastTimestampForAge;
> +if (lastTimestampForAge != timestamp) {
> +  lastTimestampForAge = timestamp;
> +  this.age = System.currentTimeMillis() - lastTimestampForAge;
> +} else {
> +  this.age = 0;
> +}
>  rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
>  return age;
>}
> {code}
> detail discussion in [dev@hbase  | 
> http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E
>  ]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

2014-08-12 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094323#comment-14094323
 ] 

Demai Ni commented on HBASE-9531:
-

again, the '-1 lineLengths' are for the generated protobuff code and jruby 
script, should be ok. 

[~apurtell],[~enis], does the new patch match your suggestions? thanks... Demai

> a command line (hbase shell) interface to retreive the replication metrics 
> and show replication lag
> ---
>
> Key: HBASE-9531
> URL: https://issues.apache.org/jira/browse/HBASE-9531
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Affects Versions: 0.99.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.6
>
> Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, 
> HBASE-9531-master-v1.patch, HBASE-9531-master-v2.patch, 
> HBASE-9531-master-v3.patch, HBASE-9531-trunk-v0.patch, 
> HBASE-9531-trunk-v0.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive 
> the replication metrics info such as:ageOfLastShippedOp, 
> timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and 
> timeStampsOfLastAppliedOp. And also to provide a point of time info of the 
> lag of replication(source only)
> Understand that hbase is using Hadoop 
> metrics(http://hbase.apache.org/metrics.html), which is a common way to 
> monitor metric info. This Jira is to serve as a light-weight client 
> interface, comparing to a completed(certainly better, but heavier)GUI 
> monitoring package. I made the code works on 0.94.9 now, and like to use this 
> jira to get opinions about whether the feature is valuable to other 
> users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell 
> command 'status', and invent a new module, called ReplicationLoad.  In 
> HRegionServer.buildServerLoad() , use the local replication service objects 
> to get their loads  which could be wrapped in a ReplicationLoad object and 
> then simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
> ReplicationSinkMetrics, a few getters and setters will be created, and ask 
> Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for 
> his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
>   if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
> timeStampsOfLastShippedOp)) //err on the large side
>   else if (current time - timeStampsOfLastShippedOp) < 2* 
> ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
> recently 
> else lag = 0 // last shipped may happens last night, so NO real lag 
> although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','sink'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.

[jira] [Updated] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

2014-08-11 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-9531:


Attachment: HBASE-9531-master-v3.patch

upload v3 patch. 
[~apurtell], can you please take a look again? Originally, I added the 
ReplicationLoadSink and ReplicationLoadSource inside ReplicationLoad.java. 
However, it turns out that ProtobufUtil.java can't import ReplicationLoad 
directly as it is under hbase-server. So I just put the two new files under 
hbase-client. 

> a command line (hbase shell) interface to retreive the replication metrics 
> and show replication lag
> ---
>
> Key: HBASE-9531
> URL: https://issues.apache.org/jira/browse/HBASE-9531
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Affects Versions: 0.99.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.6
>
> Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, 
> HBASE-9531-master-v1.patch, HBASE-9531-master-v2.patch, 
> HBASE-9531-master-v3.patch, HBASE-9531-trunk-v0.patch, 
> HBASE-9531-trunk-v0.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive 
> the replication metrics info such as:ageOfLastShippedOp, 
> timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and 
> timeStampsOfLastAppliedOp. And also to provide a point of time info of the 
> lag of replication(source only)
> Understand that hbase is using Hadoop 
> metrics(http://hbase.apache.org/metrics.html), which is a common way to 
> monitor metric info. This Jira is to serve as a light-weight client 
> interface, comparing to a completed(certainly better, but heavier)GUI 
> monitoring package. I made the code works on 0.94.9 now, and like to use this 
> jira to get opinions about whether the feature is valuable to other 
> users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell 
> command 'status', and invent a new module, called ReplicationLoad.  In 
> HRegionServer.buildServerLoad() , use the local replication service objects 
> to get their loads  which could be wrapped in a ReplicationLoad object and 
> then simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
> ReplicationSinkMetrics, a few getters and setters will be created, and ask 
> Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for 
> his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
>   if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
> timeStampsOfLastShippedOp)) //err on the large side
>   else if (current time - timeStampsOfLastShippedOp) < 2* 
> ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
> recently 
> else lag = 0 // last shipped may happens last night, so NO real lag 
> although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','sink'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtes

[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

2014-07-31 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081731#comment-14081731
 ] 

Demai Ni commented on HBASE-9531:
-

[~apurtell], thanks for the tip. I will be out of town for a week and no access 
to the enviroment. So change the target to 98.6 due to the delay on the new 
patch and miss the 98.5 cut time.. Demai

> a command line (hbase shell) interface to retreive the replication metrics 
> and show replication lag
> ---
>
> Key: HBASE-9531
> URL: https://issues.apache.org/jira/browse/HBASE-9531
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Affects Versions: 0.99.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.6
>
> Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, 
> HBASE-9531-master-v1.patch, HBASE-9531-master-v2.patch, 
> HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive 
> the replication metrics info such as:ageOfLastShippedOp, 
> timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and 
> timeStampsOfLastAppliedOp. And also to provide a point of time info of the 
> lag of replication(source only)
> Understand that hbase is using Hadoop 
> metrics(http://hbase.apache.org/metrics.html), which is a common way to 
> monitor metric info. This Jira is to serve as a light-weight client 
> interface, comparing to a completed(certainly better, but heavier)GUI 
> monitoring package. I made the code works on 0.94.9 now, and like to use this 
> jira to get opinions about whether the feature is valuable to other 
> users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell 
> command 'status', and invent a new module, called ReplicationLoad.  In 
> HRegionServer.buildServerLoad() , use the local replication service objects 
> to get their loads  which could be wrapped in a ReplicationLoad object and 
> then simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
> ReplicationSinkMetrics, a few getters and setters will be created, and ask 
> Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for 
> his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
>   if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
> timeStampsOfLastShippedOp)) //err on the large side
>   else if (current time - timeStampsOfLastShippedOp) < 2* 
> ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
> recently 
> else lag = 0 // last shipped may happens last night, so NO real lag 
> although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','sink'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>  

[jira] [Updated] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

2014-07-31 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-9531:


Fix Version/s: (was: 0.98.5)
   0.98.6

> a command line (hbase shell) interface to retreive the replication metrics 
> and show replication lag
> ---
>
> Key: HBASE-9531
> URL: https://issues.apache.org/jira/browse/HBASE-9531
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Affects Versions: 0.99.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.6
>
> Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, 
> HBASE-9531-master-v1.patch, HBASE-9531-master-v2.patch, 
> HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive 
> the replication metrics info such as:ageOfLastShippedOp, 
> timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and 
> timeStampsOfLastAppliedOp. And also to provide a point of time info of the 
> lag of replication(source only)
> Understand that hbase is using Hadoop 
> metrics(http://hbase.apache.org/metrics.html), which is a common way to 
> monitor metric info. This Jira is to serve as a light-weight client 
> interface, comparing to a completed(certainly better, but heavier)GUI 
> monitoring package. I made the code works on 0.94.9 now, and like to use this 
> jira to get opinions about whether the feature is valuable to other 
> users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell 
> command 'status', and invent a new module, called ReplicationLoad.  In 
> HRegionServer.buildServerLoad() , use the local replication service objects 
> to get their loads  which could be wrapped in a ReplicationLoad object and 
> then simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
> ReplicationSinkMetrics, a few getters and setters will be created, and ask 
> Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for 
> his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
>   if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
> timeStampsOfLastShippedOp)) //err on the large side
>   else if (current time - timeStampsOfLastShippedOp) < 2* 
> ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
> recently 
> else lag = 0 // last shipped may happens last night, so NO real lag 
> although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','sink'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','lag' 
> version 0.94.9
> 3 live servers
>     hdtest017.svl.

[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

2014-07-31 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081485#comment-14081485
 ] 

Demai Ni commented on HBASE-9531:
-

[~apurtell]

so we shouldn't expose protobuf objects to client (such as hbase shell) 
directly? 

I was considering to implement gettings for each individual value in 
*ReplicationLoad*, such as 
*sink.getAgeOfLastAppliedOp() 
*sink.getTimeStampsOfLastAppliedOp()

 however, when I implemented source, it become too complex(as it is a list, so 
each value will become something like 
*HashMap getAgeOfLastShippedOp() {}
*HashMap getSizeOfLogQueue(){}
*HashMap getTimeStampsOfLastShippedOp(){}
*HashMap getReplicationLag(){}
I feel it is kind of over-engineered.  Hence, the current implementation. 

Demai



> a command line (hbase shell) interface to retreive the replication metrics 
> and show replication lag
> ---
>
> Key: HBASE-9531
> URL: https://issues.apache.org/jira/browse/HBASE-9531
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Affects Versions: 0.99.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.5
>
> Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, 
> HBASE-9531-master-v1.patch, HBASE-9531-master-v2.patch, 
> HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive 
> the replication metrics info such as:ageOfLastShippedOp, 
> timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and 
> timeStampsOfLastAppliedOp. And also to provide a point of time info of the 
> lag of replication(source only)
> Understand that hbase is using Hadoop 
> metrics(http://hbase.apache.org/metrics.html), which is a common way to 
> monitor metric info. This Jira is to serve as a light-weight client 
> interface, comparing to a completed(certainly better, but heavier)GUI 
> monitoring package. I made the code works on 0.94.9 now, and like to use this 
> jira to get opinions about whether the feature is valuable to other 
> users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell 
> command 'status', and invent a new module, called ReplicationLoad.  In 
> HRegionServer.buildServerLoad() , use the local replication service objects 
> to get their loads  which could be wrapped in a ReplicationLoad object and 
> then simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
> ReplicationSinkMetrics, a few getters and setters will be created, and ask 
> Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for 
> his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
>   if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
> timeStampsOfLastShippedOp)) //err on the large side
>   else if (current time - timeStampsOfLastShippedOp) < 2* 
> ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
> recently 
> else lag = 0 // last shipped may happens last night, so NO real lag 
> although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastSh

[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

2014-07-31 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080992#comment-14080992
 ] 

Demai Ni commented on HBASE-9531:
-

bq. -1 lineLengths. The patch introduces the following lines longer than 100:
the long lines come from either the generated protobuf code or the jruby hbase 
shell code, both have existing long lines already

the failed testcases show up in other jiras' recently, and are not related with 
this patch

[~apurtell] and [~enis], would you please take another look at this new patch? 
Thanks. 

BTW, I am working on another small fix [HBASE-11617 | 
https://issues.apache.org/jira/browse/HBASE-11617]. the patches conflict with 
each other. I will resolve the conflict after one of the two committed. 

Demai


> a command line (hbase shell) interface to retreive the replication metrics 
> and show replication lag
> ---
>
> Key: HBASE-9531
> URL: https://issues.apache.org/jira/browse/HBASE-9531
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Affects Versions: 0.99.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.5
>
> Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, 
> HBASE-9531-master-v1.patch, HBASE-9531-master-v2.patch, 
> HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive 
> the replication metrics info such as:ageOfLastShippedOp, 
> timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and 
> timeStampsOfLastAppliedOp. And also to provide a point of time info of the 
> lag of replication(source only)
> Understand that hbase is using Hadoop 
> metrics(http://hbase.apache.org/metrics.html), which is a common way to 
> monitor metric info. This Jira is to serve as a light-weight client 
> interface, comparing to a completed(certainly better, but heavier)GUI 
> monitoring package. I made the code works on 0.94.9 now, and like to use this 
> jira to get opinions about whether the feature is valuable to other 
> users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell 
> command 'status', and invent a new module, called ReplicationLoad.  In 
> HRegionServer.buildServerLoad() , use the local replication service objects 
> to get their loads  which could be wrapped in a ReplicationLoad object and 
> then simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
> ReplicationSinkMetrics, a few getters and setters will be created, and ask 
> Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for 
> his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
>   if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
> timeStampsOfLastShippedOp)) //err on the large side
>   else if (current time - timeStampsOfLastShippedOp) < 2* 
> ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
> recently 
> else lag = 0 // last shipped may happens last night, so NO real lag 
> although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShi

[jira] [Updated] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

2014-07-30 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-9531:


Attachment: HBASE-9531-master-v2.patch

uploaded v2 patch for master which is changed according to [~enis] suggestion

> a command line (hbase shell) interface to retreive the replication metrics 
> and show replication lag
> ---
>
> Key: HBASE-9531
> URL: https://issues.apache.org/jira/browse/HBASE-9531
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Affects Versions: 0.99.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.5
>
> Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, 
> HBASE-9531-master-v1.patch, HBASE-9531-master-v2.patch, 
> HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive 
> the replication metrics info such as:ageOfLastShippedOp, 
> timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and 
> timeStampsOfLastAppliedOp. And also to provide a point of time info of the 
> lag of replication(source only)
> Understand that hbase is using Hadoop 
> metrics(http://hbase.apache.org/metrics.html), which is a common way to 
> monitor metric info. This Jira is to serve as a light-weight client 
> interface, comparing to a completed(certainly better, but heavier)GUI 
> monitoring package. I made the code works on 0.94.9 now, and like to use this 
> jira to get opinions about whether the feature is valuable to other 
> users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell 
> command 'status', and invent a new module, called ReplicationLoad.  In 
> HRegionServer.buildServerLoad() , use the local replication service objects 
> to get their loads  which could be wrapped in a ReplicationLoad object and 
> then simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
> ReplicationSinkMetrics, a few getters and setters will be created, and ask 
> Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for 
> his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
>   if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
> timeStampsOfLastShippedOp)) //err on the large side
>   else if (current time - timeStampsOfLastShippedOp) < 2* 
> ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
> recently 
> else lag = 0 // last shipped may happens last night, so NO real lag 
> although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','sink'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','l

[jira] [Commented] (HBASE-11617) incorrect AgeOfLastAppliedOp and AgeOfLastShippedOp in replication Metrics when no new replication OP

2014-07-30 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080269#comment-14080269
 ] 

Demai Ni commented on HBASE-11617:
--

don't think the failed testcases are related with this patch. the same failures 
also show up in other jiras from recent testing

> incorrect AgeOfLastAppliedOp and AgeOfLastShippedOp in replication Metrics 
> when no new replication OP 
> --
>
> Key: HBASE-11617
> URL: https://issues.apache.org/jira/browse/HBASE-11617
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 0.98.2
>Reporter: Demai Ni
>Assignee: Demai Ni
>Priority: Minor
> Fix For: 0.99.0, 0.98.5, 2.0.0
>
> Attachments: HBASE-11617-master-v1.patch
>
>
> AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in 
> the 'replication queue' before it got replicated(aka applied)
> {code}
>   /**
>* Set the age of the last applied operation
>*
>* @param timestamp The timestamp of the last operation applied.
>* @return the age that was set
>*/
>   public long setAgeOfLastAppliedOp(long timestamp) {
> lastTimestampForAge = timestamp;
> long age = System.currentTimeMillis() - lastTimestampForAge;
> rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
> return age;
>   } 
> {code}
> In the following scenario:
> 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is
> set for example 100ms;
> 2) and then NO new Sink op occur.
> 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of
> return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, 
> It was because that refreshAgeOfLastAppliedOp() get invoked periodically by 
> getStats(). 
> proposed fix: 
> {code}
> --- 
> hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java
> +++ 
> hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java
> @@ -35,6 +35,7 @@ public class MetricsSink {
>  
>private MetricsReplicationSource rms;
>private long lastTimestampForAge = System.currentTimeMillis();
> +  private long age = 0;
>  
>public MetricsSink() {
>  rms = 
> CompatibilitySingletonFactory.getInstance(MetricsReplicationSource.class);
> @@ -47,8 +48,12 @@ public class MetricsSink {
> * @return the age that was set
> */
>public long setAgeOfLastAppliedOp(long timestamp) {
> -lastTimestampForAge = timestamp;
> -long age = System.currentTimeMillis() - lastTimestampForAge;
> +if (lastTimestampForAge != timestamp) {
> +  lastTimestampForAge = timestamp;
> +  this.age = System.currentTimeMillis() - lastTimestampForAge;
> +} else {
> +  this.age = 0;
> +}
>  rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
>  return age;
>}
> {code}
> detail discussion in [dev@hbase  | 
> http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E
>  ]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11617) incorrect AgeOfLastAppliedOp and AgeOfLastShippedOp in replication Metrics when no new replication OP

2014-07-30 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080026#comment-14080026
 ] 

Demai Ni commented on HBASE-11617:
--

btw, with this patch, I am not sure what the purpose of  
MetricsSink.refreshAgeOfLastAppliedOp() ? As it will be ignored and always 
return age = 0; 

> incorrect AgeOfLastAppliedOp and AgeOfLastShippedOp in replication Metrics 
> when no new replication OP 
> --
>
> Key: HBASE-11617
> URL: https://issues.apache.org/jira/browse/HBASE-11617
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 0.98.2
>Reporter: Demai Ni
>Assignee: Demai Ni
>Priority: Minor
> Fix For: 0.99.0, 0.98.5, 2.0.0
>
> Attachments: HBASE-11617-master-v1.patch
>
>
> AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in 
> the 'replication queue' before it got replicated(aka applied)
> {code}
>   /**
>* Set the age of the last applied operation
>*
>* @param timestamp The timestamp of the last operation applied.
>* @return the age that was set
>*/
>   public long setAgeOfLastAppliedOp(long timestamp) {
> lastTimestampForAge = timestamp;
> long age = System.currentTimeMillis() - lastTimestampForAge;
> rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
> return age;
>   } 
> {code}
> In the following scenario:
> 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is
> set for example 100ms;
> 2) and then NO new Sink op occur.
> 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of
> return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, 
> It was because that refreshAgeOfLastAppliedOp() get invoked periodically by 
> getStats(). 
> proposed fix: 
> {code}
> --- 
> hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java
> +++ 
> hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java
> @@ -35,6 +35,7 @@ public class MetricsSink {
>  
>private MetricsReplicationSource rms;
>private long lastTimestampForAge = System.currentTimeMillis();
> +  private long age = 0;
>  
>public MetricsSink() {
>  rms = 
> CompatibilitySingletonFactory.getInstance(MetricsReplicationSource.class);
> @@ -47,8 +48,12 @@ public class MetricsSink {
> * @return the age that was set
> */
>public long setAgeOfLastAppliedOp(long timestamp) {
> -lastTimestampForAge = timestamp;
> -long age = System.currentTimeMillis() - lastTimestampForAge;
> +if (lastTimestampForAge != timestamp) {
> +  lastTimestampForAge = timestamp;
> +  this.age = System.currentTimeMillis() - lastTimestampForAge;
> +} else {
> +  this.age = 0;
> +}
>  rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
>  return age;
>}
> {code}
> detail discussion in [dev@hbase  | 
> http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E
>  ]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11143) Improve replication metrics

2014-07-30 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080018#comment-14080018
 ] 

Demai Ni commented on HBASE-11143:
--

thanks to [~lhofhansl]'s suggestion, the patch is uploaded in [HBASE-11617 | 
https://issues.apache.org/jira/browse/HBASE-11617] 

> Improve replication metrics
> ---
>
> Key: HBASE-11143
> URL: https://issues.apache.org/jira/browse/HBASE-11143
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.99.0, 0.94.20, 0.98.3
>
> Attachments: 11143-0.94-v2.txt, 11143-0.94-v3.txt, 11143-0.94.txt, 
> 11143-trunk.txt
>
>
> We are trying to report on replication lag and find that there is no good 
> single metric to do that.
> ageOfLastShippedOp is close, but unfortunately it is increased even when 
> there is nothing to ship on a particular RegionServer.
> I would like discuss a few options here:
> Add a new metric: replicationQueueTime (or something) with the above meaning. 
> I.e. if we have something to ship we set the age of that last shipped edit, 
> if we fail we increment that last time (just like we do now). But if there is 
> nothing to replicate we set it to current time (and hence that metric is 
> reported to close to 0).
> Alternatively we could change the meaning of ageOfLastShippedOp to mean to do 
> that. That might lead to surprises, but the current behavior is clearly weird 
> when there is nothing to replicate.
> Comments? [~jdcryans], [~stack].
> If approach sounds good, I'll make a patch for all branches.
> Edit: Also adds a new shippedKBs metric to track the amount of data that is 
> shipped via replication.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11617) incorrect AgeOfLastAppliedOp and AgeOfLastShippedOp in replication Metrics when no new replication OP

2014-07-30 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11617:
-

Status: Patch Available  (was: In Progress)

[~lhofhansl], would you please take a look at the patch, whether it matches 
your take? thanks  

> incorrect AgeOfLastAppliedOp and AgeOfLastShippedOp in replication Metrics 
> when no new replication OP 
> --
>
> Key: HBASE-11617
> URL: https://issues.apache.org/jira/browse/HBASE-11617
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 0.98.2
>Reporter: Demai Ni
>Assignee: Demai Ni
>Priority: Minor
> Fix For: 0.99.0, 0.98.5, 2.0.0
>
> Attachments: HBASE-11617-master-v1.patch
>
>
> AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in 
> the 'replication queue' before it got replicated(aka applied)
> {code}
>   /**
>* Set the age of the last applied operation
>*
>* @param timestamp The timestamp of the last operation applied.
>* @return the age that was set
>*/
>   public long setAgeOfLastAppliedOp(long timestamp) {
> lastTimestampForAge = timestamp;
> long age = System.currentTimeMillis() - lastTimestampForAge;
> rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
> return age;
>   } 
> {code}
> In the following scenario:
> 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is
> set for example 100ms;
> 2) and then NO new Sink op occur.
> 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of
> return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, 
> It was because that refreshAgeOfLastAppliedOp() get invoked periodically by 
> getStats(). 
> proposed fix: 
> {code}
> --- 
> hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java
> +++ 
> hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java
> @@ -35,6 +35,7 @@ public class MetricsSink {
>  
>private MetricsReplicationSource rms;
>private long lastTimestampForAge = System.currentTimeMillis();
> +  private long age = 0;
>  
>public MetricsSink() {
>  rms = 
> CompatibilitySingletonFactory.getInstance(MetricsReplicationSource.class);
> @@ -47,8 +48,12 @@ public class MetricsSink {
> * @return the age that was set
> */
>public long setAgeOfLastAppliedOp(long timestamp) {
> -lastTimestampForAge = timestamp;
> -long age = System.currentTimeMillis() - lastTimestampForAge;
> +if (lastTimestampForAge != timestamp) {
> +  lastTimestampForAge = timestamp;
> +  this.age = System.currentTimeMillis() - lastTimestampForAge;
> +} else {
> +  this.age = 0;
> +}
>  rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
>  return age;
>}
> {code}
> detail discussion in [dev@hbase  | 
> http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E
>  ]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11617) incorrect AgeOfLastAppliedOp and AgeOfLastShippedOp in replication Metrics when no new replication OP

2014-07-30 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11617:
-

Summary: incorrect AgeOfLastAppliedOp and AgeOfLastShippedOp in replication 
Metrics when no new replication OP   (was: AgeOfLastAppliedOp in MetricsSink 
got increased when no new replication sink OP )

> incorrect AgeOfLastAppliedOp and AgeOfLastShippedOp in replication Metrics 
> when no new replication OP 
> --
>
> Key: HBASE-11617
> URL: https://issues.apache.org/jira/browse/HBASE-11617
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 0.98.2
>Reporter: Demai Ni
>Assignee: Demai Ni
>Priority: Minor
> Fix For: 0.99.0, 0.98.5, 2.0.0
>
> Attachments: HBASE-11617-master-v1.patch
>
>
> AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in 
> the 'replication queue' before it got replicated(aka applied)
> {code}
>   /**
>* Set the age of the last applied operation
>*
>* @param timestamp The timestamp of the last operation applied.
>* @return the age that was set
>*/
>   public long setAgeOfLastAppliedOp(long timestamp) {
> lastTimestampForAge = timestamp;
> long age = System.currentTimeMillis() - lastTimestampForAge;
> rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
> return age;
>   } 
> {code}
> In the following scenario:
> 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is
> set for example 100ms;
> 2) and then NO new Sink op occur.
> 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of
> return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, 
> It was because that refreshAgeOfLastAppliedOp() get invoked periodically by 
> getStats(). 
> proposed fix: 
> {code}
> --- 
> hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java
> +++ 
> hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java
> @@ -35,6 +35,7 @@ public class MetricsSink {
>  
>private MetricsReplicationSource rms;
>private long lastTimestampForAge = System.currentTimeMillis();
> +  private long age = 0;
>  
>public MetricsSink() {
>  rms = 
> CompatibilitySingletonFactory.getInstance(MetricsReplicationSource.class);
> @@ -47,8 +48,12 @@ public class MetricsSink {
> * @return the age that was set
> */
>public long setAgeOfLastAppliedOp(long timestamp) {
> -lastTimestampForAge = timestamp;
> -long age = System.currentTimeMillis() - lastTimestampForAge;
> +if (lastTimestampForAge != timestamp) {
> +  lastTimestampForAge = timestamp;
> +  this.age = System.currentTimeMillis() - lastTimestampForAge;
> +} else {
> +  this.age = 0;
> +}
>  rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
>  return age;
>}
> {code}
> detail discussion in [dev@hbase  | 
> http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E
>  ]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11617) AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink OP

2014-07-30 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11617:
-

Attachment: HBASE-11617-master-v1.patch

upload the patch for both AgeOfLastAppliedOp and AgeOfLatShippedOp(from 
[HBase-11143 | https://issues.apache.org/jira/browse/HBASE-11143] )

> AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink 
> OP 
> 
>
> Key: HBASE-11617
> URL: https://issues.apache.org/jira/browse/HBASE-11617
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 0.98.2
>Reporter: Demai Ni
>Assignee: Demai Ni
>Priority: Minor
> Fix For: 0.99.0, 0.98.5, 2.0.0
>
> Attachments: HBASE-11617-master-v1.patch
>
>
> AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in 
> the 'replication queue' before it got replicated(aka applied)
> {code}
>   /**
>* Set the age of the last applied operation
>*
>* @param timestamp The timestamp of the last operation applied.
>* @return the age that was set
>*/
>   public long setAgeOfLastAppliedOp(long timestamp) {
> lastTimestampForAge = timestamp;
> long age = System.currentTimeMillis() - lastTimestampForAge;
> rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
> return age;
>   } 
> {code}
> In the following scenario:
> 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is
> set for example 100ms;
> 2) and then NO new Sink op occur.
> 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of
> return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, 
> It was because that refreshAgeOfLastAppliedOp() get invoked periodically by 
> getStats(). 
> proposed fix: 
> {code}
> --- 
> hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java
> +++ 
> hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java
> @@ -35,6 +35,7 @@ public class MetricsSink {
>  
>private MetricsReplicationSource rms;
>private long lastTimestampForAge = System.currentTimeMillis();
> +  private long age = 0;
>  
>public MetricsSink() {
>  rms = 
> CompatibilitySingletonFactory.getInstance(MetricsReplicationSource.class);
> @@ -47,8 +48,12 @@ public class MetricsSink {
> * @return the age that was set
> */
>public long setAgeOfLastAppliedOp(long timestamp) {
> -lastTimestampForAge = timestamp;
> -long age = System.currentTimeMillis() - lastTimestampForAge;
> +if (lastTimestampForAge != timestamp) {
> +  lastTimestampForAge = timestamp;
> +  this.age = System.currentTimeMillis() - lastTimestampForAge;
> +} else {
> +  this.age = 0;
> +}
>  rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
>  return age;
>}
> {code}
> detail discussion in [dev@hbase  | 
> http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E
>  ]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Work started] (HBASE-11617) AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink OP

2014-07-30 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-11617 started by Demai Ni.

> AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink 
> OP 
> 
>
> Key: HBASE-11617
> URL: https://issues.apache.org/jira/browse/HBASE-11617
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 0.98.2
>Reporter: Demai Ni
>Assignee: Demai Ni
>Priority: Minor
> Fix For: 0.99.0, 0.98.5, 2.0.0
>
>
> AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in 
> the 'replication queue' before it got replicated(aka applied)
> {code}
>   /**
>* Set the age of the last applied operation
>*
>* @param timestamp The timestamp of the last operation applied.
>* @return the age that was set
>*/
>   public long setAgeOfLastAppliedOp(long timestamp) {
> lastTimestampForAge = timestamp;
> long age = System.currentTimeMillis() - lastTimestampForAge;
> rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
> return age;
>   } 
> {code}
> In the following scenario:
> 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is
> set for example 100ms;
> 2) and then NO new Sink op occur.
> 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of
> return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, 
> It was because that refreshAgeOfLastAppliedOp() get invoked periodically by 
> getStats(). 
> proposed fix: 
> {code}
> --- 
> hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java
> +++ 
> hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java
> @@ -35,6 +35,7 @@ public class MetricsSink {
>  
>private MetricsReplicationSource rms;
>private long lastTimestampForAge = System.currentTimeMillis();
> +  private long age = 0;
>  
>public MetricsSink() {
>  rms = 
> CompatibilitySingletonFactory.getInstance(MetricsReplicationSource.class);
> @@ -47,8 +48,12 @@ public class MetricsSink {
> * @return the age that was set
> */
>public long setAgeOfLastAppliedOp(long timestamp) {
> -lastTimestampForAge = timestamp;
> -long age = System.currentTimeMillis() - lastTimestampForAge;
> +if (lastTimestampForAge != timestamp) {
> +  lastTimestampForAge = timestamp;
> +  this.age = System.currentTimeMillis() - lastTimestampForAge;
> +} else {
> +  this.age = 0;
> +}
>  rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
>  return age;
>}
> {code}
> detail discussion in [dev@hbase  | 
> http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E
>  ]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11617) AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink OP

2014-07-30 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079818#comment-14079818
 ] 

Demai Ni commented on HBASE-11617:
--

actually, putting the checking in MetricsSink.refreshAgeOfLastAppliedOp() may 
be better? 

> AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink 
> OP 
> 
>
> Key: HBASE-11617
> URL: https://issues.apache.org/jira/browse/HBASE-11617
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 0.98.2
>Reporter: Demai Ni
>Assignee: Demai Ni
>Priority: Minor
> Fix For: 0.99.0, 0.98.5, 2.0.0
>
>
> AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in 
> the 'replication queue' before it got replicated(aka applied)
> {code}
>   /**
>* Set the age of the last applied operation
>*
>* @param timestamp The timestamp of the last operation applied.
>* @return the age that was set
>*/
>   public long setAgeOfLastAppliedOp(long timestamp) {
> lastTimestampForAge = timestamp;
> long age = System.currentTimeMillis() - lastTimestampForAge;
> rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
> return age;
>   } 
> {code}
> In the following scenario:
> 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is
> set for example 100ms;
> 2) and then NO new Sink op occur.
> 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of
> return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, 
> It was because that refreshAgeOfLastAppliedOp() get invoked periodically by 
> getStats(). 
> proposed fix: 
> {code}
> --- 
> hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java
> +++ 
> hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java
> @@ -35,6 +35,7 @@ public class MetricsSink {
>  
>private MetricsReplicationSource rms;
>private long lastTimestampForAge = System.currentTimeMillis();
> +  private long age = 0;
>  
>public MetricsSink() {
>  rms = 
> CompatibilitySingletonFactory.getInstance(MetricsReplicationSource.class);
> @@ -47,8 +48,12 @@ public class MetricsSink {
> * @return the age that was set
> */
>public long setAgeOfLastAppliedOp(long timestamp) {
> -lastTimestampForAge = timestamp;
> -long age = System.currentTimeMillis() - lastTimestampForAge;
> +if (lastTimestampForAge != timestamp) {
> +  lastTimestampForAge = timestamp;
> +  this.age = System.currentTimeMillis() - lastTimestampForAge;
> +} else {
> +  this.age = 0;
> +}
>  rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
>  return age;
>}
> {code}
> detail discussion in [dev@hbase  | 
> http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E
>  ]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11617) AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink OP

2014-07-30 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079784#comment-14079784
 ] 

Demai Ni commented on HBASE-11617:
--

[~lhofhansl], thanks for confirming the problem.

bq. Can we just not refresh from getStats? That way the metric retains the 
value it was last set to by ReplicationSink.

I am not sure how to stop refresh getStats(), it is a public method, which can 
be invoke by other application. And it is also invoked by 
ReplicationStatisticsThread. Also the invocation won't pass in a parm to check 
whether a refresh is needed.  Suggestions? 

 Demai

> AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink 
> OP 
> 
>
> Key: HBASE-11617
> URL: https://issues.apache.org/jira/browse/HBASE-11617
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 0.98.2
>Reporter: Demai Ni
>Assignee: Demai Ni
>Priority: Minor
> Fix For: 0.99.0, 0.98.5, 2.0.0
>
>
> AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in 
> the 'replication queue' before it got replicated(aka applied)
> {code}
>   /**
>* Set the age of the last applied operation
>*
>* @param timestamp The timestamp of the last operation applied.
>* @return the age that was set
>*/
>   public long setAgeOfLastAppliedOp(long timestamp) {
> lastTimestampForAge = timestamp;
> long age = System.currentTimeMillis() - lastTimestampForAge;
> rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
> return age;
>   } 
> {code}
> In the following scenario:
> 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is
> set for example 100ms;
> 2) and then NO new Sink op occur.
> 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of
> return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, 
> It was because that refreshAgeOfLastAppliedOp() get invoked periodically by 
> getStats(). 
> proposed fix: 
> {code}
> --- 
> hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java
> +++ 
> hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java
> @@ -35,6 +35,7 @@ public class MetricsSink {
>  
>private MetricsReplicationSource rms;
>private long lastTimestampForAge = System.currentTimeMillis();
> +  private long age = 0;
>  
>public MetricsSink() {
>  rms = 
> CompatibilitySingletonFactory.getInstance(MetricsReplicationSource.class);
> @@ -47,8 +48,12 @@ public class MetricsSink {
> * @return the age that was set
> */
>public long setAgeOfLastAppliedOp(long timestamp) {
> -lastTimestampForAge = timestamp;
> -long age = System.currentTimeMillis() - lastTimestampForAge;
> +if (lastTimestampForAge != timestamp) {
> +  lastTimestampForAge = timestamp;
> +  this.age = System.currentTimeMillis() - lastTimestampForAge;
> +} else {
> +  this.age = 0;
> +}
>  rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
>  return age;
>}
> {code}
> detail discussion in [dev@hbase  | 
> http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E
>  ]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11617) AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink OP

2014-07-30 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11617:
-

Description: 
AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in 
the 'replication queue' before it got replicated(aka applied)
{code}

  /**
   * Set the age of the last applied operation
   *
   * @param timestamp The timestamp of the last operation applied.
   * @return the age that was set
   */
  public long setAgeOfLastAppliedOp(long timestamp) {
lastTimestampForAge = timestamp;
long age = System.currentTimeMillis() - lastTimestampForAge;
rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
return age;
  } 
{code}
In the following scenario:
1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is
set for example 100ms;
2) and then NO new Sink op occur.
3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of
return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, 

It was because that refreshAgeOfLastAppliedOp() get invoked periodically by 
getStats(). 

proposed fix: 
{code}
--- 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java
+++ 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSink.java
@@ -35,6 +35,7 @@ public class MetricsSink {
 
   private MetricsReplicationSource rms;
   private long lastTimestampForAge = System.currentTimeMillis();
+  private long age = 0;
 
   public MetricsSink() {
 rms = 
CompatibilitySingletonFactory.getInstance(MetricsReplicationSource.class);
@@ -47,8 +48,12 @@ public class MetricsSink {
* @return the age that was set
*/
   public long setAgeOfLastAppliedOp(long timestamp) {
-lastTimestampForAge = timestamp;
-long age = System.currentTimeMillis() - lastTimestampForAge;
+if (lastTimestampForAge != timestamp) {
+  lastTimestampForAge = timestamp;
+  this.age = System.currentTimeMillis() - lastTimestampForAge;
+} else {
+  this.age = 0;
+}
 rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
 return age;
   }
{code}

detail discussion in [dev@hbase  | 
http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E
 ]

  was:
AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in 
the 'replication queue' before it got replicated(aka applied)
{code}

  /**
   * Set the age of the last applied operation
   *
   * @param timestamp The timestamp of the last operation applied.
   * @return the age that was set
   */
  public long setAgeOfLastAppliedOp(long timestamp) {
lastTimestampForAge = timestamp;
long age = System.currentTimeMillis() - lastTimestampForAge;
rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
return age;
  } 
{code}
In the following scenario:
1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is
set for example 100ms;
2) and then NO new Sink op occur.
3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of
return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, 

It was because that refreshAgeOfLastAppliedOp() get invoked periodically by 
getStats(). 

proposed fix: 
{code}

// a new value 
+   private long age; 

  public long setAgeOfLastAppliedOp(long timestamp) {
+ if (lastTimestampForAge != timestamp) {
lastTimestampForAge = timestamp;
-   long age = System.currentTimeMillis() - lastTimestampForAge;
+this.age = System.currentTimeMillis() - lastTimestampForAge;
rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
+  } else {
+ this.age = 0; // no new Sink OP coming. the last one already applied
+  }
return age;
  }
{code}

detail discussion in [dev@hbase  | 
http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E
 ]


> AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink 
> OP 
> 
>
> Key: HBASE-11617
> URL: https://issues.apache.org/jira/browse/HBASE-11617
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 0.98.2
>Reporter: Demai Ni
>Assignee: Demai Ni
>Priority: Minor
> Fix For: 0.99.0, 0.98.5, 2.0.0
>
>
> AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in 
> the 'replication queue' before it got replicated(aka applied)
> {code}
>   /**
>* Set the age of the last applied operation
>*
>* @param timestamp The timestamp of the last operation applied.
>* @return the age that was set
>*/
>   public long setAgeOfLastAppliedOp(long timestamp) {
> lastTimestampForAge = timestamp;
> long age = System.currentTimeMi

[jira] [Updated] (HBASE-11617) AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink OP

2014-07-30 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11617:
-

Description: 
AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in 
the 'replication queue' before it got replicated(aka applied)
{code}

  /**
   * Set the age of the last applied operation
   *
   * @param timestamp The timestamp of the last operation applied.
   * @return the age that was set
   */
  public long setAgeOfLastAppliedOp(long timestamp) {
lastTimestampForAge = timestamp;
long age = System.currentTimeMillis() - lastTimestampForAge;
rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
return age;
  } 
{code}
In the following scenario:
1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is
set for example 100ms;
2) and then NO new Sink op occur.
3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of
return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, 

It was because that refreshAgeOfLastAppliedOp() get invoked periodically by 
getStats(). 

proposed fix: 
{code}

// a new value 
+   private long age; 

  public long setAgeOfLastAppliedOp(long timestamp) {
+ if (lastTimestampForAge != timestamp) {
lastTimestampForAge = timestamp;
-   long age = System.currentTimeMillis() - lastTimestampForAge;
+this.age = System.currentTimeMillis() - lastTimestampForAge;
rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
+  } else {
+ this.age = 0; // no new Sink OP coming. the last one already applied
+  }
return age;
  }
{code}

detail discussion in [dev@hbase  | 
http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E
 ]

  was:
AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in 
the 'replication queue' before it got replicated(aka applied)
{code}

  /**
   * Set the age of the last applied operation
   *
   * @param timestamp The timestamp of the last operation applied.
   * @return the age that was set
   */
  public long setAgeOfLastAppliedOp(long timestamp) {
lastTimestampForAge = timestamp;
long age = System.currentTimeMillis() - lastTimestampForAge;
rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
return age;
  } 
{code}
In the following scenario:
1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is
set for example 100ms;
2) and then NO new Sink op occur.
3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of
return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, 

It was because that refreshAgeOfLastAppliedOp() get invoked periodically by 
getStats(). 

proposed fix: 
{code}

// a new value 
+   private long age; 

  public long setAgeOfLastAppliedOp(long timestamp) {
+ if (lastTimestampForAge != timestamp) {
lastTimestampForAge = timestamp;
-   long age = System.currentTimeMillis() - lastTimestampForAge;
+this.age = System.currentTimeMillis() - lastTimestampForAge;
rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
}
return age;
  }
{code}

detail discussion in [dev@hbase  | 
http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E
 ]


> AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink 
> OP 
> 
>
> Key: HBASE-11617
> URL: https://issues.apache.org/jira/browse/HBASE-11617
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 0.98.2
>Reporter: Demai Ni
>Assignee: Demai Ni
>Priority: Minor
> Fix For: 0.99.0, 0.98.5, 2.0.0
>
>
> AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in 
> the 'replication queue' before it got replicated(aka applied)
> {code}
>   /**
>* Set the age of the last applied operation
>*
>* @param timestamp The timestamp of the last operation applied.
>* @return the age that was set
>*/
>   public long setAgeOfLastAppliedOp(long timestamp) {
> lastTimestampForAge = timestamp;
> long age = System.currentTimeMillis() - lastTimestampForAge;
> rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
> return age;
>   } 
> {code}
> In the following scenario:
> 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is
> set for example 100ms;
> 2) and then NO new Sink op occur.
> 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of
> return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, 
> It was because that refreshAgeOfLastAppliedOp() get invoked periodically by 
> getStats(). 
> proposed fix: 
> {code}
> 
> // a new value 
> +   private long age; 
> 
>   public l

[jira] [Updated] (HBASE-11617) AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink OP

2014-07-30 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11617:
-

Environment: (was: AgeOfLastAppliedOp in MetricsSink.java is to 
indicate the time an edit sat in the 'replication queue' before it got 
replicated(aka applied)
{code}

  /**
   * Set the age of the last applied operation
   *
   * @param timestamp The timestamp of the last operation applied.
   * @return the age that was set
   */
  public long setAgeOfLastAppliedOp(long timestamp) {
lastTimestampForAge = timestamp;
long age = System.currentTimeMillis() - lastTimestampForAge;
rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
return age;
  } 
{code}
In the following scenario:
1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is
set for example 100ms;
2) and then NO new Sink op occur.
3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of
return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, 

It was because that refreshAgeOfLastAppliedOp() get invoked periodically by 
getStats(). 

proposed fix: 
{code}

// a new value 
+   private long age; 

  public long setAgeOfLastAppliedOp(long timestamp) {
+ if (lastTimestampForAge != timestamp) {
lastTimestampForAge = timestamp;
-   long age = System.currentTimeMillis() - lastTimestampForAge;
+this.age = System.currentTimeMillis() - lastTimestampForAge;
rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
}
return age;
  }
{code}

detail discussion in [dev@hbase  | 
http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E
 ])

> AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink 
> OP 
> 
>
> Key: HBASE-11617
> URL: https://issues.apache.org/jira/browse/HBASE-11617
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 0.98.2
>Reporter: Demai Ni
>Assignee: Demai Ni
>Priority: Minor
> Fix For: 0.99.0, 0.98.5, 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11617) AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink OP

2014-07-30 Thread Demai Ni (JIRA)
Demai Ni created HBASE-11617:


 Summary: AgeOfLastAppliedOp in MetricsSink got increased when no 
new replication sink OP 
 Key: HBASE-11617
 URL: https://issues.apache.org/jira/browse/HBASE-11617
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.98.2
 Environment: AgeOfLastAppliedOp in MetricsSink.java is to indicate the 
time an edit sat in the 'replication queue' before it got replicated(aka 
applied)
{code}

  /**
   * Set the age of the last applied operation
   *
   * @param timestamp The timestamp of the last operation applied.
   * @return the age that was set
   */
  public long setAgeOfLastAppliedOp(long timestamp) {
lastTimestampForAge = timestamp;
long age = System.currentTimeMillis() - lastTimestampForAge;
rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
return age;
  } 
{code}
In the following scenario:
1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is
set for example 100ms;
2) and then NO new Sink op occur.
3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of
return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, 

It was because that refreshAgeOfLastAppliedOp() get invoked periodically by 
getStats(). 

proposed fix: 
{code}

// a new value 
+   private long age; 

  public long setAgeOfLastAppliedOp(long timestamp) {
+ if (lastTimestampForAge != timestamp) {
lastTimestampForAge = timestamp;
-   long age = System.currentTimeMillis() - lastTimestampForAge;
+this.age = System.currentTimeMillis() - lastTimestampForAge;
rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
}
return age;
  }
{code}

detail discussion in [dev@hbase  | 
http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E
 ]
Reporter: Demai Ni
Assignee: Demai Ni
Priority: Minor
 Fix For: 0.99.0, 0.98.5, 2.0.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11617) AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink OP

2014-07-30 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11617:
-

Description: 
AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in 
the 'replication queue' before it got replicated(aka applied)
{code}

  /**
   * Set the age of the last applied operation
   *
   * @param timestamp The timestamp of the last operation applied.
   * @return the age that was set
   */
  public long setAgeOfLastAppliedOp(long timestamp) {
lastTimestampForAge = timestamp;
long age = System.currentTimeMillis() - lastTimestampForAge;
rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
return age;
  } 
{code}
In the following scenario:
1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is
set for example 100ms;
2) and then NO new Sink op occur.
3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of
return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, 

It was because that refreshAgeOfLastAppliedOp() get invoked periodically by 
getStats(). 

proposed fix: 
{code}

// a new value 
+   private long age; 

  public long setAgeOfLastAppliedOp(long timestamp) {
+ if (lastTimestampForAge != timestamp) {
lastTimestampForAge = timestamp;
-   long age = System.currentTimeMillis() - lastTimestampForAge;
+this.age = System.currentTimeMillis() - lastTimestampForAge;
rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
}
return age;
  }
{code}

detail discussion in [dev@hbase  | 
http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E
 ]

> AgeOfLastAppliedOp in MetricsSink got increased when no new replication sink 
> OP 
> 
>
> Key: HBASE-11617
> URL: https://issues.apache.org/jira/browse/HBASE-11617
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 0.98.2
>Reporter: Demai Ni
>Assignee: Demai Ni
>Priority: Minor
> Fix For: 0.99.0, 0.98.5, 2.0.0
>
>
> AgeOfLastAppliedOp in MetricsSink.java is to indicate the time an edit sat in 
> the 'replication queue' before it got replicated(aka applied)
> {code}
>   /**
>* Set the age of the last applied operation
>*
>* @param timestamp The timestamp of the last operation applied.
>* @return the age that was set
>*/
>   public long setAgeOfLastAppliedOp(long timestamp) {
> lastTimestampForAge = timestamp;
> long age = System.currentTimeMillis() - lastTimestampForAge;
> rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
> return age;
>   } 
> {code}
> In the following scenario:
> 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is
> set for example 100ms;
> 2) and then NO new Sink op occur.
> 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of
> return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, 
> It was because that refreshAgeOfLastAppliedOp() get invoked periodically by 
> getStats(). 
> proposed fix: 
> {code}
> 
> // a new value 
> +   private long age; 
> 
>   public long setAgeOfLastAppliedOp(long timestamp) {
> + if (lastTimestampForAge != timestamp) {
> lastTimestampForAge = timestamp;
> -   long age = System.currentTimeMillis() - lastTimestampForAge;
> +this.age = System.currentTimeMillis() - lastTimestampForAge;
> rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
> }
> return age;
>   }
> {code}
> detail discussion in [dev@hbase  | 
> http://mail-archives.apache.org/mod_mbox/hbase-dev/201407.mbox/%3CCAOEq2C5BKMXAM2Fv4LGVb_Ktek-Pm%3DhjOi33gSHX-2qHqAou6w%40mail.gmail.com%3E
>  ]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

2014-07-24 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073817#comment-14073817
 ] 

Demai Ni commented on HBASE-9531:
-

[~enis], good point. I will provide a revised patch accordingly demai

> a command line (hbase shell) interface to retreive the replication metrics 
> and show replication lag
> ---
>
> Key: HBASE-9531
> URL: https://issues.apache.org/jira/browse/HBASE-9531
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Affects Versions: 0.99.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.5
>
> Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, 
> HBASE-9531-master-v1.patch, HBASE-9531-trunk-v0.patch, 
> HBASE-9531-trunk-v0.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive 
> the replication metrics info such as:ageOfLastShippedOp, 
> timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and 
> timeStampsOfLastAppliedOp. And also to provide a point of time info of the 
> lag of replication(source only)
> Understand that hbase is using Hadoop 
> metrics(http://hbase.apache.org/metrics.html), which is a common way to 
> monitor metric info. This Jira is to serve as a light-weight client 
> interface, comparing to a completed(certainly better, but heavier)GUI 
> monitoring package. I made the code works on 0.94.9 now, and like to use this 
> jira to get opinions about whether the feature is valuable to other 
> users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell 
> command 'status', and invent a new module, called ReplicationLoad.  In 
> HRegionServer.buildServerLoad() , use the local replication service objects 
> to get their loads  which could be wrapped in a ReplicationLoad object and 
> then simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
> ReplicationSinkMetrics, a few getters and setters will be created, and ask 
> Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for 
> his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
>   if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
> timeStampsOfLastShippedOp)) //err on the large side
>   else if (current time - timeStampsOfLastShippedOp) < 2* 
> ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
> recently 
> else lag = 0 // last shipped may happens last night, so NO real lag 
> although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','sink'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','lag' 
> version 0.94

[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

2014-07-24 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073454#comment-14073454
 ] 

Demai Ni commented on HBASE-9531:
-

the failure "org.apache.hadoop.hbase.io.hfile.TestCacheConfig" show up in a few 
other patch testing, seems unrelated with this jira/patch

> a command line (hbase shell) interface to retreive the replication metrics 
> and show replication lag
> ---
>
> Key: HBASE-9531
> URL: https://issues.apache.org/jira/browse/HBASE-9531
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Affects Versions: 0.99.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.5
>
> Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, 
> HBASE-9531-master-v1.patch, HBASE-9531-trunk-v0.patch, 
> HBASE-9531-trunk-v0.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive 
> the replication metrics info such as:ageOfLastShippedOp, 
> timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and 
> timeStampsOfLastAppliedOp. And also to provide a point of time info of the 
> lag of replication(source only)
> Understand that hbase is using Hadoop 
> metrics(http://hbase.apache.org/metrics.html), which is a common way to 
> monitor metric info. This Jira is to serve as a light-weight client 
> interface, comparing to a completed(certainly better, but heavier)GUI 
> monitoring package. I made the code works on 0.94.9 now, and like to use this 
> jira to get opinions about whether the feature is valuable to other 
> users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell 
> command 'status', and invent a new module, called ReplicationLoad.  In 
> HRegionServer.buildServerLoad() , use the local replication service objects 
> to get their loads  which could be wrapped in a ReplicationLoad object and 
> then simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
> ReplicationSinkMetrics, a few getters and setters will be created, and ask 
> Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for 
> his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
>   if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
> timeStampsOfLastShippedOp)) //err on the large side
>   else if (current time - timeStampsOfLastShippedOp) < 2* 
> ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
> recently 
> else lag = 0 // last shipped may happens last night, so NO real lag 
> although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','sink'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 201

[jira] [Updated] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

2014-07-24 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-9531:


Attachment: HBASE-9531-master-v1.patch

[~apurtell], thanks a lot for the help. I just tried out the patch again which 
is valid for both 0.98 and master(trunk) branch. So resubmit to HadoopQA. 

[~enis], how about branch-1? thanks. 

> a command line (hbase shell) interface to retreive the replication metrics 
> and show replication lag
> ---
>
> Key: HBASE-9531
> URL: https://issues.apache.org/jira/browse/HBASE-9531
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Affects Versions: 0.99.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.5
>
> Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, 
> HBASE-9531-master-v1.patch, HBASE-9531-trunk-v0.patch, 
> HBASE-9531-trunk-v0.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive 
> the replication metrics info such as:ageOfLastShippedOp, 
> timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and 
> timeStampsOfLastAppliedOp. And also to provide a point of time info of the 
> lag of replication(source only)
> Understand that hbase is using Hadoop 
> metrics(http://hbase.apache.org/metrics.html), which is a common way to 
> monitor metric info. This Jira is to serve as a light-weight client 
> interface, comparing to a completed(certainly better, but heavier)GUI 
> monitoring package. I made the code works on 0.94.9 now, and like to use this 
> jira to get opinions about whether the feature is valuable to other 
> users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell 
> command 'status', and invent a new module, called ReplicationLoad.  In 
> HRegionServer.buildServerLoad() , use the local replication service objects 
> to get their loads  which could be wrapped in a ReplicationLoad object and 
> then simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
> ReplicationSinkMetrics, a few getters and setters will be created, and ask 
> Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for 
> his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
>   if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
> timeStampsOfLastShippedOp)) //err on the large side
>   else if (current time - timeStampsOfLastShippedOp) < 2* 
> ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
> recently 
> else lag = 0 // last shipped may happens last night, so NO real lag 
> although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','sink'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLa

[jira] [Commented] (HBASE-11566) make ExportSnapshot extendable by removing 'final'

2014-07-22 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070520#comment-14070520
 ] 

Demai Ni commented on HBASE-11566:
--

[~mbertozzi], and [~apurtell], thank you so much... Demai 

> make ExportSnapshot extendable by removing 'final' 
> ---
>
> Key: HBASE-11566
> URL: https://issues.apache.org/jira/browse/HBASE-11566
> Project: HBase
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 0.98.4
>Reporter: Demai Ni
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 0.99.0, 0.98.5, 2.0.0
>
> Attachments: HBASE-11566.patch
>
>
> currently the ExportSnapshot is defined as final class. This jira would like 
> to remove 'final' to make the class extendable so that we can leverage the 
> existing snapshot logic for backup/restore solution discussed in [HBASE-7912 
> | https://issues.apache.org/jira/browse/HBASE-7912]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11566) make ExportSnapshot extendable by removing 'final'

2014-07-22 Thread Demai Ni (JIRA)
Demai Ni created HBASE-11566:


 Summary: make ExportSnapshot extendable by removing 'final' 
 Key: HBASE-11566
 URL: https://issues.apache.org/jira/browse/HBASE-11566
 Project: HBase
  Issue Type: Improvement
  Components: snapshots
Affects Versions: 0.98.4, 0.98.3
Reporter: Demai Ni
Assignee: Demai Ni
Priority: Minor
 Fix For: 0.99.0, 1.0.0, 0.98.5, 2.0.0


currently the ExportSnapshot is defined as final class. This jira would like to 
remove 'final' to make the class extendable so that we can leverage the 
existing snapshot logic for backup/restore solution discussed in [HBASE-7912 | 
https://issues.apache.org/jira/browse/HBASE-7912]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11542) Unit Test KeyStoreTestUtil.java compilation failure in IBM JDK

2014-07-18 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066527#comment-14066527
 ] 

Demai Ni commented on HBASE-11542:
--

[~stack], thanks for following this issue. Linsey and I worked within the same 
group, and just began to get into the hbase community.  She is looking into 
jvm.java as a reference, and already have a fix in our local repo, and will 
provide a patch a bit later Demai 



> Unit Test  KeyStoreTestUtil.java compilation failure in IBM JDK 
> 
>
> Key: HBASE-11542
> URL: https://issues.apache.org/jira/browse/HBASE-11542
> Project: HBase
>  Issue Type: Improvement
>  Components: build, test
>Affects Versions: 0.99.0
> Environment: RHEL 6.3 ,IBM JDK 6
>Reporter: LinseyPang
>Priority: Minor
> Fix For: 2.0.0
>
>
> In trunk,  jira HBase-10336 added a utility test KeyStoreTestUtil.java, which 
> leverages the following sun classes:
>    import sun.security.x509.AlgorithmId;
>    import sun.security.x509.CertificateAlgorithmId;
>   
> this cause hbase compiler failure if using IBM JDK,  
> There are similar classes like below in IBM jdk: 
> import com.ibm.security.x509.AlgorithmId;
> import com.ibm.security.x509.CertificateAlgorithmId; 
> This jira is to add handling of the x509 references. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11452) add getUserPermission feature in AccessControlClient as client API

2014-07-04 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14052777#comment-14052777
 ] 

Demai Ni commented on HBASE-11452:
--

[~stack],[~apurtell] and [~enis], many thanks for the review and commit the 
fix. Sorry for the late response. Appreciate the help

Demai

> add getUserPermission feature in AccessControlClient as client API 
> ---
>
> Key: HBASE-11452
> URL: https://issues.apache.org/jira/browse/HBASE-11452
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, security
>Affects Versions: 0.98.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.4, 2.0.0
>
> Attachments: HBASE-11452-master-v0.patch, 
> HBASE-11452-master-v1.patch, HBASE-11452-master-v1.patch, 
> HBASE-11452-master-v2.patch, HBASE-11452-master-v3.patch
>
>
> Currently user can 'grant','revoke' and show 'user_permission' through hbase 
> shell. And there are client api implemented in AccessControlClient.java for 
> 'grant' and 'revoke'. This jira is to add the 'user_permission' feature with 
> a new method called 'getUserPermission'
> To keep interface consistant, this jira will also update user_permission.rb 
> to use this API directly. The test result is 
> {code}
> hbase(main):001:0> user_permission
> User
> Table,Family,Qualifier:Permission 
>  
>  hbase  dn:t1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
>  biadminetest,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
>  hive   t1_dn,,: [Permission: 
> actions=READ,WRITE]   
>
>  biadmintable1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintable2,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintest_dn,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>  
> 6 row(s) in 1.6220 seconds
> hbase(main):002:0> user_permission 't.*'
> User
> Table,Family,Qualifier:Permission 
>  
>  hive   t1_dn,,: [Permission: 
> actions=READ,WRITE]   
>
>  biadmintable1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintable2,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintest_dn,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>  
> 4 row(s) in 0.2130 seconds
> hbase(main):003:0> user_permission 'dn:t1'
> User
> Table,Family,Qualifier:Permission 
>  
>  hbase  dn:t1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
> 1 row(s) in 0.0790 seconds
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11452) add getUserPermission feature in AccessControlClient as client API

2014-07-04 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14052499#comment-14052499
 ] 

Demai Ni commented on HBASE-11452:
--

bq. -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) 
warnings.
didn't find any of them related with this patch. And also saw same warnings 
show up from other recent HadoopQA testing. 

The unit test failures doesn't look related either.

Demai


> add getUserPermission feature in AccessControlClient as client API 
> ---
>
> Key: HBASE-11452
> URL: https://issues.apache.org/jira/browse/HBASE-11452
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, security
>Affects Versions: 0.98.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.4, 2.0.0
>
> Attachments: HBASE-11452-master-v0.patch, 
> HBASE-11452-master-v1.patch, HBASE-11452-master-v1.patch, 
> HBASE-11452-master-v2.patch
>
>
> Currently user can 'grant','revoke' and show 'user_permission' through hbase 
> shell. And there are client api implemented in AccessControlClient.java for 
> 'grant' and 'revoke'. This jira is to add the 'user_permission' feature with 
> a new method called 'getUserPermission'
> To keep interface consistant, this jira will also update user_permission.rb 
> to use this API directly. The test result is 
> {code}
> hbase(main):001:0> user_permission
> User
> Table,Family,Qualifier:Permission 
>  
>  hbase  dn:t1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
>  biadminetest,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
>  hive   t1_dn,,: [Permission: 
> actions=READ,WRITE]   
>
>  biadmintable1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintable2,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintest_dn,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>  
> 6 row(s) in 1.6220 seconds
> hbase(main):002:0> user_permission 't.*'
> User
> Table,Family,Qualifier:Permission 
>  
>  hive   t1_dn,,: [Permission: 
> actions=READ,WRITE]   
>
>  biadmintable1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintable2,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintest_dn,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>  
> 4 row(s) in 0.2130 seconds
> hbase(main):003:0> user_permission 'dn:t1'
> User
> Table,Family,Qualifier:Permission 
>  
>  hbase  dn:t1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
> 1 row(s) in 0.0790 seconds
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11452) add userPermission feature in AccessControlClient as client API

2014-07-04 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14052494#comment-14052494
 ] 

Demai Ni commented on HBASE-11452:
--

[~enis], 

thanks. the method name is 'getUserPermission' in the patch now. let me change 
the jira description.

Demai

> add userPermission feature in AccessControlClient as client API 
> 
>
> Key: HBASE-11452
> URL: https://issues.apache.org/jira/browse/HBASE-11452
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, security
>Affects Versions: 0.98.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.4, 2.0.0
>
> Attachments: HBASE-11452-master-v0.patch, 
> HBASE-11452-master-v1.patch, HBASE-11452-master-v1.patch, 
> HBASE-11452-master-v2.patch
>
>
> Currently user can 'grant','revoke' and show 'user_permission' through hbase 
> shell. And there are client api implemented in AccessControlClient.java for 
> 'grant' and 'revoke'. This jira is to add the 'user_permission' feature. 
> To keep interface consistant, this jira will also update user_permission.rb 
> to use this API directly. The test result is 
> {code}
> hbase(main):001:0> user_permission
> User
> Table,Family,Qualifier:Permission 
>  
>  hbase  dn:t1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
>  biadminetest,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
>  hive   t1_dn,,: [Permission: 
> actions=READ,WRITE]   
>
>  biadmintable1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintable2,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintest_dn,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>  
> 6 row(s) in 1.6220 seconds
> hbase(main):002:0> user_permission 't.*'
> User
> Table,Family,Qualifier:Permission 
>  
>  hive   t1_dn,,: [Permission: 
> actions=READ,WRITE]   
>
>  biadmintable1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintable2,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintest_dn,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>  
> 4 row(s) in 0.2130 seconds
> hbase(main):003:0> user_permission 'dn:t1'
> User
> Table,Family,Qualifier:Permission 
>  
>  hbase  dn:t1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
> 1 row(s) in 0.0790 seconds
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11452) add getUserPermission feature in AccessControlClient as client API

2014-07-04 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11452:
-

Description: 
Currently user can 'grant','revoke' and show 'user_permission' through hbase 
shell. And there are client api implemented in AccessControlClient.java for 
'grant' and 'revoke'. This jira is to add the 'user_permission' feature with a 
new method called 'getUserPermission'

To keep interface consistant, this jira will also update user_permission.rb to 
use this API directly. The test result is 
{code}
hbase(main):001:0> user_permission
User
Table,Family,Qualifier:Permission   
   
 hbase  dn:t1,,: [Permission: 
actions=READ,WRITE,EXEC,CREATE,ADMIN]   
 
 biadminetest,,: [Permission: 
actions=READ,WRITE,EXEC,CREATE,ADMIN]   
 
 hive   t1_dn,,: [Permission: 
actions=READ,WRITE] 
 
 biadmintable1,,: [Permission: 
actions=READ,WRITE,EXEC,CREATE,ADMIN]   

 biadmintable2,,: [Permission: 
actions=READ,WRITE,EXEC,CREATE,ADMIN]   

 biadmintest_dn,,: [Permission: 
actions=READ,WRITE,EXEC,CREATE,ADMIN]   
   
6 row(s) in 1.6220 seconds

hbase(main):002:0> user_permission 't.*'
User
Table,Family,Qualifier:Permission   
   
 hive   t1_dn,,: [Permission: 
actions=READ,WRITE] 
 
 biadmintable1,,: [Permission: 
actions=READ,WRITE,EXEC,CREATE,ADMIN]   

 biadmintable2,,: [Permission: 
actions=READ,WRITE,EXEC,CREATE,ADMIN]   

 biadmintest_dn,,: [Permission: 
actions=READ,WRITE,EXEC,CREATE,ADMIN]   
   
4 row(s) in 0.2130 seconds

hbase(main):003:0> user_permission 'dn:t1'
User
Table,Family,Qualifier:Permission   
   
 hbase  dn:t1,,: [Permission: 
actions=READ,WRITE,EXEC,CREATE,ADMIN]   
 
1 row(s) in 0.0790 seconds
{code}

  was:
Currently user can 'grant','revoke' and show 'user_permission' through hbase 
shell. And there are client api implemented in AccessControlClient.java for 
'grant' and 'revoke'. This jira is to add the 'user_permission' feature. 

To keep interface consistant, this jira will also update user_permission.rb to 
use this API directly. The test result is 
{code}
hbase(main):001:0> user_permission
User
Table,Family,Qualifier:Permission   
   
 hbase  dn:t1,,: [Permission: 
actions=READ,WRITE,EXEC,CREATE,ADMIN]   
 
 biadminetest,,: [Permission: 
actions=READ,WRITE,EXEC,CREATE,ADMIN]   
 
 hive   t1_dn,,: [Permission: 
actions=READ,WRITE] 
 
 biadmintable1,,: [Perm

[jira] [Updated] (HBASE-11452) add getUserPermission feature in AccessControlClient as client API

2014-07-04 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11452:
-

Summary: add getUserPermission feature in AccessControlClient as client API 
  (was: add userPermission feature in AccessControlClient as client API )

> add getUserPermission feature in AccessControlClient as client API 
> ---
>
> Key: HBASE-11452
> URL: https://issues.apache.org/jira/browse/HBASE-11452
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, security
>Affects Versions: 0.98.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.4, 2.0.0
>
> Attachments: HBASE-11452-master-v0.patch, 
> HBASE-11452-master-v1.patch, HBASE-11452-master-v1.patch, 
> HBASE-11452-master-v2.patch
>
>
> Currently user can 'grant','revoke' and show 'user_permission' through hbase 
> shell. And there are client api implemented in AccessControlClient.java for 
> 'grant' and 'revoke'. This jira is to add the 'user_permission' feature. 
> To keep interface consistant, this jira will also update user_permission.rb 
> to use this API directly. The test result is 
> {code}
> hbase(main):001:0> user_permission
> User
> Table,Family,Qualifier:Permission 
>  
>  hbase  dn:t1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
>  biadminetest,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
>  hive   t1_dn,,: [Permission: 
> actions=READ,WRITE]   
>
>  biadmintable1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintable2,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintest_dn,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>  
> 6 row(s) in 1.6220 seconds
> hbase(main):002:0> user_permission 't.*'
> User
> Table,Family,Qualifier:Permission 
>  
>  hive   t1_dn,,: [Permission: 
> actions=READ,WRITE]   
>
>  biadmintable1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintable2,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintest_dn,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>  
> 4 row(s) in 0.2130 seconds
> hbase(main):003:0> user_permission 'dn:t1'
> User
> Table,Family,Qualifier:Permission 
>  
>  hbase  dn:t1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
> 1 row(s) in 0.0790 seconds
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11452) add userPermission feature in AccessControlClient as client API

2014-07-03 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11452:
-

Attachment: HBASE-11452-master-v2.patch

a minor conflict due to new commit. upload v2 patch

> add userPermission feature in AccessControlClient as client API 
> 
>
> Key: HBASE-11452
> URL: https://issues.apache.org/jira/browse/HBASE-11452
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, security
>Affects Versions: 0.98.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.4, 2.0.0
>
> Attachments: HBASE-11452-master-v0.patch, 
> HBASE-11452-master-v1.patch, HBASE-11452-master-v1.patch, 
> HBASE-11452-master-v2.patch
>
>
> Currently user can 'grant','revoke' and show 'user_permission' through hbase 
> shell. And there are client api implemented in AccessControlClient.java for 
> 'grant' and 'revoke'. This jira is to add the 'user_permission' feature. 
> To keep interface consistant, this jira will also update user_permission.rb 
> to use this API directly. The test result is 
> {code}
> hbase(main):001:0> user_permission
> User
> Table,Family,Qualifier:Permission 
>  
>  hbase  dn:t1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
>  biadminetest,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
>  hive   t1_dn,,: [Permission: 
> actions=READ,WRITE]   
>
>  biadmintable1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintable2,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintest_dn,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>  
> 6 row(s) in 1.6220 seconds
> hbase(main):002:0> user_permission 't.*'
> User
> Table,Family,Qualifier:Permission 
>  
>  hive   t1_dn,,: [Permission: 
> actions=READ,WRITE]   
>
>  biadmintable1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintable2,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintest_dn,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>  
> 4 row(s) in 0.2130 seconds
> hbase(main):003:0> user_permission 'dn:t1'
> User
> Table,Family,Qualifier:Permission 
>  
>  hbase  dn:t1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
> 1 row(s) in 0.0790 seconds
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11452) add userPermission feature in AccessControlClient as client API

2014-07-03 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11452:
-

Attachment: HBASE-11452-master-v1.patch

[~apurtell], 

thanks. since Stack fixed the findbugs link, let me try the patch one more 
time. 

Demai

> add userPermission feature in AccessControlClient as client API 
> 
>
> Key: HBASE-11452
> URL: https://issues.apache.org/jira/browse/HBASE-11452
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, security
>Affects Versions: 0.98.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.4, 2.0.0
>
> Attachments: HBASE-11452-master-v0.patch, 
> HBASE-11452-master-v1.patch, HBASE-11452-master-v1.patch
>
>
> Currently user can 'grant','revoke' and show 'user_permission' through hbase 
> shell. And there are client api implemented in AccessControlClient.java for 
> 'grant' and 'revoke'. This jira is to add the 'user_permission' feature. 
> To keep interface consistant, this jira will also update user_permission.rb 
> to use this API directly. The test result is 
> {code}
> hbase(main):001:0> user_permission
> User
> Table,Family,Qualifier:Permission 
>  
>  hbase  dn:t1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
>  biadminetest,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
>  hive   t1_dn,,: [Permission: 
> actions=READ,WRITE]   
>
>  biadmintable1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintable2,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintest_dn,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>  
> 6 row(s) in 1.6220 seconds
> hbase(main):002:0> user_permission 't.*'
> User
> Table,Family,Qualifier:Permission 
>  
>  hive   t1_dn,,: [Permission: 
> actions=READ,WRITE]   
>
>  biadmintable1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintable2,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintest_dn,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>  
> 4 row(s) in 0.2130 seconds
> hbase(main):003:0> user_permission 'dn:t1'
> User
> Table,Family,Qualifier:Permission 
>  
>  hbase  dn:t1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
> 1 row(s) in 0.0790 seconds
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11452) add userPermission feature in AccessControlClient as client API

2014-07-02 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051027#comment-14051027
 ] 

Demai Ni commented on HBASE-11452:
--

[~apurtell], 

thanks for the review. The patch can be applied to 0.98 directly. 

the ' 11 new Findbugs' is still a puzzle to me, as I can't access the files of 
'Findbugs warnings' from HadoopQA and couldn't generate them locally.  I 
re-exam the code, and don't think the patch can cause so many Fingbugs 
warnings. well, I certainly could be wrong here. 

on a side note, [~stack] mentioned that trunk is probably broken. so probably 
better to rerun the QA again after fixed.

Demai


> add userPermission feature in AccessControlClient as client API 
> 
>
> Key: HBASE-11452
> URL: https://issues.apache.org/jira/browse/HBASE-11452
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, security
>Affects Versions: 0.98.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.4, 2.0.0
>
> Attachments: HBASE-11452-master-v0.patch, HBASE-11452-master-v1.patch
>
>
> Currently user can 'grant','revoke' and show 'user_permission' through hbase 
> shell. And there are client api implemented in AccessControlClient.java for 
> 'grant' and 'revoke'. This jira is to add the 'user_permission' feature. 
> To keep interface consistant, this jira will also update user_permission.rb 
> to use this API directly. The test result is 
> {code}
> hbase(main):001:0> user_permission
> User
> Table,Family,Qualifier:Permission 
>  
>  hbase  dn:t1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
>  biadminetest,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
>  hive   t1_dn,,: [Permission: 
> actions=READ,WRITE]   
>
>  biadmintable1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintable2,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintest_dn,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>  
> 6 row(s) in 1.6220 seconds
> hbase(main):002:0> user_permission 't.*'
> User
> Table,Family,Qualifier:Permission 
>  
>  hive   t1_dn,,: [Permission: 
> actions=READ,WRITE]   
>
>  biadmintable1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintable2,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintest_dn,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>  
> 4 row(s) in 0.2130 seconds
> hbase(main):003:0> user_permission 'dn:t1'
> User
> Table,Family,Qualifier:Permission 
>  
>  hbase  dn:t1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
> 1 row(s) in 0.0790 seconds
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

2014-07-02 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14050649#comment-14050649
 ] 

Demai Ni commented on HBASE-9531:
-

three issues from the hadoopQA report:
* "-1 lineLengths. The patch introduces the following lines longer than 100:". 
the code is generated through protobuf, and I saw that it already contains 
other lines longer than 100, so should be ok
* the failed UT: org.apache.hadoop.hbase.regionserver.wal.TestLogRolling, 
the assert failure is at 
{code}
381 assertTrue(pipeline.length == 
382 fs.getDefaultReplication(TEST_UTIL.getDataTestDirOnTestFS()));
{code}
I couldn't find the immediate relationship with this patch. will look more into 
this. This particular testcase seems unstable in the past. so could be unrelated
* "-1 findbugs. The patch appears to introduce 10 new Findbugs (version 1.3.9) 
warnings." 
again this is the 2nd time this week that the links to findbugs warnings do not 
work. I can't find them through test report. I will send a note to dev@hbase 
for help

> a command line (hbase shell) interface to retreive the replication metrics 
> and show replication lag
> ---
>
> Key: HBASE-9531
> URL: https://issues.apache.org/jira/browse/HBASE-9531
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Affects Versions: 0.99.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.5
>
> Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, 
> HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive 
> the replication metrics info such as:ageOfLastShippedOp, 
> timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and 
> timeStampsOfLastAppliedOp. And also to provide a point of time info of the 
> lag of replication(source only)
> Understand that hbase is using Hadoop 
> metrics(http://hbase.apache.org/metrics.html), which is a common way to 
> monitor metric info. This Jira is to serve as a light-weight client 
> interface, comparing to a completed(certainly better, but heavier)GUI 
> monitoring package. I made the code works on 0.94.9 now, and like to use this 
> jira to get opinions about whether the feature is valuable to other 
> users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell 
> command 'status', and invent a new module, called ReplicationLoad.  In 
> HRegionServer.buildServerLoad() , use the local replication service objects 
> to get their loads  which could be wrapped in a ReplicationLoad object and 
> then simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
> ReplicationSinkMetrics, a few getters and setters will be created, and ask 
> Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for 
> his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
>   if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
> timeStampsOfLastShippedOp)) //err on the large side
>   else if (current time - timeStampsOfLastShippedOp) < 2* 
> ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
> recently 
> else lag = 0 // last shipped may happens last night, so NO real lag 
> although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:P

[jira] [Updated] (HBASE-11452) add userPermission feature in AccessControlClient as client API

2014-07-02 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11452:
-

Attachment: HBASE-11452-master-v1.patch

attached v1 patch with method name changed to getUserPermission(). 

also to use it to get another round with hadoopQA for the failed testcases and 
the findingbug warning

> add userPermission feature in AccessControlClient as client API 
> 
>
> Key: HBASE-11452
> URL: https://issues.apache.org/jira/browse/HBASE-11452
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, security
>Affects Versions: 0.98.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 2.0.0
>
> Attachments: HBASE-11452-master-v0.patch, HBASE-11452-master-v1.patch
>
>
> Currently user can 'grant','revoke' and show 'user_permission' through hbase 
> shell. And there are client api implemented in AccessControlClient.java for 
> 'grant' and 'revoke'. This jira is to add the 'user_permission' feature. 
> To keep interface consistant, this jira will also update user_permission.rb 
> to use this API directly. The test result is 
> {code}
> hbase(main):001:0> user_permission
> User
> Table,Family,Qualifier:Permission 
>  
>  hbase  dn:t1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
>  biadminetest,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
>  hive   t1_dn,,: [Permission: 
> actions=READ,WRITE]   
>
>  biadmintable1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintable2,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintest_dn,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>  
> 6 row(s) in 1.6220 seconds
> hbase(main):002:0> user_permission 't.*'
> User
> Table,Family,Qualifier:Permission 
>  
>  hive   t1_dn,,: [Permission: 
> actions=READ,WRITE]   
>
>  biadmintable1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintable2,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintest_dn,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>  
> 4 row(s) in 0.2130 seconds
> hbase(main):003:0> user_permission 'dn:t1'
> User
> Table,Family,Qualifier:Permission 
>  
>  hbase  dn:t1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
> 1 row(s) in 0.0790 seconds
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

2014-07-02 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14050487#comment-14050487
 ] 

Demai Ni commented on HBASE-9531:
-

[~stack], thanks a lot demai

> a command line (hbase shell) interface to retreive the replication metrics 
> and show replication lag
> ---
>
> Key: HBASE-9531
> URL: https://issues.apache.org/jira/browse/HBASE-9531
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Affects Versions: 0.99.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.5
>
> Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, 
> HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive 
> the replication metrics info such as:ageOfLastShippedOp, 
> timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and 
> timeStampsOfLastAppliedOp. And also to provide a point of time info of the 
> lag of replication(source only)
> Understand that hbase is using Hadoop 
> metrics(http://hbase.apache.org/metrics.html), which is a common way to 
> monitor metric info. This Jira is to serve as a light-weight client 
> interface, comparing to a completed(certainly better, but heavier)GUI 
> monitoring package. I made the code works on 0.94.9 now, and like to use this 
> jira to get opinions about whether the feature is valuable to other 
> users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell 
> command 'status', and invent a new module, called ReplicationLoad.  In 
> HRegionServer.buildServerLoad() , use the local replication service objects 
> to get their loads  which could be wrapped in a ReplicationLoad object and 
> then simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
> ReplicationSinkMetrics, a few getters and setters will be created, and ask 
> Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for 
> his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
>   if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
> timeStampsOfLastShippedOp)) //err on the large side
>   else if (current time - timeStampsOfLastShippedOp) < 2* 
> ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
> recently 
> else lag = 0 // last shipped may happens last night, so NO real lag 
> although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','sink'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','lag' 
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com: lag = 0
>     hdtest018

[jira] [Commented] (HBASE-11452) add userPermission feature in AccessControlClient as client API

2014-07-01 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14049615#comment-14049615
 ] 

Demai Ni commented on HBASE-11452:
--

the failed replication Testcases should not related with this patch.  I also 
can't find the artifact of findingbugs warnings. The link will hit 404 error, 
which is odd.

[~apurtell], thanks for the comments.

bq. can we consider calling this getUserPermission? 
sure. I will change the method name. 

bq. Do I have a Java variant of Stockholm Syndrome? 
would you please elaborate a little bit? do you feel that we have too many ways 
to retrieve the same userPermission information? or we should use 'tableName' 
directly instead of 'tableRegex'? Thanks

Demai



> add userPermission feature in AccessControlClient as client API 
> 
>
> Key: HBASE-11452
> URL: https://issues.apache.org/jira/browse/HBASE-11452
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, security
>Affects Versions: 0.98.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 2.0.0
>
> Attachments: HBASE-11452-master-v0.patch
>
>
> Currently user can 'grant','revoke' and show 'user_permission' through hbase 
> shell. And there are client api implemented in AccessControlClient.java for 
> 'grant' and 'revoke'. This jira is to add the 'user_permission' feature. 
> To keep interface consistant, this jira will also update user_permission.rb 
> to use this API directly. The test result is 
> {code}
> hbase(main):001:0> user_permission
> User
> Table,Family,Qualifier:Permission 
>  
>  hbase  dn:t1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
>  biadminetest,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
>  hive   t1_dn,,: [Permission: 
> actions=READ,WRITE]   
>
>  biadmintable1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintable2,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintest_dn,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>  
> 6 row(s) in 1.6220 seconds
> hbase(main):002:0> user_permission 't.*'
> User
> Table,Family,Qualifier:Permission 
>  
>  hive   t1_dn,,: [Permission: 
> actions=READ,WRITE]   
>
>  biadmintable1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintable2,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintest_dn,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>  
> 4 row(s) in 0.2130 seconds
> hbase(main):003:0> user_permission 'dn:t1'
> User
> Table,Family,Qualifier:Permission 
>  
>  hbase  dn:t1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
> 1 row(s) in 0.0790 seconds
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

2014-07-01 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14049475#comment-14049475
 ] 

Demai Ni commented on HBASE-9531:
-

strange that the patch submitted a few days ago didn't trigger HadoopQA. I must 
miss a procedure step. Can someone give me a hand? thanks a lot... Demai

> a command line (hbase shell) interface to retreive the replication metrics 
> and show replication lag
> ---
>
> Key: HBASE-9531
> URL: https://issues.apache.org/jira/browse/HBASE-9531
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Affects Versions: 0.99.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.5
>
> Attachments: HBASE-9531-master-v1.patch, HBASE-9531-trunk-v0.patch, 
> HBASE-9531-trunk-v0.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive 
> the replication metrics info such as:ageOfLastShippedOp, 
> timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and 
> timeStampsOfLastAppliedOp. And also to provide a point of time info of the 
> lag of replication(source only)
> Understand that hbase is using Hadoop 
> metrics(http://hbase.apache.org/metrics.html), which is a common way to 
> monitor metric info. This Jira is to serve as a light-weight client 
> interface, comparing to a completed(certainly better, but heavier)GUI 
> monitoring package. I made the code works on 0.94.9 now, and like to use this 
> jira to get opinions about whether the feature is valuable to other 
> users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell 
> command 'status', and invent a new module, called ReplicationLoad.  In 
> HRegionServer.buildServerLoad() , use the local replication service objects 
> to get their loads  which could be wrapped in a ReplicationLoad object and 
> then simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
> ReplicationSinkMetrics, a few getters and setters will be created, and ask 
> Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for 
> his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
>   if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
> timeStampsOfLastShippedOp)) //err on the large side
>   else if (current time - timeStampsOfLastShippedOp) < 2* 
> ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
> recently 
> else lag = 0 // last shipped may happens last night, so NO real lag 
> although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','sink'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication',

[jira] [Updated] (HBASE-11452) add userPermission feature in AccessControlClient as client API

2014-07-01 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11452:
-

Status: Patch Available  (was: Open)

patch is attached for master branch. 

> add userPermission feature in AccessControlClient as client API 
> 
>
> Key: HBASE-11452
> URL: https://issues.apache.org/jira/browse/HBASE-11452
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, security
>Affects Versions: 0.98.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 2.0.0
>
> Attachments: HBASE-11452-master-v0.patch
>
>
> Currently user can 'grant','revoke' and show 'user_permission' through hbase 
> shell. And there are client api implemented in AccessControlClient.java for 
> 'grant' and 'revoke'. This jira is to add the 'user_permission' feature. 
> To keep interface consistant, this jira will also update user_permission.rb 
> to use this API directly. The test result is 
> {code}
> hbase(main):001:0> user_permission
> User
> Table,Family,Qualifier:Permission 
>  
>  hbase  dn:t1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
>  biadminetest,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
>  hive   t1_dn,,: [Permission: 
> actions=READ,WRITE]   
>
>  biadmintable1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintable2,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintest_dn,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>  
> 6 row(s) in 1.6220 seconds
> hbase(main):002:0> user_permission 't.*'
> User
> Table,Family,Qualifier:Permission 
>  
>  hive   t1_dn,,: [Permission: 
> actions=READ,WRITE]   
>
>  biadmintable1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintable2,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>   
>  biadmintest_dn,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>  
> 4 row(s) in 0.2130 seconds
> hbase(main):003:0> user_permission 'dn:t1'
> User
> Table,Family,Qualifier:Permission 
>  
>  hbase  dn:t1,,: [Permission: 
> actions=READ,WRITE,EXEC,CREATE,ADMIN] 
>
> 1 row(s) in 0.0790 seconds
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11452) add userPermission feature in AccessControlClient as client API

2014-07-01 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11452:
-

Description: 
Currently user can 'grant','revoke' and show 'user_permission' through hbase 
shell. And there are client api implemented in AccessControlClient.java for 
'grant' and 'revoke'. This jira is to add the 'user_permission' feature. 

To keep interface consistant, this jira will also update user_permission.rb to 
use this API directly. The test result is 
{code}
hbase(main):001:0> user_permission
User
Table,Family,Qualifier:Permission   
   
 hbase  dn:t1,,: [Permission: 
actions=READ,WRITE,EXEC,CREATE,ADMIN]   
 
 biadminetest,,: [Permission: 
actions=READ,WRITE,EXEC,CREATE,ADMIN]   
 
 hive   t1_dn,,: [Permission: 
actions=READ,WRITE] 
 
 biadmintable1,,: [Permission: 
actions=READ,WRITE,EXEC,CREATE,ADMIN]   

 biadmintable2,,: [Permission: 
actions=READ,WRITE,EXEC,CREATE,ADMIN]   

 biadmintest_dn,,: [Permission: 
actions=READ,WRITE,EXEC,CREATE,ADMIN]   
   
6 row(s) in 1.6220 seconds

hbase(main):002:0> user_permission 't.*'
User
Table,Family,Qualifier:Permission   
   
 hive   t1_dn,,: [Permission: 
actions=READ,WRITE] 
 
 biadmintable1,,: [Permission: 
actions=READ,WRITE,EXEC,CREATE,ADMIN]   

 biadmintable2,,: [Permission: 
actions=READ,WRITE,EXEC,CREATE,ADMIN]   

 biadmintest_dn,,: [Permission: 
actions=READ,WRITE,EXEC,CREATE,ADMIN]   
   
4 row(s) in 0.2130 seconds

hbase(main):003:0> user_permission 'dn:t1'
User
Table,Family,Qualifier:Permission   
   
 hbase  dn:t1,,: [Permission: 
actions=READ,WRITE,EXEC,CREATE,ADMIN]   
 
1 row(s) in 0.0790 seconds
{code}

  was:
Currently user can 'grant','revoke' and show 'user_permission' through hbase 
shell. And there are client api implemented in AccessControlClient.java for 
'grant' and 'revoke'. This jira is to add the 'user_permission' feature. 

To keep interface consistant, this jira will also update user_permission.rb to 
use this API directly


> add userPermission feature in AccessControlClient as client API 
> 
>
> Key: HBASE-11452
> URL: https://issues.apache.org/jira/browse/HBASE-11452
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, security
>Affects Versions: 0.98.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 2.0.0
>
> Attachments: HBASE-11452-master-v0.patch
>
>
> Currently user can 'grant','revoke' and show 'user_permission' through hbase 
> shell. And there are client api implemented in AccessControlClient.java for 
> 'grant' and 'revoke'. This jira is to add the 'user_permission' feature. 
> To keep interface consistant, this jira will also update user_permission.rb 
> to use this API directly. The test result is 
> {code}
> hbase(main):001:0> user_permission
> User
> Tab

[jira] [Updated] (HBASE-11452) add userPermission feature in AccessControlClient as client API

2014-07-01 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11452:
-

Attachment: HBASE-11452-master-v0.patch

> add userPermission feature in AccessControlClient as client API 
> 
>
> Key: HBASE-11452
> URL: https://issues.apache.org/jira/browse/HBASE-11452
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, security
>Affects Versions: 0.98.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 2.0.0
>
> Attachments: HBASE-11452-master-v0.patch
>
>
> Currently user can 'grant','revoke' and show 'user_permission' through hbase 
> shell. And there are client api implemented in AccessControlClient.java for 
> 'grant' and 'revoke'. This jira is to add the 'user_permission' feature. 
> To keep interface consistant, this jira will also update user_permission.rb 
> to use this API directly



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11452) add userPermission feature in AccessControlClient as client API

2014-07-01 Thread Demai Ni (JIRA)
Demai Ni created HBASE-11452:


 Summary: add userPermission feature in AccessControlClient as 
client API 
 Key: HBASE-11452
 URL: https://issues.apache.org/jira/browse/HBASE-11452
 Project: HBase
  Issue Type: Improvement
  Components: security
Affects Versions: 0.99.0
Reporter: Demai Ni
Assignee: Demai Ni
 Fix For: 0.99.0


Currently user can 'grant','revoke' and show 'user_permission' through hbase 
shell. And there are client api implemented in AccessControlClient.java for 
'grant' and 'revoke'. This jira is to add the 'user_permission' feature. 

To keep interface consistant, this jira will also update user_permission.rb to 
use this API directly



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10851) Wait for regionservers to join the cluster

2014-06-30 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047965#comment-14047965
 ] 

Demai Ni commented on HBASE-10851:
--

[~jxiang]

many thanks for your reponse. sorry for the delay, got occupied during weekend. 
 My cluster is a single node cluster, but I put this property
   
  hbase.cluster.distributed
  true
   
to mimic distributed, I guess that 'fool' the logic in 
{code:title="LocalHBaseCluster"}
-- 
  /**
   * @param c Configuration to check.
   * @return True if a 'local' address in hbase.master value.
   */
  public static boolean isLocal(final Configuration c) {
boolean mode = c.getBoolean(HConstants.CLUSTER_DISTRIBUTED, 
HConstants.DEFAULT_CLUSTER_DISTRIBUTED);
return(mode == HConstants.CLUSTER_IS_LOCAL);
  }
{code}
{code:title="HMasterCommandLine"}
...
if (LocalHBaseCluster.isLocal(conf)) {
 DefaultMetricsSystem.setMiniClusterMode(true);
+conf.setInt(ServerManager.WAIT_ON_REGIONSERVERS_MINTOSTART, 1);
...
}

{code}

> Wait for regionservers to join the cluster
> --
>
> Key: HBASE-10851
> URL: https://issues.apache.org/jira/browse/HBASE-10851
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Critical
> Fix For: 0.99.0
>
> Attachments: hbase-10851.patch, hbase-10851_v2.patch
>
>
> With HBASE-10569, if regionservers are started a while after the master, all 
> regions will be assigned to the master.  That may not be what users expect.
> A work-around is to always start regionservers before masters.
> I was wondering if the master can wait a little for other regionservers to 
> join.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

2014-06-27 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046562#comment-14046562
 ] 

Demai Ni commented on HBASE-9531:
-

[~apurtell], 

can you please take a look at the new patch. 

about this: 
bq.  Might be good to have a summary option for replication by default but not 
necessary
for the default of 'replication', I am including both 'sink' and 'source' info, 
like below. It is more like a 'detailed' option instead of 'summary' 
{code}
hbase(main):002:0> status
1 servers, 0 dead, 19. average load

hbase(main):003:0> status 'replication'
version 0.99.0-SNAPSHOT
1 live servers
hdtest014.svl.ibm.com:
SOURCE:PeerID=15, AgeOfLastShippedOp=307, SizeOfLogQueue=0, 
TimeStampsOfLastShippedOp=Fri Jun 27 17:00:44 PDT 2014, Replication Lag=0
SINK  :AgeOfLastAppliedOp=1129746, TimeStampsOfLastAppliedOp=Fri Jun 27 
17:10:18 PDT 2014
{code}


> a command line (hbase shell) interface to retreive the replication metrics 
> and show replication lag
> ---
>
> Key: HBASE-9531
> URL: https://issues.apache.org/jira/browse/HBASE-9531
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Affects Versions: 0.99.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.5
>
> Attachments: HBASE-9531-master-v1.patch, HBASE-9531-trunk-v0.patch, 
> HBASE-9531-trunk-v0.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive 
> the replication metrics info such as:ageOfLastShippedOp, 
> timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and 
> timeStampsOfLastAppliedOp. And also to provide a point of time info of the 
> lag of replication(source only)
> Understand that hbase is using Hadoop 
> metrics(http://hbase.apache.org/metrics.html), which is a common way to 
> monitor metric info. This Jira is to serve as a light-weight client 
> interface, comparing to a completed(certainly better, but heavier)GUI 
> monitoring package. I made the code works on 0.94.9 now, and like to use this 
> jira to get opinions about whether the feature is valuable to other 
> users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell 
> command 'status', and invent a new module, called ReplicationLoad.  In 
> HRegionServer.buildServerLoad() , use the local replication service objects 
> to get their loads  which could be wrapped in a ReplicationLoad object and 
> then simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
> ReplicationSinkMetrics, a few getters and setters will be created, and ask 
> Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for 
> his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
>   if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
> timeStampsOfLastShippedOp)) //err on the large side
>   else if (current time - timeStampsOfLastShippedOp) < 2* 
> ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
> recently 
> else lag = 0 // last shipped may happens last night, so NO real lag 
> although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastS

[jira] [Updated] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

2014-06-27 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-9531:


Attachment: HBASE-9531-master-v1.patch

updated patch to address Andrew's comments

> a command line (hbase shell) interface to retreive the replication metrics 
> and show replication lag
> ---
>
> Key: HBASE-9531
> URL: https://issues.apache.org/jira/browse/HBASE-9531
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Affects Versions: 0.99.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.5
>
> Attachments: HBASE-9531-master-v1.patch, HBASE-9531-trunk-v0.patch, 
> HBASE-9531-trunk-v0.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive 
> the replication metrics info such as:ageOfLastShippedOp, 
> timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and 
> timeStampsOfLastAppliedOp. And also to provide a point of time info of the 
> lag of replication(source only)
> Understand that hbase is using Hadoop 
> metrics(http://hbase.apache.org/metrics.html), which is a common way to 
> monitor metric info. This Jira is to serve as a light-weight client 
> interface, comparing to a completed(certainly better, but heavier)GUI 
> monitoring package. I made the code works on 0.94.9 now, and like to use this 
> jira to get opinions about whether the feature is valuable to other 
> users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell 
> command 'status', and invent a new module, called ReplicationLoad.  In 
> HRegionServer.buildServerLoad() , use the local replication service objects 
> to get their loads  which could be wrapped in a ReplicationLoad object and 
> then simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
> ReplicationSinkMetrics, a few getters and setters will be created, and ask 
> Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for 
> his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
>   if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
> timeStampsOfLastShippedOp)) //err on the large side
>   else if (current time - timeStampsOfLastShippedOp) < 2* 
> ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
> recently 
> else lag = 0 // last shipped may happens last night, so NO real lag 
> although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','sink'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','lag' 
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com: lag = 0
>     hdtest018.svl.ibm.com: lag = 14
>     hd

[jira] [Commented] (HBASE-11431) Add support of running from command line for 'hbase shell'

2014-06-27 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046547#comment-14046547
 ] 

Demai Ni commented on HBASE-11431:
--

currently. I usually did this to ge the same functionality. 

1) put the hbase shell commands in to a txt file. For example, I put this line 
put 't1_dn15','row5','cf1:q1','row5_from15'
into file cmd.txt
2) then run this
$hbase shell < cmd.txt

It will work well like this
{code}
$ hbase shell < cmd.txt 
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.99.0-SNAPSHOT, redfe6592dfcab57f6f2a78f73d4fc788e62707e9, Fri Jun 27 
15:36:44 PDT 2014

put 't1_dn15','row5','cf1:q1','row5_from15'
0 row(s) in 0.4750 seconds

{code}


And user can certainly put more than one command in the text file. So will that 
serve the requirement of this jira? 


> Add support of running from command line for 'hbase shell'
> --
>
> Key: HBASE-11431
> URL: https://issues.apache.org/jira/browse/HBASE-11431
> Project: HBase
>  Issue Type: New Feature
>  Components: Admin
>Affects Versions: 0.89-fb
>Reporter: Yi Deng
>Priority: Minor
>  Labels: shell
> Fix For: 0.89-fb
>
>
> Add support of running from command line for 'hbase shell'.
> Now you can execute shell command from the bash like this:
>   bin/hbase shell --exec='scan ".META"' 
> The result can be piped to grep or other command.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10851) Wait for regionservers to join the cluster

2014-06-27 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046522#comment-14046522
 ] 

Demai Ni commented on HBASE-10851:
--

[~jxiang],

I encoutered this msg on master log on single-node cluster using trunk build 
"INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers 
count to settle; currently checked in 1, slept for 978938 ms, expecting 
*minimum of 2*, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms, 
selfCheckedIn true"

This jira change the default value to 2, and saw your comments "... The minimum 
regionservers to wait is changed from 1 to 2 so the active master is included. 
For standalone sever, the minimum regionservers to wait is set to 1."

After added property 'hbase.master.wait.on.regionservers.mintostart' into 
hbase-site.xml, and my cluster up and run again. 

My question is from migration/configuration perspective.  In the case an 
existing single-node cluster migrated to 1.0 (in my case, my cluster was using 
98.2, I stopped hbase, replaced hbase jars, restarted hbase, and hit problem), 
is there a way to add such configuration into hbase-site.xml automatically? I 
examed the hbase-default.xml, and couldn't find the property, and also couldn't 
figure out whether we can use two different default values for single-node vs 
multi-node cluster.  

thanks

Demai




> Wait for regionservers to join the cluster
> --
>
> Key: HBASE-10851
> URL: https://issues.apache.org/jira/browse/HBASE-10851
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Critical
> Fix For: 0.99.0
>
> Attachments: hbase-10851.patch, hbase-10851_v2.patch
>
>
> With HBASE-10569, if regionservers are started a while after the master, all 
> regions will be assigned to the master.  That may not be what users expect.
> A work-around is to always start regionservers before masters.
> I was wondering if the master can wait a little for other regionservers to 
> join.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

2014-06-24 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043004#comment-14043004
 ] 

Demai Ni commented on HBASE-9531:
-

[~apurtell], 

thanks for the review and comments. 
bq. Can we just not set the new fields in ClusterStatus if replication is not 
active?
let me check the code. should be ok to not set it with replication disabled. 

bq. In ReplicationLoad.java, please don't start method names with capital 
letters
I will update the patch and correct them.

bq. The default status command is 'summary', so we shouldn't dump all of the 
source and sink information as default, that's not a summary by definition.
the code logic doesn't change the existing behavior of 'status' and the output 
of 'status summary'. The code in admin.rb will only check 2nd argument if the 
first arg is 'replication', otherwise, the code flow will go to the existing 
'summary'/default logic, and won't contain the replication information. I will 
double check it.

Again appreciate the help. I will put up an updated patch later this week. 

Demai

> a command line (hbase shell) interface to retreive the replication metrics 
> and show replication lag
> ---
>
> Key: HBASE-9531
> URL: https://issues.apache.org/jira/browse/HBASE-9531
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Affects Versions: 0.99.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.5
>
> Attachments: HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive 
> the replication metrics info such as:ageOfLastShippedOp, 
> timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and 
> timeStampsOfLastAppliedOp. And also to provide a point of time info of the 
> lag of replication(source only)
> Understand that hbase is using Hadoop 
> metrics(http://hbase.apache.org/metrics.html), which is a common way to 
> monitor metric info. This Jira is to serve as a light-weight client 
> interface, comparing to a completed(certainly better, but heavier)GUI 
> monitoring package. I made the code works on 0.94.9 now, and like to use this 
> jira to get opinions about whether the feature is valuable to other 
> users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell 
> command 'status', and invent a new module, called ReplicationLoad.  In 
> HRegionServer.buildServerLoad() , use the local replication service objects 
> to get their loads  which could be wrapped in a ReplicationLoad object and 
> then simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
> ReplicationSinkMetrics, a few getters and setters will be created, and ask 
> Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for 
> his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
>   if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
> timeStampsOfLastShippedOp)) //err on the large side
>   else if (current time - timeStampsOfLastShippedOp) < 2* 
> ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
> recently 
> else lag = 0 // last shipped may happens last night, so NO real lag 
> although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, siz

[jira] [Updated] (HBASE-11327) ExportSnapshot hit stackoverflow error when target snapshotDir doesn't contain uri

2014-06-11 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11327:
-

Attachment: HBASE-11327-0.98-v0.patch

> ExportSnapshot hit stackoverflow error when target snapshotDir doesn't 
> contain uri
> --
>
> Key: HBASE-11327
> URL: https://issues.apache.org/jira/browse/HBASE-11327
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.2
>Reporter: Demai Ni
>Assignee: Demai Ni
>Priority: Minor
> Fix For: 0.99.0, 0.98.4
>
> Attachments: HBASE-11327-0.98-v0.patch, HBASE-11327-trunk-v0.patch
>
>
> {code}
> $hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 
> snapshotT1_dn -copy-to /user/demai/backup1
> Exception in thread "main" java.lang.StackOverflowError
> at java.util.regex.Pattern$Slice.match(Pattern.java:3490)
> at java.util.regex.Pattern$Start.match(Pattern.java:3066)
> at java.util.regex.Matcher.search(Matcher.java:1116)
> at java.util.regex.Matcher.find(Matcher.java:546)
> at 
> org.apache.hadoop.conf.Configuration.substituteVars(Configuration.java:681)
> at org.apache.hadoop.conf.Configuration.get(Configuration.java:893)
> at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:175)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360)
> {code}
> the following command will work with uri
> {code}
> hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshotT1_dn 
> -copy-to hdfs://hdtest014.svl.ibm.com:9000/user/demai/backup2
> {code}
> The bug is the same as 
> [Hadoop-9069|https://issues.apache.org/jira/browse/HADOOP-9069]. Since the 
> hadoop jira has been sitting there for more than a year, use this jira for a 
> local hbase fix for now. 
> Many thanks for [~mbertozzi] help on this one. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11327) ExportSnapshot hit stackoverflow error when target snapshotDir doesn't contain uri

2014-06-11 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11327:
-

Attachment: HBASE-11327-trunk-v0.patch

> ExportSnapshot hit stackoverflow error when target snapshotDir doesn't 
> contain uri
> --
>
> Key: HBASE-11327
> URL: https://issues.apache.org/jira/browse/HBASE-11327
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.2
>Reporter: Demai Ni
>Assignee: Demai Ni
>Priority: Minor
> Fix For: 0.99.0, 0.98.4
>
> Attachments: HBASE-11327-0.98-v0.patch, HBASE-11327-trunk-v0.patch
>
>
> {code}
> $hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 
> snapshotT1_dn -copy-to /user/demai/backup1
> Exception in thread "main" java.lang.StackOverflowError
> at java.util.regex.Pattern$Slice.match(Pattern.java:3490)
> at java.util.regex.Pattern$Start.match(Pattern.java:3066)
> at java.util.regex.Matcher.search(Matcher.java:1116)
> at java.util.regex.Matcher.find(Matcher.java:546)
> at 
> org.apache.hadoop.conf.Configuration.substituteVars(Configuration.java:681)
> at org.apache.hadoop.conf.Configuration.get(Configuration.java:893)
> at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:175)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360)
> {code}
> the following command will work with uri
> {code}
> hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshotT1_dn 
> -copy-to hdfs://hdtest014.svl.ibm.com:9000/user/demai/backup2
> {code}
> The bug is the same as 
> [Hadoop-9069|https://issues.apache.org/jira/browse/HADOOP-9069]. Since the 
> hadoop jira has been sitting there for more than a year, use this jira for a 
> local hbase fix for now. 
> Many thanks for [~mbertozzi] help on this one. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11327) ExportSnapshot hit stackoverflow error when target snapshotDir doesn't contain uri

2014-06-11 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11327:
-

Status: Patch Available  (was: Open)

> ExportSnapshot hit stackoverflow error when target snapshotDir doesn't 
> contain uri
> --
>
> Key: HBASE-11327
> URL: https://issues.apache.org/jira/browse/HBASE-11327
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.2
>Reporter: Demai Ni
>Assignee: Demai Ni
>Priority: Minor
> Fix For: 0.99.0, 0.98.4
>
> Attachments: HBASE-11327-0.98-v0.patch, HBASE-11327-trunk-v0.patch
>
>
> {code}
> $hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 
> snapshotT1_dn -copy-to /user/demai/backup1
> Exception in thread "main" java.lang.StackOverflowError
> at java.util.regex.Pattern$Slice.match(Pattern.java:3490)
> at java.util.regex.Pattern$Start.match(Pattern.java:3066)
> at java.util.regex.Matcher.search(Matcher.java:1116)
> at java.util.regex.Matcher.find(Matcher.java:546)
> at 
> org.apache.hadoop.conf.Configuration.substituteVars(Configuration.java:681)
> at org.apache.hadoop.conf.Configuration.get(Configuration.java:893)
> at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:175)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360)
> {code}
> the following command will work with uri
> {code}
> hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshotT1_dn 
> -copy-to hdfs://hdtest014.svl.ibm.com:9000/user/demai/backup2
> {code}
> The bug is the same as 
> [Hadoop-9069|https://issues.apache.org/jira/browse/HADOOP-9069]. Since the 
> hadoop jira has been sitting there for more than a year, use this jira for a 
> local hbase fix for now. 
> Many thanks for [~mbertozzi] help on this one. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11327) ExportSnapshot hit stackoverflow error when target snapshotDir doesn't contain uri

2014-06-11 Thread Demai Ni (JIRA)
Demai Ni created HBASE-11327:


 Summary: ExportSnapshot hit stackoverflow error when target 
snapshotDir doesn't contain uri
 Key: HBASE-11327
 URL: https://issues.apache.org/jira/browse/HBASE-11327
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Affects Versions: 0.98.2
Reporter: Demai Ni
Assignee: Demai Ni
Priority: Minor
 Fix For: 0.99.0, 0.98.4


{code}
$hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshotT1_dn 
-copy-to /user/demai/backup1

Exception in thread "main" java.lang.StackOverflowError
at java.util.regex.Pattern$Slice.match(Pattern.java:3490)
at java.util.regex.Pattern$Start.match(Pattern.java:3066)
at java.util.regex.Matcher.search(Matcher.java:1116)
at java.util.regex.Matcher.find(Matcher.java:546)
at 
org.apache.hadoop.conf.Configuration.substituteVars(Configuration.java:681)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:893)
at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:175)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360)

{code}

the following command will work with uri
{code}
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshotT1_dn 
-copy-to hdfs://hdtest014.svl.ibm.com:9000/user/demai/backup2
{code}

The bug is the same as 
[Hadoop-9069|https://issues.apache.org/jira/browse/HADOOP-9069]. Since the 
hadoop jira has been sitting there for more than a year, use this jira for a 
local hbase fix for now. 

Many thanks for [~mbertozzi] help on this one. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot

2014-06-04 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-7912:


Attachment: HBaseBackupRestore-Jira-7912-DesignDoc-v2.pdf

uploaded designDoc V2 with a few minor changes, and listed limitations

> HBase Backup/Restore Based on HBase Snapshot
> 
>
> Key: HBASE-7912
> URL: https://issues.apache.org/jira/browse/HBASE-7912
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Richard Ding
>Assignee: Richard Ding
> Attachments: HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, 
> HBaseBackupRestore-Jira-7912-DesignDoc-v2.pdf, 
> HBase_BackupRestore-Jira-7912-CLI-v1.pdf
>
>
> Finally, we completed the implementation of our backup/restore solution, and 
> would like to share with community through this jira. 
> We are leveraging existing hbase snapshot feature, and provide a general 
> solution to common users. Our full backup is using snapshot to capture 
> metadata locally and using exportsnapshot to move data to another cluster; 
> the incremental backup is using offline-WALplayer to backup HLogs; we also 
> leverage global distribution rolllog and flush to improve performance; other 
> added-on values such as convert, merge, progress report, and CLI commands. So 
> that a common user can backup hbase data without in-depth knowledge of hbase. 
>  Our solution also contains some usability features for enterprise users. 
> The detail design document and CLI command will be attached in this jira. We 
> plan to use 10~12 subtasks to share each of the following features, and 
> document the detail implement in the subtasks: 
> * *Full Backup* : provide local and remote back/restore for a list of tables
> * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental 
> backup)
> * *distributed* Logroll and distributed flush 
> * Backup *Manifest* and history
> * *Incremental* backup: to build on top of full backup as daily/weekly backup 
> * *Convert*  incremental backup WAL files into hfiles
> * *Merge* several backup images into one(like merge weekly into monthly)
> * *add and remove* table to and from Backup image
> * *Cancel* a backup process
> * backup progress *status*
> * full backup based on *existing snapshot*
> *-*
> *Below is the original description, to keep here as the history for the 
> design and discussion back in 2013*
> There have been attempts in the past to come up with a viable HBase 
> backup/restore solution (e.g., HBASE-4618).  Recently, there are many 
> advancements and new features in HBase, for example, FileLink, Snapshot, and 
> Distributed Barrier Procedure. This is a proposal for a backup/restore 
> solution that utilizes these new features to achieve better performance and 
> consistency. 
>  
> A common practice of backup and restore in database is to first take full 
> baseline backup, and then periodically take incremental backup that capture 
> the changes since the full baseline backup. HBase cluster can store massive 
> amount data.  Combination of full backups with incremental backups has 
> tremendous benefit for HBase as well.  The following is a typical scenario 
> for full and incremental backup.
> # The user takes a full backup of a table or a set of tables in HBase. 
> # The user schedules periodical incremental backups to capture the changes 
> from the full backup, or from last incremental backup.
> # The user needs to restore table data to a past point of time.
> # The full backup is restored to the table(s) or to different table name(s).  
> Then the incremental backups that are up to the desired point in time are 
> applied on top of the full backup. 
> We would support the following key features and capabilities.
> * Full backup uses HBase snapshot to capture HFiles.
> * Use HBase WALs to capture incremental changes, but we use bulk load of 
> HFiles for fast incremental restore.
> * Support single table or a set of tables, and column family level backup and 
> restore.
> * Restore to different table names.
> * Support adding additional tables or CF to backup set without interruption 
> of incremental backup schedule.
> * Support rollup/combining of incremental backups into longer period and 
> bigger incremental backups.
> * Unified command line interface for all the above.
> The solution will support HBase backup to FileSystem, either on the same 
> cluster or across clusters.  It has the flexibility to support backup to 
> other devices and servers in the future.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot

2014-06-04 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017973#comment-14017973
 ] 

Demai Ni commented on HBASE-7912:
-

[~fenghh], many thanks for the comments 

{quote}
"Use case example 1" in page 3: The full backup doesn't contain data of table3 
and table4, so when restoring table3 and table4, their data are all restored 
from the incremental backups, right? Sounds it's not a typical 
scenario(full-backup + incremental backups) for backup/restore.
{quote}
during step c. ".. user adds other table.." this actually triggers an implicite 
full backup for table 3 and table 4. So when restore them in the future, the 
data will come both full and incremental backup. 

{quote}
"4. Full Backup": Does log roll take place after taking (full) snapshot? What 
if new writes arrive after taking snapshot but before log roll?
{quote} 
the logic is to take log roll first and then snapshot. if new writes arrive in 
between, it will be saved in the full backup image. And the same writes will be 
saved again in the next incremental backup. The approach is to ensure no data 
loss by allowing duplicate puts during restore. 

{quote}
"5. Incremental Backup": What if some RS fails during the log roll procedure so 
that not all current log number are recorded onto ZooKeeper?
{quote}
in such case, the backup process will abort, and the clean up logic is the same 
as [HBASE-11172 cancel a backup process | 
https://issues.apache.org/jira/browse/HBASE-11172]. The code will remove the 
incomplete backup image and roll back zookeeper state to the previous backup. 

{quote} 
What if some log files are archived/deleted between two incremental backups and 
are not included in any incremental backup? Is it possible?
{quote} 
Good point. (also thanks to [~mbertozzi], who pointed out the same problem 
earlier). There is a log cleaner that hasn't been included in the patch yet. It 
is called BackupLogCleaner extended from BaseLogCleanerDelegate, as part of 
hbase.master.logcleaner.plugins. It would keep the logs. The side-effect would 
be (if user don't do incremental too often) too much log files left. We have a 
stop -all feature to remove all backup tables, also will free up the logs. 

Thanks for pointing out the typo. I will fix them up in the doc. 


> HBase Backup/Restore Based on HBase Snapshot
> 
>
> Key: HBASE-7912
> URL: https://issues.apache.org/jira/browse/HBASE-7912
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Richard Ding
>Assignee: Richard Ding
> Attachments: HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, 
> HBase_BackupRestore-Jira-7912-CLI-v1.pdf
>
>
> Finally, we completed the implementation of our backup/restore solution, and 
> would like to share with community through this jira. 
> We are leveraging existing hbase snapshot feature, and provide a general 
> solution to common users. Our full backup is using snapshot to capture 
> metadata locally and using exportsnapshot to move data to another cluster; 
> the incremental backup is using offline-WALplayer to backup HLogs; we also 
> leverage global distribution rolllog and flush to improve performance; other 
> added-on values such as convert, merge, progress report, and CLI commands. So 
> that a common user can backup hbase data without in-depth knowledge of hbase. 
>  Our solution also contains some usability features for enterprise users. 
> The detail design document and CLI command will be attached in this jira. We 
> plan to use 10~12 subtasks to share each of the following features, and 
> document the detail implement in the subtasks: 
> * *Full Backup* : provide local and remote back/restore for a list of tables
> * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental 
> backup)
> * *distributed* Logroll and distributed flush 
> * Backup *Manifest* and history
> * *Incremental* backup: to build on top of full backup as daily/weekly backup 
> * *Convert*  incremental backup WAL files into hfiles
> * *Merge* several backup images into one(like merge weekly into monthly)
> * *add and remove* table to and from Backup image
> * *Cancel* a backup process
> * backup progress *status*
> * full backup based on *existing snapshot*
> *-*
> *Below is the original description, to keep here as the history for the 
> design and discussion back in 2013*
> There have been attempts in the past to come up with a viable HBase 
> backup/restore solution (e.g., HBASE-4618).  Recently, there are many 
> advancements and new features in HBase, for example, FileLink, Snapshot, and 
> Distributed Barrier Procedure. This is a proposal for a backup/restore 
> solution that utilizes these new features to achie

[jira] [Commented] (HBASE-10289) Avoid random port usage by default JMX Server. Create Custome JMX server

2014-06-03 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14016977#comment-14016977
 ] 

Demai Ni commented on HBASE-10289:
--

[~andrew.purt...@gmail.com], Qiang mentioned offline last night that 0.98 code 
may be slightly different comparing to trunk, and he is looking into that, and 
will provide the patch if it is indeed different. He is at another timezone, so 
I responded for him... Demai

> Avoid random port usage by default JMX Server. Create Custome JMX server
> 
>
> Key: HBASE-10289
> URL: https://issues.apache.org/jira/browse/HBASE-10289
> Project: HBase
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: Qiang Tian
>Priority: Minor
>  Labels: stack
> Fix For: 0.99.0
>
> Attachments: HBASE-10289-v4.patch, HBASE-10289.patch, 
> HBASE-10289_1.patch, HBASE-10289_2.patch, HBASE-10289_3.patch, 
> HBase10289-master.patch, hbase10289-master-v1.patch, 
> hbase10289-master-v2.patch
>
>
> If we enable JMX MBean server for HMaster or Region server  through VM 
> arguments, the process will use one random which we cannot configure.
> This can be a problem if that random port is configured for some other 
> service.
> This issue can be avoided by supporting  a custom JMX Server.
> The ports can be configured. If there is no ports configured, it will 
> continue the same way as now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11274) More general single-row Condition Mutation

2014-05-30 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013871#comment-14013871
 ] 

Demai Ni commented on HBASE-11274:
--

[~lshmouse], definitely a valid use case here. I think the other SQL layer 
compoments provide similar function already, so pushing it into HBase code 
engineer need to give us better performance? can you please share the design a 
bit? And it will be great if it can decide the order of filters. 

thanks.

Demai

> More general single-row Condition Mutation
> --
>
> Key: HBASE-11274
> URL: https://issues.apache.org/jira/browse/HBASE-11274
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liu Shaohui
>Priority: Minor
>
> Currently, the checkAndDelete and checkAndPut interface  only support atomic 
> mutation with single condition. But in actual apps, we need more general 
> condition-mutation that support multi conditions and logical expression with 
> those conditions.
> For example, to support the following sql
> {quote}
>   insert row  where (column A == 'X' and column B == 'Y') or (column C == 'z')
> {quote}
> Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11085) Incremental Backup Restore support

2014-05-29 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11085:
-

Attachment: HLogPlayer.java

attached a customized HLogPlayer, which is used in incremental backup for the 
'convert' feature. That is convert from HLog to HFile. The code is very similar 
to WALPlayer except that it is offline(w/o live hbase cluster).  Currently the 
code is kind of messy and only for backup code.  

So posted here only to show as prototype. 

We are working on a general code in 
[HBASE-11170|https://issues.apache.org/jira/browse/HBASE-11170]. so the real 
patch will be provided it. 





> Incremental Backup Restore support
> --
>
> Key: HBASE-11085
> URL: https://issues.apache.org/jira/browse/HBASE-11085
> Project: HBase
>  Issue Type: New Feature
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0
>
> Attachments: 
> HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch, 
> HBASE-11085-trunk-v1.patch, 
> HBASE-11085-trunk-v2-contain-HBASE-10900-trunk-v4.patch, 
> HBASE-11085-trunk-v2.patch, HLogPlayer.java
>
>
> h2. Feature Description
> the jira is part of  
> [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on 
> full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. 
> for the detail layout and frame work, please reference to  [HBASE-10900| 
> https://issues.apache.org/jira/browse/HBASE-10900].
> When client issues an incremental backup request, BackupManager will check 
> the request and then kicks of a global procedure via HBaseAdmin for all the 
> active regionServer to roll log. Each Region server will record their log 
> number into zookeeper. Then we determine which log need to be included in 
> this incremental backup, and use DistCp to copy them to target location. At 
> the same time, a dependency of backup image will be recorded, and later on 
> saved in Backup Manifest file.
> Restore is to replay the backuped WAL logs on target HBase instance. The 
> replay will occur after full backup.
> As incremental backup image depends on prior full backup image and 
> incremental images if exists. Manifest file will be used to store the 
> dependency lineage during backup, and used during restore time for PIT 
> restore.  
> h2. Use case(i.e  example)
> {code:title=Incremental Backup Restore example|borderStyle=solid}
> /***/
> /* STEP1:  FULL backup from sourcecluster to targetcluster  
> /* if no table name specified, all tables from source cluster will be 
> backuped 
> /***/
> [sourcecluster]$ hbase backup create full 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn
> ...
> 14/05/09 13:35:46 INFO backup.BackupManager: Backup request 
> backup_1399667695966 has been executed.
> /***/
> /* STEP2:   In HBase Shell, put a few rows
> 
> /***/
> hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1'
> hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1'
> hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1'
> /***/
> /* STEP3:   Take the 1st incremental backup   
>  
> /***/
> [sourcecluster]$ hbase backup create incremental 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir
> ...
> 14/05/09 13:37:45 INFO backup.BackupManager: Backup request 
> backup_1399667851020 has been executed.
> /***/
> /* STEP4:   In HBase Shell, put a few more rows.  
> 
> /*   update 'row100', and create new 'row101' 
>   
> /***/
> hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2'
> hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2'
> hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2'
> /*

[jira] [Commented] (HBASE-11085) Incremental Backup Restore support

2014-05-28 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011284#comment-14011284
 ] 

Demai Ni commented on HBASE-11085:
--

uploaded v2 patch to review board : https://reviews.apache.org/r/21492/

also put a combined review(both full and incremental backup) here: 
https://reviews.apache.org/r/21981/. 

> Incremental Backup Restore support
> --
>
> Key: HBASE-11085
> URL: https://issues.apache.org/jira/browse/HBASE-11085
> Project: HBase
>  Issue Type: New Feature
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0
>
> Attachments: 
> HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch, 
> HBASE-11085-trunk-v1.patch, 
> HBASE-11085-trunk-v2-contain-HBASE-10900-trunk-v4.patch, 
> HBASE-11085-trunk-v2.patch
>
>
> h2. Feature Description
> the jira is part of  
> [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on 
> full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. 
> for the detail layout and frame work, please reference to  [HBASE-10900| 
> https://issues.apache.org/jira/browse/HBASE-10900].
> When client issues an incremental backup request, BackupManager will check 
> the request and then kicks of a global procedure via HBaseAdmin for all the 
> active regionServer to roll log. Each Region server will record their log 
> number into zookeeper. Then we determine which log need to be included in 
> this incremental backup, and use DistCp to copy them to target location. At 
> the same time, a dependency of backup image will be recorded, and later on 
> saved in Backup Manifest file.
> Restore is to replay the backuped WAL logs on target HBase instance. The 
> replay will occur after full backup.
> As incremental backup image depends on prior full backup image and 
> incremental images if exists. Manifest file will be used to store the 
> dependency lineage during backup, and used during restore time for PIT 
> restore.  
> h2. Use case(i.e  example)
> {code:title=Incremental Backup Restore example|borderStyle=solid}
> /***/
> /* STEP1:  FULL backup from sourcecluster to targetcluster  
> /* if no table name specified, all tables from source cluster will be 
> backuped 
> /***/
> [sourcecluster]$ hbase backup create full 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn
> ...
> 14/05/09 13:35:46 INFO backup.BackupManager: Backup request 
> backup_1399667695966 has been executed.
> /***/
> /* STEP2:   In HBase Shell, put a few rows
> 
> /***/
> hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1'
> hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1'
> hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1'
> /***/
> /* STEP3:   Take the 1st incremental backup   
>  
> /***/
> [sourcecluster]$ hbase backup create incremental 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir
> ...
> 14/05/09 13:37:45 INFO backup.BackupManager: Backup request 
> backup_1399667851020 has been executed.
> /***/
> /* STEP4:   In HBase Shell, put a few more rows.  
> 
> /*   update 'row100', and create new 'row101' 
>   
> /***/
> hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2'
> hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2'
> hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2'
> /***/
> /* STEP5:   Take the 2nd incremental backup   
> 
> /***/
> [sourcecluster

[jira] [Commented] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot

2014-05-28 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011276#comment-14011276
 ] 

Demai Ni commented on HBASE-7912:
-

hi, guys,

I opened a review for the framework (patches of both full and incremental 
backup) here: https://reviews.apache.org/r/21981/. 
Thanks for your suggestion/comments.

Demai

> HBase Backup/Restore Based on HBase Snapshot
> 
>
> Key: HBASE-7912
> URL: https://issues.apache.org/jira/browse/HBASE-7912
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Richard Ding
>Assignee: Richard Ding
> Attachments: HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, 
> HBase_BackupRestore-Jira-7912-CLI-v1.pdf
>
>
> Finally, we completed the implementation of our backup/restore solution, and 
> would like to share with community through this jira. 
> We are leveraging existing hbase snapshot feature, and provide a general 
> solution to common users. Our full backup is using snapshot to capture 
> metadata locally and using exportsnapshot to move data to another cluster; 
> the incremental backup is using offline-WALplayer to backup HLogs; we also 
> leverage global distribution rolllog and flush to improve performance; other 
> added-on values such as convert, merge, progress report, and CLI commands. So 
> that a common user can backup hbase data without in-depth knowledge of hbase. 
>  Our solution also contains some usability features for enterprise users. 
> The detail design document and CLI command will be attached in this jira. We 
> plan to use 10~12 subtasks to share each of the following features, and 
> document the detail implement in the subtasks: 
> * *Full Backup* : provide local and remote back/restore for a list of tables
> * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental 
> backup)
> * *distributed* Logroll and distributed flush 
> * Backup *Manifest* and history
> * *Incremental* backup: to build on top of full backup as daily/weekly backup 
> * *Convert*  incremental backup WAL files into hfiles
> * *Merge* several backup images into one(like merge weekly into monthly)
> * *add and remove* table to and from Backup image
> * *Cancel* a backup process
> * backup progress *status*
> * full backup based on *existing snapshot*
> *-*
> *Below is the original description, to keep here as the history for the 
> design and discussion back in 2013*
> There have been attempts in the past to come up with a viable HBase 
> backup/restore solution (e.g., HBASE-4618).  Recently, there are many 
> advancements and new features in HBase, for example, FileLink, Snapshot, and 
> Distributed Barrier Procedure. This is a proposal for a backup/restore 
> solution that utilizes these new features to achieve better performance and 
> consistency. 
>  
> A common practice of backup and restore in database is to first take full 
> baseline backup, and then periodically take incremental backup that capture 
> the changes since the full baseline backup. HBase cluster can store massive 
> amount data.  Combination of full backups with incremental backups has 
> tremendous benefit for HBase as well.  The following is a typical scenario 
> for full and incremental backup.
> # The user takes a full backup of a table or a set of tables in HBase. 
> # The user schedules periodical incremental backups to capture the changes 
> from the full backup, or from last incremental backup.
> # The user needs to restore table data to a past point of time.
> # The full backup is restored to the table(s) or to different table name(s).  
> Then the incremental backups that are up to the desired point in time are 
> applied on top of the full backup. 
> We would support the following key features and capabilities.
> * Full backup uses HBase snapshot to capture HFiles.
> * Use HBase WALs to capture incremental changes, but we use bulk load of 
> HFiles for fast incremental restore.
> * Support single table or a set of tables, and column family level backup and 
> restore.
> * Restore to different table names.
> * Support adding additional tables or CF to backup set without interruption 
> of incremental backup schedule.
> * Support rollup/combining of incremental backups into longer period and 
> bigger incremental backups.
> * Unified command line interface for all the above.
> The solution will support HBase backup to FileSystem, either on the same 
> cluster or across clusters.  It has the flexibility to support backup to 
> other devices and servers in the future.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11085) Incremental Backup Restore support

2014-05-27 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11085:
-

Description: 
h2. Feature Description
the jira is part of  
[HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on 
full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. 
for the detail layout and frame work, please reference to  [HBASE-10900| 
https://issues.apache.org/jira/browse/HBASE-10900].

When client issues an incremental backup request, BackupManager will check the 
request and then kicks of a global procedure via HBaseAdmin for all the active 
regionServer to roll log. Each Region server will record their log number into 
zookeeper. Then we determine which log need to be included in this incremental 
backup, and use DistCp to copy them to target location. At the same time, a 
dependency of backup image will be recorded, and later on saved in Backup 
Manifest file.

Restore is to replay the backuped WAL logs on target HBase instance. The replay 
will occur after full backup.

As incremental backup image depends on prior full backup image and incremental 
images if exists. Manifest file will be used to store the dependency lineage 
during backup, and used during restore time for PIT restore.  

h2. Use case(i.e  example)
{code:title=Incremental Backup Restore example|borderStyle=solid}
/***/
/* STEP1:  FULL backup from sourcecluster to targetcluster  
/* if no table name specified, all tables from source cluster will be backuped 
/***/
[sourcecluster]$ hbase backup create full 
hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn
...
14/05/09 13:35:46 INFO backup.BackupManager: Backup request 
backup_1399667695966 has been executed.
/***/
/* STEP2:   In HBase Shell, put a few rows  
  
/***/
hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1'
hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1'
hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1'

/***/
/* STEP3:   Take the 1st incremental backup 
   
/***/
[sourcecluster]$ hbase backup create incremental 
hdfs://hostname.targetcluster.org:9000/userid/backupdir
...
14/05/09 13:37:45 INFO backup.BackupManager: Backup request 
backup_1399667851020 has been executed.

/***/
/* STEP4:   In HBase Shell, put a few more rows.
  
/*   update 'row100', and create new 'row101'   

/***/
hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2'
hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2'
hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2'
hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2'
hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2'
hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2'

/***/
/* STEP5:   Take the 2nd incremental backup 
  
/***/
[sourcecluster]$ hbase backup create incremental 
hdfs://hostname.targetcluster.org:9000/userid/backupdir
...
14/05/09 13:39:33 INFO backup.BackupManager: Backup request 
backup_1399667959165 has been executed.

/***/
/* STEP7:   Restore from PIT of the 1st incremental backup 
/* specified the backup ID of the 1st incremental   
   
/* option -automatic, will trigger the restore of full backup first, then 1st   
/* incremental backup image 
   
/* t1_dn,etc are the original table names. All tables will be restored if not 
specified 
/* t1_dn_restore, etc. are the restored table. if not specified, orginal table 
name will be used
/***

[jira] [Updated] (HBASE-11085) Incremental Backup Restore support

2014-05-27 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11085:
-

Description: 
h2. Feature Description
the jira is part of  
[HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on 
full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. 
for the detail layout and frame work, please reference to  [HBASE-10900| 
https://issues.apache.org/jira/browse/HBASE-10900].

When client issues an incremental backup request, BackupManager will check the 
request and then kicks of a global procedure via HBaseAdmin for all the active 
regionServer to roll log. Each Region server will record their log number into 
zookeeper. Then we determine which log need to be included in this incremental 
backup, and use DistCp to copy them to target location. At the same time, a 
dependency of backup image will be recorded, and later on saved in Backup 
Manifest file.

Restore is to replay the backuped WAL logs on target HBase instance. The replay 
will occur after full backup.

As incremental backup image depends on prior full backup image and incremental 
images if exists. Manifest file will be used to store the dependency lineage 
during backup, and used during restore time for PIT restore.  

h2. Use case(i.e  example)
{code:title=Incremental Backup Restore example|borderStyle=solid}
/***/
/* STEP1:  FULL backup from sourcecluster to targetcluster  
/* if no table name specified, all tables from source cluster will be backuped 
/***/
[sourcecluster]$ hbase backup create full 
hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn
...
14/05/09 13:35:46 INFO backup.BackupManager: Backup request 
backup_1399667695966 has been executed.
/***/
/* STEP2:   In HBase Shell, put a few rows  
  
/***/
hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1'
hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1'
hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1'

/***/
/* STEP3:   Take the 1st incremental backup 
   
/***/
[sourcecluster]$ hbase backup create incremental 
hdfs://hostname.targetcluster.org:9000/userid/backupdir
...
14/05/09 13:37:45 INFO backup.BackupManager: Backup request 
backup_1399667851020 has been executed.

/***/
/* STEP4:   In HBase Shell, put a few more rows.
  
/*   update 'row100', and create new 'row101'   

/***/
hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2'
hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2'
hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2'
hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2'
hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2'
hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2'

/***/
/* STEP5:   Take the 2nd incremental backup 
  
/***/
[sourcecluster]$ hbase backup create incremental 
hdfs://hostname.targetcluster.org:9000/userid/backupdir
...
14/05/09 13:39:33 INFO backup.BackupManager: Backup request 
backup_1399667959165 has been executed.

/***/
/* STEP7:   Restore from PIT of the 1st incremental backup 
/* specified the backup ID of the 1st incremental   
   
/* option -automatic, will trigger the restore of full backup first, then 1st   
/* incremental backup image 
   
/* t1_dn,etc are the original table names. All tables will be restored if not 
specified 
/* t1_dn_restore, etc. are the restored table. if not specified, orginal table 
name will be used
/***

[jira] [Updated] (HBASE-11085) Incremental Backup Restore support

2014-05-27 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11085:
-

Attachment: HBASE-11085-trunk-v2-contain-HBASE-10900-trunk-v4.patch

attached V2 patch, which contains:
1) a unit testcase (thanks to [~tianq], who implemented it)
2) address comments from [~tedyu]
3) fix a few long line warning, and java doc warnings

> Incremental Backup Restore support
> --
>
> Key: HBASE-11085
> URL: https://issues.apache.org/jira/browse/HBASE-11085
> Project: HBase
>  Issue Type: New Feature
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0
>
> Attachments: 
> HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch, 
> HBASE-11085-trunk-v1.patch, 
> HBASE-11085-trunk-v2-contain-HBASE-10900-trunk-v4.patch, 
> HBASE-11085-trunk-v2.patch
>
>
> h2. Feature Description
> the jira is part of  
> [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on 
> full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. 
> for the detail layout and frame work, please reference to  [HBASE-10900| 
> https://issues.apache.org/jira/browse/HBASE-10900].
> When client issues an incremental backup request, BackupManager will check 
> the request and then kicks of a global procedure via HBaseAdmin for all the 
> active regionServer to roll log. Each Region server will record their log 
> number into zookeeper. Then we determine which log need to be included in 
> this incremental backup, and use DistCp to copy them to target location. At 
> the same time, a dependency of backup image will be recorded, and later on 
> saved in Backup Manifest file.
> Restore is to replay the backuped WAL logs on target HBase instance. The 
> replay will occur after full backup.
> As incremental backup image depends on prior full backup image and 
> incremental images if exists. Manifest file will be used to store the 
> dependency lineage during backup, and used during restore time for PIT 
> restore.  
> h2. Use case(i.e  example)
> {code:title=Incremental Backup Restore example|borderStyle=solid}
> /***/
> /* STEP1:  FULL backup from sourcecluster to targetcluster  
> /* if no table name specified, all tables from source cluster will be 
> backuped 
> /***/
> [sourcecluster]$ hbase backup create full 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn
> ...
> 14/05/09 13:35:46 INFO backup.BackupManager: Backup request 
> backup_1399667695966 has been executed.
> /***/
> /* STEP2:   In HBase Shell, put a few rows
> 
> /***/
> hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1'
> hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1'
> hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1'
> /***/
> /* STEP3:   Take the 1st incremental backup   
>  
> /***/
> [sourcecluster]$ hbase backup create incremental 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir
> ...
> 14/05/09 13:37:45 INFO backup.BackupManager: Backup request 
> backup_1399667851020 has been executed.
> /***/
> /* STEP4:   In HBase Shell, put a few more rows.  
> 
> /*   update 'row100', and create new 'row101' 
>   
> /***/
> hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2'
> hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2'
> hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2'
> /***/
> /* STEP5:   Take the 2nd incremental backup   
> 
> /*

[jira] [Updated] (HBASE-11085) Incremental Backup Restore support

2014-05-27 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11085:
-

Attachment: HBASE-11085-trunk-v2.patch

> Incremental Backup Restore support
> --
>
> Key: HBASE-11085
> URL: https://issues.apache.org/jira/browse/HBASE-11085
> Project: HBase
>  Issue Type: New Feature
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0
>
> Attachments: 
> HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch, 
> HBASE-11085-trunk-v1.patch, HBASE-11085-trunk-v2.patch
>
>
> h2. Feature Description
> the jira is part of  
> [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on 
> full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. 
> for the detail layout and frame work, please reference to  [HBASE-10900| 
> https://issues.apache.org/jira/browse/HBASE-10900].
> When client issues an incremental backup request, BackupManager will check 
> the request and then kicks of a global procedure via HBaseAdmin for all the 
> active regionServer to roll log. Each Region server will record their log 
> number into zookeeper. Then we determine which log need to be included in 
> this incremental backup, and use DistCp to copy them to target location. At 
> the same time, a dependency of backup image will be recorded, and later on 
> saved in Backup Manifest file.
> Restore is to replay the backuped WAL logs on target HBase instance. The 
> replay will occur after full backup.
> As incremental backup image depends on prior full backup image and 
> incremental images if exists. Manifest file will be used to store the 
> dependency lineage during backup, and used during restore time for PIT 
> restore.  
> h2. Use case(i.e  example)
> {code:title=Incremental Backup Restore example|borderStyle=solid}
> /***/
> /* STEP1:  FULL backup from sourcecluster to targetcluster  
> /* if no table name specified, all tables from source cluster will be 
> backuped 
> /***/
> [sourcecluster]$ hbase backup create full 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn
> ...
> 14/05/09 13:35:46 INFO backup.BackupManager: Backup request 
> backup_1399667695966 has been executed.
> /***/
> /* STEP2:   In HBase Shell, put a few rows
> 
> /***/
> hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1'
> hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1'
> hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1'
> /***/
> /* STEP3:   Take the 1st incremental backup   
>  
> /***/
> [sourcecluster]$ hbase backup create incremental 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir
> ...
> 14/05/09 13:37:45 INFO backup.BackupManager: Backup request 
> backup_1399667851020 has been executed.
> /***/
> /* STEP4:   In HBase Shell, put a few more rows.  
> 
> /*   update 'row100', and create new 'row101' 
>   
> /***/
> hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2'
> hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2'
> hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2'
> /***/
> /* STEP5:   Take the 2nd incremental backup   
> 
> /***/
> [sourcecluster]$ hbase backup create incremental 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir
> ...
> 14/05/09 13:39:33 INFO backup.BackupManager: Backup request 
> backup_1399667959165 has been executed.
> /

[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

2014-05-27 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010262#comment-14010262
 ] 

Demai Ni commented on HBASE-9531:
-

[~jdcryans], may I know your takes about this feature? thanks... Demai

> a command line (hbase shell) interface to retreive the replication metrics 
> and show replication lag
> ---
>
> Key: HBASE-9531
> URL: https://issues.apache.org/jira/browse/HBASE-9531
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Affects Versions: 0.99.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.4
>
> Attachments: HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive 
> the replication metrics info such as:ageOfLastShippedOp, 
> timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and 
> timeStampsOfLastAppliedOp. And also to provide a point of time info of the 
> lag of replication(source only)
> Understand that hbase is using Hadoop 
> metrics(http://hbase.apache.org/metrics.html), which is a common way to 
> monitor metric info. This Jira is to serve as a light-weight client 
> interface, comparing to a completed(certainly better, but heavier)GUI 
> monitoring package. I made the code works on 0.94.9 now, and like to use this 
> jira to get opinions about whether the feature is valuable to other 
> users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell 
> command 'status', and invent a new module, called ReplicationLoad.  In 
> HRegionServer.buildServerLoad() , use the local replication service objects 
> to get their loads  which could be wrapped in a ReplicationLoad object and 
> then simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
> ReplicationSinkMetrics, a few getters and setters will be created, and ask 
> Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for 
> his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
>   if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
> timeStampsOfLastShippedOp)) //err on the large side
>   else if (current time - timeStampsOfLastShippedOp) < 2* 
> ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
> recently 
> else lag = 0 // last shipped may happens last night, so NO real lag 
> although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','sink'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','lag' 
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com: lag = 0
>     hdtest018.svl.ibm.com: lag = 1

[jira] [Updated] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

2014-05-27 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-9531:


Description: 
This jira is to provide a command line (hbase shell) interface to retreive the 
replication metrics info such as:ageOfLastShippedOp, timeStampsOfLastShippedOp, 
sizeOfLogQueue ageOfLastAppliedOp, and timeStampsOfLastAppliedOp. And also to 
provide a point of time info of the lag of replication(source only)

Understand that hbase is using Hadoop 
metrics(http://hbase.apache.org/metrics.html), which is a common way to monitor 
metric info. This Jira is to serve as a light-weight client interface, 
comparing to a completed(certainly better, but heavier)GUI monitoring package. 
I made the code works on 0.94.9 now, and like to use this jira to get opinions 
about whether the feature is valuable to other users/workshop. If so, I will 
build a trunk patch. 

All inputs are greatly appreciated. Thank you!

The overall design is to reuse the existing logic which supports hbase shell 
command 'status', and invent a new module, called ReplicationLoad.  In 
HRegionServer.buildServerLoad() , use the local replication service objects to 
get their loads  which could be wrapped in a ReplicationLoad object and then 
simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
ReplicationSinkMetrics, a few getters and setters will be created, and ask 
Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for his 
kindly suggestions through dev email list)

the replication lag will be calculated for source only, and use this formula: 
{code:title=Replication lag|borderStyle=solid}
if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
timeStampsOfLastShippedOp)) //err on the large side
else if (current time - timeStampsOfLastShippedOp) < 2* 
ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
recently 
else lag = 0 // last shipped may happens last night, so NO real lag 
although ageOfLastShippedOp is non-zero
{code}

External will look something like:
{code:title=status 'replication'|borderStyle=solid}
hbase(main):001:0> status 'replication'
version 0.94.9
3 live servers
    hdtest017.svl.ibm.com:
    SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
    SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
14:48:48 PDT 2013
    hdtest018.svl.ibm.com:
    SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
    SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
14:50:59 PDT 2013
    hdtest015.svl.ibm.com:
    SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
    SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
14:48:48 PDT 2013

hbase(main):002:0> status 'replication','source'
version 0.94.9
3 live servers
    hdtest017.svl.ibm.com:
    SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
    hdtest018.svl.ibm.com:
    SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
    hdtest015.svl.ibm.com:
    SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013

hbase(main):003:0> status 'replication','sink'
version 0.94.9
3 live servers
    hdtest017.svl.ibm.com:
    SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
14:48:48 PDT 2013
    hdtest018.svl.ibm.com:
    SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
14:50:59 PDT 2013
    hdtest015.svl.ibm.com:
    SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
14:48:48 PDT 2013

hbase(main):003:0> status 'replication','lag' 
version 0.94.9
3 live servers
    hdtest017.svl.ibm.com: lag = 0
    hdtest018.svl.ibm.com: lag = 14
    hdtest015.svl.ibm.com: lag = 0
{code}



  was:
This jira is to provide a command line (hbase shell) interface to retreive the 
replication metrics info such as:ageOfLastShippedOp, timeStampsOfLastShippedOp, 
sizeOfLogQueue ageOfLastAppliedOp, and timeStampsOfLastAppliedOp. And also to 
provide a point of time info of the lag of replication(source only)

Understand that hbase is using Hadoop 
metrics(http://hbase.apache.org/metrics.html), which is a common way to monitor 
metric info. This Jira is to serve as a light-weight client interface, 
comparing to a completed(certainly better, but heavier)GUI monitoring package. 
I made the code works on 0.94.9 now, and like to use this jira to get opinions 
about whether the feature is valuable to other users/workshop. If so, I will 
build a trunk patch. 

All inputs are greatly appreciated. Thank you!

The overall design is to

[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

2014-05-27 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010084#comment-14010084
 ] 

Demai Ni commented on HBASE-9531:
-

about the release audit warning(31 of them), this file 
(https://builds.apache.org/job/PreCommit-HBASE-Build/9602//artifact/patchprocess/patchReleaseAuditWarnings.txt)
 pointes the problems for all the files under "patchprocess", such as: 
{quote}
31 Unknown Licenses

***

Unapproved licenses:

  patchprocess/newPatchFindbugsWarningshbase-examples.xml
  patchprocess/newPatchFindbugsWarningshbase-thrift.xml
  patchprocess/patchFindbugsWarningshbase-prefix-tree.xml
  patchprocess/newPatchFindbugsWarningshbase-server.xml
  patchprocess/patchFindbugsWarningshbase-hadoop-compat.xml
  patchprocess/patchFindbugsWarningshbase-common.xml
  patchprocess/newPatchFindbugsWarningshbase-thrift.html

{quote}

probably a side-effect when switching to git? 

> a command line (hbase shell) interface to retreive the replication metrics 
> and show replication lag
> ---
>
> Key: HBASE-9531
> URL: https://issues.apache.org/jira/browse/HBASE-9531
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Affects Versions: 0.99.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.4
>
> Attachments: HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive 
> the replication metrics info such as:ageOfLastShippedOp, 
> timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and 
> timeStampsOfLastAppliedOp. And also to provide a point of time info of the 
> lag of replication(source only)
> Understand that hbase is using Hadoop 
> metrics(http://hbase.apache.org/metrics.html), which is a common way to 
> monitor metric info. This Jira is to serve as a light-weight client 
> interface, comparing to a completed(certainly better, but heavier)GUI 
> monitoring package. I made the code works on 0.94.9 now, and like to use this 
> jira to get opinions about whether the feature is valuable to other 
> users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell 
> command 'status', and invent a new module, called ReplicationLoad.  In 
> HRegionServer.buildServerLoad() , use the local replication service objects 
> to get their loads  which could be wrapped in a ReplicationLoad object and 
> then simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
> ReplicationSinkMetrics, a few getters and setters will be created, and ask 
> Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for 
> his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
>   if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
> timeStampsOfLastShippedOp)) //err on the large side
>   else if (current time - timeStampsOfLastShippedOp) < 2* 
> ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
> recently 
>   else lag = 0 // last shipped may happens last night, so NO real 
> lag although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1

[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

2014-05-27 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010064#comment-14010064
 ] 

Demai Ni commented on HBASE-9531:
-

'The patch introduces the following lines longer than 100' comes from two 
places: 
first two are from ClusterStatusProtos.java, which is generated
the 2nd two are from admin.rb, where there are a lot of lines longer than 100, 
probably for the ruby coding-style(?) 

do we need to keep the lines under 100 in both cases? 

Demai

> a command line (hbase shell) interface to retreive the replication metrics 
> and show replication lag
> ---
>
> Key: HBASE-9531
> URL: https://issues.apache.org/jira/browse/HBASE-9531
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Affects Versions: 0.99.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.4
>
> Attachments: HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive 
> the replication metrics info such as:ageOfLastShippedOp, 
> timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and 
> timeStampsOfLastAppliedOp. And also to provide a point of time info of the 
> lag of replication(source only)
> Understand that hbase is using Hadoop 
> metrics(http://hbase.apache.org/metrics.html), which is a common way to 
> monitor metric info. This Jira is to serve as a light-weight client 
> interface, comparing to a completed(certainly better, but heavier)GUI 
> monitoring package. I made the code works on 0.94.9 now, and like to use this 
> jira to get opinions about whether the feature is valuable to other 
> users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell 
> command 'status', and invent a new module, called ReplicationLoad.  In 
> HRegionServer.buildServerLoad() , use the local replication service objects 
> to get their loads  which could be wrapped in a ReplicationLoad object and 
> then simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
> ReplicationSinkMetrics, a few getters and setters will be created, and ask 
> Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for 
> his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
>   if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
> timeStampsOfLastShippedOp)) //err on the large side
>   else if (current time - timeStampsOfLastShippedOp) < 2* 
> ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
> recently 
>   else lag = 0 // last shipped may happens last night, so NO real 
> lag although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','sink'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.s

[jira] [Updated] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

2014-05-26 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-9531:


Attachment: HBASE-9531-trunk-v0.patch

for some reason, the patch doesn't trigger HadoopQA the first time, so attached 
it again with fingers crossed

> a command line (hbase shell) interface to retreive the replication metrics 
> and show replication lag
> ---
>
> Key: HBASE-9531
> URL: https://issues.apache.org/jira/browse/HBASE-9531
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Affects Versions: 0.99.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.4
>
> Attachments: HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive 
> the replication metrics info such as:ageOfLastShippedOp, 
> timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and 
> timeStampsOfLastAppliedOp. And also to provide a point of time info of the 
> lag of replication(source only)
> Understand that hbase is using Hadoop 
> metrics(http://hbase.apache.org/metrics.html), which is a common way to 
> monitor metric info. This Jira is to serve as a light-weight client 
> interface, comparing to a completed(certainly better, but heavier)GUI 
> monitoring package. I made the code works on 0.94.9 now, and like to use this 
> jira to get opinions about whether the feature is valuable to other 
> users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell 
> command 'status', and invent a new module, called ReplicationLoad.  In 
> HRegionServer.buildServerLoad() , use the local replication service objects 
> to get their loads  which could be wrapped in a ReplicationLoad object and 
> then simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
> ReplicationSinkMetrics, a few getters and setters will be created, and ask 
> Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for 
> his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
>   if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
> timeStampsOfLastShippedOp)) //err on the large side
>   else if (current time - timeStampsOfLastShippedOp) < 2* 
> ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
> recently 
>   else lag = 0 // last shipped may happens last night, so NO real 
> lag although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','sink'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','lag' 
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com: lag = 0
>   

[jira] [Updated] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

2014-05-26 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-9531:


Status: Patch Available  (was: In Progress)

> a command line (hbase shell) interface to retreive the replication metrics 
> and show replication lag
> ---
>
> Key: HBASE-9531
> URL: https://issues.apache.org/jira/browse/HBASE-9531
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Affects Versions: 0.99.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.4
>
> Attachments: HBASE-9531-trunk-v0.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive 
> the replication metrics info such as:ageOfLastShippedOp, 
> timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and 
> timeStampsOfLastAppliedOp. And also to provide a point of time info of the 
> lag of replication(source only)
> Understand that hbase is using Hadoop 
> metrics(http://hbase.apache.org/metrics.html), which is a common way to 
> monitor metric info. This Jira is to serve as a light-weight client 
> interface, comparing to a completed(certainly better, but heavier)GUI 
> monitoring package. I made the code works on 0.94.9 now, and like to use this 
> jira to get opinions about whether the feature is valuable to other 
> users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell 
> command 'status', and invent a new module, called ReplicationLoad.  In 
> HRegionServer.buildServerLoad() , use the local replication service objects 
> to get their loads  which could be wrapped in a ReplicationLoad object and 
> then simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
> ReplicationSinkMetrics, a few getters and setters will be created, and ask 
> Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for 
> his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
>   if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
> timeStampsOfLastShippedOp)) //err on the large side
>   else if (current time - timeStampsOfLastShippedOp) < 2* 
> ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
> recently 
>   else lag = 0 // last shipped may happens last night, so NO real 
> lag although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','sink'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','lag' 
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com: lag = 0
>     hdtest018.svl.ibm.com: lag = 14
>     hdtest015.svl.ibm.com: lag = 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6

[jira] [Updated] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

2014-05-26 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-9531:


Status: In Progress  (was: Patch Available)

> a command line (hbase shell) interface to retreive the replication metrics 
> and show replication lag
> ---
>
> Key: HBASE-9531
> URL: https://issues.apache.org/jira/browse/HBASE-9531
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Affects Versions: 0.99.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.4
>
> Attachments: HBASE-9531-trunk-v0.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive 
> the replication metrics info such as:ageOfLastShippedOp, 
> timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and 
> timeStampsOfLastAppliedOp. And also to provide a point of time info of the 
> lag of replication(source only)
> Understand that hbase is using Hadoop 
> metrics(http://hbase.apache.org/metrics.html), which is a common way to 
> monitor metric info. This Jira is to serve as a light-weight client 
> interface, comparing to a completed(certainly better, but heavier)GUI 
> monitoring package. I made the code works on 0.94.9 now, and like to use this 
> jira to get opinions about whether the feature is valuable to other 
> users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell 
> command 'status', and invent a new module, called ReplicationLoad.  In 
> HRegionServer.buildServerLoad() , use the local replication service objects 
> to get their loads  which could be wrapped in a ReplicationLoad object and 
> then simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
> ReplicationSinkMetrics, a few getters and setters will be created, and ask 
> Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for 
> his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
>   if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
> timeStampsOfLastShippedOp)) //err on the large side
>   else if (current time - timeStampsOfLastShippedOp) < 2* 
> ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
> recently 
>   else lag = 0 // last shipped may happens last night, so NO real 
> lag although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','sink'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','lag' 
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com: lag = 0
>     hdtest018.svl.ibm.com: lag = 14
>     hdtest015.svl.ibm.com: lag = 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6

[jira] [Updated] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

2014-05-23 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-9531:


Fix Version/s: 0.98.4
   0.99.0
Affects Version/s: 0.99.0
   Status: Patch Available  (was: Open)

> a command line (hbase shell) interface to retreive the replication metrics 
> and show replication lag
> ---
>
> Key: HBASE-9531
> URL: https://issues.apache.org/jira/browse/HBASE-9531
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Affects Versions: 0.99.0
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0, 0.98.4
>
> Attachments: HBASE-9531-trunk-v0.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive 
> the replication metrics info such as:ageOfLastShippedOp, 
> timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and 
> timeStampsOfLastAppliedOp. And also to provide a point of time info of the 
> lag of replication(source only)
> Understand that hbase is using Hadoop 
> metrics(http://hbase.apache.org/metrics.html), which is a common way to 
> monitor metric info. This Jira is to serve as a light-weight client 
> interface, comparing to a completed(certainly better, but heavier)GUI 
> monitoring package. I made the code works on 0.94.9 now, and like to use this 
> jira to get opinions about whether the feature is valuable to other 
> users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell 
> command 'status', and invent a new module, called ReplicationLoad.  In 
> HRegionServer.buildServerLoad() , use the local replication service objects 
> to get their loads  which could be wrapped in a ReplicationLoad object and 
> then simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
> ReplicationSinkMetrics, a few getters and setters will be created, and ask 
> Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for 
> his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
>   if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
> timeStampsOfLastShippedOp)) //err on the large side
>   else if (current time - timeStampsOfLastShippedOp) < 2* 
> ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
> recently 
>   else lag = 0 // last shipped may happens last night, so NO real 
> lag although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','sink'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','lag' 
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com: lag = 0
>     hdtest018.svl.ibm.com: lag = 14
>    

[jira] [Updated] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

2014-05-23 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-9531:


Attachment: HBASE-9531-trunk-v0.patch

finally, get back to build a trunk patch for this one.  I can build a 0.98 
patch if needed

> a command line (hbase shell) interface to retreive the replication metrics 
> and show replication lag
> ---
>
> Key: HBASE-9531
> URL: https://issues.apache.org/jira/browse/HBASE-9531
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Demai Ni
>Assignee: Demai Ni
> Attachments: HBASE-9531-trunk-v0.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive 
> the replication metrics info such as:ageOfLastShippedOp, 
> timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and 
> timeStampsOfLastAppliedOp. And also to provide a point of time info of the 
> lag of replication(source only)
> Understand that hbase is using Hadoop 
> metrics(http://hbase.apache.org/metrics.html), which is a common way to 
> monitor metric info. This Jira is to serve as a light-weight client 
> interface, comparing to a completed(certainly better, but heavier)GUI 
> monitoring package. I made the code works on 0.94.9 now, and like to use this 
> jira to get opinions about whether the feature is valuable to other 
> users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell 
> command 'status', and invent a new module, called ReplicationLoad.  In 
> HRegionServer.buildServerLoad() , use the local replication service objects 
> to get their loads  which could be wrapped in a ReplicationLoad object and 
> then simply pass it to the ServerLoad. In ReplicationSourceMetrics and 
> ReplicationSinkMetrics, a few getters and setters will be created, and ask 
> Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for 
> his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
>   if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - 
> timeStampsOfLastShippedOp)) //err on the large side
>   else if (current time - timeStampsOfLastShippedOp) < 2* 
> ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen 
> recently 
>   else lag = 0 // last shipped may happens last night, so NO real 
> lag although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest015.svl.ibm.com:
>     SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, 
> timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','sink'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>     SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 
> 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','lag' 
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com: lag = 0
>     hdtest018.svl.ibm.com: lag = 14
>     hdtest015.svl.ibm.com: lag = 0
> {code}



--
This message was sent by Atlas

[jira] [Commented] (HBASE-11085) Incremental Backup Restore support

2014-05-16 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998969#comment-13998969
 ] 

Demai Ni commented on HBASE-11085:
--

open review board:  https://reviews.apache.org/r/21492/

[~stack], [~tedyu],  [~mbertozzi], and other folks,   looking forward to you 
takes. Many thanks... Demai




> Incremental Backup Restore support
> --
>
> Key: HBASE-11085
> URL: https://issues.apache.org/jira/browse/HBASE-11085
> Project: HBase
>  Issue Type: New Feature
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0
>
> Attachments: 
> HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch, 
> HBASE-11085-trunk-v1.patch
>
>
> h2. Feature Description
> the jira is part of  
> [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on 
> full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. 
> for the detail layout and frame work, please reference to  [HBASE-10900| 
> https://issues.apache.org/jira/browse/HBASE-10900].
> When client issues an incremental backup request, BackupManager will check 
> the request and then kicks of a global procedure via HBaseAdmin for all the 
> active regionServer to roll log. Each Region server will record their log 
> number into zookeeper. Then we determine which log need to be included in 
> this incremental backup, and use DistCp to copy them to target location. At 
> the same time, a dependency of backup image will be recorded, and later on 
> saved in Backup Manifest file.
> Restore is to replay the backuped WAL logs on target HBase instance. The 
> replay will occur after full backup.
> As incremental backup image depends on prior full backup image and 
> incremental images if exists. Manifest file will be used to store the 
> dependency lineage during backup, and used during restore time for PIT 
> restore.  
> h2. Use case(i.e  example)
> {code:title=Incremental Backup Restore example|borderStyle=solid}
> /***/
> /* STEP1:  FULL backup from sourcecluster to targetcluster  
> /* if no table name specified, all tables from source cluster will be 
> backuped 
> /***/
> [sourcecluster]$ hbase backup create full 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn
> ...
> 14/05/09 13:35:46 INFO backup.BackupManager: Backup request 
> backup_1399667695966 has been executed.
> /***/
> /* STEP2:   In HBase Shell, put a few rows
> 
> /***/
> hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1'
> hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1'
> hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1'
> /***/
> /* STEP3:   Take the 1st incremental backup   
>  
> /***/
> [sourcecluster]$ hbase backup create incremental 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir
> ...
> 14/05/09 13:37:45 INFO backup.BackupManager: Backup request 
> backup_1399667851020 has been executed.
> /***/
> /* STEP4:   In HBase Shell, put a few more rows.  
> 
> /*   update 'row100', and create new 'row101' 
>   
> /***/
> hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2'
> hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2'
> hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2'
> /***/
> /* STEP5:   Take the 2nd incremental backup   
> 
> /***/
> [sourcecluster]$ hbase backup create incremental 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir
> ...
> 14

[jira] [Updated] (HBASE-10900) FULL table backup and restore

2014-05-16 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-10900:
-

Attachment: HBASE-10900-trunk-v4.patch

Attached v4 version for trunk, which includes 
1) UT (thanks to [~enhs8920])
2) small interface change to use Snapshot Manifest
3) plugged in a customized 'global log roll' to record info into ZK. A general 
'global log roll' jira will be opened a bit later. and Fullbackup code will 
depend on it

Demai

> FULL table backup and restore
> -
>
> Key: HBASE-10900
> URL: https://issues.apache.org/jira/browse/HBASE-10900
> Project: HBase
>  Issue Type: Task
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0
>
> Attachments: HBASE-10900-fullbackup-trunk-v1.patch, 
> HBASE-10900-trunk-v2.patch, HBASE-10900-trunk-v3.patch, 
> HBASE-10900-trunk-v4.patch
>
>
> h2. Feature Description
> This is a subtask of 
> [HBase-7912|https://issues.apache.org/jira/browse/HBASE-7912] to support FULL 
> backup/restore, and will complete the following function:
> {code:title=Backup Restore example|borderStyle=solid}
> /* backup from sourcecluster to targetcluster 
>  */
> /* if no table name specified, all tables from source cluster will be 
> backuped */
> [sourcecluster]$ hbase backup create full 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn
> /* restore on targetcluser, this is a local restore   
>   */
> /* backup_1396650096738 - backup image name   
>   */
> /* t1_dn,etc are the original table names. All tables will be restored if not 
> specified */
> /* t1_dn_restore, etc. are the restored table. if not specified, orginal 
> table name will be used*/
> [targetcluster]$ hbase restore /userid/backupdir backup_1396650096738 
> t1_dn,t2_dn,t3_dn t1_dn_restore,t2_dn_restore,t3_dn_restore
> /* restore from targetcluster back to source cluster, this is a remote restore
> [sourcecluster]$ hbase restore 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir backup_1396650096738 
> t1_dn,t2_dn,t3_dn t1_dn_restore,t2_dn_restore,t3_dn_restore
> {code}
> h2. Detail layout and frame work for the next jiras
> The patch is a wrapper of the existing snapshot and exportSnapshot, and will 
> use as the base framework for the over-all solution of  
> [HBase-7912|https://issues.apache.org/jira/browse/HBASE-7912] as described 
> below:
> * *bin/hbase*  : end-user command line interface to invoke 
> BackupClient and RestoreClient
> * *BackupClient.java*  : 'main' entry for backup operations. This patch will 
> only support 'full' backup. In future jiras, will support:
> ** *create* incremental backup
> ** *cancel* an ongoing backup
> ** *delete* an exisitng backup image
> ** *describe* the detailed informaiton of backup image
> ** show *history* of all successful backups 
> ** show the *status* of the latest backup request
> ** *convert* incremental backup WAL files into HFiles.  either on-the-fly 
> during create or after create
> ** *merge* backup image
> ** *stop* backup a table of existing backup image
> ** *show* tables of a backup image 
> * *BackupCommands.java* : a place to keep all the command usages and options
> * *BackupManager.java*  : handle backup requests on server-side, create 
> BACKUP ZOOKEEPER nodes to keep track backup. The timestamps kept in zookeeper 
> will be used for future incremental backup (not included in this jira). 
> Create BackupContext and DispatchRequest. 
> * *BackupHandler.java*  : in this patch, it is a wrapper of snapshot and 
> exportsnapshot. In future jiras, 
> ** *timestamps* info will be recorded in ZK
> ** carry on *incremental* backup.  
> ** update backup *progress*
> ** set flags of *status*
> ** build up *backupManifest* file(in this jira only limited info for 
> fullback. later on, timestamps and dependency of multipl backup images are 
> also recorded here)
> ** clean up after *failed* backup 
> ** clean up after *cancelled* backup
> ** allow on-the-fly *convert* during incremental backup 
> * *BackupContext.java* : encapsulate backup information like backup ID, table 
> names, directory info, phase, TimeStamps of backup progress, size of data, 
> ancestor info, etc. 
> * *BackupCopier.java*  : the copying operation.  Later on, to support 
> progress report and mapper estimation; and extends DisCp for progress 
> updating to ZK during backup. 
> * *BackupExcpetion.java*: to handle exception from backup/restore
> * *BackupManifest.java* : encapsulate all the backup image information. The 
> manifest info will be bundled as manifest file together with data. So that 
> each backup image will contain all the info needed for restore. 
> * *BackupStatus.java*   : encapsulate b

[jira] [Created] (HBASE-11172) Cancal a backup process

2014-05-15 Thread Demai Ni (JIRA)
Demai Ni created HBASE-11172:


 Summary: Cancal a backup process 
 Key: HBASE-11172
 URL: https://issues.apache.org/jira/browse/HBASE-11172
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.99.0
Reporter: Demai Ni
 Fix For: 0.99.0


h2. Feature Description
the jira is part of  
[HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on 
full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900] 
and incremental backup [HBASE-11085| 
https://issues.apache.org/jira/browse/HBASE-11085]. for the detail layout and 
frame work, please reference to  [HBASE-10900|  
https://issues.apache.org/jira/browse/HBASE-10900].

A backup operation may need to move handreds/thousands GB of data, and takes 
hours. Sometimes, the operation may take longer than the original maintenance 
time window planned by the administration. So it is necessary to have the 
functionality to cancel the operation and reset all the history/manifest info 
whenever necessary. so that we can have a clean backup in the next time window 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11085) Incremental Backup Restore support

2014-05-15 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11085:
-

Status: Patch Available  (was: Open)

> Incremental Backup Restore support
> --
>
> Key: HBASE-11085
> URL: https://issues.apache.org/jira/browse/HBASE-11085
> Project: HBase
>  Issue Type: New Feature
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0
>
> Attachments: 
> HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch, 
> HBASE-11085-trunk-v1.patch
>
>
> h2. Feature Description
> the jira is part of  
> [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on 
> full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. 
> for the detail layout and frame work, please reference to  [HBASE-10900| 
> https://issues.apache.org/jira/browse/HBASE-10900].
> When client issues an incremental backup request, BackupManager will check 
> the request and then kicks of a global procedure via HBaseAdmin for all the 
> active regionServer to roll log. Each Region server will record their log 
> number into zookeeper. Then we determine which log need to be included in 
> this incremental backup, and use DistCp to copy them to target location. At 
> the same time, a dependency of backup image will be recorded, and later on 
> saved in Backup Manifest file.
> Restore is to replay the backuped WAL logs on target HBase instance. The 
> replay will occur after full backup.
> As incremental backup image depends on prior full backup image and 
> incremental images if exists. Manifest file will be used to store the 
> dependency lineage during backup, and used during restore time for PIT 
> restore.  
> h2. Use case(i.e  example)
> {code:title=Incremental Backup Restore example|borderStyle=solid}
> /***/
> /* STEP1:  FULL backup from sourcecluster to targetcluster  
> /* if no table name specified, all tables from source cluster will be 
> backuped 
> /***/
> [sourcecluster]$ hbase backup create full 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn
> ...
> 14/05/09 13:35:46 INFO backup.BackupManager: Backup request 
> backup_1399667695966 has been executed.
> /***/
> /* STEP2:   In HBase Shell, put a few rows
> 
> /***/
> hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1'
> hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1'
> hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1'
> /***/
> /* STEP3:   Take the 1st incremental backup   
>  
> /***/
> [sourcecluster]$ hbase backup create incremental 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir
> ...
> 14/05/09 13:37:45 INFO backup.BackupManager: Backup request 
> backup_1399667851020 has been executed.
> /***/
> /* STEP4:   In HBase Shell, put a few more rows.  
> 
> /*   update 'row100', and create new 'row101' 
>   
> /***/
> hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2'
> hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2'
> hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2'
> /***/
> /* STEP5:   Take the 2nd incremental backup   
> 
> /***/
> [sourcecluster]$ hbase backup create incremental 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir
> ...
> 14/05/09 13:39:33 INFO backup.BackupManager: Backup request 
> backup_1399667959165 has been executed.
> /**

[jira] [Updated] (HBASE-11085) Incremental Backup Restore support

2014-05-15 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11085:
-

Attachment: HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch
HBASE-11085-trunk-v1.patch

> Incremental Backup Restore support
> --
>
> Key: HBASE-11085
> URL: https://issues.apache.org/jira/browse/HBASE-11085
> Project: HBase
>  Issue Type: New Feature
>Reporter: Demai Ni
>Assignee: Demai Ni
> Fix For: 0.99.0
>
> Attachments: 
> HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch, 
> HBASE-11085-trunk-v1.patch
>
>
> h2. Feature Description
> the jira is part of  
> [HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on 
> full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. 
> for the detail layout and frame work, please reference to  [HBASE-10900| 
> https://issues.apache.org/jira/browse/HBASE-10900].
> When client issues an incremental backup request, BackupManager will check 
> the request and then kicks of a global procedure via HBaseAdmin for all the 
> active regionServer to roll log. Each Region server will record their log 
> number into zookeeper. Then we determine which log need to be included in 
> this incremental backup, and use DistCp to copy them to target location. At 
> the same time, a dependency of backup image will be recorded, and later on 
> saved in Backup Manifest file.
> Restore is to replay the backuped WAL logs on target HBase instance. The 
> replay will occur after full backup.
> As incremental backup image depends on prior full backup image and 
> incremental images if exists. Manifest file will be used to store the 
> dependency lineage during backup, and used during restore time for PIT 
> restore.  
> h2. Use case(i.e  example)
> {code:title=Incremental Backup Restore example|borderStyle=solid}
> /***/
> /* STEP1:  FULL backup from sourcecluster to targetcluster  
> /* if no table name specified, all tables from source cluster will be 
> backuped 
> /***/
> [sourcecluster]$ hbase backup create full 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn
> ...
> 14/05/09 13:35:46 INFO backup.BackupManager: Backup request 
> backup_1399667695966 has been executed.
> /***/
> /* STEP2:   In HBase Shell, put a few rows
> 
> /***/
> hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1'
> hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1'
> hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1'
> /***/
> /* STEP3:   Take the 1st incremental backup   
>  
> /***/
> [sourcecluster]$ hbase backup create incremental 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir
> ...
> 14/05/09 13:37:45 INFO backup.BackupManager: Backup request 
> backup_1399667851020 has been executed.
> /***/
> /* STEP4:   In HBase Shell, put a few more rows.  
> 
> /*   update 'row100', and create new 'row101' 
>   
> /***/
> hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2'
> hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2'
> hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2'
> hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2'
> /***/
> /* STEP5:   Take the 2nd incremental backup   
> 
> /***/
> [sourcecluster]$ hbase backup create incremental 
> hdfs://hostname.targetcluster.org:9000/userid/backupdir
> ...
> 14/05/09 13:39:33 INFO backup.BackupManager: Backup request 
> backup_1399667959165 has been executed.
> /***

[jira] [Updated] (HBASE-11085) Incremental Backup Restore support

2014-05-15 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11085:
-

Description: 
h2. Feature Description
the jira is part of  
[HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on 
full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. 
for the detail layout and frame work, please reference to  [HBASE-10900| 
https://issues.apache.org/jira/browse/HBASE-10900].

When client issues an incremental backup request, BackupManager will check the 
request and then kicks of a global procedure via HBaseAdmin for all the active 
regionServer to roll log. Each Region server will record their log number into 
zookeeper. Then we determine which log need to be included in this incremental 
backup, and use DistCp to copy them to target location. At the same time, a 
dependency of backup image will be recorded, and later on saved in Backup 
Manifest file.

Restore is to replay the backuped WAL logs on target HBase instance. The replay 
will occur after full backup.

As incremental backup image depends on prior full backup image and incremental 
images if exists. Manifest file will be used to store the dependency lineage 
during backup, and used during restore time for PIT restore.  

h2. Use case(i.e  example)
{code:title=Incremental Backup Restore example|borderStyle=solid}
/***/
/* STEP1:  FULL backup from sourcecluster to targetcluster  
/* if no table name specified, all tables from source cluster will be backuped 
/***/
[sourcecluster]$ hbase backup create full 
hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn
...
14/05/09 13:35:46 INFO backup.BackupManager: Backup request 
backup_1399667695966 has been executed.
/***/
/* STEP2:   In HBase Shell, put a few rows  
  
/***/
hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1'
hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1'
hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1'

/***/
/* STEP3:   Take the 1st incremental backup 
   
/***/
[sourcecluster]$ hbase backup create incremental 
hdfs://hostname.targetcluster.org:9000/userid/backupdir
...
14/05/09 13:37:45 INFO backup.BackupManager: Backup request 
backup_1399667851020 has been executed.

/***/
/* STEP4:   In HBase Shell, put a few more rows.
  
/*   update 'row100', and create new 'row101'   

/***/
hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2'
hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2'
hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2'
hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2'
hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2'
hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2'

/***/
/* STEP5:   Take the 2nd incremental backup 
  
/***/
[sourcecluster]$ hbase backup create incremental 
hdfs://hostname.targetcluster.org:9000/userid/backupdir
...
14/05/09 13:39:33 INFO backup.BackupManager: Backup request 
backup_1399667959165 has been executed.

/***/
/* STEP7:   Restore from PIT of the 1st incremental backup 
/* specified the backup ID of the 1st incremental   
   
/* option -automatic, will trigger the restore of full backup first, then 1st   
/* incremental backup image 
   
/* t1_dn,etc are the original table names. All tables will be restored if not 
specified 
/* t1_dn_restore, etc. are the restored table. if not specified, orginal table 
name will be used
/***

[jira] [Commented] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot

2014-05-15 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994060#comment-13994060
 ] 

Demai Ni commented on HBASE-7912:
-

hi, folks,

We have patches for both full backup (v4) and incremental backup (v1) uploaded 
today. With that, it is easy to apply this 
[HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch| 
https://issues.apache.org/jira/secure/attachment/12644215/HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch]
 directly on trunk, and give it a try. 

Please see the example in both incremental backup jira : 
[HBASE-11085|https://issues.apache.org/jira/browse/HBASE-11085] and fullbackup 
jira : [HBASE-10900|https://issues.apache.org/jira/browse/HBASE-10900].

We will open more jiras and patches for other features (just as, merge, 
convert, delete, history, progress) in the coming weeks.

Also, thanks for the review comments from [~tedyu], [~mbertozzi], [~stack], and 
others. We will have a few follow-up improvements about zookeeper, protobuff, 
and leveraging the new snapshot manifest

Demai

> HBase Backup/Restore Based on HBase Snapshot
> 
>
> Key: HBASE-7912
> URL: https://issues.apache.org/jira/browse/HBASE-7912
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Richard Ding
>Assignee: Richard Ding
> Attachments: HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, 
> HBase_BackupRestore-Jira-7912-CLI-v1.pdf
>
>
> Finally, we completed the implementation of our backup/restore solution, and 
> would like to share with community through this jira. 
> We are leveraging existing hbase snapshot feature, and provide a general 
> solution to common users. Our full backup is using snapshot to capture 
> metadata locally and using exportsnapshot to move data to another cluster; 
> the incremental backup is using offline-WALplayer to backup HLogs; we also 
> leverage global distribution rolllog and flush to improve performance; other 
> added-on values such as convert, merge, progress report, and CLI commands. So 
> that a common user can backup hbase data without in-depth knowledge of hbase. 
>  Our solution also contains some usability features for enterprise users. 
> The detail design document and CLI command will be attached in this jira. We 
> plan to use 10~12 subtasks to share each of the following features, and 
> document the detail implement in the subtasks: 
> * *Full Backup* : provide local and remote back/restore for a list of tables
> * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental 
> backup)
> * *distributed* Logroll and distributed flush 
> * Backup *Manifest* and history
> * *Incremental* backup: to build on top of full backup as daily/weekly backup 
> * *Convert*  incremental backup WAL files into hfiles
> * *Merge* several backup images into one(like merge weekly into monthly)
> * *add and remove* table to and from Backup image
> * *Cancel* a backup process
> * backup progress *status*
> * full backup based on *existing snapshot*
> *-*
> *Below is the original description, to keep here as the history for the 
> design and discussion back in 2013*
> There have been attempts in the past to come up with a viable HBase 
> backup/restore solution (e.g., HBASE-4618).  Recently, there are many 
> advancements and new features in HBase, for example, FileLink, Snapshot, and 
> Distributed Barrier Procedure. This is a proposal for a backup/restore 
> solution that utilizes these new features to achieve better performance and 
> consistency. 
>  
> A common practice of backup and restore in database is to first take full 
> baseline backup, and then periodically take incremental backup that capture 
> the changes since the full baseline backup. HBase cluster can store massive 
> amount data.  Combination of full backups with incremental backups has 
> tremendous benefit for HBase as well.  The following is a typical scenario 
> for full and incremental backup.
> # The user takes a full backup of a table or a set of tables in HBase. 
> # The user schedules periodical incremental backups to capture the changes 
> from the full backup, or from last incremental backup.
> # The user needs to restore table data to a past point of time.
> # The full backup is restored to the table(s) or to different table name(s).  
> Then the incremental backups that are up to the desired point in time are 
> applied on top of the full backup. 
> We would support the following key features and capabilities.
> * Full backup uses HBase snapshot to capture HFiles.
> * Use HBase WALs to capture incremental changes, but we use bulk load of 
> HFiles for fast incremental restore.
> * Support single table or a set of tables, and column family l

[jira] [Created] (HBASE-11175) improve Backup/Restore framework by abstracting out zookeeper

2014-05-15 Thread Demai Ni (JIRA)
Demai Ni created HBASE-11175:


 Summary: improve Backup/Restore framework by abstracting out 
zookeeper
 Key: HBASE-11175
 URL: https://issues.apache.org/jira/browse/HBASE-11175
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.99.0
Reporter: Demai Ni
 Fix For: 0.99.0


the jira is part of  
[HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912],

h2. Feature Description
current backup/restore patches are using zookeeper to keep the history and the 
dependency on source cluster. This jira is to abstract out the zookeeper usage. 
The jira is kind of follow up of 
[HBASE-10909|https://issues.apache.org/jira/browse/HBASE-10909] and   
[HBASE-10296|https://issues.apache.org/jira/browse/HBASE-10296]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot

2014-05-15 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995387#comment-13995387
 ] 

Demai Ni commented on HBASE-7912:
-

Need HBase-11148 to roll the logs for [full 
backup|https://issues.apache.org/jira/browse/HBASE-10090] and [incremental 
backup|https://issues.apache.org/jira/browse/HBASE-11085], and also mark a 
timestamp for the next incremental

> HBase Backup/Restore Based on HBase Snapshot
> 
>
> Key: HBASE-7912
> URL: https://issues.apache.org/jira/browse/HBASE-7912
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Richard Ding
>Assignee: Richard Ding
> Attachments: HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, 
> HBase_BackupRestore-Jira-7912-CLI-v1.pdf
>
>
> Finally, we completed the implementation of our backup/restore solution, and 
> would like to share with community through this jira. 
> We are leveraging existing hbase snapshot feature, and provide a general 
> solution to common users. Our full backup is using snapshot to capture 
> metadata locally and using exportsnapshot to move data to another cluster; 
> the incremental backup is using offline-WALplayer to backup HLogs; we also 
> leverage global distribution rolllog and flush to improve performance; other 
> added-on values such as convert, merge, progress report, and CLI commands. So 
> that a common user can backup hbase data without in-depth knowledge of hbase. 
>  Our solution also contains some usability features for enterprise users. 
> The detail design document and CLI command will be attached in this jira. We 
> plan to use 10~12 subtasks to share each of the following features, and 
> document the detail implement in the subtasks: 
> * *Full Backup* : provide local and remote back/restore for a list of tables
> * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental 
> backup)
> * *distributed* Logroll and distributed flush 
> * Backup *Manifest* and history
> * *Incremental* backup: to build on top of full backup as daily/weekly backup 
> * *Convert*  incremental backup WAL files into hfiles
> * *Merge* several backup images into one(like merge weekly into monthly)
> * *add and remove* table to and from Backup image
> * *Cancel* a backup process
> * backup progress *status*
> * full backup based on *existing snapshot*
> *-*
> *Below is the original description, to keep here as the history for the 
> design and discussion back in 2013*
> There have been attempts in the past to come up with a viable HBase 
> backup/restore solution (e.g., HBASE-4618).  Recently, there are many 
> advancements and new features in HBase, for example, FileLink, Snapshot, and 
> Distributed Barrier Procedure. This is a proposal for a backup/restore 
> solution that utilizes these new features to achieve better performance and 
> consistency. 
>  
> A common practice of backup and restore in database is to first take full 
> baseline backup, and then periodically take incremental backup that capture 
> the changes since the full baseline backup. HBase cluster can store massive 
> amount data.  Combination of full backups with incremental backups has 
> tremendous benefit for HBase as well.  The following is a typical scenario 
> for full and incremental backup.
> # The user takes a full backup of a table or a set of tables in HBase. 
> # The user schedules periodical incremental backups to capture the changes 
> from the full backup, or from last incremental backup.
> # The user needs to restore table data to a past point of time.
> # The full backup is restored to the table(s) or to different table name(s).  
> Then the incremental backups that are up to the desired point in time are 
> applied on top of the full backup. 
> We would support the following key features and capabilities.
> * Full backup uses HBase snapshot to capture HFiles.
> * Use HBase WALs to capture incremental changes, but we use bulk load of 
> HFiles for fast incremental restore.
> * Support single table or a set of tables, and column family level backup and 
> restore.
> * Restore to different table names.
> * Support adding additional tables or CF to backup set without interruption 
> of incremental backup schedule.
> * Support rollup/combining of incremental backups into longer period and 
> bigger incremental backups.
> * Unified command line interface for all the above.
> The solution will support HBase backup to FileSystem, either on the same 
> cluster or across clusters.  It has the flexibility to support backup to 
> other devices and servers in the future.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10900) FULL table backup and restore

2014-05-15 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-10900:
-

Description: 
h2. Feature Description
This is a subtask of 
[HBase-7912|https://issues.apache.org/jira/browse/HBASE-7912] to support FULL 
backup/restore, and will complete the following function:
{code:title=Backup Restore example|borderStyle=solid}
/* backup from sourcecluster to targetcluster  
*/
/* if no table name specified, all tables from source cluster will be backuped 
*/
[sourcecluster]$ hbase backup create full 
hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn

/* restore on targetcluser, this is a local restore 
*/
/* backup_1396650096738 - backup image name 
*/
/* t1_dn,etc are the original table names. All tables will be restored if not 
specified */
/* t1_dn_restore, etc. are the restored table. if not specified, orginal table 
name will be used*/
[targetcluster]$ hbase restore /userid/backupdir backup_1396650096738 
t1_dn,t2_dn,t3_dn t1_dn_restore,t2_dn_restore,t3_dn_restore

/* restore from targetcluster back to source cluster, this is a remote restore
[sourcecluster]$ hbase restore 
hdfs://hostname.targetcluster.org:9000/userid/backupdir backup_1396650096738 
t1_dn,t2_dn,t3_dn t1_dn_restore,t2_dn_restore,t3_dn_restore
{code}

h2. Detail layout and frame work for the next jiras
The patch is a wrapper of the existing snapshot and exportSnapshot, and will 
use as the base framework for the over-all solution of  
[HBase-7912|https://issues.apache.org/jira/browse/HBASE-7912] as described 
below:
* *bin/hbase*  : end-user command line interface to invoke BackupClient 
and RestoreClient
* *BackupClient.java*  : 'main' entry for backup operations. This patch will 
only support 'full' backup. In future jiras, will support:
** *create* incremental backup
** *cancel* an ongoing backup
** *delete* an exisitng backup image
** *describe* the detailed informaiton of backup image
** show *history* of all successful backups 
** show the *status* of the latest backup request
** *convert* incremental backup WAL files into HFiles.  either on-the-fly 
during create or after create
** *merge* backup image
** *stop* backup a table of existing backup image
** *show* tables of a backup image 
* *BackupCommands.java* : a place to keep all the command usages and options
* *BackupManager.java*  : handle backup requests on server-side, create BACKUP 
ZOOKEEPER nodes to keep track backup. The timestamps kept in zookeeper will be 
used for future incremental backup (not included in this jira). Create 
BackupContext and DispatchRequest. 
* *BackupHandler.java*  : in this patch, it is a wrapper of snapshot and 
exportsnapshot. In future jiras, 
** *timestamps* info will be recorded in ZK
** carry on *incremental* backup.  
** update backup *progress*
** set flags of *status*
** build up *backupManifest* file(in this jira only limited info for fullback. 
later on, timestamps and dependency of multipl backup images are also recorded 
here)
** clean up after *failed* backup 
** clean up after *cancelled* backup
** allow on-the-fly *convert* during incremental backup 
* *BackupContext.java* : encapsulate backup information like backup ID, table 
names, directory info, phase, TimeStamps of backup progress, size of data, 
ancestor info, etc. 
* *BackupCopier.java*  : the copying operation.  Later on, to support progress 
report and mapper estimation; and extends DisCp for progress updating to ZK 
during backup. 
* *BackupExcpetion.java*: to handle exception from backup/restore
* *BackupManifest.java* : encapsulate all the backup image information. The 
manifest info will be bundled as manifest file together with data. So that each 
backup image will contain all the info needed for restore. 
* *BackupStatus.java*   : encapsulate backup status at table level during 
backup progress
* *BackupUtil.java* : utility methods during backup process
* *RestoreClient.java*  : 'main' entry for restore operations. This patch will 
only support 'full' backup. 
* *RestoreUtil.java*: utility methods during restore process
* *ExportSnapshot.java* : remove 'final' so that another class 
SnapshotCopy.java can extends from it
* *SnapshotCopy.java*   : only a wrapper at this moment. But will be extended 
to keep track progress(maybe should implemented in ExportSnapshot directly?)
* *BackupRestoreConstants.java* : add the constants used by backup/restore 
code.
* *HBackupFilesystem.java* :   the filesystem related api used by 
BackupClient and RestoreClient.

h2. Global log roll 
currently a customized one under *org.apache.hadoop.hbase.backup.master* and 
*org.apache.hadoop.hbase.backup.regionserver*
[HBASE-11148|https://issues.apache.org/jira/browse/HBASE-11148] is opened to

[jira] [Created] (HBASE-11174) show backup/restore progress

2014-05-15 Thread Demai Ni (JIRA)
Demai Ni created HBASE-11174:


 Summary: show backup/restore progress
 Key: HBASE-11174
 URL: https://issues.apache.org/jira/browse/HBASE-11174
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.99.0
Reporter: Demai Ni
 Fix For: 0.99.0


h2. Feature Description
the jira is part of  
[HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on 
full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900] 
and incremental backup [HBASE-11085| 
https://issues.apache.org/jira/browse/HBASE-11085]. for the detail layout and 
frame work, please reference to  [HBASE-10900|  
https://issues.apache.org/jira/browse/HBASE-10900].

A backup/restore operation may take a while to complete, sometimes hours. It 
will be helpful to show the estimated progress as percentage to user. The jira 
will provide such functionally 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11173) Show Backup History

2014-05-15 Thread Demai Ni (JIRA)
Demai Ni created HBASE-11173:


 Summary: Show Backup History 
 Key: HBASE-11173
 URL: https://issues.apache.org/jira/browse/HBASE-11173
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.99.0
Reporter: Demai Ni
 Fix For: 0.99.0


h2. Feature Description
the jira is part of  
[HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on 
full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900] 
and incremental backup [HBASE-11085| 
https://issues.apache.org/jira/browse/HBASE-11085]. for the detail layout and 
frame work, please reference to  [HBASE-10900|  
https://issues.apache.org/jira/browse/HBASE-10900].

After several backup operations executed in the past, he may like to know what 
tables were backuped at what time, so that a restore or future backup operation 
can be performanced accordingly. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11085) Incremental Backup Restore support

2014-05-15 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-11085:
-

Description: 
h2. Feature Description
the jira is part of  
[HBASE-7912|https://issues.apache.org/jira/browse/HBASE-7912], and depend on 
full backup [HBASE-10900| https://issues.apache.org/jira/browse/HBASE-10900]. 
for the detail layout and frame work, please reference to  [HBASE-10900| 
https://issues.apache.org/jira/browse/HBASE-10900].

When client issues an incremental backup request, BackupManager will check the 
request and then kicks of a global procedure via HBaseAdmin for all the active 
regionServer to roll log. Each Region server will record their log number into 
zookeeper. Then we determine which log need to be included in this incremental 
backup, and use DistCp to copy them to target location. At the same time, a 
dependency of backup image will be recorded, and later on saved in Backup 
Manifest file.

Restore is to replay the backuped WAL logs on target HBase instance. The replay 
will occur after full backup.

As incremental backup image depends on prior full backup image and incremental 
images if exists. Manifest file will be used to store the dependency lineage 
during backup, and used during restore time for PIT restore.  

h2. Use case(i.e  example)
{code:title=Incremental Backup Restore example|borderStyle=solid}
/***/
/* STEP1:  FULL backup from sourcecluster to targetcluster  
/* if no table name specified, all tables from source cluster will be backuped 
/***/
[sourcecluster]$ hbase backup create full 
hdfs://hostname.targetcluster.org:9000/userid/backupdir t1_dn,t2_dn,t3_dn
...
14/05/09 13:35:46 INFO backup.BackupManager: Backup request 
backup_1399667695966 has been executed.
/***/
/* STEP2:   In HBase Shell, put a few rows  
  
/***/
hbase(main):002:0> put 't1_dn','row100','cf1:q1','value100_0509_increm1'
hbase(main):003:0> put 't2_dn','row100','cf1:q1','value100_0509_increm1'
hbase(main):004:0> put 't3_dn','row100','cf1:q1','value100_0509_increm1'

/***/
/* STEP3:   Take the 1st incremental backup 
   
/***/
[sourcecluster]$ hbase backup create incremental 
hdfs://hostname.targetcluster.org:9000/userid/backupdir
...
14/05/09 13:37:45 INFO backup.BackupManager: Backup request 
backup_1399667851020 has been executed.

/***/
/* STEP4:   In HBase Shell, put a few more rows.
  
/*   update 'row100', and create new 'row101'   

/***/
hbase(main):005:0> put 't3_dn','row100','cf1:q1','value101_0509_increm2'
hbase(main):006:0> put 't2_dn','row100','cf1:q1','value101_0509_increm2'
hbase(main):007:0> put 't1_dn','row100','cf1:q1','value101_0509_increm2'
hbase(main):009:0> put 't1_dn','row101','cf1:q1','value101_0509_increm2'
hbase(main):010:0> put 't2_dn','row101','cf1:q1','value101_0509_increm2'
hbase(main):011:0> put 't3_dn','row101','cf1:q1','value101_0509_increm2'

/***/
/* STEP5:   Take the 2nd incremental backup 
  
/***/
[sourcecluster]$ hbase backup create incremental 
hdfs://hostname.targetcluster.org:9000/userid/backupdir
...
14/05/09 13:39:33 INFO backup.BackupManager: Backup request 
backup_1399667959165 has been executed.

/***/
/* STEP7:   Restore from PIT of the 1st incremental backup 
/* specified the backup ID of the 1st incremental   
   
/* option -automatic, will trigger the restore of full backup first, then 1st   
/* incremental backup image 
   
/* t1_dn,etc are the original table names. All tables will be restored if not 
specified 
/* t1_dn_restore, etc. are the restored table. if not specified, orginal table 
name will be used
/***

[jira] [Commented] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot

2014-05-13 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996599#comment-13996599
 ] 

Demai Ni commented on HBASE-7912:
-

[~stack], 

thanks for the comments. 

bq. This doc. with perhaps a little more commentary like it could go into the 
hbase refguide when this feature is committed?
In additional to the cli pdf I attached in this jira. more completed documents 
can be found here:  [IBM BigInsights 
2.1.2|http://www-01.ibm.com/support/knowledgecenter/SSPT3X_2.1.2/com.ibm.swg.im.infosphere.biginsights.admin.doc/doc/admin_hbase_bkuprestore_overview.html],
 which was officially released in March 2014. We will open source all the 
features related with Backup/Restore from IBM BigInsights. We can move the 
documents to 'backup' session of HBase ref book as you suggested, and certainly 
after incorporated the comments/suggestions from the community.

About testing, thanks to [~jinghe]'s comment. We already did functional, stress 
testing internally before release. For the current patches, since we did some 
changes per suggestions from the community, additional dev testing is being 
carried on. 

{quote}
bq. We’ll convert/replay the backed-up Hlogs into HFiles for fast incremental 
restore. 
This is interesting. It is done against a cluster or it is just a MR job/tool?
{quote}
~70% of the code logic is from WalPlayer, a MR job against target cluster. The 
difference is, we don't rely on a live hbase cluster when convert the HLog to 
Hfiles as the code can access the tableinfo offline. Currently the code is only 
useful for the backup/restore solution. We'd like to open another jira for the 
logic as a general tool/improvement of WalPlayer, and the new jira will have a 
dependency on [HBASE-8083 | https://issues.apache.org/jira/browse/HBASE-8073]. 

bq.What needs to go in first? What should we review first?
Actually, need you and other folks' suggestion here. 

>From the dependency perspective, I'd like to have [Full backup HBase-10900| 
>https://issues.apache.org/jira/browse/HBASE-10900] in first, and then 
>[incremental backup 
>HBase-11085|https://issues.apache.org/jira/browse/HBASE-11085], and once 
>Jerry's [global log roll HBase-11148| 
>https://issues.apache.org/jira/browse/HBASE-11148] get accepted. I will put a 
>patch to update full and incremental to use it immediately.  Then, I would 
>like to improve it with protobuff and abstract out zookeeper. 

If community accepts the solution of the general framework provided by [Full 
backup HBase-10900| https://issues.apache.org/jira/browse/HBASE-10900] and  
[incremental backup 
HBase-11085|https://issues.apache.org/jira/browse/HBASE-11085]. We will build 
the patches of other features on top of the framework. 

At this moment, I am thinking about open another review board for the combined 
patches of [both incremental and full backup | 
https://issues.apache.org/jira/secure/attachment/12644215/HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch].
 

I understand a lot of codes involved here, and open to any suggestion to make 
the review easier to everyone. :-) 

Demai

> HBase Backup/Restore Based on HBase Snapshot
> 
>
> Key: HBASE-7912
> URL: https://issues.apache.org/jira/browse/HBASE-7912
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Richard Ding
>Assignee: Richard Ding
> Attachments: HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, 
> HBase_BackupRestore-Jira-7912-CLI-v1.pdf
>
>
> Finally, we completed the implementation of our backup/restore solution, and 
> would like to share with community through this jira. 
> We are leveraging existing hbase snapshot feature, and provide a general 
> solution to common users. Our full backup is using snapshot to capture 
> metadata locally and using exportsnapshot to move data to another cluster; 
> the incremental backup is using offline-WALplayer to backup HLogs; we also 
> leverage global distribution rolllog and flush to improve performance; other 
> added-on values such as convert, merge, progress report, and CLI commands. So 
> that a common user can backup hbase data without in-depth knowledge of hbase. 
>  Our solution also contains some usability features for enterprise users. 
> The detail design document and CLI command will be attached in this jira. We 
> plan to use 10~12 subtasks to share each of the following features, and 
> document the detail implement in the subtasks: 
> * *Full Backup* : provide local and remote back/restore for a list of tables
> * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental 
> backup)
> * *distributed* Logroll and distributed flush 
> * Backup *Manifest* and history
> * *Incremental* backup: to build on top of full backup as daily/weekly backup 
> * *Convert*  incremental backup WAL files int

  1   2   3   >