[jira] [Commented] (HBASE-17938) General fault - tolerance framework for backup/restore operations

2017-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020849#comment-16020849
 ] 

Hudson commented on HBASE-17938:


FAILURE: Integrated in Jenkins build HBase-HBASE-14614 #244 (See 
[https://builds.apache.org/job/HBase-HBASE-14614/244/])
HBASE-17938 General fault - tolerance framework for backup/restore (tedyu: rev 
305ffcb04025ea6f7880e9961120d309f55bf8ba)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupAdminImpl.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupCommands.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupManager.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/impl/FullTableBackupClient.java
* (add) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/BackupClientFactory.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/BackupDriver.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupSystemTable.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/BackupRestoreConstants.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalTableBackupClient.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/impl/TableBackupClient.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/backup/TestFullBackupWithFailures.java


> General fault - tolerance framework for backup/restore operations
> -
>
> Key: HBASE-17938
> URL: https://issues.apache.org/jira/browse/HBASE-17938
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17938-v1.patch, HBASE-17938-v2.patch, 
> HBASE-17938-v3.patch, HBASE-17938-v4.patch, HBASE-17938-v5.patch, 
> HBASE-17938-v6.patch, HBASE-17938-v7.patch, HBASE-17938-v8.patch
>
>
> The framework must take care of all general types of failures during backup/ 
> restore and restore system to the original state in case of a failure.
> That won't solve all the possible issues  but we have a separate JIRAs for 
> them as a sub-tasks of HBASE-15277



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17938) General fault - tolerance framework for backup/restore operations

2017-05-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16008530#comment-16008530
 ] 

Hudson commented on HBASE-17938:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #2998 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/2998/])
HBASE-17938 General fault - tolerance framework for backup/restore (tedyu: rev 
305ffcb04025ea6f7880e9961120d309f55bf8ba)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupManager.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupCommands.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/impl/FullTableBackupClient.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/impl/TableBackupClient.java
* (add) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/BackupClientFactory.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/BackupRestoreConstants.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/BackupDriver.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupAdminImpl.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupSystemTable.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalTableBackupClient.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/backup/TestFullBackupWithFailures.java


> General fault - tolerance framework for backup/restore operations
> -
>
> Key: HBASE-17938
> URL: https://issues.apache.org/jira/browse/HBASE-17938
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17938-v1.patch, HBASE-17938-v2.patch, 
> HBASE-17938-v3.patch, HBASE-17938-v4.patch, HBASE-17938-v5.patch, 
> HBASE-17938-v6.patch, HBASE-17938-v7.patch, HBASE-17938-v8.patch
>
>
> The framework must take care of all general types of failures during backup/ 
> restore and restore system to the original state in case of a failure.
> That won't solve all the possible issues  but we have a separate JIRAs for 
> them as a sub-tasks of HBASE-15277



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17938) General fault - tolerance framework for backup/restore operations

2017-05-11 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006711#comment-16006711
 ] 

Vladimir Rodionov commented on HBASE-17938:
---

Test failures are unrelated. 

> General fault - tolerance framework for backup/restore operations
> -
>
> Key: HBASE-17938
> URL: https://issues.apache.org/jira/browse/HBASE-17938
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17938-v1.patch, HBASE-17938-v2.patch, 
> HBASE-17938-v3.patch, HBASE-17938-v4.patch, HBASE-17938-v5.patch, 
> HBASE-17938-v6.patch, HBASE-17938-v7.patch, HBASE-17938-v8.patch
>
>
> The framework must take care of all general types of failures during backup/ 
> restore and restore system to the original state in case of a failure.
> That won't solve all the possible issues  but we have a separate JIRAs for 
> them as a sub-tasks of HBASE-15277



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17938) General fault - tolerance framework for backup/restore operations

2017-05-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005837#comment-16005837
 ] 

Hadoop QA commented on HBASE-17938:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 41s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
45s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 43s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
38s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
59s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
57m 4s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 109m 58s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
40s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 197m 32s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Timed out junit tests | 
org.apache.hadoop.hbase.snapshot.TestMobSecureExportSnapshot |
|   | org.apache.hadoop.hbase.snapshot.TestSecureExportSnapshot |
|   | org.apache.hadoop.hbase.snapshot.TestMobExportSnapshot |
|   | org.apache.hadoop.hbase.snapshot.TestExportSnapshot |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12867473/HBASE-17938-v8.patch |
| JIRA Issue | HBASE-17938 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 1408fe7d28f1 4.8.3-std-1 #1 SMP Fri Oct 21 11:15:43 UTC 2016 
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / c5cc81d |
| Default Java | 1.8.0_131 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6757/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/6757/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6757/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6757/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org 

[jira] [Commented] (HBASE-17938) General fault - tolerance framework for backup/restore operations

2017-05-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005657#comment-16005657
 ] 

Ted Yu commented on HBASE-17938:


Can you submit the patch for QA run ?

> General fault - tolerance framework for backup/restore operations
> -
>
> Key: HBASE-17938
> URL: https://issues.apache.org/jira/browse/HBASE-17938
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17938-v1.patch, HBASE-17938-v2.patch, 
> HBASE-17938-v3.patch, HBASE-17938-v4.patch, HBASE-17938-v5.patch, 
> HBASE-17938-v6.patch, HBASE-17938-v7.patch
>
>
> The framework must take care of all general types of failures during backup/ 
> restore and restore system to the original state in case of a failure.
> That won't solve all the possible issues  but we have a separate JIRAs for 
> them as a sub-tasks of HBASE-15277



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17938) General fault - tolerance framework for backup/restore operations

2017-05-09 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003970#comment-16003970
 ] 

Ted Yu commented on HBASE-17938:


For XXTableBackupClient, you can make the failStageIf() no op.
In test(s), create test client which extends XXTableBackupClient with 
failStageIf() that does fault injection.

This would reduce code duplication.

> General fault - tolerance framework for backup/restore operations
> -
>
> Key: HBASE-17938
> URL: https://issues.apache.org/jira/browse/HBASE-17938
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17938-v1.patch, HBASE-17938-v2.patch, 
> HBASE-17938-v3.patch, HBASE-17938-v4.patch, HBASE-17938-v5.patch, 
> HBASE-17938-v6.patch
>
>
> The framework must take care of all general types of failures during backup/ 
> restore and restore system to the original state in case of a failure.
> That won't solve all the possible issues  but we have a separate JIRAs for 
> them as a sub-tasks of HBASE-15277



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17938) General fault - tolerance framework for backup/restore operations

2017-05-09 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003833#comment-16003833
 ] 

Vladimir Rodionov commented on HBASE-17938:
---

We have discussed this already. If some step fail during failBackup execution, 
user will be notified of a failure and advised to run repair tool manually. I 
will fix the wording of IOException in case if operation fails in repair phase 
(failBackup)



> General fault - tolerance framework for backup/restore operations
> -
>
> Key: HBASE-17938
> URL: https://issues.apache.org/jira/browse/HBASE-17938
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17938-v1.patch, HBASE-17938-v2.patch, 
> HBASE-17938-v3.patch, HBASE-17938-v4.patch, HBASE-17938-v5.patch, 
> HBASE-17938-v6.patch
>
>
> The framework must take care of all general types of failures during backup/ 
> restore and restore system to the original state in case of a failure.
> That won't solve all the possible issues  but we have a separate JIRAs for 
> them as a sub-tasks of HBASE-15277



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17938) General fault - tolerance framework for backup/restore operations

2017-05-09 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003820#comment-16003820
 ] 

Ted Yu commented on HBASE-17938:


In cleanupAndRestoreBackupSystem(), 
{code}
 if (type == BackupType.FULL) {
   deleteSnapshots(conn, backupInfo, conf);
   cleanupExportSnapshotLog(conf);
 }
 restoreBackupTable(conn, conf);
{code}
What if deleteSnapshots() throws exception ?  restoreBackupTable() would be 
skipped.

> General fault - tolerance framework for backup/restore operations
> -
>
> Key: HBASE-17938
> URL: https://issues.apache.org/jira/browse/HBASE-17938
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17938-v1.patch, HBASE-17938-v2.patch, 
> HBASE-17938-v3.patch, HBASE-17938-v4.patch, HBASE-17938-v5.patch, 
> HBASE-17938-v6.patch
>
>
> The framework must take care of all general types of failures during backup/ 
> restore and restore system to the original state in case of a failure.
> That won't solve all the possible issues  but we have a separate JIRAs for 
> them as a sub-tasks of HBASE-15277



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17938) General fault - tolerance framework for backup/restore operations

2017-05-03 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15995868#comment-15995868
 ] 

Ted Yu commented on HBASE-17938:


Somehow review board doesn't accept my review comments.
{code}
  conn = ConnectionFactory.createConnection(getConf());
{code}
Where is conn released ?
{code}
  firstBackup = savedStartCode == null || Long.parseLong(savedStartCode) == 
0L;
{code}
What if a second client comes and sees the savedStartCode as zero (written by 
line 130) ?

There is duplicate code between executeForTesting() and execute(). Extract 
common code.


> General fault - tolerance framework for backup/restore operations
> -
>
> Key: HBASE-17938
> URL: https://issues.apache.org/jira/browse/HBASE-17938
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17938-v1.patch, HBASE-17938-v2.patch, 
> HBASE-17938-v3.patch, HBASE-17938-v4.patch
>
>
> The framework must take care of all general types of failures during backup/ 
> restore and restore system to the original state in case of a failure.
> That won't solve all the possible issues  but we have a separate JIRAs for 
> them as a sub-tasks of HBASE-15277



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17938) General fault - tolerance framework for backup/restore operations

2017-05-02 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993471#comment-15993471
 ] 

Vladimir Rodionov commented on HBASE-17938:
---

OK.

> General fault - tolerance framework for backup/restore operations
> -
>
> Key: HBASE-17938
> URL: https://issues.apache.org/jira/browse/HBASE-17938
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17938-v1.patch, HBASE-17938-v2.patch, 
> HBASE-17938-v3.patch
>
>
> The framework must take care of all general types of failures during backup/ 
> restore and restore system to the original state in case of a failure.
> That won't solve all the possible issues  but we have a separate JIRAs for 
> them as a sub-tasks of HBASE-15277



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17938) General fault - tolerance framework for backup/restore operations

2017-05-02 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993305#comment-15993305
 ] 

Ted Yu commented on HBASE-17938:


Vlad:
Mind adding unit test which exercises the newly added code ?

> General fault - tolerance framework for backup/restore operations
> -
>
> Key: HBASE-17938
> URL: https://issues.apache.org/jira/browse/HBASE-17938
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17938-v1.patch, HBASE-17938-v2.patch, 
> HBASE-17938-v3.patch
>
>
> The framework must take care of all general types of failures during backup/ 
> restore and restore system to the original state in case of a failure.
> That won't solve all the possible issues  but we have a separate JIRAs for 
> them as a sub-tasks of HBASE-15277



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17938) General fault - tolerance framework for backup/restore operations

2017-05-01 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991883#comment-15991883
 ] 

Devaraj Das commented on HBASE-17938:
-

I agree.. Let's keep it simple for now. Once we get more experience, we can 
enhance as needed.

> General fault - tolerance framework for backup/restore operations
> -
>
> Key: HBASE-17938
> URL: https://issues.apache.org/jira/browse/HBASE-17938
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17938-v1.patch, HBASE-17938-v2.patch, 
> HBASE-17938-v3.patch
>
>
> The framework must take care of all general types of failures during backup/ 
> restore and restore system to the original state in case of a failure.
> That won't solve all the possible issues  but we have a separate JIRAs for 
> them as a sub-tasks of HBASE-15277



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17938) General fault - tolerance framework for backup/restore operations

2017-05-01 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991862#comment-15991862
 ] 

Vladimir Rodionov commented on HBASE-17938:
---

{quote}
Currently fault handling is coarse grained (see cleanupAndRestoreBackupSystem).
I suggest investigating fine grained approach where last successful sub-step is 
recorded so that subsequent run can omit unnecessary work.
{quote}

The same as above. Complex algorithms are proved to be fragile. There is no 
need for additional complexity here. Repair operation is very fast (deletes 
some files and restores table from snapshot)

> General fault - tolerance framework for backup/restore operations
> -
>
> Key: HBASE-17938
> URL: https://issues.apache.org/jira/browse/HBASE-17938
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17938-v1.patch, HBASE-17938-v2.patch, 
> HBASE-17938-v3.patch
>
>
> The framework must take care of all general types of failures during backup/ 
> restore and restore system to the original state in case of a failure.
> That won't solve all the possible issues  but we have a separate JIRAs for 
> them as a sub-tasks of HBASE-15277



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17938) General fault - tolerance framework for backup/restore operations

2017-05-01 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991857#comment-15991857
 ] 

Vladimir Rodionov commented on HBASE-17938:
---

{quote}
Assuming IOE may come out of each of the calls above, shouldn't a state machine 
be designed for more robustness ?
{quote}

No needs to overcomplicate the feature. If system repair fails in the middle, 
user will get notified and will have a chance to fix everything by running 
repair tool manually.

Repair operation is idempotent and can be run as many times as we need.

Did I address your concerns, [~tedyu]?



> General fault - tolerance framework for backup/restore operations
> -
>
> Key: HBASE-17938
> URL: https://issues.apache.org/jira/browse/HBASE-17938
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17938-v1.patch, HBASE-17938-v2.patch, 
> HBASE-17938-v3.patch
>
>
> The framework must take care of all general types of failures during backup/ 
> restore and restore system to the original state in case of a failure.
> That won't solve all the possible issues  but we have a separate JIRAs for 
> them as a sub-tasks of HBASE-15277



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17938) General fault - tolerance framework for backup/restore operations

2017-05-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991816#comment-15991816
 ] 

Ted Yu commented on HBASE-17938:


I did look at the review board 5 days ago - my latest comments were not 
addressed.

Currently fault handling is coarse grained (see cleanupAndRestoreBackupSystem).
I suggest investigating fine grained approach where last successful sub-step is 
recorded so that subsequent run can omit unnecessary work.


> General fault - tolerance framework for backup/restore operations
> -
>
> Key: HBASE-17938
> URL: https://issues.apache.org/jira/browse/HBASE-17938
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17938-v1.patch, HBASE-17938-v2.patch, 
> HBASE-17938-v3.patch
>
>
> The framework must take care of all general types of failures during backup/ 
> restore and restore system to the original state in case of a failure.
> That won't solve all the possible issues  but we have a separate JIRAs for 
> them as a sub-tasks of HBASE-15277



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17938) General fault - tolerance framework for backup/restore operations

2017-05-01 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991233#comment-15991233
 ] 

Vladimir Rodionov commented on HBASE-17938:
---

[~tedyu], when you have a time please take a look at the above RB submission.

> General fault - tolerance framework for backup/restore operations
> -
>
> Key: HBASE-17938
> URL: https://issues.apache.org/jira/browse/HBASE-17938
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17938-v1.patch, HBASE-17938-v2.patch, 
> HBASE-17938-v3.patch
>
>
> The framework must take care of all general types of failures during backup/ 
> restore and restore system to the original state in case of a failure.
> That won't solve all the possible issues  but we have a separate JIRAs for 
> them as a sub-tasks of HBASE-15277



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17938) General fault - tolerance framework for backup/restore operations

2017-04-26 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15985724#comment-15985724
 ] 

Vladimir Rodionov commented on HBASE-17938:
---

https://reviews.apache.org/r/58757/

> General fault - tolerance framework for backup/restore operations
> -
>
> Key: HBASE-17938
> URL: https://issues.apache.org/jira/browse/HBASE-17938
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17938-v1.patch, HBASE-17938-v2.patch, 
> HBASE-17938-v3.patch
>
>
> The framework must take care of all general types of failures during backup/ 
> restore and restore system to the original state in case of a failure.
> That won't solve all the possible issues  but we have a separate JIRAs for 
> them as a sub-tasks of HBASE-15277



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17938) General fault - tolerance framework for backup/restore operations

2017-04-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15985614#comment-15985614
 ] 

Ted Yu commented on HBASE-17938:


{code}
+throw new IOException("Active session found, aborted command 
execution");
{code}
Include information about the first session in message.
{code}
+ if (type == BackupType.FULL) {
+   deleteSnapshots(conn, backupInfo, conf);
+   cleanupExportSnapshotLog(conf);
+ }
+ restoreBackupTable(conn, conf);
+ deleteBackupTableSnapshot(conn, conf);
+ // clean up the uncompleted data at target directory if the ongoing 
backup has already entered
+ // the copy phase
+ // For incremental backup, DistCp logs will be cleaned with the targetDir.
+ cleanupTargetDir(backupInfo, conf);
{code}
Assuming IOE may come out of each of the calls above, shouldn't a state machine 
be designed for more robustness ?

Please put the next patch on review board.

> General fault - tolerance framework for backup/restore operations
> -
>
> Key: HBASE-17938
> URL: https://issues.apache.org/jira/browse/HBASE-17938
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17938-v1.patch, HBASE-17938-v2.patch, 
> HBASE-17938-v3.patch
>
>
> The framework must take care of all general types of failures during backup/ 
> restore and restore system to the original state in case of a failure.
> That won't solve all the possible issues  but we have a separate JIRAs for 
> them as a sub-tasks of HBASE-15277



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17938) General fault - tolerance framework for backup/restore operations

2017-04-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983725#comment-15983725
 ] 

Ted Yu commented on HBASE-17938:


{code}
+snapshotBackupSystemTable();
{code}
backup table is not in hbase namespace anymore, right ? Rename the above method 
and other new methods.
{code}
-LOG.debug("when deleting snapshot " + snapshotName, ioe);
+LOG.error("when deleting snapshot " + snapshotName, ioe);
{code}
What's the action for the above error ?



> General fault - tolerance framework for backup/restore operations
> -
>
> Key: HBASE-17938
> URL: https://issues.apache.org/jira/browse/HBASE-17938
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17938-v1.patch
>
>
> The framework must take care of all general types of failures during backup/ 
> restore and restore system to the original state in case of a failure.
> That won't solve all the possible issues  but we have a separate JIRAs for 
> them as a sub-tasks of HBASE-15277



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)