[ https://issues.apache.org/jira/browse/OOZIE-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16002917#comment-16002917 ]
Hadoop QA commented on OOZIE-2854: ---------------------------------- Testing JIRA OOZIE-2854 Cleaning local git workspace ---------------------------- {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} . {color:green}+1{color} the patch does not introduce any @author tags . {color:green}+1{color} the patch does not introduce any tabs . {color:red}-1{color} the patch contains 1 line(s) with trailing spaces . {color:green}+1{color} the patch does not introduce any line longer than 132 . {color:red}-1{color} the patch does not add/modify any testcase {color:green}+1 RAT{color} . {color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} . {color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} . {color:green}+1{color} HEAD compiles . {color:green}+1{color} patch compiles . {color:green}+1{color} the patch does not seem to introduce new javac warnings {color:orange}0{color} There are [4] new bugs found in total that would be nice to have fixed. . {color:green}+1{color} There are no new bugs found in [server]. . {color:green}+1{color} There are no new bugs found in [client]. . {color:green}+1{color} There are no new bugs found in [docs]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive]. . {color:green}+1{color} There are no new bugs found in [sharelib/spark]. . {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive2]. . {color:green}+1{color} There are no new bugs found in [sharelib/streaming]. . {color:green}+1{color} There are no new bugs found in [sharelib/pig]. . {color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. . {color:green}+1{color} There are no new bugs found in [sharelib/distcp]. . {color:green}+1{color} There are no new bugs found in [sharelib/oozie]. . {color:green}+1{color} There are no new bugs found in [hadooplibs/hadoop-utils-2]. . {color:orange}0{color} There are [4] new bugs found in [core] that would be nice to have fixed. . You can find the FindBugs diff here: core/findbugs-new.html . {color:green}+1{color} There are no new bugs found in [tools]. . {color:green}+1{color} There are no new bugs found in [examples]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} . {color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations . {color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} - patch does not compile, cannot run testcases {color:green}+1 DISTRO{color} . {color:green}+1{color} distro tarball builds with the patch ---------------------------- {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/3813/ > Oozie should handle transient DB problems > ----------------------------------------- > > Key: OOZIE-2854 > URL: https://issues.apache.org/jira/browse/OOZIE-2854 > Project: Oozie > Issue Type: Improvement > Components: core > Reporter: Peter Bacsko > Assignee: Peter Bacsko > Attachments: OOZIE-2854-001.patch, OOZIE-2854-002.patch, > OOZIE-2854-003.patch, OOZIE-2854-POC-001.patch > > > There can be problems when Oozie cannot update the database properly. > Recently, we have experienced erratic behavior with two setups: > * MySQL with the Galera cluster manager. Galera uses cluster-wide optimistic > locking which might cause a transaction to rollback if there are two or more > parallel transaction running and one of them cannot complete because of a > conflict. > * MySQL with Percona XtraDB Cluster. If one of the MySQL instances is killed, > Oozie might get "Communications link failure" exception during the failover. > The problem is that failed DB transactions later might cause a workflow > (which are started/re-started by RecoveryService) to get stuck. It's not > clear to us how this happens but it has to do with the fact that certain DB > updates are not executed. > The solution is to use some sort of retry logic with exponential backoff if > the DB update fails. We could start with a 100ms wait time which is doubled > at every retry. The operation can be considered a failure if it still fails > after 10 attempts. These values could be configurable. We should discuss > initial values in the scope of this JIRA. > Note that this solution is to handle *transient* failures. If the DB is down > for a longer period of time, we have to accept that the internal state of > Oozie is corrupted. -- This message was sent by Atlassian JIRA (v6.3.15#6346)