[ 
https://issues.apache.org/jira/browse/OOZIE-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16076921#comment-16076921
 ] 

Hadoop QA commented on OOZIE-2854:
----------------------------------

Testing JIRA OOZIE-2854

Cleaning local git workspace

----------------------------

{color:green} 1 PATCH_APPLIES{color}
{color:green} 1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.    {color:green} 1{color} the patch does not introduce any @author tags
.    {color:green} 1{color} the patch does not introduce any tabs
.    {color:green} 1{color} the patch does not introduce any trailing spaces
.    {color:red}-1{color} the patch contains 1 line(s) longer than 132 
characters
.    {color:green} 1{color} the patch does adds/modifies 12 testcase(s)
{color:green} 1 RAT{color}
.    {color:green} 1{color} the patch does not seem to introduce new RAT 
warnings
{color:green} 1 JAVADOC{color}
.    {color:green} 1{color} the patch does not seem to introduce new Javadoc 
warnings
.    {color:red}WARNING{color}: the current HEAD has 1 Javadoc warning(s)
{color:green} 1 COMPILE{color}
.    {color:green} 1{color} HEAD compiles
.    {color:green} 1{color} patch compiles
.    {color:green} 1{color} the patch does not seem to introduce new javac 
warnings
{color:green} 1{color} There are no new bugs found in total.
{color:red}-1 BACKWARDS_COMPATIBILITY{color}
.    {color:green} 1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.    {color:red}-1{color} the patch modifies 1 JPA file(s), persistence.xml or 
*-orm.xml
{color:green} 1 TESTS{color}
.    Tests run: 1110
.    Tests rerun: 22
.    Tests failed at first run: org.apache.oozie.action.hadoop.TestLauncherAM,
{color:green} 1 DISTRO{color}
.    {color:green} 1{color} distro tarball builds with the patch 

----------------------------
{color:red}*-1 Overall result, please check the reported -1(s)*{color}

{color:red}. There is at least one warning, please check{color}

The full output of the test-patch run is available at

. https://builds.apache.org/job/oozie-trunk-precommit-build/3953/

> Oozie should handle transient database problems
> -----------------------------------------------
>
>                 Key: OOZIE-2854
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2854
>             Project: Oozie
>          Issue Type: Improvement
>          Components: core
>            Reporter: Peter Bacsko
>            Assignee: Andras Piros
>         Attachments: OOZIE-2854-001.patch, OOZIE-2854-002.patch, 
> OOZIE-2854-003.patch, OOZIE-2854-004.patch, OOZIE-2854-005.patch, 
> OOZIE-2854.006.patch, OOZIE-2854.007.patch, OOZIE-2854.008.patch, 
> OOZIE-2854.009.patch, OOZIE-2854.010.patch, OOZIE-2854.011.patch, 
> OOZIE-2854-POC-001.patch
>
>
> There can be problems when Oozie cannot update the database properly. 
> Recently, we have experienced erratic behavior with two setups:
> * MySQL with the Galera cluster manager. Galera uses cluster-wide optimistic 
> locking which might cause a transaction to rollback if there are two or more 
> parallel transaction running and one of them cannot complete because of a 
> conflict.
> * MySQL with Percona XtraDB Cluster. If one of the MySQL instances is killed, 
> Oozie might get "Communications link failure" exception during the failover.
> The problem is that failed DB transactions later might cause a workflow 
> (which are started/re-started by RecoveryService) to get stuck. It's not 
> clear to us how this happens but it has to do with the fact that certain DB 
> updates are not executed.
> The solution is to use some sort of retry logic with exponential backoff if 
> the DB update fails. We could start with a 100ms wait time which is doubled 
> at every retry. The operation can be considered a failure if it still fails 
> after 10 attempts. These values could be configurable. We should discuss 
> initial values in the scope of this JIRA.
> Note that this solution is to handle *transient* failures. If the DB is down 
> for a longer period of time, we have to accept that the internal state of 
> Oozie is corrupted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to