[ 
https://issues.apache.org/jira/browse/OOZIE-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Shevchuk updated OOZIE-1525:
-----------------------------------

    Attachment: OOZIE-90.patch

I attached a patch in which before adding a new action to insert/update list 
check whether it has been created in db before. For this purpose get action 
executed to get workflow with id action has. If this query fails then this 
action doesn't really exist in db (suppose that if connection is broken then 
next insert action will fail too).
                
> Oozie workflow does not update status sometimes and is stuck in RUNNING state.
> ------------------------------------------------------------------------------
>
>                 Key: OOZIE-1525
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1525
>             Project: Oozie
>          Issue Type: Bug
>    Affects Versions: 3.3.2
>            Reporter: Eugene Shevchuk
>         Attachments: OOZIE-90.patch
>
>
> Sometimes workflow is stuck in RUNNING state with the following info:
> {noformat}
> Job ID : 0000002-130822205833322-oozie-hdp-W
> ------------------------------------------------------------------------------------------------------------------------------------
> Workflow Name : wordcount-wf
> App Path      : ***
> Status        : RUNNING
> Run           : 0
> User          : ***
> Group         : -
> Created       : 2013-08-22 21:01 GMT
> Started       : 2013-08-22 21:01 GMT
> Last Modified : 2013-08-22 21:02 GMT
> Ended         : -
> CoordAction ID: -
> Actions
> ------------------------------------------------------------------------------------------------------------------------------------
> ID                                                                            
> Status    Ext ID                 Ext Status Err Code
> ------------------------------------------------------------------------------------------------------------------------------------
> 0000002-130822205833322-oozie-hdp-W@:start:                                   
> OK        -                      OK         -
> ------------------------------------------------------------------------------------------------------------------------------------
> 0000002-130822205833322-oozie-hdp-W@wc                                        
> OK        job_201308221727_0287  SUCCEEDED  -
> ------------------------------------------------------------------------------------------------------------------------------------
> {noformat}
> In log there are some problems with jpaservice:
> {noformat}
> 2013-08-22 23:00:59,170 ERROR SignalXCommand:536 - USER[***] GROUP[-] TOKEN[] 
> APP[wordcount-wf] JOB[0000002-130822205833322-oozie-hdp-W] 
> ACTION[0000002-130822205833322-oozie-hdp-W@wc] Exception, 
> <openjpa-2.1.0-r422266:1071316 fatal store error> 
> org.apache.openjpa.persistence.RollbackException: The transaction has been 
> rolled back.  See the nested exceptions for details on the errors that 
> occurred.
> FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
>       at 
> org.apache.openjpa.persistence.EntityManagerImpl.commit(EntityManagerImpl.java:585)
>       at org.apache.oozie.service.JPAService.execute(JPAService.java:217)
>       at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:310)
>       at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:64)
>       at org.apache.oozie.command.XCommand.call(XCommand.java:277)
>       at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)
>       at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)
>       at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>       at java.lang.Thread.run(Thread.java:722)
> Caused by: <openjpa-2.1.0-r422266:1071316 fatal general error> 
> org.apache.openjpa.persistence.PersistenceException: The transaction has been 
> rolled back.  See the nested exceptions for details on the errors that 
> occurred.
> FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
>       at 
> org.apache.openjpa.kernel.BrokerImpl.newFlushException(BrokerImpl.java:2316)
>       at org.apache.openjpa.kernel.BrokerImpl.flush(BrokerImpl.java:2153)
>       at org.apache.openjpa.kernel.BrokerImpl.flushSafe(BrokerImpl.java:2051)
>       at 
> org.apache.openjpa.kernel.BrokerImpl.beforeCompletion(BrokerImpl.java:1969)
>       at 
> org.apache.openjpa.kernel.LocalManagedRuntime.commit(LocalManagedRuntime.java:81)
>       at org.apache.openjpa.kernel.BrokerImpl.commit(BrokerImpl.java:1493)
>       at 
> org.apache.openjpa.kernel.DelegatingBroker.commit(DelegatingBroker.java:925)
>       at 
> org.apache.openjpa.persistence.EntityManagerImpl.commit(EntityManagerImpl.java:561)
>       ... 10 more
> Caused by: <openjpa-2.1.0-r422266:1071316 fatal store error> 
> org.apache.openjpa.persistence.EntityExistsException: Violation of PRIMARY 
> KEY constraint 'PK__WF_ACTIO__3213E83FECE1688E'. Cannot insert duplicate key 
> in object 'dbo.WF_ACTIONS'. The duplicate key value is 
> (0000002-130822205833322-oozie-hdp-W@wc). {prepstmnt 1032423179 INSERT INTO 
> WF_ACTIONS (id, conf, console_url, cred, data, error_code, error_message, 
> external_child_ids, external_id, external_status, name, retries, stats, 
> tracker_uri, transition, type, user_retry_count, user_retry_interval, 
> user_retry_max, bean_type, end_time, execution_path, last_check_time, 
> log_token, pending, pending_age, signal_value, sla_xml, start_time, status, 
> wf_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 
> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [params=?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 
> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]} [code=2627, 
> state=23000]
> FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
>       at 
> org.apache.openjpa.jdbc.sql.DBDictionary.narrow(DBDictionary.java:4854)
>       at 
> org.apache.openjpa.jdbc.sql.DBDictionary.newStoreException(DBDictionary.java:4829)
>       at 
> org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:136)
>       at 
> org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:78)
>       at 
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushAndUpdate(PreparedStatementManagerImpl.java:143)
>       at 
> org.apache.openjpa.jdbc.kernel.BatchingPreparedStatementManagerImpl.batchOrExecuteRow(BatchingPreparedStatementManagerImpl.java:101)
>       at 
> org.apache.openjpa.jdbc.kernel.BatchingPreparedStatementManagerImpl.flushAndUpdate(BatchingPreparedStatementManagerImpl.java:85)
>       at 
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushInternal(PreparedStatementManagerImpl.java:99)
>       at 
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flush(PreparedStatementManagerImpl.java:87)
>       at 
> org.apache.openjpa.jdbc.kernel.ConstraintUpdateManager.flush(ConstraintUpdateManager.java:550)
>       at 
> org.apache.openjpa.jdbc.kernel.ConstraintUpdateManager.flush(ConstraintUpdateManager.java:107)
>       at 
> org.apache.openjpa.jdbc.kernel.BatchingConstraintUpdateManager.flush(BatchingConstraintUpdateManager.java:59)
>       at 
> org.apache.openjpa.jdbc.kernel.AbstractUpdateManager.flush(AbstractUpdateManager.java:103)
>       at 
> org.apache.openjpa.jdbc.kernel.AbstractUpdateManager.flush(AbstractUpdateManager.java:76)
>       at 
> org.apache.openjpa.jdbc.kernel.JDBCStoreManager.flush(JDBCStoreManager.java:742)
>       at 
> org.apache.openjpa.kernel.DelegatingStoreManager.flush(DelegatingStoreManager.java:131)
>       ... 17 more
> Caused by: org.apache.openjpa.lib.jdbc.ReportingSQLException: Violation of 
> PRIMARY KEY constraint 'PK__WF_ACTIO__3213E83FECE1688E'. Cannot insert 
> duplicate key in object 'dbo.WF_ACTIONS'. The duplicate key value is 
> (0000002-130822205833322-oozie-hdp-W@wc). {prepstmnt 1032423179 INSERT INTO 
> WF_ACTIONS (id, conf, console_url, cred, data, error_code, error_message, 
> external_child_ids, external_id, external_status, name, retries, stats, 
> tracker_uri, transition, type, user_retry_count, user_retry_interval, 
> user_retry_max, bean_type, end_time, execution_path, last_check_time, 
> log_token, pending, pending_age, signal_value, sla_xml, start_time, status, 
> wf_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 
> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [params=?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 
> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]} [code=2627, 
> state=23000]
>       at 
> org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.wrap(LoggingConnectionDecorator.java:281)
>       at 
> org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.wrap(LoggingConnectionDecorator.java:257)
>       at 
> org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.access$1000(LoggingConnectionDecorator.java:72)
>       at 
> org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator$LoggingConnection$LoggingPreparedStatement.executeUpdate(LoggingConnectionDecorator.java:1199)
>       at 
> org.apache.openjpa.lib.jdbc.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:291)
>       at 
> org.apache.openjpa.jdbc.kernel.JDBCStoreManager$CancelPreparedStatement.executeUpdate(JDBCStoreManager.java:1776)
>       at 
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.executeUpdate(PreparedStatementManagerImpl.java:267)
>       at 
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushAndUpdate(PreparedStatementManagerImpl.java:118)
>       ... 28 more
> 2013-08-22 23:00:59,170  WARN CallableQueueService$CompositeCallable:542 - 
> USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] exception callable 
> [signal], E0607: Other error in operation [signal], The transaction has been 
> rolled back.  See the nested exceptions for details on the errors that 
> occurred.
> org.apache.oozie.command.CommandException: E0607: Other error in operation 
> [signal], The transaction has been rolled back.  See the nested exceptions 
> for details on the errors that occurred.
>       at org.apache.oozie.command.XCommand.call(XCommand.java:317)
>       at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)
>       at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)
>       at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>       at java.lang.Thread.run(Thread.java:722)
> Caused by: <openjpa-2.1.0-r422266:1071316 fatal store error> 
> org.apache.openjpa.persistence.RollbackException: The transaction has been 
> rolled back.  See the nested exceptions for details on the errors that 
> occurred.
> FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
>       at 
> org.apache.openjpa.persistence.EntityManagerImpl.commit(EntityManagerImpl.java:585)
>       at org.apache.oozie.service.JPAService.execute(JPAService.java:217)
>       at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:310)
>       at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:64)
>       at org.apache.oozie.command.XCommand.call(XCommand.java:277)
>       ... 6 more
> Caused by: <openjpa-2.1.0-r422266:1071316 fatal general error> 
> org.apache.openjpa.persistence.PersistenceException: The transaction has been 
> rolled back.  See the nested exceptions for details on the errors that 
> occurred.
> FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
>       at 
> org.apache.openjpa.kernel.BrokerImpl.newFlushException(BrokerImpl.java:2316)
>       at org.apache.openjpa.kernel.BrokerImpl.flush(BrokerImpl.java:2153)
>       at org.apache.openjpa.kernel.BrokerImpl.flushSafe(BrokerImpl.java:2051)
>       at 
> org.apache.openjpa.kernel.BrokerImpl.beforeCompletion(BrokerImpl.java:1969)
>       at 
> org.apache.openjpa.kernel.LocalManagedRuntime.commit(LocalManagedRuntime.java:81)
>       at org.apache.openjpa.kernel.BrokerImpl.commit(BrokerImpl.java:1493)
>       at 
> org.apache.openjpa.kernel.DelegatingBroker.commit(DelegatingBroker.java:925)
>       at 
> org.apache.openjpa.persistence.EntityManagerImpl.commit(EntityManagerImpl.java:561)
>       ... 10 more
> Caused by: <openjpa-2.1.0-r422266:1071316 fatal store error> 
> org.apache.openjpa.persistence.EntityExistsException: Violation of PRIMARY 
> KEY constraint 'PK__WF_ACTIO__3213E83FECE1688E'. Cannot insert duplicate key 
> in object 'dbo.WF_ACTIONS'. The duplicate key value is 
> (0000002-130822205833322-oozie-hdp-W@wc). {prepstmnt 1032423179 INSERT INTO 
> WF_ACTIONS (id, conf, console_url, cred, data, error_code, error_message, 
> external_child_ids, external_id, external_status, name, retries, stats, 
> tracker_uri, transition, type, user_retry_count, user_retry_interval, 
> user_retry_max, bean_type, end_time, execution_path, last_check_time, 
> log_token, pending, pending_age, signal_value, sla_xml, start_time, status, 
> wf_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 
> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [params=?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 
> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]} [code=2627, 
> state=23000]
> FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
>       at 
> org.apache.openjpa.jdbc.sql.DBDictionary.narrow(DBDictionary.java:4854)
>       at 
> org.apache.openjpa.jdbc.sql.DBDictionary.newStoreException(DBDictionary.java:4829)
>       at 
> org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:136)
>       at 
> org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:78)
>       at 
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushAndUpdate(PreparedStatementManagerImpl.java:143)
>       at 
> org.apache.openjpa.jdbc.kernel.BatchingPreparedStatementManagerImpl.batchOrExecuteRow(BatchingPreparedStatementManagerImpl.java:101)
>       at 
> org.apache.openjpa.jdbc.kernel.BatchingPreparedStatementManagerImpl.flushAndUpdate(BatchingPreparedStatementManagerImpl.java:85)
>       at 
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushInternal(PreparedStatementManagerImpl.java:99)
>       at 
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flush(PreparedStatementManagerImpl.java:87)
>       at 
> org.apache.openjpa.jdbc.kernel.ConstraintUpdateManager.flush(ConstraintUpdateManager.java:550)
>       at 
> org.apache.openjpa.jdbc.kernel.ConstraintUpdateManager.flush(ConstraintUpdateManager.java:107)
>       at 
> org.apache.openjpa.jdbc.kernel.BatchingConstraintUpdateManager.flush(BatchingConstraintUpdateManager.java:59)
>       at 
> org.apache.openjpa.jdbc.kernel.AbstractUpdateManager.flush(AbstractUpdateManager.java:103)
>       at 
> org.apache.openjpa.jdbc.kernel.AbstractUpdateManager.flush(AbstractUpdateManager.java:76)
>       at 
> org.apache.openjpa.jdbc.kernel.JDBCStoreManager.flush(JDBCStoreManager.java:742)
>       at 
> org.apache.openjpa.kernel.DelegatingStoreManager.flush(DelegatingStoreManager.java:131)
>       ... 17 more
> Caused by: org.apache.openjpa.lib.jdbc.ReportingSQLException: Violation of 
> PRIMARY KEY constraint 'PK__WF_ACTIO__3213E83FECE1688E'. Cannot insert 
> duplicate key in object 'dbo.WF_ACTIONS'. The duplicate key value is 
> (0000002-130822205833322-oozie-hdp-W@wc). {prepstmnt 1032423179 INSERT INTO 
> WF_ACTIONS (id, conf, console_url, cred, data, error_code, error_message, 
> external_child_ids, external_id, external_status, name, retries, stats, 
> tracker_uri, transition, type, user_retry_count, user_retry_interval, 
> user_retry_max, bean_type, end_time, execution_path, last_check_time, 
> log_token, pending, pending_age, signal_value, sla_xml, start_time, status, 
> wf_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 
> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [params=?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 
> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]} [code=2627, 
> state=23000]
>       at 
> org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.wrap(LoggingConnectionDecorator.java:281)
>       at 
> org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.wrap(LoggingConnectionDecorator.java:257)
>       at 
> org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.access$1000(LoggingConnectionDecorator.java:72)
>       at 
> org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator$LoggingConnection$LoggingPreparedStatement.executeUpdate(LoggingConnectionDecorator.java:1199)
>       at 
> org.apache.openjpa.lib.jdbc.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:291)
>       at 
> org.apache.openjpa.jdbc.kernel.JDBCStoreManager$CancelPreparedStatement.executeUpdate(JDBCStoreManager.java:1776)
>       at 
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.executeUpdate(PreparedStatementManagerImpl.java:267)
>       at 
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushAndUpdate(PreparedStatementManagerImpl.java:118)
>       ... 28 more
> {noformat}
> Possible problem may be that when oozie is inserting a new object, the JPA 
> entity manager needs to hear back from the connection to know the commit 
> succeeded.
> Sometimes when commit indeed succeeded but connection was lost, the JPA 
> entityManager will keep retrying to persist the same object since it assumes 
> the commit failed, therefore you see the "primary key constraint violation 
> exception" in the stacktrace. The workflow status remains in running state 
> until all of its child actions turn "SUCCEEDED". Since one of the child 
> action fails to be created, the workflow will always remain in running state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to