Marty Kube created OOZIE-1214:
---------------------------------

             Summary: Oozie coordinator jobs cannot be killed
                 Key: OOZIE-1214
                 URL: https://issues.apache.org/jira/browse/OOZIE-1214
             Project: Oozie
          Issue Type: Bug
          Components: coordinator
    Affects Versions: 3.2.0
            Reporter: Marty Kube


We can't reliably kill oozie jobs (mostly fails, occasionally succeeds).  When 
kill fails, an oozie server restart usually "fixes" the issue so the job can be 
killed after shutdown/startup sequence.

>oozie job -kill 0005437-130128204041186-oozie-oozi-C

Hangs forever or occasionally exits the command with no error on cmd line but 
several errors in oozie log.

We find the following errors in the logs:


2013-02-10 22:32:42,776 WARN org.apache.oozie.command.wf.ResumeXCommand: 
USER[oozie] GROUP[-] TOKEN[] APP[foo-logmapper] 
JOB[0005516-130128204041186-oozie-oozi-W] ACTION[-] E1100: Command precondition 
doe\
s not hold before execution, [workflow's status is SUCCEEDED is not SUSPENDED], 
Error Code: E1100
2013-02-10 22:32:42,776 ERROR org.apache.oozie.command.wf.ResumeXCommand: 
USER[oozie] GROUP[-] TOKEN[] APP[foo-logmapper] 
JOB[0005516-130128204041186-oozie-oozi-W] ACTION[-] Exception,
java.lang.IllegalMonitorStateException
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryRelease(ReentrantReadWriteLock.java:317)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1239)
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.unlock(ReentrantReadWriteLock.java:992)
        at 
org.apache.oozie.util.MemoryLocks$LockToken.release(MemoryLocks.java:59)
        at org.apache.oozie.command.XCommand.releaseLock(XCommand.java:222)
        at org.apache.oozie.command.XCommand.call(XCommand.java:295)
        at 
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)
        at 
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)
        at 
org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2013-02-10 22:32:42,776 WARN 
org.apache.oozie.service.CallableQueueService$CompositeCallable: USER[-] 
GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] exception callable [resume], E0607: 
Other error in operation\
 [java.lang.IllegalMonitorStateException], {1}
org.apache.oozie.command.CommandException: E0607: Other error in operation 
[java.lang.IllegalMonitorStateException], {1}
        at org.apache.oozie.command.XCommand.call(XCommand.java:317)
        at 
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)
        at 
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)
        at 
org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.IllegalMonitorStateException
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryRelease(ReentrantReadWriteLock.java:317)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1239)
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.unlock(ReentrantReadWriteLock.java:992)
        at 
org.apache.oozie.util.MemoryLocks$LockToken.release(MemoryLocks.java:59)
        at org.apache.oozie.command.XCommand.releaseLock(XCommand.java:222)
        at org.apache.oozie.command.XCommand.call(XCommand.java:295)
        ... 6 more
2013-02-10 22:32:43,377 WARN 
org.apache.oozie.command.coord.CoordActionReadyXCommand: USER[oozie] GROUP[-] 
TOKEN[] APP[foo-logs-cron-coord] JOB[0002766-130128204041186-oozie-oozi-C] 
ACTION[-] No actions t\
o start! for jobId=0002766-130128204041186-oozie-oozi-C
2013-02-10 22:32:43,964 WARN 
org.apache.oozie.command.coord.CoordActionReadyXCommand: USER[oozie] GROUP[-] 
TOKEN[] APP[foo-modelling-cron-coord] JOB[0005437-130128204041186-oozie-oozi-C] 
ACTION[-] No acti\
ons to start! for jobId=0005437-130128204041186-oozie-oozi-C
2013-02-10 22:33:00,608 WARN 
org.apache.hadoop.security.authentication.server.AuthenticationFilter: 
AuthenticationToken ignored: AuthenticationToken expired
2013-02-10 22:33:00,611 WARN 
org.apache.hadoop.security.authentication.server.AuthenticationFilter: 
AuthenticationToken ignored: AuthenticationToken expired
2013-02-10 22:33:00,627 WARN 
org.apache.hadoop.security.authentication.server.AuthenticationFilter: 
AuthenticationToken ignored: AuthenticationToken expired
2013-02-10 22:33:00,628 WARN 
org.apache.hadoop.security.authentication.server.AuthenticationFilter: 
AuthenticationToken ignored: AuthenticationToken expired
2013-02-10 22:33:04,333 INFO 
org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: USER[-] 
GROUP[-] Acquired lock for [org.apache.oozie.service.StatusTransitService]
2013-02-10 22:33:04,333 INFO 
org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: USER[-] 
GROUP[-] Running coordinator status service from last instance time =  
2013-02-11T03:32Z
2013-02-10 22:33:29,476 INFO 
org.apache.oozie.service.CoordMaterializeTriggerService$CoordMaterializeTriggerRunnable:
 USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] 
CoordMaterializeTriggerService - Cur\
r Date= Sun Feb 10 22:34:29 EST 2013, Num jobs to materialize = 1
2013-02-10 22:33:33,000 INFO org.apache.oozie.service.PauseTransitService: 
USER[-] GROUP[-] Acquired lock for 
[org.apache.oozie.service.PauseTransitService]
2013-02-10 22:33:33,007 INFO org.apache.oozie.service.PauseTransitService: 
USER[-] GROUP[-] Released lock for 
[org.apache.oozie.service.PauseTransitService]
2013-02-10 22:34:11,834 WARN org.apache.oozie.service.JPAService: USER[-] 
GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] JPAExecutor 
[CoordActionsGetByLastModifiedTimeJPAExecutor] ended with an active transact\
ion, rolling back
2013-02-10 22:34:11,835 WARN 
org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: USER[-] 
GROUP[-] Exception happened during StatusTransitRunnable
org.apache.oozie.executor.jpa.JPAExecutorException: E0603: SQL error in 
operation [<openjpa-2.1.0-r422266:1071316 nonfatal store error> 
org.apache.openjpa.persistence.OptimisticLockException: Unable to ob\
tain an object lock on "null".
FailedObject: select a.jobId from CoordinatorActionBean a where 
a.lastModifiedTimestamp >= :lastModifiedTime [java.lang.String]], {1}
        at 
org.apache.oozie.executor.jpa.CoordActionsGetByLastModifiedTimeJPAExecutor.execute(CoordActionsGetByLastModifiedTimeJPAExecutor.java:62)
        at 
org.apache.oozie.executor.jpa.CoordActionsGetByLastModifiedTimeJPAExecutor.execute(CoordActionsGetByLastModifiedTimeJPAExecutor.java:34)
        at org.apache.oozie.service.JPAService.execute(JPAService.java:211)
        at 
org.apache.oozie.service.StatusTransitService$StatusTransitRunnable.coordTransit(StatusTransitService.java:710)
        at 
org.apache.oozie.service.StatusTransitService$StatusTransitRunnable.run(StatusTransitService.java:108)
        at 
org.apache.oozie.service.SchedulerService$2.run(SchedulerService.java:164)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: <openjpa-2.1.0-r422266:1071316 nonfatal store error> 
org.apache.openjpa.persistence.OptimisticLockException: Unable to obtain an 
object lock on "null".
FailedObject: select a.jobId from CoordinatorActionBean a where 
a.lastModifiedTimestamp >= :lastModifiedTime [java.lang.String]
        at 
org.apache.openjpa.jdbc.sql.DBDictionary.narrow(DBDictionary.java:4851)
        at 
org.apache.openjpa.jdbc.sql.DBDictionary.newStoreException(DBDictionary.java:4829)
        at 
org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:136)
        at 
org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:118)
        at 
org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:70)
        at 
org.apache.openjpa.jdbc.kernel.SelectResultObjectProvider.handleCheckedException(SelectResultObjectProvider.java:155)
        at 
org.apache.openjpa.kernel.QueryImpl$PackingResultObjectProvider.handleCheckedException(QueryImpl.java:2111)
        at 
org.apache.openjpa.lib.rop.EagerResultList.<init>(EagerResultList.java:40)
        at org.apache.openjpa.kernel.QueryImpl.toResult(QueryImpl.java:1251)
        at org.apache.openjpa.kernel.QueryImpl.execute(QueryImpl.java:1007)
        at org.apache.openjpa.kernel.QueryImpl.execute(QueryImpl.java:863)
        at org.apache.openjpa.kernel.QueryImpl.execute(QueryImpl.java:794)
        at 
org.apache.openjpa.kernel.DelegatingQuery.execute(DelegatingQuery.java:542)
        at org.apache.openjpa.persistence.QueryImpl.execute(QueryImpl.java:305)
        at 
org.apache.openjpa.persistence.QueryImpl.getResultList(QueryImpl.java:319)
        at 
org.apache.oozie.executor.jpa.CoordActionsGetByLastModifiedTimeJPAExecutor.execute(CoordActionsGetByLastModifiedTimeJPAExecutor.java:58)
        ... 14 more
Caused by: java.sql.SQLTransactionRollbackException: A lock could not be 
obtained within the time requested
        at 
org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
        at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
Source)
        at 
org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
Source)
        at 
org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
Source)
        at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
Source)
        at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
Source)
        at 
org.apache.derby.impl.jdbc.EmbedResultSet.closeOnTransactionError(Unknown 
Source)
        at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(Unknown 
Source)
        at org.apache.derby.impl.jdbc.EmbedResultSet.next(Unknown Source)
        at 
org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207)
        at 
org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207)
        at 
org.apache.openjpa.lib.jdbc.DelegatingResultSet.next(DelegatingResultSet.java:131)
        at 
org.apache.openjpa.jdbc.sql.ResultSetResult.nextInternal(ResultSetResult.java:222)
        at 
org.apache.openjpa.jdbc.sql.SelectImpl$SelectResult.nextInternal(SelectImpl.java:2445)
        at 
org.apache.openjpa.jdbc.sql.AbstractResult.next(AbstractResult.java:175)
        at 
org.apache.openjpa.jdbc.kernel.SelectResultObjectProvider.next(SelectResultObjectProvider.java:99)
        at 
org.apache.openjpa.kernel.QueryImpl$PackingResultObjectProvider.next(QueryImpl.java:2087)
        at 
org.apache.openjpa.lib.rop.EagerResultList.<init>(EagerResultList.java:35)
        ... 22 more
Caused by: java.sql.SQLException: A lock could not be obtained within the time 
requested
        at 
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
        at 
org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
 Source)
        ... 40 more
Caused by: ERROR 40XL1: A lock could not be obtained within the time requested
        at org.apache.derby.iapi.error.StandardException.newException(Unknown 
Source)
        at 
org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(Unknown 
Source)
        at 
org.apache.derby.impl.services.locks.ConcurrentLockSet.zeroDurationLockObject(Unknown
 Source)
        at 
org.apache.derby.impl.services.locks.AbstractPool.zeroDurationlockObject(Unknown
 Source)
        at 
org.apache.derby.impl.services.locks.ConcurrentPool.zeroDurationlockObject(Unknown
 Source)
        at 
org.apache.derby.impl.store.raw.xact.RowLocking2nohold.lockRecordForRead(Unknown
 Source)
        at 
org.apache.derby.impl.store.access.heap.HeapController.lockRow(Unknown Source)
        at 
org.apache.derby.impl.store.access.heap.HeapController.lockRow(Unknown Source)
        at 
org.apache.derby.impl.store.access.btree.index.B2IRowLocking3.lockRowOnPage(Unknown
 Source)
        at 
org.apache.derby.impl.store.access.btree.index.B2IRowLocking3._lockScanRow(Unknown
 Source)
        at 
org.apache.derby.impl.store.access.btree.index.B2IRowLockingRR.lockScanRow(Unknown
 Source)
        at 
org.apache.derby.impl.store.access.btree.BTreeForwardScan.fetchRows(Unknown 
Source)
        at 
org.apache.derby.impl.store.access.btree.BTreeScan.fetchNextGroup(Unknown 
Source)
        at 
org.apache.derby.impl.sql.execute.BulkTableScanResultSet.reloadArray(Unknown 
Source)
        at 
org.apache.derby.impl.sql.execute.BulkTableScanResultSet.getNextRowCore(Unknown 
Source)
        at 
org.apache.derby.impl.sql.execute.IndexRowToBaseRowResultSet.getNextRowCore(Unknown
 Source)
        at 
org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore(Unknown
 Source)
        at 
org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore(Unknown
 Source)
        at 
org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.getNextRow(Unknown 
Source)
        ... 33 more
2013-02-10 22:34:11,835 INFO 
org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: USER[-] 
GROUP[-] Released lock for [org.apache.oozie.service.StatusTransitService]
2013-02-10 22:34:29,790 WARN org.apache.oozie.service.JPAService: USER[-] 
GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] JPAExecutor 
[CoordActionsActiveCountJPAExecutor] ended with an active transaction, rolli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to