Marty Kube created OOZIE-1214:
---------------------------------
Summary: Oozie coordinator jobs cannot be killed
Key: OOZIE-1214
URL: https://issues.apache.org/jira/browse/OOZIE-1214
Project: Oozie
Issue Type: Bug
Components: coordinator
Affects Versions: 3.2.0
Reporter: Marty Kube
We can't reliably kill oozie jobs (mostly fails, occasionally succeeds). When
kill fails, an oozie server restart usually "fixes" the issue so the job can be
killed after shutdown/startup sequence.
>oozie job -kill 0005437-130128204041186-oozie-oozi-C
Hangs forever or occasionally exits the command with no error on cmd line but
several errors in oozie log.
We find the following errors in the logs:
2013-02-10 22:32:42,776 WARN org.apache.oozie.command.wf.ResumeXCommand:
USER[oozie] GROUP[-] TOKEN[] APP[foo-logmapper]
JOB[0005516-130128204041186-oozie-oozi-W] ACTION[-] E1100: Command precondition
doe\
s not hold before execution, [workflow's status is SUCCEEDED is not SUSPENDED],
Error Code: E1100
2013-02-10 22:32:42,776 ERROR org.apache.oozie.command.wf.ResumeXCommand:
USER[oozie] GROUP[-] TOKEN[] APP[foo-logmapper]
JOB[0005516-130128204041186-oozie-oozi-W] ACTION[-] Exception,
java.lang.IllegalMonitorStateException
at
java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryRelease(ReentrantReadWriteLock.java:317)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1239)
at
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.unlock(ReentrantReadWriteLock.java:992)
at
org.apache.oozie.util.MemoryLocks$LockToken.release(MemoryLocks.java:59)
at org.apache.oozie.command.XCommand.releaseLock(XCommand.java:222)
at org.apache.oozie.command.XCommand.call(XCommand.java:295)
at
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)
at
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)
at
org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2013-02-10 22:32:42,776 WARN
org.apache.oozie.service.CallableQueueService$CompositeCallable: USER[-]
GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] exception callable [resume], E0607:
Other error in operation\
[java.lang.IllegalMonitorStateException], {1}
org.apache.oozie.command.CommandException: E0607: Other error in operation
[java.lang.IllegalMonitorStateException], {1}
at org.apache.oozie.command.XCommand.call(XCommand.java:317)
at
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)
at
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)
at
org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.IllegalMonitorStateException
at
java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryRelease(ReentrantReadWriteLock.java:317)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1239)
at
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.unlock(ReentrantReadWriteLock.java:992)
at
org.apache.oozie.util.MemoryLocks$LockToken.release(MemoryLocks.java:59)
at org.apache.oozie.command.XCommand.releaseLock(XCommand.java:222)
at org.apache.oozie.command.XCommand.call(XCommand.java:295)
... 6 more
2013-02-10 22:32:43,377 WARN
org.apache.oozie.command.coord.CoordActionReadyXCommand: USER[oozie] GROUP[-]
TOKEN[] APP[foo-logs-cron-coord] JOB[0002766-130128204041186-oozie-oozi-C]
ACTION[-] No actions t\
o start! for jobId=0002766-130128204041186-oozie-oozi-C
2013-02-10 22:32:43,964 WARN
org.apache.oozie.command.coord.CoordActionReadyXCommand: USER[oozie] GROUP[-]
TOKEN[] APP[foo-modelling-cron-coord] JOB[0005437-130128204041186-oozie-oozi-C]
ACTION[-] No acti\
ons to start! for jobId=0005437-130128204041186-oozie-oozi-C
2013-02-10 22:33:00,608 WARN
org.apache.hadoop.security.authentication.server.AuthenticationFilter:
AuthenticationToken ignored: AuthenticationToken expired
2013-02-10 22:33:00,611 WARN
org.apache.hadoop.security.authentication.server.AuthenticationFilter:
AuthenticationToken ignored: AuthenticationToken expired
2013-02-10 22:33:00,627 WARN
org.apache.hadoop.security.authentication.server.AuthenticationFilter:
AuthenticationToken ignored: AuthenticationToken expired
2013-02-10 22:33:00,628 WARN
org.apache.hadoop.security.authentication.server.AuthenticationFilter:
AuthenticationToken ignored: AuthenticationToken expired
2013-02-10 22:33:04,333 INFO
org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: USER[-]
GROUP[-] Acquired lock for [org.apache.oozie.service.StatusTransitService]
2013-02-10 22:33:04,333 INFO
org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: USER[-]
GROUP[-] Running coordinator status service from last instance time =
2013-02-11T03:32Z
2013-02-10 22:33:29,476 INFO
org.apache.oozie.service.CoordMaterializeTriggerService$CoordMaterializeTriggerRunnable:
USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-]
CoordMaterializeTriggerService - Cur\
r Date= Sun Feb 10 22:34:29 EST 2013, Num jobs to materialize = 1
2013-02-10 22:33:33,000 INFO org.apache.oozie.service.PauseTransitService:
USER[-] GROUP[-] Acquired lock for
[org.apache.oozie.service.PauseTransitService]
2013-02-10 22:33:33,007 INFO org.apache.oozie.service.PauseTransitService:
USER[-] GROUP[-] Released lock for
[org.apache.oozie.service.PauseTransitService]
2013-02-10 22:34:11,834 WARN org.apache.oozie.service.JPAService: USER[-]
GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] JPAExecutor
[CoordActionsGetByLastModifiedTimeJPAExecutor] ended with an active transact\
ion, rolling back
2013-02-10 22:34:11,835 WARN
org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: USER[-]
GROUP[-] Exception happened during StatusTransitRunnable
org.apache.oozie.executor.jpa.JPAExecutorException: E0603: SQL error in
operation [<openjpa-2.1.0-r422266:1071316 nonfatal store error>
org.apache.openjpa.persistence.OptimisticLockException: Unable to ob\
tain an object lock on "null".
FailedObject: select a.jobId from CoordinatorActionBean a where
a.lastModifiedTimestamp >= :lastModifiedTime [java.lang.String]], {1}
at
org.apache.oozie.executor.jpa.CoordActionsGetByLastModifiedTimeJPAExecutor.execute(CoordActionsGetByLastModifiedTimeJPAExecutor.java:62)
at
org.apache.oozie.executor.jpa.CoordActionsGetByLastModifiedTimeJPAExecutor.execute(CoordActionsGetByLastModifiedTimeJPAExecutor.java:34)
at org.apache.oozie.service.JPAService.execute(JPAService.java:211)
at
org.apache.oozie.service.StatusTransitService$StatusTransitRunnable.coordTransit(StatusTransitService.java:710)
at
org.apache.oozie.service.StatusTransitService$StatusTransitRunnable.run(StatusTransitService.java:108)
at
org.apache.oozie.service.SchedulerService$2.run(SchedulerService.java:164)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: <openjpa-2.1.0-r422266:1071316 nonfatal store error>
org.apache.openjpa.persistence.OptimisticLockException: Unable to obtain an
object lock on "null".
FailedObject: select a.jobId from CoordinatorActionBean a where
a.lastModifiedTimestamp >= :lastModifiedTime [java.lang.String]
at
org.apache.openjpa.jdbc.sql.DBDictionary.narrow(DBDictionary.java:4851)
at
org.apache.openjpa.jdbc.sql.DBDictionary.newStoreException(DBDictionary.java:4829)
at
org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:136)
at
org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:118)
at
org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:70)
at
org.apache.openjpa.jdbc.kernel.SelectResultObjectProvider.handleCheckedException(SelectResultObjectProvider.java:155)
at
org.apache.openjpa.kernel.QueryImpl$PackingResultObjectProvider.handleCheckedException(QueryImpl.java:2111)
at
org.apache.openjpa.lib.rop.EagerResultList.<init>(EagerResultList.java:40)
at org.apache.openjpa.kernel.QueryImpl.toResult(QueryImpl.java:1251)
at org.apache.openjpa.kernel.QueryImpl.execute(QueryImpl.java:1007)
at org.apache.openjpa.kernel.QueryImpl.execute(QueryImpl.java:863)
at org.apache.openjpa.kernel.QueryImpl.execute(QueryImpl.java:794)
at
org.apache.openjpa.kernel.DelegatingQuery.execute(DelegatingQuery.java:542)
at org.apache.openjpa.persistence.QueryImpl.execute(QueryImpl.java:305)
at
org.apache.openjpa.persistence.QueryImpl.getResultList(QueryImpl.java:319)
at
org.apache.oozie.executor.jpa.CoordActionsGetByLastModifiedTimeJPAExecutor.execute(CoordActionsGetByLastModifiedTimeJPAExecutor.java:58)
... 14 more
Caused by: java.sql.SQLTransactionRollbackException: A lock could not be
obtained within the time requested
at
org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown
Source)
at
org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown
Source)
at
org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown
Source)
at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown
Source)
at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown
Source)
at
org.apache.derby.impl.jdbc.EmbedResultSet.closeOnTransactionError(Unknown
Source)
at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(Unknown
Source)
at org.apache.derby.impl.jdbc.EmbedResultSet.next(Unknown Source)
at
org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207)
at
org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207)
at
org.apache.openjpa.lib.jdbc.DelegatingResultSet.next(DelegatingResultSet.java:131)
at
org.apache.openjpa.jdbc.sql.ResultSetResult.nextInternal(ResultSetResult.java:222)
at
org.apache.openjpa.jdbc.sql.SelectImpl$SelectResult.nextInternal(SelectImpl.java:2445)
at
org.apache.openjpa.jdbc.sql.AbstractResult.next(AbstractResult.java:175)
at
org.apache.openjpa.jdbc.kernel.SelectResultObjectProvider.next(SelectResultObjectProvider.java:99)
at
org.apache.openjpa.kernel.QueryImpl$PackingResultObjectProvider.next(QueryImpl.java:2087)
at
org.apache.openjpa.lib.rop.EagerResultList.<init>(EagerResultList.java:35)
... 22 more
Caused by: java.sql.SQLException: A lock could not be obtained within the time
requested
at
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at
org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
Source)
... 40 more
Caused by: ERROR 40XL1: A lock could not be obtained within the time requested
at org.apache.derby.iapi.error.StandardException.newException(Unknown
Source)
at
org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(Unknown
Source)
at
org.apache.derby.impl.services.locks.ConcurrentLockSet.zeroDurationLockObject(Unknown
Source)
at
org.apache.derby.impl.services.locks.AbstractPool.zeroDurationlockObject(Unknown
Source)
at
org.apache.derby.impl.services.locks.ConcurrentPool.zeroDurationlockObject(Unknown
Source)
at
org.apache.derby.impl.store.raw.xact.RowLocking2nohold.lockRecordForRead(Unknown
Source)
at
org.apache.derby.impl.store.access.heap.HeapController.lockRow(Unknown Source)
at
org.apache.derby.impl.store.access.heap.HeapController.lockRow(Unknown Source)
at
org.apache.derby.impl.store.access.btree.index.B2IRowLocking3.lockRowOnPage(Unknown
Source)
at
org.apache.derby.impl.store.access.btree.index.B2IRowLocking3._lockScanRow(Unknown
Source)
at
org.apache.derby.impl.store.access.btree.index.B2IRowLockingRR.lockScanRow(Unknown
Source)
at
org.apache.derby.impl.store.access.btree.BTreeForwardScan.fetchRows(Unknown
Source)
at
org.apache.derby.impl.store.access.btree.BTreeScan.fetchNextGroup(Unknown
Source)
at
org.apache.derby.impl.sql.execute.BulkTableScanResultSet.reloadArray(Unknown
Source)
at
org.apache.derby.impl.sql.execute.BulkTableScanResultSet.getNextRowCore(Unknown
Source)
at
org.apache.derby.impl.sql.execute.IndexRowToBaseRowResultSet.getNextRowCore(Unknown
Source)
at
org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore(Unknown
Source)
at
org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore(Unknown
Source)
at
org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.getNextRow(Unknown
Source)
... 33 more
2013-02-10 22:34:11,835 INFO
org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: USER[-]
GROUP[-] Released lock for [org.apache.oozie.service.StatusTransitService]
2013-02-10 22:34:29,790 WARN org.apache.oozie.service.JPAService: USER[-]
GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] JPAExecutor
[CoordActionsActiveCountJPAExecutor] ended with an active transaction, rolli
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira