Aled Sage created BROOKLYN-313:
----------------------------------
Summary: Exception stacktrace never logged when resize effector
fails, called by AutoScaler policy
Key: BROOKLYN-313
URL: https://issues.apache.org/jira/browse/BROOKLYN-313
Project: Brooklyn
Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Aled Sage
A customer was using the ControlledDynamicWebAppCluster, with a
AutoScalerPolicy. However, they also had an EntitlementManager that throws a
NullPointerException whenever it is called with {{context==null ||
context.user==null}} (which happens for automated calls not triggered
explicitly by a user, such as for policies).
The underlying cause of the NullPointerException was never logged. All that
made it into the logs was something like:
{noformat}
2016-07-09 16:07:57,491 DEBUG o.a.b.util.core.task.BasicTask
[brooklyn-autoscalerpolicy-0]: call from
Thread[brooklyn-autoscalerpolicy-0,5,main], blocking until
'Task[Auto-scaler]@DHtSIGZf' finishes, ended with error:
java.util.concurrent.ExecutionException:
org.apache.brooklyn.util.exceptions.PropagatedRuntimeException: Error invoking
resize at DynamicWebAppClusterImpl{id=iyo37ngogv}: NullPointerException
{noformat}
We should *always* log (at least at debug) the underlying cause of an
exception, even if it risks logging it multiple times.
I've reproduced this in a test case. The thread stacktraces are shown below:
The thread below is about to throw a NullPointerException. The logging from
this thread is also shown. The underlying exception is recorded in the task (so
a subsequent task.get() would throw it):
{noformat}
Daemon Thread [brooklyn-execmanager-cU8EcGHT-39] (Suspended (breakpoint at line
64 in EffectorExceptionLoggingTest$ThrowingEntitlementManager))
EffectorExceptionLoggingTest$ThrowingEntitlementManager.isEntitled(EntitlementContext,
EntitlementClass<T>, T) line: 64
Entitlements.isEntitled(EntitlementManager, EntitlementClass<T>, T)
line: 409
Entitlements.checkEntitled(EntitlementManager, EntitlementClass<T>, T)
line: 414
EntityManagementSupport$EntityChangeListenerImpl.onEffectorStarting(Effector<?>,
Object) line: 469
EffectorUtils.invokeMethodEffector(Entity, Effector<T>, Map<String,?>)
line: 274
MethodEffector<T>.call(Entity, Map) line: 148
MethodEffector<T>(AbstractEffector<T>).call(Map, Entity) line: 61
AbstractEffector$1$1.call() line: 83
DynamicSequentialTask$DstJob.call() line: 359
BasicExecutionManager$SubmissionCallable<T>.call() line: 519
FutureTask<V>.run() line: 262
ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145
ThreadPoolExecutor$Worker.run() line: 615
Thread.run() line: 745
2016-07-09 16:33:47,333 WARN o.a.b.c.m.i.EffectorUtils
[brooklyn-execmanager-cU8EcGHT-39]: Error invoking resize at
DynamicWebAppClusterImpl{id=iyo37ngogv}: NullPointerException
2016-07-09 16:33:51,213 DEBUG o.a.b.u.c.t.BasicExecutionManager
[brooklyn-execmanager-cU8EcGHT-39]: Exception running task
Task[resize]@ZbLY4WUg (rethrowing):
org.apache.brooklyn.core.mgmt.internal.EffectorUtils$EffectorCallPropagatedRuntimeException:
Error invoking resize at DynamicWebAppClusterImpl{id=iyo37ngogv}:
NullPointerException
2016-07-09 16:33:51,214 DEBUG o.a.b.u.core.task.CompoundTask
[brooklyn-execmanager-cU8EcGHT-36]: Secondary job queue for
Task[Auto-scaler]@pOJ7r6Y0 ignoring error in inessential task
Task[resize]@ZbLY4WUg: java.util.concurrent.ExecutionException:
org.apache.brooklyn.core.mgmt.internal.EffectorUtils$EffectorCallPropagatedRuntimeException:
Error invoking resize at DynamicWebAppClusterImpl{id=iyo37ngogv}:
NullPointerException
{noformat}
The thread below invoked the effector - note that the AutoScalerPolicy has
wrapped the invocation in a task. The logging from this thread is also shown.
Within that task, it will not catch the exception:
{noformat}
"brooklyn-execmanager-cU8EcGHT-0" daemon prio=5 tid=0x00007fcfe5b5f800
nid=0x6003 waiting on condition [0x00007000018de000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000007f93d7ab0> (a
java.util.concurrent.FutureTask)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:425)
at java.util.concurrent.FutureTask.get(FutureTask.java:187)
at
com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63)
at org.apache.brooklyn.util.core.task.BasicTask.get(BasicTask.java:361)
at
org.apache.brooklyn.util.core.task.BasicTask.getUnchecked(BasicTask.java:370)
at
org.apache.brooklyn.util.core.task.DynamicTasks$TaskQueueingResult.andWaitForSuccess(DynamicTasks.java:160)
at
org.apache.brooklyn.core.objs.proxy.EntityProxyImpl.invoke(EntityProxyImpl.java:211)
at com.sun.proxy.$Proxy19.resize(Unknown Source)
at
org.apache.brooklyn.entity.webapp.ControlledDynamicWebAppClusterImpl.resize(ControlledDynamicWebAppClusterImpl.java:297)
at
org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy$3.resize(AutoScalerPolicy.java:364)
at
org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy$14.call(AutoScalerPolicy.java:1054)
at
org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy$14.call(AutoScalerPolicy.java:1)
at
org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:359)
at
org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:519)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2016-07-09 16:36:54,385 DEBUG o.a.b.u.c.t.BasicExecutionManager
[brooklyn-execmanager-cU8EcGHT-0]: Exception running task
Task[Auto-scaler]@pOJ7r6Y0 (rethrowing):
org.apache.brooklyn.util.exceptions.PropagatedRuntimeException: Error invoking
resize at DynamicWebAppClusterImpl{id=iyo37ngogv}: NullPointerException
{noformat}
The thread below invoked the task. It has called {{task.blockUntilEnded}} to
wait for the result:
{noformat}
"brooklyn-autoscalerpolicy-0" prio=5 tid=0x00007fcfe5b49800 nid=0x1407 waiting
on condition [0x0000700002b14000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000007f81a07b0> (a
java.util.concurrent.FutureTask)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:425)
at java.util.concurrent.FutureTask.get(FutureTask.java:187)
at
com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63)
at
org.apache.brooklyn.util.core.task.BasicTask.blockUntilEnded(BasicTask.java:417)
at
org.apache.brooklyn.util.core.task.BasicTask.blockUntilEnded(BasicTask.java:407)
at
org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy.resizeNow(AutoScalerPolicy.java:1066)
at
org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy.access$15(AutoScalerPolicy.java:1021)
at
org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy$13.run(AutoScalerPolicy.java:952)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2016-07-09 16:38:45,089 DEBUG o.a.b.util.core.task.BasicTask
[brooklyn-autoscalerpolicy-0]: call from
Thread[brooklyn-autoscalerpolicy-0,5,main], blocking until
'Task[Auto-scaler]@pOJ7r6Y0' finishes, ended with error:
java.util.concurrent.ExecutionException:
org.apache.brooklyn.util.exceptions.PropagatedRuntimeException: Error invoking
resize at DynamicWebAppClusterImpl{id=iyo37ngogv}: NullPointerException
{noformat}
---
Note we'd have had the same problem if the cluster.resize() effector itself had
thrown the exception (rather than the entitlements). We'd have only got logging
like:
{noformat}
2016-07-09 16:43:52,293 DEBUG o.a.b.u.c.t.BasicExecutionManager
[brooklyn-execmanager-NpLl6jt3-1]: Exception running task
Task[Auto-scaler]@uLBsngUp (rethrowing): java.lang.NullPointerException
2016-07-09 16:43:52,294 DEBUG o.a.b.util.core.task.BasicTask
[brooklyn-autoscalerpolicy-0]: call from
Thread[brooklyn-autoscalerpolicy-0,5,main], blocking until
'Task[Auto-scaler]@uLBsngUp' finishes, ended with error:
java.util.concurrent.ExecutionException: java.lang.NullPointerException
{noformat}
This lack of logging is hard to reproduce in a simpler test case where
effectors are invoked directly. It is primarily because of the additional
layers of task, and the use of {{basicTask.blockUntilEnded}}, that we lose the
message.
However, I have in the past seen something similar when there is an exception
in an effector invoked from the web-console: the web-console's activity view
showed the full exception stacktrace, but the debug log did not have it. This
is very bad for when a customer sends us a debug log!
---
It's unclear what the holistic fix for this should be.
We do not want to require the caller of the effector to always wrap the call in
a try-catch with logging: the failure of an effector should get logged
automatically, and should include the underlying cause. Any other option risks
us being sent customer log files that give no indication of what caused the
problem.
---
We can hopefully log any effector invocation error in the same way.
Alternatively we could treat framework exceptions (e.g. NPE preparing to
execute) as different from exceptions during the actual effector
implementation.
I don't want to do it in low-level code like {{BasicTask.blockUntilEnded}},
because that will probably result in a huge amount of duplicate logging.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)