[
https://issues.apache.org/jira/browse/BROOKLYN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15369183#comment-15369183
]
Aled Sage commented on BROOKLYN-313:
------------------------------------
My inclination is to catch+log+rethrow in EffectorUtils.invokeMethodEffector.
This already has a handleEffectorException, which just does a one-liner warn
message and then rethrows. I think this should also do a log.debug of the full
exception. The EffectorUtils.handleEffectorException already has smarts to
avoid logging exactly the same thing again.
Things get hard if we invoke effector X, which invokes Y on another entity. If
Y throws an exception that causes X to fail, then we're likely to log Y's
failure immediately and then again as the "caused by" of X's failure. I think
that's acceptable. We definitely need to log X's exception, because it shows
the callstack of where we invoked Y. We could try to do fiddly stuff to mark
Y's exception as already logged, so we exclude it from the "caused by" when
logging X's exception. But that seems too complicated and unusual for the value
it adds.
> Exception stacktrace never logged when resize effector fails, called by
> AutoScaler policy
> -----------------------------------------------------------------------------------------
>
> Key: BROOKLYN-313
> URL: https://issues.apache.org/jira/browse/BROOKLYN-313
> Project: Brooklyn
> Issue Type: Bug
> Affects Versions: 0.9.0
> Reporter: Aled Sage
>
> A customer was using the ControlledDynamicWebAppCluster, with a
> AutoScalerPolicy. However, they also had an EntitlementManager that throws a
> NullPointerException whenever it is called with {{context==null ||
> context.user==null}} (which happens for automated calls not triggered
> explicitly by a user, such as for policies).
> The underlying cause of the NullPointerException was never logged. All that
> made it into the logs was something like:
> {noformat}
> 2016-07-09 16:07:57,491 DEBUG o.a.b.util.core.task.BasicTask
> [brooklyn-autoscalerpolicy-0]: call from
> Thread[brooklyn-autoscalerpolicy-0,5,main], blocking until
> 'Task[Auto-scaler]@DHtSIGZf' finishes, ended with error:
> java.util.concurrent.ExecutionException:
> org.apache.brooklyn.util.exceptions.PropagatedRuntimeException: Error
> invoking resize at DynamicWebAppClusterImpl{id=iyo37ngogv}:
> NullPointerException
> {noformat}
> We should *always* log (at least at debug) the underlying cause of an
> exception, even if it risks logging it multiple times.
> I've reproduced this in a test case. The thread stacktraces are shown below:
> The thread below is about to throw a NullPointerException. The logging from
> this thread is also shown. The underlying exception is recorded in the task
> (so a subsequent task.get() would throw it):
> {noformat}
> Daemon Thread [brooklyn-execmanager-cU8EcGHT-39] (Suspended (breakpoint at
> line 64 in EffectorExceptionLoggingTest$ThrowingEntitlementManager))
>
> EffectorExceptionLoggingTest$ThrowingEntitlementManager.isEntitled(EntitlementContext,
> EntitlementClass<T>, T) line: 64
> Entitlements.isEntitled(EntitlementManager, EntitlementClass<T>, T)
> line: 409
> Entitlements.checkEntitled(EntitlementManager, EntitlementClass<T>, T)
> line: 414
>
> EntityManagementSupport$EntityChangeListenerImpl.onEffectorStarting(Effector<?>,
> Object) line: 469
> EffectorUtils.invokeMethodEffector(Entity, Effector<T>, Map<String,?>)
> line: 274
> MethodEffector<T>.call(Entity, Map) line: 148
> MethodEffector<T>(AbstractEffector<T>).call(Map, Entity) line: 61
> AbstractEffector$1$1.call() line: 83
> DynamicSequentialTask$DstJob.call() line: 359
> BasicExecutionManager$SubmissionCallable<T>.call() line: 519
> FutureTask<V>.run() line: 262
> ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145
> ThreadPoolExecutor$Worker.run() line: 615
> Thread.run() line: 745
> 2016-07-09 16:33:47,333 WARN o.a.b.c.m.i.EffectorUtils
> [brooklyn-execmanager-cU8EcGHT-39]: Error invoking resize at
> DynamicWebAppClusterImpl{id=iyo37ngogv}: NullPointerException
> 2016-07-09 16:33:51,213 DEBUG o.a.b.u.c.t.BasicExecutionManager
> [brooklyn-execmanager-cU8EcGHT-39]: Exception running task
> Task[resize]@ZbLY4WUg (rethrowing):
> org.apache.brooklyn.core.mgmt.internal.EffectorUtils$EffectorCallPropagatedRuntimeException:
> Error invoking resize at DynamicWebAppClusterImpl{id=iyo37ngogv}:
> NullPointerException
> 2016-07-09 16:33:51,214 DEBUG o.a.b.u.core.task.CompoundTask
> [brooklyn-execmanager-cU8EcGHT-36]: Secondary job queue for
> Task[Auto-scaler]@pOJ7r6Y0 ignoring error in inessential task
> Task[resize]@ZbLY4WUg: java.util.concurrent.ExecutionException:
> org.apache.brooklyn.core.mgmt.internal.EffectorUtils$EffectorCallPropagatedRuntimeException:
> Error invoking resize at DynamicWebAppClusterImpl{id=iyo37ngogv}:
> NullPointerException
> {noformat}
> The thread below invoked the effector - note that the AutoScalerPolicy has
> wrapped the invocation in a task. The logging from this thread is also shown.
> Within that task, it will not catch the exception:
> {noformat}
> "brooklyn-execmanager-cU8EcGHT-0" daemon prio=5 tid=0x00007fcfe5b5f800
> nid=0x6003 waiting on condition [0x00007000018de000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000007f93d7ab0> (a
> java.util.concurrent.FutureTask)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:425)
> at java.util.concurrent.FutureTask.get(FutureTask.java:187)
> at
> com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63)
> at
> org.apache.brooklyn.util.core.task.BasicTask.get(BasicTask.java:361)
> at
> org.apache.brooklyn.util.core.task.BasicTask.getUnchecked(BasicTask.java:370)
> at
> org.apache.brooklyn.util.core.task.DynamicTasks$TaskQueueingResult.andWaitForSuccess(DynamicTasks.java:160)
> at
> org.apache.brooklyn.core.objs.proxy.EntityProxyImpl.invoke(EntityProxyImpl.java:211)
> at com.sun.proxy.$Proxy19.resize(Unknown Source)
> at
> org.apache.brooklyn.entity.webapp.ControlledDynamicWebAppClusterImpl.resize(ControlledDynamicWebAppClusterImpl.java:297)
> at
> org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy$3.resize(AutoScalerPolicy.java:364)
> at
> org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy$14.call(AutoScalerPolicy.java:1054)
> at
> org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy$14.call(AutoScalerPolicy.java:1)
> at
> org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:359)
> at
> org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:519)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 2016-07-09 16:36:54,385 DEBUG o.a.b.u.c.t.BasicExecutionManager
> [brooklyn-execmanager-cU8EcGHT-0]: Exception running task
> Task[Auto-scaler]@pOJ7r6Y0 (rethrowing):
> org.apache.brooklyn.util.exceptions.PropagatedRuntimeException: Error
> invoking resize at DynamicWebAppClusterImpl{id=iyo37ngogv}:
> NullPointerException
> {noformat}
> The thread below invoked the task. It has called {{task.blockUntilEnded}} to
> wait for the result:
> {noformat}
> "brooklyn-autoscalerpolicy-0" prio=5 tid=0x00007fcfe5b49800 nid=0x1407
> waiting on condition [0x0000700002b14000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000007f81a07b0> (a
> java.util.concurrent.FutureTask)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:425)
> at java.util.concurrent.FutureTask.get(FutureTask.java:187)
> at
> com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63)
> at
> org.apache.brooklyn.util.core.task.BasicTask.blockUntilEnded(BasicTask.java:417)
> at
> org.apache.brooklyn.util.core.task.BasicTask.blockUntilEnded(BasicTask.java:407)
> at
> org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy.resizeNow(AutoScalerPolicy.java:1066)
> at
> org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy.access$15(AutoScalerPolicy.java:1021)
> at
> org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy$13.run(AutoScalerPolicy.java:952)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 2016-07-09 16:38:45,089 DEBUG o.a.b.util.core.task.BasicTask
> [brooklyn-autoscalerpolicy-0]: call from
> Thread[brooklyn-autoscalerpolicy-0,5,main], blocking until
> 'Task[Auto-scaler]@pOJ7r6Y0' finishes, ended with error:
> java.util.concurrent.ExecutionException:
> org.apache.brooklyn.util.exceptions.PropagatedRuntimeException: Error
> invoking resize at DynamicWebAppClusterImpl{id=iyo37ngogv}:
> NullPointerException
> {noformat}
> ---
> Note we'd have had the same problem if the cluster.resize() effector itself
> had thrown the exception (rather than the entitlements). We'd have only got
> logging like:
> {noformat}
> 2016-07-09 16:43:52,293 DEBUG o.a.b.u.c.t.BasicExecutionManager
> [brooklyn-execmanager-NpLl6jt3-1]: Exception running task
> Task[Auto-scaler]@uLBsngUp (rethrowing): java.lang.NullPointerException
> 2016-07-09 16:43:52,294 DEBUG o.a.b.util.core.task.BasicTask
> [brooklyn-autoscalerpolicy-0]: call from
> Thread[brooklyn-autoscalerpolicy-0,5,main], blocking until
> 'Task[Auto-scaler]@uLBsngUp' finishes, ended with error:
> java.util.concurrent.ExecutionException: java.lang.NullPointerException
> {noformat}
> This lack of logging is hard to reproduce in a simpler test case where
> effectors are invoked directly. It is primarily because of the additional
> layers of task, and the use of {{basicTask.blockUntilEnded}}, that we lose
> the message.
> However, I have in the past seen something similar when there is an exception
> in an effector invoked from the web-console: the web-console's activity view
> showed the full exception stacktrace, but the debug log did not have it. This
> is very bad for when a customer sends us a debug log!
> ---
> It's unclear what the holistic fix for this should be.
> We do not want to require the caller of the effector to always wrap the call
> in a try-catch with logging: the failure of an effector should get logged
> automatically, and should include the underlying cause. Any other option
> risks us being sent customer log files that give no indication of what caused
> the problem.
> ---
> We can hopefully log any effector invocation error in the same way.
> Alternatively we could treat framework exceptions (e.g. NPE preparing to
> execute) as different from exceptions during the actual effector
> implementation.
> I don't want to do it in low-level code like {{BasicTask.blockUntilEnded}},
> because that will probably result in a huge amount of duplicate logging.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)