Aled Sage created BROOKLYN-313:
----------------------------------

             Summary: Exception stacktrace never logged when resize effector 
fails, called by AutoScaler policy
                 Key: BROOKLYN-313
                 URL: https://issues.apache.org/jira/browse/BROOKLYN-313
             Project: Brooklyn
          Issue Type: Bug
    Affects Versions: 0.9.0
            Reporter: Aled Sage


A customer was using the ControlledDynamicWebAppCluster, with a 
AutoScalerPolicy. However, they also had an EntitlementManager that throws a 
NullPointerException whenever it is called with {{context==null || 
context.user==null}} (which happens for automated calls not triggered 
explicitly by a user, such as for policies).

The underlying cause of the NullPointerException was never logged. All that 
made it into the logs was something like:
{noformat}
2016-07-09 16:07:57,491 DEBUG o.a.b.util.core.task.BasicTask 
[brooklyn-autoscalerpolicy-0]: call from 
Thread[brooklyn-autoscalerpolicy-0,5,main], blocking until 
'Task[Auto-scaler]@DHtSIGZf' finishes, ended with error: 
java.util.concurrent.ExecutionException: 
org.apache.brooklyn.util.exceptions.PropagatedRuntimeException: Error invoking 
resize at DynamicWebAppClusterImpl{id=iyo37ngogv}: NullPointerException
{noformat}

We should *always* log (at least at debug) the underlying cause of an 
exception, even if it risks logging it multiple times.

I've reproduced this in a test case. The thread stacktraces are shown below:

The thread below is about to throw a NullPointerException. The logging from 
this thread is also shown. The underlying exception is recorded in the task (so 
a subsequent task.get() would throw it):

{noformat}
Daemon Thread [brooklyn-execmanager-cU8EcGHT-39] (Suspended (breakpoint at line 
64 in EffectorExceptionLoggingTest$ThrowingEntitlementManager)) 
        
EffectorExceptionLoggingTest$ThrowingEntitlementManager.isEntitled(EntitlementContext,
 EntitlementClass<T>, T) line: 64 
        Entitlements.isEntitled(EntitlementManager, EntitlementClass<T>, T) 
line: 409   
        Entitlements.checkEntitled(EntitlementManager, EntitlementClass<T>, T) 
line: 414        
        
EntityManagementSupport$EntityChangeListenerImpl.onEffectorStarting(Effector<?>,
 Object) line: 469      
        EffectorUtils.invokeMethodEffector(Entity, Effector<T>, Map<String,?>) 
line: 274        
        MethodEffector<T>.call(Entity, Map) line: 148   
        MethodEffector<T>(AbstractEffector<T>).call(Map, Entity) line: 61       
        AbstractEffector$1$1.call() line: 83    
        DynamicSequentialTask$DstJob.call() line: 359   
        BasicExecutionManager$SubmissionCallable<T>.call() line: 519    
        FutureTask<V>.run() line: 262   
        ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145      
        ThreadPoolExecutor$Worker.run() line: 615       
        Thread.run() line: 745  

2016-07-09 16:33:47,333 WARN  o.a.b.c.m.i.EffectorUtils 
[brooklyn-execmanager-cU8EcGHT-39]: Error invoking resize at 
DynamicWebAppClusterImpl{id=iyo37ngogv}: NullPointerException
2016-07-09 16:33:51,213 DEBUG o.a.b.u.c.t.BasicExecutionManager 
[brooklyn-execmanager-cU8EcGHT-39]: Exception running task 
Task[resize]@ZbLY4WUg (rethrowing): 
org.apache.brooklyn.core.mgmt.internal.EffectorUtils$EffectorCallPropagatedRuntimeException:
 Error invoking resize at DynamicWebAppClusterImpl{id=iyo37ngogv}: 
NullPointerException
2016-07-09 16:33:51,214 DEBUG o.a.b.u.core.task.CompoundTask 
[brooklyn-execmanager-cU8EcGHT-36]: Secondary job queue for 
Task[Auto-scaler]@pOJ7r6Y0 ignoring error in inessential task 
Task[resize]@ZbLY4WUg: java.util.concurrent.ExecutionException: 
org.apache.brooklyn.core.mgmt.internal.EffectorUtils$EffectorCallPropagatedRuntimeException:
 Error invoking resize at DynamicWebAppClusterImpl{id=iyo37ngogv}: 
NullPointerException
{noformat}

The thread below invoked the effector - note that the AutoScalerPolicy has 
wrapped the invocation in a task. The logging from this thread is also shown. 
Within that task, it will not catch the exception:

{noformat}
"brooklyn-execmanager-cU8EcGHT-0" daemon prio=5 tid=0x00007fcfe5b5f800 
nid=0x6003 waiting on condition [0x00007000018de000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007f93d7ab0> (a 
java.util.concurrent.FutureTask)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:425)
        at java.util.concurrent.FutureTask.get(FutureTask.java:187)
        at 
com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63)
        at org.apache.brooklyn.util.core.task.BasicTask.get(BasicTask.java:361)
        at 
org.apache.brooklyn.util.core.task.BasicTask.getUnchecked(BasicTask.java:370)
        at 
org.apache.brooklyn.util.core.task.DynamicTasks$TaskQueueingResult.andWaitForSuccess(DynamicTasks.java:160)
        at 
org.apache.brooklyn.core.objs.proxy.EntityProxyImpl.invoke(EntityProxyImpl.java:211)
        at com.sun.proxy.$Proxy19.resize(Unknown Source)
        at 
org.apache.brooklyn.entity.webapp.ControlledDynamicWebAppClusterImpl.resize(ControlledDynamicWebAppClusterImpl.java:297)
        at 
org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy$3.resize(AutoScalerPolicy.java:364)
        at 
org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy$14.call(AutoScalerPolicy.java:1054)
        at 
org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy$14.call(AutoScalerPolicy.java:1)
        at 
org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:359)
        at 
org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:519)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

2016-07-09 16:36:54,385 DEBUG o.a.b.u.c.t.BasicExecutionManager 
[brooklyn-execmanager-cU8EcGHT-0]: Exception running task 
Task[Auto-scaler]@pOJ7r6Y0 (rethrowing): 
org.apache.brooklyn.util.exceptions.PropagatedRuntimeException: Error invoking 
resize at DynamicWebAppClusterImpl{id=iyo37ngogv}: NullPointerException
{noformat}

The thread below invoked the task. It has called {{task.blockUntilEnded}} to 
wait for the result:

{noformat}
"brooklyn-autoscalerpolicy-0" prio=5 tid=0x00007fcfe5b49800 nid=0x1407 waiting 
on condition [0x0000700002b14000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007f81a07b0> (a 
java.util.concurrent.FutureTask)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:425)
        at java.util.concurrent.FutureTask.get(FutureTask.java:187)
        at 
com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63)
        at 
org.apache.brooklyn.util.core.task.BasicTask.blockUntilEnded(BasicTask.java:417)
        at 
org.apache.brooklyn.util.core.task.BasicTask.blockUntilEnded(BasicTask.java:407)
        at 
org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy.resizeNow(AutoScalerPolicy.java:1066)
        at 
org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy.access$15(AutoScalerPolicy.java:1021)
        at 
org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy$13.run(AutoScalerPolicy.java:952)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

2016-07-09 16:38:45,089 DEBUG o.a.b.util.core.task.BasicTask 
[brooklyn-autoscalerpolicy-0]: call from 
Thread[brooklyn-autoscalerpolicy-0,5,main], blocking until 
'Task[Auto-scaler]@pOJ7r6Y0' finishes, ended with error: 
java.util.concurrent.ExecutionException: 
org.apache.brooklyn.util.exceptions.PropagatedRuntimeException: Error invoking 
resize at DynamicWebAppClusterImpl{id=iyo37ngogv}: NullPointerException
{noformat}

---
Note we'd have had the same problem if the cluster.resize() effector itself had 
thrown the exception (rather than the entitlements). We'd have only got logging 
like:

{noformat}
2016-07-09 16:43:52,293 DEBUG o.a.b.u.c.t.BasicExecutionManager 
[brooklyn-execmanager-NpLl6jt3-1]: Exception running task 
Task[Auto-scaler]@uLBsngUp (rethrowing): java.lang.NullPointerException
2016-07-09 16:43:52,294 DEBUG o.a.b.util.core.task.BasicTask 
[brooklyn-autoscalerpolicy-0]: call from 
Thread[brooklyn-autoscalerpolicy-0,5,main], blocking until 
'Task[Auto-scaler]@uLBsngUp' finishes, ended with error: 
java.util.concurrent.ExecutionException: java.lang.NullPointerException
{noformat}

This lack of logging is hard to reproduce in a simpler test case where 
effectors are invoked directly. It is primarily because of the additional 
layers of task, and the use of {{basicTask.blockUntilEnded}}, that we lose the 
message.

However, I have in the past seen something similar when there is an exception 
in an effector invoked from the web-console: the web-console's activity view 
showed the full exception stacktrace, but the debug log did not have it. This 
is very bad for when a customer sends us a debug log!

---
It's unclear what the holistic fix for this should be. 

We do not want to require the caller of the effector to always wrap the call in 
a try-catch with logging: the failure of an effector should get logged 
automatically, and should include the underlying cause. Any other option risks 
us being sent customer log files that give no indication of what caused the 
problem.

---
We can hopefully log any effector invocation error in the same way. 
Alternatively we could treat framework exceptions (e.g. NPE preparing to 
execute) as different from exceptions during the actual effector 
implementation. 

I don't want to do it in low-level code like {{BasicTask.blockUntilEnded}}, 
because that will probably result in a huge amount of duplicate logging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to