Re: Catch ComputeJob failures on the offending Node.

2018-10-01 Thread Chris Berry
Thank you!



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Catch ComputeJob failures on the offending Node.

2018-10-01 Thread Maxim.Pudov
You can listen to Ignite  events  
. I believe, the event type you are looking for is  EVT_JOB_TIMEOUT

  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Catch ComputeJob failures on the offending Node.

2018-09-29 Thread Chris Berry
Hi,

I need to somehow know, programmatically, when I get a Job timeout – on the
Node it occurs on.
Or, to know, programmatically, if I am the Node that threw an Exception
processing a ComputeJob.

For example, I see this in the logs of the machine it occurs on;

[2018-09-29T15:14:37,970Z](grid-timeout-worker-#39)([]) did=3499c762
WARN - GridJobWorker - Job has timed out: GridJobSessionImpl
[ses=GridTaskSessionImpl
[taskName=compute.testing.BadThingsHappenLowVerbosityQuoteComputeTask,
dep=LocalDeployment [super=GridDeployment [ts=1538195815809,
depMode=SHARED, clsLdr=sun.misc.Launcher$AppClassLoader@764c12b6,
clsLdrId=73c6d932661-689f38c2-4d02-4895-94c5-782fab8e3982, userVer=0,
loc=true, sampleClsName=java.lang.String, pendingUndeploy=false,
undeployed=false, usage=0]],
taskClsName=compute.testing.BadThingsHappenLowVerbosityQuoteComputeTask,
sesId=a88a2452661-689f38c2-4d02-4895-94c5-782fab8e3982,
startTime=1538234072966, endTime=1538234077966,
taskNodeId=689f38c2-4d02-4895-94c5-782fab8e3982,
clsLdr=sun.misc.Launcher$AppClassLoader@764c12b6,
closed=false, cpSpi=null, failSpi=null, loadSpi=null, usage=1,
fullSup=false, internal=false,
topPred=compute.ReadyForComputeMonitor$$Lambda$963/95575133@653f9953,
subjId=689f38c2-4d02-4895-94c5-782fab8e3982, mapFut=IgniteFuture
[orig=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null,
hash=306886251]], execName=null],
jobId=e88a2452661-689f38c2-4d02-4895-94c5-782fab8e3982]

Is there a way for me to somehow catch this?
Or, can I somehow listen to the GridJobWorker for failure states?

Or, can I somehow query the ComputeTaskTimeoutException to know if I am on
the Node that caused the underlying failure?

Or, somehow, catch this and convert it to a different User Exception on the
Node that caused the problem?

Basically, I am trying to monitor for ComputJob failures on the Node that
causes them – so that I can take that Node out of Service if, say, the
Exception Rate on that Node gets too high.

Thanks, 
-- Chris 






--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/