[ 
https://issues.apache.org/jira/browse/IGNITE-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15139627#comment-15139627
 ] 

Andrey Gura edited comment on IGNITE-169 at 2/9/16 9:57 PM:
------------------------------------------------------------

I see two options for fix this:

# We can fix test. Communication SPI should be replaced. It should catch job 
requests (e.g. first one), get task session, invoke {{saveCheckpoint}} method 
and count latch down.
# From my point of view isn't good idea try to process delayed results from 
caller thread (at least in case of async compute facade). So I suggest remove 
this behavior (remove {{GridTaskWorker.processDelayedResponses}} method).

Thoughts?


was (Author: agura):
I see two options for fix this:

# We can fix test. Communication SPI should be replaced. It should cacth job 
requests (e.g. first one), get task session, invoke saveCheckpoint and count 
latch down.
# From my point of view isn't good idea try to process delayed results from 
caller thread (especially in case of async compute facade). So I suggest remove 
this bahavior (remove {{processDelayedResponses}} method).

Thoughts?

> [Failed test] GridSessionCheckpointSelfTest fails
> -------------------------------------------------
>
>                 Key: IGNITE-169
>                 URL: https://issues.apache.org/jira/browse/IGNITE-169
>             Project: Ignite
>          Issue Type: Test
>          Components: general
>            Reporter: Irina Vasilinets
>             Fix For: 1.6
>
>
> {{GridSessionCheckpointSelfTest.testSharedFsCheckpoint fails}} from time to 
> time.
> The problem is that test-runner thread invokes {{ignite.compute()}} in async 
> mode and when all jobs are mapped it invokes {{processDelayedResponses}} 
> method. At this moment is possible that all responses are received but sys 
> pool threads don't invoke {{reduce}} method. test-runner thread polls delayed 
> responses from queue and invoke {{reduce}}. So test-runner thread "steals" 
> reduce job that is waitiong on latch that should be counted down by the same 
> thread.
> Just run test N times in a loop in order to reproduce it.
> {noformat}
> [19:04:17,405][ERROR][main][root] Test failed.
> class org.apache.ignite.IgniteException: Failed to save checkpoint (session 
> closed): GridTaskSessionImpl [taskName=GridCheckpointTestTask, 
> dep=GridDeployment [ts=1454958227376, depMode=SHARED, 
> clsLdr=IsolatedClassLoader{roleName='test'}, 
> clsLdrId=16c7442c251-88ba4909-a876-4106-be63-3fec03d9dcff, userVer=0, 
> loc=true, 
> sampleClsName=org.apache.ignite.session.GridSessionCheckpointAbstractSelfTest$GridCheckpointTestTask,
>  pendingUndeploy=false, undeployed=false, usage=0], 
> taskClsName=org.apache.ignite.session.GridSessionCheckpointAbstractSelfTest$GridCheckpointTestTask,
>  sesId=36c7442c251-88ba4909-a876-4106-be63-3fec03d9dcff, 
> startTime=1454958227376, endTime=9223372036854775807, 
> taskNodeId=88ba4909-a876-4106-be63-3fec03d9dcff, 
> clsLdr=IsolatedClassLoader{roleName='test'}, closed=true, cpSpi=null, 
> failSpi=null, loadSpi=null, usage=0, fullSup=true, 
> subjId=88ba4909-a876-4106-be63-3fec03d9dcff, mapFut=IgniteFuture 
> [orig=GridFutureAdapter [resFlag=2, res=null, startTime=1454958227376, 
> endTime=1454958227386, ignoreInterrupts=false, state=DONE]]]
>     at 
> org.apache.ignite.internal.GridTaskSessionImpl.saveCheckpoint0(GridTaskSessionImpl.java:697)
>     at 
> org.apache.ignite.internal.GridTaskSessionImpl.saveCheckpoint(GridTaskSessionImpl.java:677)
>     at 
> org.apache.ignite.internal.GridTaskSessionImpl.saveCheckpoint(GridTaskSessionImpl.java:671)
>     at 
> org.apache.ignite.internal.GridTaskSessionImpl.saveCheckpoint(GridTaskSessionImpl.java:666)
>     at 
> org.apache.ignite.session.GridSessionCheckpointAbstractSelfTest.checkCheckpoints(GridSessionCheckpointAbstractSelfTest.java:145)
>     at 
> org.apache.ignite.session.GridSessionCheckpointSelfTest.testSharedFsCheckpoint(GridSessionCheckpointSelfTest.java:48)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at junit.framework.TestCase.runTest(TestCase.java:176)
>     at 
> org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:1723)
>     at 
> org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:118)
>     at 
> org.apache.ignite.testframework.junits.GridAbstractTest$4.run(GridAbstractTest.java:1661)
>     at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to