[ 
https://issues.apache.org/jira/browse/YUNIKORN-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840044#comment-17840044
 ] 

Yu-Lin Chen commented on YUNIKORN-2576:
---------------------------------------

[~wilfreds] Thanks for the review.
You're right, the assertion fails is prior to the RACE issue. 
Let me create another Jira to clear asyncDispatchCount in initDispatcher(). 
(Also change to use atomic.Int32)

 

I just did a quick test. After I added "asyncDispatchCount.Store(0)" to 
initDispatcher(), the RACE issue is temporarily gone. But I believet that the 
issue still exists.

The DATA RACE log I put in this Jira shows that the race issue occurs in below 
code during unit test:

 
 * 
[https://github.com/chenyulin0719/yunikorn-k8shim/blob/64b204a2fb3b83fde9d86ea58f5f0d1e42187472/pkg/dispatcher/dispatcher.go#L73C1-L73C59]
 * 
[https://github.com/chenyulin0719/yunikorn-k8shim/blob/64b204a2fb3b83fde9d86ea58f5f0d1e42187472/pkg/dispatcher/dispatcher.go#L188C8-L188C18]

Let's fix the assert issue first.

> Data Race: Flaky tests in dispatcher_test.go
> --------------------------------------------
>
>                 Key: YUNIKORN-2576
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2576
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: test - unit
>            Reporter: Yu-Lin Chen
>            Priority: Major
>         Attachments: shim-race.txt
>
>
> How to reproduce:
>  # In Shim, run 'go test ./pkg/... -race -count=10  > shim-race.txt' 
>  
> {code:java}
> WARNING: DATA RACE
> Write at 0x0000035315e0 by goroutine 88:
>   github.com/apache/yunikorn-k8shim/pkg/dispatcher.initDispatcher()
>       
> /home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:73 
> +0x2c4
>   github.com/apache/yunikorn-k8shim/pkg/dispatcher.createDispatcher()
>       
> /home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatch_test.go:305
>  +0x2f
>   runtime.Goexit()
>       /usr/local/go/src/runtime/panic.go:626 +0x5d
>   testing.(*T).FailNow()
>       <autogenerated>:1 +0x31
>   gotest.tools/v3/assert.Equal()
>       
> /home/chenyulin0719/go/pkg/mod/gotest.tools/v3@v3.5.1/assert/assert.go:205 
> +0x1aa
>   github.com/apache/yunikorn-k8shim/pkg/dispatcher.TestDispatchTimeout()
>       
> /home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatch_test.go:244
>  +0x2ba
>   testing.tRunner()
>       /usr/local/go/src/testing/testing.go:1689 +0x21e
>   testing.(*T).Run.gowrap1()
>       /usr/local/go/src/testing/testing.go:1742 +0x44
> Previous read at 0x0000035315e0 by goroutine 90:
>   
> github.com/apache/yunikorn-k8shim/pkg/dispatcher.(*Dispatcher).asyncDispatch.func1()
>       
> /home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:188 
> +0x2f5
>   
> github.com/apache/yunikorn-k8shim/pkg/dispatcher.(*Dispatcher).asyncDispatch.gowrap1()
>       
> /home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:197 
> +0x6eGoroutine 88 (running) created at:
>   testing.(*T).Run()
>       /usr/local/go/src/testing/testing.go:1742 +0x825
>   testing.runTests.func1()
>       /usr/local/go/src/testing/testing.go:2161 +0x85
>   testing.tRunner()
>       /usr/local/go/src/testing/testing.go:1689 +0x21e
>   testing.runTests()
>       /usr/local/go/src/testing/testing.go:2159 +0x8be
>   testing.(*M).Run()
>       /usr/local/go/src/testing/testing.go:2027 +0xf17
>   main.main()
>       _testmain.go:55 +0x2bdGoroutine 90 (running) created at:
>   
> github.com/apache/yunikorn-k8shim/pkg/dispatcher.(*Dispatcher).asyncDispatch()
>       
> /home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:178 
> +0x391
>   github.com/apache/yunikorn-k8shim/pkg/dispatcher.(*Dispatcher).dispatch()
>       
> /home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:164 
> +0xbb
>   github.com/apache/yunikorn-k8shim/pkg/dispatcher.Dispatch()
>       
> /home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:142 
> +0x71
>   github.com/apache/yunikorn-k8shim/pkg/dispatcher.TestDispatchTimeout()
>       
> /home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatch_test.go:232
>  +0x244
>   testing.tRunner()
>       /usr/local/go/src/testing/testing.go:1689 +0x21e
>   testing.(*T).Run.gowrap1()
>       /usr/local/go/src/testing/testing.go:1742 +0x44
> =================={code}
> Root Cause:
>  * The [globla 
> vairables|https://github.com/chenyulin0719/yunikorn-k8shim/blob/64b204a2fb3b83fde9d86ea58f5f0d1e42187472/pkg/dispatcher/dispatcher.go#L46-L51]
>  in dispatcher.go is not protected when running unit tests. Each unit test 
> will run initDispatcher() through 
> [createDispatcher()|https://github.com/chenyulin0719/yunikorn-k8shim/blob/64b204a2fb3b83fde9d86ea58f5f0d1e42187472/pkg/dispatcher/dispatch_test.go#L305].
>  * Race occurs if any other unit tests read/write the global variables before 
> or after initDispatcher(). ex: TestDispatchTimeout()   
> [https://github.com/chenyulin0719/yunikorn-k8shim/blob/64b204a2fb3b83fde9d86ea58f5f0d1e42187472/pkg/dispatcher/dispatcher.go#L188]
>  
> Solution to be discussed:
>  # Refactor dispatcher.go and encapsulates global variables to Dispatcher 
> struct
> , change Dispatcher.Start(), Dispatcher.Stop() to type method
>  # Implement Singleton in getDispatcher() and add a new function 
> newDispatcher()
>  # Create a new Dispatcher for each unit test
>  
> The race issue only happens in unit test becasue the shared vairable was 
> protected by 
> once.Do(initDispatcher) in dispatcher.go : 
> [https://github.com/chenyulin0719/yunikorn-k8shim/blob/64b204a2fb3b83fde9d86ea58f5f0d1e42187472/pkg/dispatcher/dispatcher.go#L131]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org

Reply via email to