Yu-Lin Chen created YUNIKORN-2576:
-------------------------------------

             Summary: Data Race: Flaky tests in dispatcher_test.go
                 Key: YUNIKORN-2576
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2576
             Project: Apache YuniKorn
          Issue Type: Bug
            Reporter: Yu-Lin Chen
         Attachments: shim-race.txt

How to reproduce:
 # In Shim, run 'go test ./pkg/... -race -count=10  > shim-race.txt' 

 
{code:java}
WARNING: DATA RACE
Write at 0x0000035315e0 by goroutine 88:
  github.com/apache/yunikorn-k8shim/pkg/dispatcher.initDispatcher()
      
/home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:73 
+0x2c4
  github.com/apache/yunikorn-k8shim/pkg/dispatcher.createDispatcher()
      
/home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatch_test.go:305
 +0x2f
  runtime.Goexit()
      /usr/local/go/src/runtime/panic.go:626 +0x5d
  testing.(*T).FailNow()
      <autogenerated>:1 +0x31
  gotest.tools/v3/assert.Equal()
      
/home/chenyulin0719/go/pkg/mod/gotest.tools/v3@v3.5.1/assert/assert.go:205 
+0x1aa
  github.com/apache/yunikorn-k8shim/pkg/dispatcher.TestDispatchTimeout()
      
/home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatch_test.go:244
 +0x2ba
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      /usr/local/go/src/testing/testing.go:1742 +0x44Previous read at 
0x0000035315e0 by goroutine 90:
  
github.com/apache/yunikorn-k8shim/pkg/dispatcher.(*Dispatcher).asyncDispatch.func1()
      
/home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:188 
+0x2f5
  
github.com/apache/yunikorn-k8shim/pkg/dispatcher.(*Dispatcher).asyncDispatch.gowrap1()
      
/home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:197 
+0x6eGoroutine 88 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:1742 +0x825
  testing.runTests.func1()
      /usr/local/go/src/testing/testing.go:2161 +0x85
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1689 +0x21e
  testing.runTests()
      /usr/local/go/src/testing/testing.go:2159 +0x8be
  testing.(*M).Run()
      /usr/local/go/src/testing/testing.go:2027 +0xf17
  main.main()
      _testmain.go:55 +0x2bdGoroutine 90 (running) created at:
  github.com/apache/yunikorn-k8shim/pkg/dispatcher.(*Dispatcher).asyncDispatch()
      
/home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:178 
+0x391
  github.com/apache/yunikorn-k8shim/pkg/dispatcher.(*Dispatcher).dispatch()
      
/home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:164 
+0xbb
  github.com/apache/yunikorn-k8shim/pkg/dispatcher.Dispatch()
      
/home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:142 
+0x71
  github.com/apache/yunikorn-k8shim/pkg/dispatcher.TestDispatchTimeout()
      
/home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatch_test.go:232
 +0x244
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      /usr/local/go/src/testing/testing.go:1742 +0x44
=================={code}
Root Cause:
 * The [globla 
vairables|https://github.com/chenyulin0719/yunikorn-k8shim/blob/64b204a2fb3b83fde9d86ea58f5f0d1e42187472/pkg/dispatcher/dispatcher.go#L46-L51]
 in dispatcher.go is not protected when running unit tests. Each unit test will 
run initDispatcher() through 
[createDispatcher()|https://github.com/chenyulin0719/yunikorn-k8shim/blob/64b204a2fb3b83fde9d86ea58f5f0d1e42187472/pkg/dispatcher/dispatch_test.go#L305].
 * Race occurs if any other unit tests read/write the global variables before 
or after initDispatcher(). ex: TestDispatchTimeout()   
[https://github.com/chenyulin0719/yunikorn-k8shim/blob/64b204a2fb3b83fde9d86ea58f5f0d1e42187472/pkg/dispatcher/dispatcher.go#L188]

 

Solution to be discussed:
 # Refactor dispatcher.go and encapsulates global variables to Dispatcher struct
, change Dispatcher.Start(), Dispatcher.Stop() to type method
 # Implement Singleton in getDispatcher() and add a new function newDispatcher()
 # Create a new Dispatcher for each unit test

 

The race issue only happens in unit test becasue the shared vairable was 
protected by 
once.Do(initDispatcher) in dispatcher.go : 
[https://github.com/chenyulin0719/yunikorn-k8shim/blob/64b204a2fb3b83fde9d86ea58f5f0d1e42187472/pkg/dispatcher/dispatcher.go#L131]
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org

Reply via email to