[ https://issues.apache.org/jira/browse/YUNIKORN-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Weiwei Yang updated YUNIKORN-1218: ---------------------------------- Attachment: stacktrace.log > Scheduler crashed with concurrent map access error in health checker > -------------------------------------------------------------------- > > Key: YUNIKORN-1218 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1218 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler > Reporter: Weiwei Yang > Assignee: Weiwei Yang > Priority: Major > Attachments: reproduce.patch, stacktrace.log > > > After YUNIKORN-1107, the health checker runs as a background thread in 30s > interval. We observed a few scheduler restarts in the past week that seems to > be caused by this thread, because it has an unsafe access to the partition > context without proper read lock. I have uploaded a patch to reproduce this > locally, and a file of the stack trace when crash happens. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org