Hi, We run a dev/alpha stack of our application in Azure Kubernetes. Persistent storage is contained in Azure Files NAS storage volumes, one per server node.
We ran an upgrade of Kubernetes today (from 1.24.9 to 1.26.3). During the update various pods were stopped and restarted as is normal for an update. This included nodes running the dev/alpha stack. At least one node (of a cluster of four server nodes in the cluster) failed to restart after the update, with the following logging: 2023-07-18 01:23:55.171 [1] INF Restoring checkpoint after logical recovery, will start physical recovery from back pointer: WALPointer [idx=2431, fileOff=209031823, len=29] 2023-07-18 01:23:55.205 [28] ERR Failed to apply page delta. rec=[PagesListRemovePageRecord [rmvdPageId=0101000100000057, pageId=0101000100000004, grpId=-1476359018, super=PageDeltaRecord [grpId=-1476359018, pageId=0101000100000004, super=WALRecord [size=41, chainSize=0, pos=WALPointer [idx=2431, fileOff=209169155, len=41], type=PAGES_LIST_REMOVE_PAGE]]]] 2023-07-18 01:23:55.217 [1] INF Cleanup cache stores [total=0, left=0, cleanFiles=false] 2023-07-18 01:23:55.218 [1] ERR Got exception while starting (will rollback startup routine). 2023-07-18 01:23:55.218 [1] ERR Exception during start processors, node will be stopped and close connections I know Apache Ignite is very good at surviving 'Big Red Switch' scenarios, and we have our data regions configured with the strictest update protocol (full sync after each write), however it's possible the NAS implementation does something different! I think if we delete the WAL files from the nodes that won't restart then the node may be happy, though we will lose any updates since the last checkpoint (but then, it has low use and checkpoints are every 30-45 seconds or so, so this won't be significant). Is this an error anyone else has noticed? Has anyone else had similar issues with Azure Files when using strict update/sync semantics? Thanks, Raymond. -- <http://www.trimble.com/> Raymond Wilson Trimble Distinguished Engineer, Civil Construction Software (CCS) 11 Birmingham Drive | Christchurch, New Zealand raymond_wil...@trimble.com <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>