style95 opened a new pull request, #5432: URL: https://github.com/apache/openwhisk/pull/5432
## Description This is to fix a bug in that ETCD data for non-existent containers persist forever. This bug happens when a shared action is invoked. When the queue manager receives an activation message, it finds a queue based on the `docId` only and this is wrong. As a result, the activation message could be sent to the wrong queue as long as the `docId` is same. But the `docId` is same for all shared actions. ex) - original action: `/whisk.system/sharedPackage/hello` - bound action1: `/style95/myPackage/hello` (docId: `/whisk.system/sharedPackage/hello`) - bound action2: `/bdoyle/yourPackage/hello` (docId: `/whisk.system/sharedPackage/hello`) So an activation for `/style95/myPackage/hello` could be sent to `/bdoyle/yourPackage/hello`. Then a memory queue will send the activation to a container for `/style95/myPackage/hello`. The container is initialized with `/style95/myPackage/hello` and it registers the ETCD data for the running container. But after executing the activation for `/bdoyle/yourPackage/hello`, it tries to remove the ETCD data for the running container using the key prefix with `/bdoyle/yourPackage/hello` because the key now resides in the container data(`WarmData`). Accordingly, the original ETCD data for the running container is not deleted forever. Please refer to the following logs that I found. (The shared action is `whisk.system/sharedPackage/hello` and I replaced the name of two namespace to `style95` and `bdoyle` from the original logs just to hide proprietary information.) ``` [2023-08-04T16:31:49.392Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] [QueueManager] Got activation message 98481e836089419d881e836089d19d00 for Namespace(style95,07e316f4-9393-4777-a1d3-650da22a83f0)/whisk.system/sharedPackage/hello@0.0.38 from kafka. [2023-08-04T16:31:49.392Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] [QueueManager] [98481e836089419d881e836089d19d00] the key whisk/queue/style95/whisk.system/sharedPackage/hello/leader is not in the initRevisionMap. revision: 38-5e7fc51a452c685030fdfcec61e3bdf1 [2023-08-04T16:31:49.392Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] [QueueManager] [98481e836089419d881e836089d19d00] send activation to remote queue, key: whisk/queue/style95/whisk.system/sharedPackage/hello/leader revision: 38-5e7fc51a452c685030fdfcec61e3bdf1 [2023-08-04T16:31:49.419Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] [QueueManager] add a new actor selection to a map with key: whisk/queue/style95/whisk.system/sharedPackage/hello/leader [2023-08-04T16:31:49.419Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] [QueueManager] Got activation message 98481e836089419d881e836089d19d00 for Namespace(style95,07e316f4-9393-4777-a1d3-650da22a83f0)/whisk.system/sharedPackage/hello@0.0.38 from remote queue manager. [2023-08-04T16:31:49.419Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] [QueueManager] Queue for action whisk.system/sharedPackage/hello@0.0.38 is already recovered, skip [2023-08-04T16:31:49.419Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] [MemoryQueue] [bdoyle:whisk.system/sharedPackage/hello@0.0.38:Running] got a new activation message 98481e836089419d881e836089d19d00 ``` As you can see above the activation(`98481e836089419d881e836089d19d00`) is originally for `style95` but it is finally sent to the queue for `bdoyle`. And the queue sent this activation to a container with an ID(`f372eb3f960da72c4bff75110d24b0c5b72d6d306d3c31e59f4c10f90ccefdff`). ``` [2023-08-04T16:31:49.686Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] [MemoryQueue] [bdoyle:whisk.system/sharedPackage/hello@0.0.38:Running] Get activation request f372eb3f960da72c4bff75110d24b0c5b72d6d306d3c31e59f4c10f90ccefdff, send one message: 98481e836089419d881e836089d19d00 ``` This container was originally created for `bdoyle`. ``` [2023-08-04T16:31:49.623Z] [INFO] [#tid_sid_unknown] [FunctionPullingContainerPool] received a container creation message: f3dc23dbb34c43889c23dbb34c93882e [2023-08-04T16:31:49.683Z] [INFO] [#tid_sid_unknown] [WatcherService] watch endpoint: WatchEndpoint(whisk/namespace/bdoyle/whisk.system/sharedPackage/hello/38-5e7fc51a452c685030fdfcec61e3bdf1/invoker0/container/f372eb3f960da72c4bff75110d24b0c5b72d6d306d3c31e59f4c10f90ccefdff,,false,data-management-service,Set(DeleteEvent)) [2023-08-04T16:31:49.685Z] [INFO] [#tid_sid_unknown] [FPCInvokerReactive] Posted success ack of container creation f3dc23dbb34c43889c23dbb34c93882e for bdoyle/whisk.system/sharedPackage/hello@0.0.38 ``` But when the same container(`f372eb3f960da72c4bff75110d24b0c5b72d6d306d3c31e59f4c10f90ccefdff`) is being paused, it tries to unwatch endpoint based on the `style95` key because it executed an activation for `style95` namespace. ``` [2023-08-04T16:31:49.688Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] [FunctionPullingContainerProxy] received a message 98481e836089419d881e836089d19d00 for whisk.system/sharedPackage/hello@0.0.38 in ClientCreated [2023-08-04T16:31:49.688Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] [DockerContainer] sending initialization to ContainerId(f372eb3f960da72c4bff75110d24b0c5b72d6d306d3c31e59f4c10f90ccefdff) ContainerAddress(172.17.0.34,8080) [2023-08-04T16:31:49.888Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] [DockerContainer] initialization result: ok [2023-08-04T16:31:49.888Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] [DockerContainer] sending arguments to /whisk.system/sharedPackage/hello at ContainerId(f372eb3f960da72c4bff75110d24b0c5b72d6d306d3c31e59f4c10f90ccefdff) ContainerAddress(172.17.0.34,8080) [2023-08-04T16:31:49.992Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] [DockerContainer] running result: ok [2023-08-04T16:31:49.992Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] [MessagingActiveAck] posted completion of activation 98481e836089419d881e836089d19d00 [2023-08-04T16:32:00.007Z] [INFO] [#tid_sid_unknown] [FunctionPullingContainerProxy] No more run activation is coming in state: Running, action: ExecutableWhiskAction/whisk.system/sharedPackage/hello@0.0.38, container: ContainerId(f372eb3f960da72c4bff75110d24b0c5b72d6d306d3c31e59f4c10f90ccefdff) [2023-08-04T16:32:00.472Z] [INFO] [#tid_sid_invokerNanny] [RuncClient] running /usr/bin/docker-runc pause f372eb3f960da72c4bff75110d24b0c5b72d6d306d3c31e59f4c10f90ccefdff (timeout: 10 seconds) [2023-08-04T16:32:00.544Z] [INFO] [#tid_sid_unknown] [WatcherService] watch endpoint: WatchEndpoint(whisk/warmed/style95/whisk.system/sharedPackage/hello/38-5e7fc51a452c685030fdfcec61e3bdf1/invoker/0/container/f372eb3f960da72c4bff75110d24b0c5b72d6d306d3c31e59f4c10f90ccefdff,,false,data-management-service,Set(DeleteEvent)) [2023-08-04T16:32:00.544Z] [INFO] [#tid_sid_unknown] [WatcherService] unwatch endpoint: UnwatchEndpoint(whisk/namespace/style95/whisk.system/sharedPackage/hello/38-5e7fc51a452c685030fdfcec61e3bdf1/invoker0/container/f372eb3f960da72c4bff75110d24b0c5b72d6d306d3c31e59f4c10f90ccefdff,false,data-management-service,true) ``` Consequently, the data for `bdoyle`(`whisk/namespace/bdoyle/whisk.system/sharedPackage/hello/38-5e7fc51a452c685030fdfcec61e3bdf1/invoker0/container/f372eb3f960da72c4bff75110d24b0c5b72d6d306d3c31e59f4c10f90ccefdff`) is not removed forever unless the invoker is restarted. ## Related issue and scope <!--- Please include a link to a related issue if there is one. --> - [ ] I opened an issue to propose and discuss this change (#????) ## My changes affect the following components <!--- Select below all system components are affected by your change. --> <!--- Enter an `x` in all applicable boxes. --> - [ ] API - [ ] Controller - [ ] Message Bus (e.g., Kafka) - [ ] Loadbalancer - [x] Scheduler - [x] Invoker - [ ] Intrinsic actions (e.g., sequences, conductors) - [ ] Data stores (e.g., CouchDB) - [ ] Tests - [ ] Deployment - [ ] CLI - [ ] General tooling - [ ] Documentation ## Types of changes <!--- What types of changes does your code introduce? Use `x` in all the boxes that apply: --> - [x] Bug fix (generally a non-breaking change which closes an issue). - [ ] Enhancement or new feature (adds new functionality). - [ ] Breaking change (a bug fix or enhancement which changes existing behavior). ## Checklist: <!--- Please review the points below which help you make sure you've covered all aspects of the change you're making. --> - [x] I signed an [Apache CLA](https://github.com/apache/openwhisk/blob/master/CONTRIBUTING.md). - [x] I reviewed the [style guides](https://github.com/apache/openwhisk/blob/master/CONTRIBUTING.md#coding-standards) and followed the recommendations (Travis CI will check :). - [ ] I added tests to cover my changes. - [ ] My changes require further changes to the documentation. - [ ] I updated the documentation where necessary. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@openwhisk.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org